1,592 110 24MB
English Pages 889 [890] Year 2023
Lecture Notes in Networks and Systems 572
Ashish Khanna Zdzislaw Polkowski Oscar Castillo Editors
Proceedings of Data Analytics and Management ICDAM 2022
Lecture Notes in Networks and Systems Volume 572
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Ashish Khanna · Zdzislaw Polkowski · Oscar Castillo Editors
Proceedings of Data Analytics and Management ICDAM 2022
Editors Ashish Khanna Maharaja Agrasen Institute of Technology Rohini, Delhi, India
Zdzislaw Polkowski The Karkonosze University of Applied Sciences in Jelenia Góra Jelenia Góra, Poland
Oscar Castillo Tijuana Institute of Technology Tijuana, Baja California, Mexico
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-7614-8 ISBN 978-981-19-7615-5 (eBook) https://doi.org/10.1007/978-981-19-7615-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Dr. Ashish Khanna would like to dedicate this book to his mentors Dr. A. K. Singh and Dr. Abhishek Swaroop for their constant encouragement and guidance and his family members including his mother, wife, and kids. He would also like to dedicate this work to his (Late) father Sh. R. C. Khanna with folded hands for his constant blessings. Dr. Zdzislaw Polkowski would like to dedicate this book to his wife, daughter, and parents.
ICDAM-2022 Steering Committee Members
Patrons Prof. (Dr.) Wioletta Palczewska, Rector, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland Prof. (Dr.) Beata Tel˛az˙ ka, Vice-Rector, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland
General Chairs Prof. Dr. Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute, Poland Prof. Oscar Castillo, Tijuana Institute of Technology, Mexico
Honorary Chairs Prof. Dr. Aboul Ella Hassanien, Cairo University, Egypt Prof. Dr. Vaclav Snasel, Rector, VSB-Technical University of Ostrava, Czech Republic
Conference Chairs Dr. Magdalena Baczy´nska, Dean, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland
vii
viii
ICDAM-2022 Steering Committee Members
Dr. Zdzislaw Polkowski, Adjunct Professor KPSW, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland Jelenia Góra, Poland, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland Prof. Dr. Abhishek Swaroop, Bhagwan Parshuram Institute of Technology, Delhi, India Dr. Salama A. Mostafa, Universiti Tun Hussein Onn Malaysia
Technical Program Chairs Prof. Joel J. P. C. Rodrigues, Federal University of Piauí (UFPI), Teresina, PI, Brazil Dr. Ali Kashif Bashir, Manchester Metropolitan University, UK Prof. Joanna Paliszkiewicz, Warsaw University of Life Science Management Institute, Poland Prof. Dr. Anil K. Ahlawat, KIET Group of Institutes, Ghaziabad, India
Conveners Dr. Deepak Gupta, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi, India Dr. Utku Kose, Suleyman Demirel University, Isparta, Turkey, Europe
Publication Chairs Dr. Vicente García Díaz, University of Oviedo, Spain Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi, India Prof. Adriana Burlea-Schiopoiu, University of Craiova, Romania
Publicity Chairs Dr. Józef Zaprucki, Prof. KPSW, Rector’s Proxy for Foreign Affairs, The Karkonosze State University of Applied Sciences in Jelenia Góra, Poland Syed Bilal Hussain Shah, School of Computing and Mathematics, Manchester Metropolitan University, UK Ashraf Elnagar, University of Sharjah, UAE
ICDAM-2022 Steering Committee Members
Co-conveners Dr. Prayag Tiwari, Aalto University, Finland, Europe Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India
ix
Preface
We hereby are delighted to announce that The Karkonosze University of Applied Sciences, Poland, in collaboration with the University of Craiova, Romania, Warsaw University of Life Sciences, Poland, and Tun Hussein Onn University, Malaysia, has hosted the eagerly awaited and much coveted International Conference on Data Analytics and Management (ICDAM-2022). The third version of the conference was able to attract a diverse range of engineering practitioners, academicians, scholars, and industry delegates, with the reception of abstracts including more than 1500 authors from different parts of the world. The committee of professionals dedicated toward the conference is striving to achieve a high-quality technical program with tracks on data analytics, data management, big data, computational intelligence, and communication networks. All the tracks chosen in the conference are interrelated and are very famous among the present-day research community. Therefore, a lot of research is happening in the above-mentioned tracks and their related sub-areas. More than 370 full-length papers have been received, among which the contributions are focused on theoretical, computer simulation-based research, and laboratoryscale experiments. Among these manuscripts, 71 papers have been included in the Springer proceedings after a thorough two-stage review and editing process. All the manuscripts submitted to the ICDAM-2022 were peer reviewed by at least two independent reviewers, who were provided with a detailed review proforma. The comments from the reviewers were communicated to the authors, who incorporated the suggestions in their revised manuscripts. The recommendations from two reviewers were taken into consideration while selecting a manuscript for inclusion in the proceedings. The exhaustiveness of the review process is evident, given the large number of articles received addressing a wide range of research areas. The stringent review process ensured that each published manuscript met the rigorous academic and scientific standards. It is an exalting experience to finally see these elite contributions materialize into the two book volumes as ICDAM proceedings by Springer entitled “Proceedings of Data Analytics and Management: ICDAM-2022.” ICDAM-2022 invited three keynote speakers, who are eminent researchers in the field of computer science and engineering, from different parts of the world. In
xi
xii
Preface
addition to the plenary sessions on each day of the conference, seven concurrent technical sessions are held every day to assure the oral presentation of around 71 accepted papers. Keynote speakers and session chair(s) for each of the concurrent sessions have been leading researchers from the thematic area of the session. The delegates were provided with a book of extended abstracts to quickly browse through the contents, participate in the presentations, and provide access to a broad audience. The research part of the conference was organized in a total of 15 special sessions. These special sessions provided the opportunity for researchers conducting research in specific areas to present their results in a more focused environment. An international conference of such magnitude and release of the ICDAM-2022 proceedings by Springer has been the remarkable outcome of the untiring efforts of the entire organizing team. The success of an event undoubtedly involves the painstaking efforts of several contributors at different stages, dictated by their devotion and sincerity. Fortunately, since the beginning of its journey, ICDAM-2022 has received support and contributions from every corner. We thank them all who have wished the best for ICDAM-2022 and contributed by any means toward its success. The edited proceedings volumes by Springer would not have been possible without the perseverance of all the steering, advisory, and technical program committee members. All the contributing authors owe thanks from the organizers of ICDAM-2022 for their interest and exceptional articles. We would also like to thank the authors of the papers for adhering to the time schedule and for incorporating the review comments. We wish to extend my heartfelt acknowledgment to the authors, peer reviewers, committee members, and production staff whose diligent work put shape to the ICDAM-2022 proceedings. We especially want to thank our dedicated team of peer reviewers who volunteered for the arduous and tedious step of quality checking and critique on the submitted manuscripts. We wish to thank our faculty colleague Mr. Moolchand Sharma for extending their enormous assistance during the conference. The time spent by them and the midnight oil burnt is greatly appreciated, for which we will ever remain indebted. The management, faculties, administrative, and support staff of the college have always been extending their services whenever needed, for which we remain thankful to them. Lastly, we would like to thank Springer for accepting our proposal for publishing the ICDAM-2022 conference proceedings. The help received from Mr. Aninda Bose, the acquisition senior editor, in the process has been very useful. New Delhi, India
Ashish Khanna Deepak Gupta Organizers, ICDAM-2022
Contents
Penetration Testing in Application Using TestNG Tool . . . . . . . . . . . . . . . . Bhawna Sharma and Rahul Johari Empirical Analysis of Psychological Well-Being of Students During the Pandemic with Rebooted Remote Learning Mode . . . . . . . . . . Akshi Kumar, Kapil Sharma, and Aditi Sharma Handwritten Digit Recognition Using Machine Learning . . . . . . . . . . . . . . Mayank Sharma, Pradhyuman Singh Sindal, and M. Baskar
1
13 31
A Smart Movie Recommendation System Using Machine Learning Predictive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pranshi Verma, Preeti Gupta, and Vijai Singh
45
A Strategy to Accelerate the Inference of a Complex Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Haseena Rahmath, Vishal Srivastava, and Kuldeep Chaurasia
57
Predicting Aramco’s IPO Long-Term Performance During COVID Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Imdadul Haque, Master Prince, and Abdul Rahman Shaik
69
Development of a Transdisciplinary Role Concept for the Process Chain of Industrial Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jörn Schwenken, Christopher Klupak, Marius Syberg, Nikolai West, Felix Walker, and Jochen Deuse Convolutional Neural Network-Based Lung Cancer Nodule Detection Based on Computer Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmed Hamid Ahmed, Hiba Basim Alwan, and Muhammet Çakmak
81
89
D-Test: Decentralized Application for Preventing Fake COVID-19 Test Certificate Scam by Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Himani Mishra and Amita Jain
xiii
xiv
Contents
A Novel QIA Protocol Based on Bell States Position by Random Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 B. Devendar Rao and Ramkumar Jayaraman Some Methods for Digital Image Forgery Detection and Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Ankit Kumar Jaiswal, Shiksha Singh, Santosh Kr. Tripathy, Nirbhay Kr. Tagore, and Arya Shahi A Novel Approach to Visual Linguistics by Assessing Multi-level Language Substructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Monika Arora, Pooja Mudgil, Rajat Kumar, Tarushi Kapoor, Rishabh Gupta, and Ankit Agnihotri Sentiment Analysis on Amazon Product Review: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Shivani Tufchi, Ashima Yadav, Vikash Kumar Rai, and Avishek Banerjee GAER-UWSN: Genetic Algorithm-Based Energy-Efficient Routing Protocols in Underwater Wireless Sensor Networks . . . . . . . . . . . 151 Mohit Sajwan, Shivam Bhatt, Kanav Arora, and Simranjit Singh Occlusion Reconstruction for Person Re-identification . . . . . . . . . . . . . . . . 161 Nirbhay Kumar Tagore, Ramakant Kumar, Naina Yadav, and Ankit Kumar Jaiswal Analysis and Prediction of Purchase Intention of Online Customers with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Megha Bansal and Vaibhav Vyas Create and Develop a Management System for Cardiovascular Clinics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Mohamed A. Fadhel, Haider Hussein Ayaall, Zainab Yasir Hanuyt, and Hussein Hamad Hussein Event Detection on Social Data Streams Using Hybrid-Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Mohammed Ali Mohammed and Narjis Mezaal Shati Automated Machine Learning Deployment Using Open-Source CI/CD Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Ashish Singh Parihar, Umesh Gupta, Utkarsh Srivastava, Vishal Yadav, and Vaibhav Kumar Trivedi Movie Recommendation System Using Machine Learning and MERN Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Shikhar Gupta, Dhruv Rawat, Kanishk Gupta, Ashok Kumar Yadav, Rashmi Gandhi, and Aakanshi Gupta
Contents
xv
A Modified Newman-Girvan Technique for Community Detection . . . . . 233 Samya Muhuri and Deepika Vatsa Mental Stress Level Detection Using LSTM for WESAD Dataset . . . . . . . 243 Lokesh Malviya, Sandip Mal, Radhikesh Kumar, Bishwajit Roy, Umesh Gupta, Deepika Pantola, and Madhuri Gupta Movie Tag Prediction System Using Machine Learning . . . . . . . . . . . . . . . . 251 Vivek Mehta, Tanya Singh, K. Tarun Kumar Reddy, V. Bhanu Prakash Reddy, and Chirag Jain Online Recommendation System Using Collaborative Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 S. B. Goyal, Kamarolhizam Bin Besah, and Ashish Khanna Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Dibyasha Das and Chittaranjan Pradhan Heartbeat Classification Using Sequential Method . . . . . . . . . . . . . . . . . . . . 293 Rajesh Kumar Shrivastava, Simar Preet Singh, Avishek Banerjee, and Gagandeep Kaur Deep Neural Ideal Networks for Brain Tumour Image Segmentation . . . 301 Sadeq Thamer Hlama, Salam Abdulabbas Ghanim, Hayder Rahm Dakheel, and Shaimaa Hadi Mohammed Exploring Correlation of Deep Topic Models Using Structured Topic Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 G. S. Mahalakshmi, S. Hemadharsana, K. Srividhyasaradha, S. Sendhilkumar, and C. Sushant Value-Added Tax Fraud Detection and Anomaly Feature Selection Using Sectorial Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Nasser A. Alsadhan Hybrid Intrusion Detection System Using Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 N. Maheswaran, S. Bose, G. Logeswari, and T. Anitha A Comprehensive Review of CNN-Based Sign Language Translation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Seema and Priti Singla HMM-Assisted Proactive Vulnerability Mitigation in Virtualization Datacenter Though Controlled VM Placement . . . . . . . . . . . . . . . . . . . . . . . 363 J. Manikandan and Uppalapati SriLaskhmi ViDepBot: Assist People to Tackle Depression Due to COVID Using AI Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Jiss Joseph Thomas and D. Venkataraman
xvi
Contents
A Review on Prevalence of Worldwide COPD Situation . . . . . . . . . . . . . . . 391 Akansha Singh, Nupur Prakash, and Anurag Jain Optimal Decision Making to Select the Best Suppliers Using Integrating AHP-TOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Zahra M. Nizar, Watheq H. Laith, and Ahmed K. Al-Najjar Drug Discovery Analysis Using Machine Learning Bioinformatics . . . . . . 419 S. Prabha, S. Sasikumar, S. Surendra, P. Chennakeshava, and Y. Sai Mohan Reddy Effect of GloVe, Word2Vec and FastText Embedding on English and Hindi Neural Machine Translation Systems . . . . . . . . . . . . . . . . . . . . . . 433 Sitender, Sangeeta, N. Sudha Sushma, and Saksham Kumar Sharma Resource Provisioning Aspects of Reserved and On-Demand VMs in Cloud Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Yogesh Kumar, Jitender Kumar, and Poonam Sheoran Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Santosh Kumar, Amol Vasudeva, and Manu Sood A Comparative Analysis of Various Methods for Attendance Framework Based on Real-Time Face Recognition Technology . . . . . . . . 477 A. M. Jothi, Sandeep Kumar Satapathy, and Shruti Mishra Identification of Efficient Industrial Robot Selection (IRS) Methods and Their Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Sasmita Nayak, Neeraj Kumar, and B. B. Choudhury Brain Tumor Diagnosis Using K-Means and Morphological Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Jadhav Jaichandra, P. Hari Charan, and Shashi Mehrotra Anomaly Detection Techniques in Intelligent Surveillance Systems . . . . . 517 Viean Fuaad Abd Al-Rasheed and Narjis Mezaal Shati Effective Detection of DDoS Attack in IoT-Based Networks Using Machine Learning with Different Feature Selection Techniques . . . . . . . . 527 Akash Deep and Manu Sood A Collaborative Destination Recommender Model in Dravidian Language by Social Media Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Muneer V. K. and K. P. Mohamed Basheer Analysis of Influential Features with Spectral Features for Modeling Dialectal Variation in Malayalam Speech Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Rizwana Kallooravi Thandil and K. P. Mohamed Basheer
Contents
xvii
Effect of Feature Selection Techniques on Machine Learning-Based Prediction Models: A Case Study on DDoS Attack in IoT-Based Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Shavnam and Manu Sood Optimal Feature Selection of Web Log Data Using Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Meena Siwach and Suman Mann Colorizing Black and White Images Using Deep ConvNets and GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Savita Ahlawat, Amit Choudhary, Chirag Wadhwa, Hardik Joshi, and Rohit Shokeen Water Potability Prediction on Crops Considering pH, Chloramine, and Lead Content Using Support Vector Machine . . . . . . . . 609 V. Varsha, R. Shree Kriti, and Sekaran Kripa CRACLE: Customer Resource Allocation in CLoud Environment . . . . . 621 Siya Garg, Rahul Johari, Vinita Jindal, and Deo Prakash Vidyarthi Detection SQL Injection Attacks Against Web Application by Using K-Nearest Neighbors with Principal Component Analysis . . . . 631 Ammar Hatem Farhan and Rehab Flaih Hasan Certain Investigations on Ensemble Learning and Machine Learning Techniques with IoT in Secured Cloud Service Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 S. Sivakamasundari and K. Dharmarajan CAPTCHA-Based Image Steganography to Achieve User Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Jayeeta Majumder and Chittaranjan Pradhan A Systematic Review on Deepfake Technology . . . . . . . . . . . . . . . . . . . . . . . . 669 Ihtiram Raza Khan, Saman Aisha, Deepak Kumar, and Tabish Mufti Predicting Stock Market Price Using Machine Learning Techniques . . . 687 Padmalaya Nayak, K. Srinivasa Nihal, Y. Tagore Ashish, M. Sai Bhargav, and K. Saketh Kumar Hyper Lattice Structure for Data Cube Computation . . . . . . . . . . . . . . . . . 697 Ajay Kumar Phogat and Suman Mann Malicious Network Traffic Detection in Internet of Things Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Manjula Ramesh Bingeri, Sivaraman Eswaran, and Prasad Honnavalli
xviii
Contents
Emotion Recognition from Speech Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Bayan Mahfood, Ashraf Elnagar, and Firuz Kamalov Performance Evaluation of Contextualized Arabic Embeddings: The Arabic Sentiment Analysis Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 Fatima Dakalbab and Ashraf Elnagar Estimating Human Running Indoor Based on the Speed of Human Detection by Using OpenPose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Mohammed Abduljabbar Ali, Abir Jaafar Hussain, and Ahmed T. Sadiq Enhanced Multi-label Classification Model for Bully Text Using Supervised Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 V. Indumathi and S. Santhana Megala Prediction of Donor–Recipient Matching in Liver Transplantation Using Correlation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 M. Usha Devi, A. Marimuthu, and S. Santhana Megala Blockchain for 5G-Enabled IoHT—A Framework for Secure Healthcare Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 Md Imran Alam, Md Oqail Ahmad, Shams Tabrez Siddiqui, Mohammad Rafeek Khan, Haneef Khan, and Khalid Ali Qidwai PCG Heart Sounds Quality Classification Using Neural Networks and SMOTE Tomek Links for the Think Health Project . . . . . . . . . . . . . . . 803 Carlos M. Huisa, C. Elvis Supo, T. Edward Figueroa, Jorge Rendulich, and Erasmo Sulla-Espinoza An Influential User Prediction in Social Network Using Centrality Measures and Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 P. Jothi and R. Padmapriya An Approach to Enhance the Character Recognition Accuracy of Nepalese License Plates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Pankaj Raj Dawadi, Manish Pokharel, and Bal Krishna Bal Detection of Fake News by Machine Learning with Linear Classification Algorithms: A Comparative Study . . . . . . . . . . . . . . . . . . . . . 845 Heba Yousef Ateaa, Ali Hussein Hasan, and Ahmed Sabeeh Ali An Ensemble Approach for Aspect-Level Sentiment Classification Using Deep Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861 Tanu Sharma and Kamaldeep Kaur A Novel Clustering Approach in Wireless Sensor Networks Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Suman Devi and Avadhesh Kumar
Contents
xix
Analysis on Exposition of Speech Type Video Using SSD and CNN Techniques for Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883 Nagendar Yamsani, Sk. Hasane Ahammad, Ahmed J. Obaid, K. Saikumar, Amer Hasan Alshathr, and Zainab Saadi Mahdi Ali Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
Editors and Contributors
About the Editors Ashish Khanna has 16 years of expertise in Teaching, Entrepreneurship, and Research and Development He received his Ph.D. degree from National Institute of Technology, Kurukshetra. He has completed his M.Tech. and B.Tech. GGSIPU, Delhi. He has completed his postdoc from Internet of Things Lab at Inatel, Brazil and University of Valladolid, Spain. He has published around 50 SCI indexed papers in IEEE Transaction, Springer, Elsevier, Wiley and many more reputed Journals with cumulative impact factor of above 100. He has around 100 research articles in top SCI/Scopus journals, conferences and book chapters. He is co-author of around 20 edited and text books. His research interest includes Distributed Systems, MANET, FANET, VANET, IoT, Machine learning and many more. He is originator of Bhavya Publications and Universal Innovator Lab. Universal Innovator is actively involved in research, innovation, conferences, startup funding events and workshops. He has served the research field as a Keynote Speaker/Faculty Resource Person/Session Chair/Reviewer/TPC member/post-doctorate supervision. He is convener and Organizer of ICICC conference series. He is currently working at the Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, under GGSIPU, Delhi, India. He is also serving as series editor in Elsevier and De Gruyter publishing houses. Zdzislaw Polkowski is Professor of UJW at Faculty of Technical Sciences and Rector’s Representative for International Cooperation and Erasmus+ Program at the Jan Wyzykowski University Polkowice. Since 2019 he is also Adjunct Professor in Department of Business Intelligence in Management, Wroclaw University of Economics and Business, Poland. Moreover, he is visiting professor in Univeristy of Pitesti, Romania and adjunct professor in Marwadi University, India. He is the former dean of the Technical Sciences Faculty during the period 2009–2012 at UZZM in Lubin. He holds a Ph.D. degree in Computer Science and Management from Wroclaw University of Technology, Post Graduate degree in Microcomputer
xxi
xxii
Editors and Contributors
Systems in Management from University of Economics in Wroclaw and Post Graduate degree IT in Education from Economics University in Katowice. He obtained his Engineering degree in Computer Systems in Industry from Technical University of Zielona Gora. He is co-editor of 4 books and guest editor of 3 journals. He has published more than 75 papers in journals, 25 conference proceedings, including more than 20 papers in journals indexed in the Web of Science, Scopus, IEEE. He served as a member of Technical Program Committee in many International conferences in Poland, India, China, Iran, Romania and Bulgaria. Till date he has delivered 24 invited talks at different international conferences across various countries. He is also the member of the Board of Studies and expert member of the doctoral research committee in many universities in India. He is also the member of the editorial board of several journals and served as a reviewer in a wide range of international journals. His area of interests includes IT in Business, IoT in Business and Education Technology. He has successfully completed a research project on Developing the innovative methodology of teaching Business Informatics funded by the European Commission. He also owns an IT SME consultancy company in Polkowice and Lubin, Poland. Oscar Castillo holds the Doctor in Science degree (Doctor Habilitatus) in Computer Science from the Polish Academy of Sciences (with the Dissertation “Soft Computing and Fractal Theory for Intelligent Manufacturing”). He is a Professor of Computer Science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. In addition, he is serving as Research Director of Computer Science and head of the research group on Hybrid Fuzzy Intelligent Systems. Currently, he is President of HAFSA (Hispanic American Fuzzy Systems Association) and Past President of IFSA (International Fuzzy Systems Association). Prof. Castillo is also Chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). He also belongs to the Technical Committee on Fuzzy Systems of IEEE and to the Task Force on “Extensions to Type-1 Fuzzy Systems”. He is also a member of NAFIPS, IFSA and IEEE. He belongs to the Mexican Research System (SNI Level 3). His research interests are in Type-2 Fuzzy Logic, Fuzzy Control, Neuro-Fuzzy and Genetic-Fuzzy hybrid approaches. He has published over 300 journal papers, 10 authored books, 50 edited books, 300 papers in conference proceedings, and more than 300 chapters in edited books, in total more than 998 publications (according to Scopus) with h index of 80 according to Google Scholar. He has been Guest Editor of several successful Special Issues in the past, like in the following journals: Applied Soft Computing, Intelligent Systems, Information Sciences, Soft Computing, Non-Linear Studies, Fuzzy Sets and Systems, JAMRIS and Engineering Letters. He is currently Associate Editor of the Information Sciences Journal, Journal of Engineering Applications on Artificial Intelligence, International Journal of Fuzzy Systems, Journal of Complex and Intelligent Systems, Granular Computing Journal and Intelligent Systems Journal (Wiley). He was Associate Editor of Journal of Applied Soft Computing and IEEE Transactions on Fuzzy Systems. He has been elected IFSA Fellow in 2015 and MICAI Fellow in 2016. Finally, he recently received the Recognition as Highly Cited Researcher in 2017 and 2018 by Clarivate Analytics and Web of Science.
Editors and Contributors
xxiii
Contributors Ankit Agnihotri Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India Sk. Hasane Ahammad Department of ECE, Koneru Lakshmaiah Education Foundation, Guntur, India Savita Ahlawat Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, India Md Oqail Ahmad Department of Computer Applications, B.S Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu, India Ahmed Hamid Ahmed Departement of Computer Science, University of Technology, Baghdad, Iraq Saman Aisha Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India Ahmed K. Al-Najjar Ministry of Higher Education and Scientific Research, Bagdad, Iraq Viean Fuaad Abd Al-Rasheed Department of Computer Science, College of Sciences, Mustansiriyah University, Baghdad, Iraq Md Imran Alam Department of Computer and Network Engineering, Jazan University, Jazan, Saudi Arabia Ahmed Sabeeh Ali Iraqi Ministry of Interior, Baghdad, Iraq Mohammed Abduljabbar Ali Computer Sciences Department, University of Technology, Baghdad, Iraq Zainab Saadi Mahdi Ali Ashur University College, Baghdad, Iraq Nasser A. Alsadhan College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia Amer Hasan Alshathr Al-Esraa University College, Baghdad, Iraq Hiba Basim Alwan Departement of Computer Science, University of Technology, Baghdad, Iraq T. Anitha Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India Kanav Arora Bennett University, Greater Noida, UP, India Monika Arora Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India
xxiv
Editors and Contributors
Heba Yousef Ateaa College of Computer Science and Information Technology, University of Sumer, Rifai, Iraq Haider Hussein Ayaall College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq Bal Krishna Bal Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal Avishek Banerjee CSBS, Asansol Engineering College, Asansol, West Bengal, India Megha Bansal Department of Computer Science, Faculty of Mathematics and Computing, Banasthali Vidyapith, Aliyabad, Rajasthan, India M. Baskar Associate Professor, Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, India Kamarolhizam Bin Besah City University, Petaling Jaya, Malaysia Shivam Bhatt Bennett University, Greater Noida, UP, India Manjula Ramesh Bingeri Department of Computer Science, PES University, Bengaluru, India S. Bose Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India Muhammet Çakmak Departement of Karabuk University, Karabuk, Turkey
Electrical-Electronics
Engineering,
Kuldeep Chaurasia Bennett University, Greater Noida, Uttar Pradesh, India P. Chennakeshava Department of ECE, Hindustan Institute of Technology and Science, Chennai, India Amit Choudhary Maharaja Surajmal Institute, GGSIP University, New Delhi, India B. B. Choudhury Mechanical Engineering Department, Indira Gandhi Institute of Technology Sarang, Dhenkana, Odisha, India Fatima Dakalbab Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE Hayder Rahm Dakheel University of Sumer, Rifai, Iraq Dibyasha Das Kalinga Institute of Industrial Technology, Bhubaneswar, India Pankaj Raj Dawadi Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal
Editors and Contributors
xxv
Akash Deep Department of Computer Science, Himachal Pradesh University, Shimla, India Jochen Deuse Institute of Production Systems, Technical University Dortmund, Dortmund, Germany; Center for Advanced Manufacturing, University of Technology Sydney, Ultimo, NSW, Australia B. Devendar Rao Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, India M. Usha Devi PG and Research Department of Computer Science, Department of Computer Science, Government Arts College, Coimbatore, Tamilnadu, India Suman Devi School of Computing Science and Engineering, Galgotias University, Greater Noida, India K. Dharmarajan Department of Information Technology, Vels Institute of Science and Technology and Advanced Studies (VISTAS), Chennai, Tamil Nadu, India T. Edward Figueroa Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru Ashraf Elnagar Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE C. Elvis Supo Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru Sivaraman Eswaran Department Bengaluru, India
of
Computer
Science,
PES
University,
Mohamed A. Fadhel College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq Ammar Hatem Farhan Computer Sciences Department, University of Technology, Baghdad, Iraq Rashmi Gandhi Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India Siya Garg Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, Delhi, India Salam Abdulabbas Ghanim Dhi Qar Education Directorate, Nasiriyah, Iraq S. B. Goyal City University, Petaling Jaya, Malaysia Aakanshi Gupta Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India Kanishk Gupta Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India
xxvi
Editors and Contributors
Madhuri Gupta School of CSET, Bennett University, Greater Noida, UP, India Preeti Gupta Department of Computer Science and Engineering, Inderprastha Engineering College, Ghaziabad, India Rishabh Gupta Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India Shikhar Gupta Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India Umesh Gupta Department of Computer Science and Engineering, Bennett University, Noida, India; School of CSET, Bennett University, Greater Noida, UP, India Zainab Yasir Hanuyt College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq Mohammad Imdadul Haque Department of Economics, Faculty of Social Science, Aligarh Muslim University, Aligarh, India P. Hari Charan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India Ali Hussein Hasan College of Computer Science and Information Technology, University of Sumer, Rifai, Iraq Rehab Flaih Hasan Computer Sciences Department, University of Technology, Baghdad, Iraq P. Haseena Rahmath Bennett University, Greater Noida, Uttar Pradesh, India S. Hemadharsana Department of Computer Science and Engineering, Anna University, Chennai, Tamil Nadu, India Sadeq Thamer Hlama College of Science, University of Sumer, Rifai, Iraq Prasad Honnavalli Department of Computer Science, PES University, Bengaluru, India Carlos M. Huisa Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru Abir Jaafar Hussain School of Computer Sciences and Mathematics, Liverpool John Moores University, Liverpool, England Hussein Hamad Hussein College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq V. Indumathi School of Computer Studies, RVS College of Arts and Science, Coimbatore, India Jadhav Jaichandra Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
Editors and Contributors
xxvii
Amita Jain Netaji Subhas University of Technology, Delhi, India Anurag Jain USCIT, GGSIPU, New Delhi, Delhi, India Chirag Jain School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India Ankit Kumar Jaiswal School of CSE & Technology, Bennett University, Greater Noida, India Ramkumar Jayaraman Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Vinita Jindal Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, Delhi, India Rahul Johari SWINGER: Security, Wireless, IoT Network Group of Engineering and Research, University School of Automation and Robotics (USAR), Guru Gobind Singh Indraprastha University, Delhi, India; SWINGER: Security, Wireless, IoT Network Group of Engineering and Research, University School of Information, Communication and Technology (USICT), Guru Gobind Singh Indraprastha University, Delhi, India Hardik Joshi Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, India A. M. Jothi School of Computer Science and Engineering, Vellore Institute of Technology University, Chennai, India P. Jothi School of Computer Studies, Rathnavel Subaramaniam College of Arts and Science, Coimbatore, Tamilnadu, India Firuz Kamalov Department of Electrical Engineering, Canadian University Dubai, Dubai, UAE Tarushi Kapoor Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India Gagandeep Kaur School of Computer Science Engineering and Technology (SCSET), Bennett University, Greater Noida, India Kamaldeep Kaur University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, India Haneef Khan Department of Computer and Network Engineering, Jazan University, Jazan, Saudi Arabia Ihtiram Raza Khan Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India Mohammad Rafeek Khan Department of Computer and Network Engineering, Jazan University, Jazan, Saudi Arabia
xxviii
Editors and Contributors
Ashish Khanna Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi, India Christopher Klupak Institute for Vocational and Business Education, University Hamburg, Hamburg, Germany; Institute of Technical Didactics, Technical University Kaiserslautern, Kaiserslautern, Germany Sekaran Kripa St Joseph’s College of Engineering, Chennai, India Akshi Kumar Department of Computing Metropolitan University, Manchester, UK
and
Mathematics,
Manchester
Avadhesh Kumar School of Computing Science and Engineering, Galgotias University, Greater Noida, India Deepak Kumar Amity Institute of Information Technology, Amity University, Noida, India Jitender Kumar Computer Science and Engineering Department, DCRUST Murthal, Sonipat, Haryana, India Neeraj Kumar Department of Mechanical Engineering, Suresh Gyan Vihar University, Jaipur, India Radhikesh Kumar Department of CSE NIT, Patna, Bihar, India Rajat Kumar Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India Ramakant Kumar CSE Department IIT (BHU), Varanasi, India Santosh Kumar Department of Computer Science, Himachal Pradesh University, Shimla, India Yogesh Kumar Computer Science and Engineering Department, DCRUST Murthal, Sonipat, Haryana, India Watheq H. Laith University of Sumer, Thi-Qar, Rifai, Iraq G. Logeswari Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India G. S. Mahalakshmi Department of Computer Science and Engineering, Anna University, Chennai, Tamil Nadu, India N. Maheswaran Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India Bayan Mahfood Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE Jayeeta Majumder KIIT University, Bhubaneswar, Odisha, India
Editors and Contributors
xxix
Sandip Mal School of SCSE, VIT, Bhopal, MP, India Lokesh Malviya School of SCSE, VIT, Bhopal, MP, India J. Manikandan Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, Andhra Pradesh, India Suman Mann Department of Information Technology, Maharaja Surajmal Institute of Technology, GGSIPU, New Delhi, India A. Marimuthu Department of Computer Science, Government Arts and Science College, Coimbatore, India S. Santhana Megala School of Computer Studies, RVS College of Arts and Science, Coimbatore, India Shashi Mehrotra Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India Vivek Mehta School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India Himani Mishra Netaji Subhas University of Technology, Delhi, India Shruti Mishra School of Computer Science and Engineering, Vellore Institute of Technology University, Chennai, India K. P. Mohamed Basheer Research Department of Computer Science, Sullamussalam Science College, University of Calicut, Kerala, India Mohammed Ali Mohammed Department of Computer Science, College of Sciences, Mustansiriyah University, Baghdad, Iraq Shaimaa Hadi Mohammed College of Computer Science and Information Technology, Sumer University, Rifai, Iraq Pooja Mudgil Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India Tabish Mufti Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India Samya Muhuri Thapar Institute of Engineering & Technology, Patiala, Punjab, India V. K. Muneer Research Department of Computer Science, Sullamussalam Science College, University of Calicut, Kerala, India Padmalaya Nayak Gokaraju Lailavathi Womens Engineering College, Hyderabad, India Sasmita Nayak Department of Mechanical Engineering, Government College of Engineering, Bhawanipatna, Odisha, India
xxx
Editors and Contributors
Zahra M. Nizar University of Sumer, Thi-Qar, Rifai, Iraq Ahmed J. Obaid Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq R. Padmapriya School of Computer Studies, Rathnavel Subaramaniam College of Arts and Science, Coimbatore, Tamilnadu, India Deepika Pantola School of CSET, Bennett University, Greater Noida, UP, India Ashish Singh Parihar Department of Computer Science & Engineering and Information Technology, Jaypee Institute of Information Technology, Noida-Sector (62), Uttar Pradesh, India Ajay Kumar Phogat Maharaja Surajmal Institute of Technology, GGSIPU, New Delhi, India Manish Pokharel Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal S. Prabha Department of ECE, Hindustan Institute of Technology and Science, Chennai, India Chittaranjan Pradhan Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, India Nupur Prakash USCIT, GGSIPU, New Delhi, Delhi, India Master Prince Department of Computer Science, Qassim University, Mulaydha, Saudi Arabia Khalid Ali Qidwai Department of Computer Science, Jazan University, Jazan, Saudi Arabia Vikash Kumar Rai Bennett University, Greater Noida, Uttar Pradesh, India Dhruv Rawat Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India K. Tarun Kumar Reddy School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India V. Bhanu Prakash Reddy School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India Y. Sai Mohan Reddy Department of ECE, Hindustan Institute of Technology and Science, Chennai, India Jorge Rendulich Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru Bishwajit Roy School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
Editors and Contributors
xxxi
Ahmed T. Sadiq Computer Sciences Department, University of Technology, Baghdad, Iraq M. Sai Bhargav Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India K. Saikumar Malla Reddy University, Hyderabad, Telangana, India Mohit Sajwan Bennett University, Greater Noida, UP, India K. Saketh Kumar Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India Sangeeta Computer Science and Engineering, Maharaja Surajmal Institute of Technology, Delhi, India S. Sasikumar Department of ECE, Hindustan Institute of Technology and Science, Chennai, India Sandeep Kumar Satapathy School of Computer Science and Engineering, Vellore Institute of Technology University, Chennai, India Jörn Schwenken Institute of Production Systems, Technical University Dortmund, Dortmund, Germany Seema Department of Computer Science and Engineering, Baba Mastnath University, Rohtak, Haryana, India S. Sendhilkumar Department of Information Science and Technology, Anna University, Chennai, Tamil Nadu, India Arya Shahi Banasthali Vidyapith, Jaipur, India Abdul Rahman Shaik College of Business Administration, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia Aditi Sharma Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India Bhawna Sharma SWINGER: Security, Wireless, IoT Network Group of Engineering and Research University School of Information, Communication and Technology (USICT), Guru Gobind Singh Indraprastha University, Delhi, India Kapil Sharma Department of Information Technology, Delhi Technological University, New Delhi, India Mayank Sharma SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, India Saksham Kumar Sharma Information Technology, Maharaja Surajmal Institute of Technology, Delhi, India Tanu Sharma University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Dwarka, Delhi, India
xxxii
Editors and Contributors
Narjis Mezaal Shati Department of Computer Science, College of Sciences, Mustansiriyah University, Baghdad, Iraq Shavnam Department of Computer Science, Himachal Pradesh University, Shimla, India Poonam Sheoran Biomedical and Engineering Department, DCRUST Murthal, Sonipat, Haryana, India Rohit Shokeen Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, India R. Shree Kriti St Joseph’s College of Engineering, Chennai, India Rajesh Kumar Shrivastava School of Computer Science Engineering and Technology (SCSET), Bennett University, Greater Noida, India Shams Tabrez Siddiqui Department of Computer Science, Jazan University, Jazan, Saudi Arabia Pradhyuman Singh Sindal SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, India Akansha Singh USCIT, GGSIPU, New Delhi, Delhi, India Shiksha Singh Shambhunath Institute of Engineering and Technology, CSED, Prayagraj, India Simar Preet Singh School of Computer Science Engineering and Technology (SCSET), Bennett University, Greater Noida, India Simranjit Singh Bennett University, Greater Noida, UP, India Tanya Singh School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India Vijai Singh Department of Computer Science and Engineering, Inderprastha Engineering College, Ghaziabad, India Priti Singla Department of Computer Science and Engineering, Baba Mastnath University, Rohtak, Haryana, India Sitender Information Technology, Maharaja Surajmal Institute of Technology, Delhi, India S. Sivakamasundari Department of Computer Science, Vels Institute of Science and Technology and Advanced Studies (VISTAS), Chennai, Tamil Nadu, India Meena Siwach Guru Gobind Singh Indraprastha University, Delhi, India; Department of Information Technology, Maharaja Surajmal Institute of Technology, Delhi, India Manu Sood Department of Computer Science, Himachal Pradesh University, Shimla, India
Editors and Contributors
xxxiii
Uppalapati SriLaskhmi Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, Andhra Pradesh, India K. Srinivasa Nihal Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India Utkarsh Srivastava Department of Computer Science, KIET Group of Institutions, Delhi-NCR, Uttar Pradesh, Ghaziabad, India Vishal Srivastava Bennett University, Greater Noida, Uttar Pradesh, India K. Srividhyasaradha Department of Computer Science and Engineering, Anna University, Chennai, Tamil Nadu, India Erasmo Sulla-Espinoza Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru S. Surendra Department of ECE, Hindustan Institute of Technology and Science, Chennai, India C. Sushant Department of Computer Science and Engineering, Anna University, Chennai, Tamil Nadu, India N. Sudha Sushma Computer Science and Engineering, Maharaja Surajmal Institute of Technology, Delhi, India Marius Syberg Institute of Production Systems, Technical University Dortmund, Dortmund, Germany Y. Tagore Ashish Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India Nirbhay Kr. Tagore School of CSE & Technology, Bennett University, Greater Noida, India Nirbhay Kumar Tagore School of CSET, Bennett University, Greater Noida, India Rizwana Kallooravi Thandil Sullamussalam Science College, University of Calicut, Kerala, India Jiss Joseph Thomas Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Santosh Kr. Tripathy CSED, Indian Institute of Technology (BHU), Varanasi, India Vaibhav Kumar Trivedi Department of Computer Science, KIET Group of Institutions, Delhi-NCR, Uttar Pradesh, Ghaziabad, India Shivani Tufchi Bennett University, Greater Noida, Uttar Pradesh, India V. Varsha St Joseph’s College of Engineering, Chennai, India
xxxiv
Editors and Contributors
Amol Vasudeva Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Solan, Himachal Pradesh, India Deepika Vatsa Bennett University, Greater Noida, UP, India D. Venkataraman Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Pranshi Verma Department of Computer Science and Engineering, Inderprastha Engineering College, Ghaziabad, India Deo Prakash Vidyarthi Parallel and Distributed System Lab, School of Computer and System Sciences, JNU, Delhi, India Vaibhav Vyas Department of Computer Science, Faculty of Mathematics and Computing, Banasthali Vidyapith, Aliyabad, Rajasthan, India Chirag Wadhwa Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, India Felix Walker Institute for Vocational and Business Education, University Hamburg, Hamburg, Germany Nikolai West Institute of Production Systems, Technical University Dortmund, Dortmund, Germany Ashima Yadav Bennett University, Greater Noida, Uttar Pradesh, India Ashok Kumar Yadav Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India Naina Yadav CSE Department IIT (BHU), Varanasi, India Vishal Yadav Department of Computer Science, KIET Group of Institutions, DelhiNCR, Uttar Pradesh, Ghaziabad, India Nagendar Yamsani School of Computer Science and Artificial Intelligence, SR University, Warangal, India
Penetration Testing in Application Using TestNG Tool Bhawna Sharma
and Rahul Johari
Abstract In today’s world, during pandemic times when humongous amount of data is transmitted over the network, the security of the data is of paramount importance. Various skilled, semi-skilled and amateurs’ hackers and nerd crackers prawn on the data and try to launch various active and passive attacks on the data. This is where domain of penetration testing is gaining a lot of importance. To show its worth and significance, in the current research work, a program, designed and developed in Java, was launched to perform a brute force attack on a password, that was input by the user at the run time. Thereafter an effort was made to perform penetration testing on the program using the TestNG tool, and the result exceeded the expectations. Keywords Network security · Penetration testing · TestNG
1 Introduction Network security is quite a serious challenge for the network operators and the Internet service providers in an attempt to prevent any of the attacks (active or passive) by intruders. It deals with all the protection of an organization’s information asset that relies at least as much on the people as on the technical controls and their guided solutions by the policy and hence implemented properly. Traditional network security includes maintenance and implementation of physical controls like controls that are technical by nature, which includes hardened routers, firewalls, and intrusion detection systems. Nowadays, every other person is connected to the Internet without any restriction or boundary. Computers and the Internet make an impact on our lives to a greater B. Sharma · R. Johari (B) SWINGER: Security, Wireless, IoT Network Group of Engineering and Research University School of Information, Communication and Technology (USICT), Guru Gobind Singh Indraprastha University, Delhi, India e-mail: [email protected] B. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_1
1
2
B. Sharma and R. Johari
Fig. 1 Methods of penetration testing
extent. It seems like without security life is impossible, and therefore, network security is essential in this kind of environment. As almost the whole world is dependent on networks and to prevent and keep vulnerable potential threats away like ransomware, malware, denial of service (DOS), DDOS attacks, etc., security comes with great responsibility. Penetration testing is the process of discovering security weaknesses, identifying loopholes, and assessing the existing defense mechanisms to check the insecurity of the system. There are three methods of penetration testing as shown in Fig. 1. White Box Penetration Testing—Full information of the internal system or network such as system source code, IP address is provided in this type of testing and hence can test the overall security of the system internally and identifies vulnerabilities if the attacker is an internal source. Black Box Penetration Testing—No information is known beforehand. The tester has to gather the information based on his expertise as he only knows about the outcome of the tests with no idea of the information. Gray Box Testing—Limited amount of information is provided to the tester about the targeted system. A partial remaining information tester has to obtain while performing security tests. It provides features for both white and black box penetration testing.
2 Vulnerability Assessment and Penetration Testing (VAPT) Vulnerability assessment is a systematic review and approach to identifying security weaknesses in a system application. It is entirely a process to assess the system if susceptible to the concerned vulnerabilities through finding those vulnerabilities. It can primarily perform a scanning approach using both manually or by certain tools.
Penetration Testing in Application Using TestNG Tool
3
The report lists all vulnerabilities, which are then categorized based on their severity levels. Further, the outcome that is the report is used for the penetration testing and also recommends mitigation or adoption of remedial measure. Examples are SQL Injection, firewall weaknesses, guessable passwords, etc. There are various types of vulnerability assessments (VA): 1. Host identifies vulnerabilities in workstation and other network hosts, etc., such as application-level bugs, backdoors, insecure file permissions. 2. Network and wireless assessment identify the vulnerabilities in network security that can prevent unauthorized access to public and private networks. 3. Database assessment seeks out database-level vulnerabilities and assesses their risk exposure. 4. Application scans detect security vulnerabilities and incorrect configuration in source code of web application through frontend automated scans, etc. Penetration Testing (PT)—Penetration testing is a process to identify the vulnerabilities and proves further that exploiting them could lead to the damage of the application or network. The result of penetration testing is evidence in the form of screenshots that the searching can provide possible remediation. The steps that are followed in the VAPT process are: 1. Finding security weaknesses. 2. Exploiting security weaknesses. 3. Preparation of final test report. Two aspects through which VA and PT are different from each other. The VA process provides a horizontal map into the security of the application and the network, whereas the PT process does a vertical dive into the search results. In other words, the VA process emphasizes how big the functionality is, whereas the PT signifies how dangerous it can be. The other difference, VA is performed using automated tools, while PT is manual in most cases.
3 Role of OWASP and CERT in Network Security OWASP is an Open Web Application Security Project, a community to provide ASVS (secure coding standard), tools, documentation, and security guidelines to enhance the security of software. This framework comprises (1) Information collection, (2) information configuration, (3) authorization testing, (4) logging, and (5) session testing. There are the top ten most critical web application security risks. OWASP provides different methods to cope with such kinds of security risks. The OWASP checklist of the top ten security risks is a great resource for secure programming. It provides resources for developers to build secure applications. Computer Emergency Response Team (CERT) was used in 1988 for the first time, a group of information experts of security focused on detection, protection against, and response to incidents of cyber security. They resolved incidents like denial of service attacks, providing
4
B. Sharma and R. Johari
alerts, data breaches, and guidelines for handling related incidents. They engaged themselves in conducting public awareness to improve the security of applications.
4 Networking Attacks Types This section contains a variety of attacks that have been categorized as under four main types: • Denial of Service—A denial of service attack (DOS)is an attack where an attacker keeps the memory and computation resources fully engaged and unavailable to address and manage authorized requests of networking and, hence, rejects the authorized user’s right for machine access, e.g., smurf, mailbomb, apache, etc. • Remote-to-User attacks (R2U)—A remote user attack is a type of attack when an attacker is capable of sending the packets to the system, which the attacker does have no access to the account so that the system vulnerabilities and privileges can be exploited. Therefore, it achieves local access as a particular user of that machine, e.g., sendmail dictionary, Xlock, guest, etc. • User-to-Root attacks (U2R)—User-to-root attacks are exploitation where the attacker gains access to a normal user account into the system and takes advantage of some susceptibilities of the system to achieve access to the system, e.g., social engineering, etc. • Probes—Probes are a type of attack where an attacker scans the network or a machine. It collects the information and determines the well-known vulnerabilities of the system to take advantage of and exploit them in the future. This is often used in data mining, e.g., Nmap, Saint, etc.
5 Literature Survey Chu and Lisitsa [1] analyze problems of IoT related to security, and primarily based on the belief-desire intention (BDI) model, it proposed a penetration testing approach and its automation, to evaluate the security of the IoT. Chandan and Khairnar [2] carried out a security assessment on the network of IoT that highlighted the weaknesses in the network. An appropriate countermeasure was applied based on the weaknesses highlighted to make it secure. The paper, during or post-implementation, presents an IoT Testing methodology that needs to be incorporated. Goutam and Tiwari [3] targeted on Internet application security, engineered a framework to test the vulnerabilities. Once penetration testing is done based on vulnerabilities, a framework is going to be designed which can give a lot of security to such websites. Visoottiviseth et al. [4] developed a system of penetration testing for IoT devices to enhance the security awareness of users, called PENTOS. Target device information is gathered automatically via WiFi communication (WiFi, Bluetooth, etc.). It outlines the outcomes of attacking modules, and suggestions are given for secure deployment to
Penetration Testing in Application Using TestNG Tool
5
keep away from viable threats. After penetration testing, Yadav et al. [5] established the novel IoT penetration testing framework known as IoT-PEN. It includes server– client architecture with the server as “a system with resources” and clients as all “IoT nodes.” IoT-PEN is a flexible, scalable, and end-to-end automated IoT penetration testing framework. It finds all the feasible approaches with the use of target-graphs an intruder can breach into the target device. Khan [6] propose a group of validation techniques that are used to confirm vehicle network security. The paper analysis is to position ahead of a collection of tools and strategies for detecting defects and also providing protection against any of the security loopholes in the implementation of in-vehicle network security. Altayaran and Elmedany [7], provide a general view of web application penetration testing integration in both stages of software development lifecycle (SDLC) that is the testing and after the release of an application. This paper also addresses choosing the accurate tools and results of recent studies to examine the security measures of web applications. Anand and Shankar Singh [8] describe the greater view of equipment utilized in pen-testing. Various tools for pen testing along with their features, such as Wireshark, Metasploit, w3af, core, Backtrack, have been discussed for security purposes. Ul Haq and Ahmed Khan [9] analyze most frameworks and techniques with the help of the ISO/IEC 25010 software quality model. This paper has also raised the issues faced by android developers while designing a secure application. This paper provides a survey of various penetration testing tools and effectively seeks out concerned gaps and issues that can help developers to design a penetration secure mobile application. A comparison between various penetration testing tools has been showcased in Table 1.
6 Types of Penetration Testing 6.1 Types of Penetration Testing Network service penetration testing identifies the vulnerabilities in the infrastructure of the network. It is one of the extremely important types of penetration tests as loopholes could be a deficient firewall or an inadequate computer protected within the range of the company network. Some common network service tests are: Firewall bypass testing, DNS attacks, etc.
6.2 Web Application Penetration Tests Web application penetration testing involves testing the entire web application and all its components like browsers, database, source code, back-end network, and so on. It is used for the identification of any vulnerability, threat, or security weakness in a web application and prioritizes solutions to mitigate them.
6
B. Sharma and R. Johari
Table 1 Penetration testing tools S.No.
Tools
Description
1
Wireshark
(1) Network protocol analyzer free (2) Often used packet sniffer, (a) Packet capture and grabs traffic, (b) Filtering the relevant information, (c) Visualization of network streaming (3) Information regarding packet info, network protocols, decryption, etc.
Cost
Windows, Linux, OS X, Solaris, FreeBSD, Unix, etc
2
Metasploit
(1) Metasploit is a powerful testing tool to seek out the vulnerabilities on servers and networks (2) Open-source framework injects custom code to identify weak spot (3) To find systematic vulnerabilities and prioritize solutions
Unix, Linux, Windows
3
nmap
(1) “Network Mapper” Free Open-source tool for scanning vulnerabilities and network discovery (2) Find open ports and devices on the network etc.
Linux, UNIX, FreeBSD, Win dows, etc.
4
John the Ripper
(1) Originally developed for free Unix OS, is a tool used for password cracking (2) Designed to test the strength of password, brute force encrypted passwords, uses dictionary attack to crack passwords
Unix, Windows, DOS, and OpenVMS
5
Burp Suite
(1) Developed by portswigger, a free set of tools for pen testing of web applications (2) Supports the entire process of testing from initial mapping, analysis of application, through exploiting security vulnerabilities
Windows, Linux, and Mac OS X
Free
Platforms
6.3 Wireless Network Penetration Testing It analyzes and highlights the security risks in the devices used at the client’s location. They assess the protocols of wireless networks followed for the configuration at a
Penetration Testing in Application Using TestNG Tool
7
client location, violating, which can enable hackers to exploit the vulnerabilities and provide unauthorized access to the confidential state of information.
6.4 Client-Side Penetration Tests The goal of this test is to identify vulnerabilities in a particular client or an employee’s computer. Various applications like web browsers, email servers, etc. could have a flaw that can allow hackers to breach and steal information. It is one of the essential penetration tests for cybersecurity measures.
6.5 Social Engineering Tests It is one of the extremely important types of tests that involve stealing information using the human aspect. Dedicated hackers obtain information such as login credentials by illegal means. It is required to train employees against such social engineering attempts such as: phishing, smishing, imposters, and build strong passwords.
6.6 Mobile Penetration Test It analyzes the parameters of security in a mobile environment. Smartphones, nowadays, have become one of the lucrative targets for hackers to steal confidential information. The process of discovery, assessment, analysis, exploitation, and reporting is followed to protect mobile security. Types of penetration testing are comprehensively shown in Fig. 2.
7 Methodology Adopted A program designed to launch a brute force attack was programmed in Java. The program has a password, stored in it for example: india. When the program was launched, an input string containing the password was given as run time parameter and the brute force program ran to check the occurrence of the password in the program. If it was found, the said password was displayed. Thereafter a penetration testing [10, 11] was performed on the same program by inserting a assert macro through a testAdd() in the brute force program. For better understanding, the methodology has been depicted by way of flowchart in Fig. 3. The relevant Java API viz: org.testng.annotations.Test and org.testng.Assert. assertEquals were imported in the program. The program was then compiled and executed. For execution, an .xml
8
B. Sharma and R. Johari
Fig. 2 Types of penetration testing
(extensible markup language) code was written, the source code of the same has been posted as public repository in Github [12]. From the .xml program, the brute force program was launched, and the output thus obtained is shown in Fig. 4a–c.
8 Result To show the effective of the proposed work, a program was written in Java programming language, and a brute force attack was launched to detect the occurrence of a password that was input by the user at the run time. As shown in Fig. 4b, the password was detected in 6.5 s.
9 Conclusion and Future Work The primary objective in current research work was to test the efficiency and efficacy of the TestNG Tool in designing of the test cases in java program, and after its successful run and scanning of the program, no failure was reported as shown in Fig. 4c. In future, the vulnerability assessment and penetration testing would be performed on the program in the Internet of things (IoT)-based networks.
Penetration Testing in Application Using TestNG Tool Fig. 3 Flowchart of penetration testing experiment
9
10
B. Sharma and R. Johari
Fig. 4 Snapshot a, b, and c depicting the launch of the brute force program and TestNG result
Penetration Testing in Application Using TestNG Tool
11
References 1. Chu G, Lisitsa A (2018) Penetration testing for internet of things and its automation. In: 2018 IEEE 20th international conference on high performance computing and communications; IEEE 16th international conference on smart city; IEEE 4th international conference on data science and systems (HPCC/SmartCity/DSS). IEEE, pp 1479–1484 2. Chandan AR, Khairnar VD (2018) Security testing methodology of IoT. In: 2018 international conference on inventive research in computing applications (ICIRCA). IEEE, pp 1431–1435 3. Goutam A, Tiwari V (2019) Vulnerability assessment and penetration testing to enhance the security of web application. In: 2019 4th international conference on information systems and computer networks (ISCON). IEEE, pp 601–605 4. Visoottiviseth V, Akarasiriwong P, Chaiyasart S, Chotivatunyu S (2017) PENTOS: penetration testing tool for Internet of Thing devices. In: TENCON 2017–2017 IEEE region 10 conference. IEEE, pp 2279–2284 5. Yadav G, Paul K, Allakany A, Okamura K (2020) Iot-pen: a penetration testing framework for iot. In: 2020 international conference on information networking (ICOIN). IEEE, pp 196–201 6. Khan J (2017) Vehicle network security testing. In: 2017 third international conference on sensing, signal processing and security (ICSSS). IEEE, pp 119–123 7. Altayaran SA, Elmedany W (2021) Integrating web application security penetration testing into the software development life cycle: a systematic literature review. In: 2021 international conference on data analytics for business and industry (ICDABI). IEEE, pp 671–676 8. Anand P, Singh AS (2021) Penetration testing security tools: a comparison. In: 2021 10th international conference on system modeling advancement in research trends (SMART). IEEE, pp 182–184 9. Ul Haq I, Ahmed Khan T (2021) Penetration frameworks and development issues in secure mobile application development: a systematic literature review. IEEE Access 10. Johari R, Kaur I, Tripathi R, Gupta K (2020) Penetration testing in IoT network. In: 2020 5th international conference on computing, communication and security (ICCCS). IEEE, pp 1–7 11. Ahuja S, Johari R, Khokhar C (2016) CRiPT: cryptography in penetration testing. In: Proceedings of the second international conference on computer and communication technologies. Springer, pp 95–106 12. https://github.com/rahuljohaari/TestNGBrute
Empirical Analysis of Psychological Well-Being of Students During the Pandemic with Rebooted Remote Learning Mode Akshi Kumar, Kapil Sharma, and Aditi Sharma
Abstract The direct and indirect mental health stressors, especially associated with the “tele-burden of pandemic” added due to the adoption of the remote learning paradigm, have led to increased online fatigue, distress, and burnout. This research aims to comprehend the perception of psychological distress experienced by Indian students placed in the new online learning setting. Subsequently, the observed symptomatology is used to predict the student’s susceptibility toward developing specific psychological challenges. Primarily, a phenomenological study is conducted on 732 student participants to understand their psychological well-being during this ongoing COVID-19 crisis. Subsequently, machine learning is used to train a model with learned features from the data extracted to detect six psychological states, amusement, neutral, low stress, high stress, depression, and anxiety. Two supervised machine learning algorithms, namely random forest and artificial neural network, are used to perform the predictive analytics of psychological well-being. Experimental evaluation reports a classification accuracy of 90.4% for the random forest and 89.15% for the neural network. The qualitative research findings help foster the need to look for coping strategies involving counselors and psychologists to decrease the risk of psychological distress and preserve students’ psychological health and well-being in the current setting. Keywords Stress · COVID-19 · Machine learning · Psychological states
A. Kumar Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, UK K. Sharma Department of Information Technology, Delhi Technological University, New Delhi, India A. Sharma (B) Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_2
13
14
A. Kumar et al.
1 Introduction As the world combats with the outrageous and perilous novel coronavirus, social restrictions, minimal and limited physical meetings, and isolation are necessary for public health, but on the flip side, it is detrimental for citizen’s psychological health. Stress is natural in day to day lives for all, as acute stress is important to perform better, but the stress along with self-isolation due to social distancing and quarantine during the pandemic has impacted the liveliness of all. A surge has been seen in psychological issues that are faced by people due to this sudden change. The psychosocial disturbance coupled with the restraints of lockdown on everyday activities has a further rippling effects on everyone, that can lead to lasting behavioral changes. The current pandemic has fostered the academic community to adopt and adapt to an absolute online teaching paradigm at a global level. From pre-schools to postgraduate professional courses, e-learning is now being used to sustain academics and prevent collapsing of educational systems. The primary issue with the online teaching method (online pedagogy) is that the masses have been forced to use it. It was an unplanned and rapid move with no training, insufficient resources (devices, bandwidth), and little preparation [1, 2]. This transition from classroombased to online learning can be quite stressful for students as along with other social restrictions, and they are not accustomed to the new way of learning. Undeniably, the students are experiencing psychological and behavioral issues with this new sudden arrangement which is “emotionally and mentally draining" [3]. Fig. 1 depicts the common COVID-19 triggered mental health issues in students. This research primarily analyzes the perception of psychological distress experienced by Indian students placed in the new remote/online learning setting. Subsequently, the observed symptomatology is used to predict the student’s susceptibility toward developing specific psychological challenges. Every psychological/mental disorder has its own set of symptoms, not necessarily unique, but these are characteristics, that help professionals identify a particular disorder or psychological state of an individual. An automated psychological disorder diagnostic tool can use these characteristics symptoms to diagnose an early psychological disorder and trigger an alert for clinical support if needed. Reliable artificial intelligence-based intelligent preventative and intervention tools can aid mental health professionals [4, 5]. It is imperious to build an automatic ML model that classifies psychological distress with minimal human intervention [6, 7]. A COVID-19 Mental Health Questionnaire (C-19 MHQ) was created to acquire data and then use machine learning-based predictive analytics to identify the likelihood of student’s stress and eventual mental well-being. The model is trained using learned features of 732 participants to detect six psychological states: amusement, neutral, low stress, high stress, depression, and anxiety. Two supervised machine learning algorithms, namely a random forest of 7 decision trees and an artificial neural network, have been used to perform the predictive analytics of students’ psychological well-being. Thus, the novelty of this paper is threefold:
Empirical Analysis of Psychological Well-Being of Students During …
15
Fig. 1 COVID-19 triggered common mental health issues in students
• Firstly, a phenomenological study is done to collect and interpret the student’s perception of remote learning during India’s COVID crisis. • Secondly, an artificial intelligence (AI)-based solution is put forward to aid early intervention for impaired psychological well-being. • This work intends to identify emotionally struggling students by vigilantly observing possible symptoms and substantiates the need for a flexible, empathic, and holistic institution-wide support of counselors and psychologists to help students adopt and adapt to the new normal. The structure of this paper is as follows: Sect. 2 discusses related work, followed by the methodology used to conduct the study. Section 4 presents the architecture and working details of learning models, followed by the results and analysis. Section 6 discusses the empirical analysis of the participants’ responses followed by the conclusion.
16
A. Kumar et al.
2 Related Work Pertinent literature studies reported the use of machine learning techniques to analyze the mental well-being of users using either questionnaires, surveys, social media activities, or with the help of smart sensor tracking [8–11]. Recent studies described the impact of COVID-19 on mental health and the need for psychological intervention [12, 13]. Liu presented the mental health effects of the COVID-19 pandemic in China [14]. Both cross-sectional and focused studies (isolation of elderly [15], dementia care [16], medical staff [17]), and home confinement on children [18] are available. Lai identified the degree of symptoms and testified mental health outcomes of 1257 healthcare workers in China who treated COVID-19 patients [19]. Qualitative and quantitative studies have been conducted to detect students’ mental health and wellbeing from various countries. Kamarianos et al. conducted a study on the students of Europe and concluded that the students are comfortable with online teaching [20]. Patsali et al. investigated mental health in university students in Greece during lockdown due to COVID-19 [21]. A recent study analyzed the mental well-being of Indian students from the state of West Bengal [22]. Iannizzotto et al. have provided a novel technique that can help students with Rett genetic syndrome access remote learning classes [23]. Some works are also published on suggesting better teaching pedagogies amidst the pandemic to make remote learning more understandable for the students [17, 24]. Irawan et al. used a phenomenological approach to identify the impact of student psychology on online learning during the COVID-19 pandemic [25]. Christian et al. described techno-stressors for lecturers’ stress levels in applying online teaching methods [26]. Cao et al. conducted a qualitative questionnaire-based study on the psychological effects of the COVID-19 epidemic on students in China [27]. Tiwari and Bhati used ensembled regressor approaches to analyze the impact of COVID-19 in India [28]. IoT-based sensors can be used to monitor the well-being from remote locations (Ranjan et al., 2019). To accurately detect the affective state of the subjects, Kumar et al. have proposed a deep hierarchal approach to the biosignals of the subjects recorded through IoT-based wearable devices [29–31]. If the two techniques to collect the information regarding a subject can be incorporated together, the affective state of the subjects in real time can be accurately predicted.
3 Materials and Methods For conducting a qualitative analysis, participants’ experience regarding a specific event is required. Information can be collected mainly by four common approaches— observation, questionnaire, interview, and focus group discussion [32]. Due to the pandemic, the only feasible option was to collect the data using web-based technologies. Therefore, to analyze the impact of the pandemic on students’ affective state, an online survey was conducted among the students of a reputed Indian university
Empirical Analysis of Psychological Well-Being of Students During …
17
using Microsoft forms. To get the response of the students that how they feel during this pandemic, a questionnaire-based approach was used, which had 40 close-ended questions with an average filling time of 12 min 25 s. We referred to this questionnaire as the COVID-19 Mental Health Questionnaire (C-19 MHQ). The C-19 MHQ was disseminated to the students through Microsoft Teams for voluntary participation, and the form was accessible only to the students of the university with affiliated mail account which helped to avoid untraceable data source. The confidentiality of information was assured. The survey was active for over a period of four weeks that is from May 24, 2020, to Jun 21, 2020. The candidates chosen were enquired about their physical health too to apprehend if any candidate was experiencing more stress or anxiety due to physiological reasons.
3.1 Participants and C-19 MHQ A total of 733 students completed the C-19 MHQ, but one student with a history of clinical depression prior to the pandemic was not taken into account. Therefore, total responses of 732 students from both undergraduate and postgraduate courses were considered for analyzing the causal effect of pandemic and online learning on the affective state of an individual. Though stress and pressure are normal among university students, especially during the examinations or placement sessions, the pandemic and the nationwide lockdown have induced increased levels of stress. The questionnaire assessed the basic personality characteristics of the subjects like age, sex, resources available, interests, health condition, and the course they are pursuing. Participants were also asked about various questions related to remote learning, online exams, and other experiences they are having due to lockdowns, such as the different kinds of support they are receiving from friends, family, and the university. Subjects indicated the different issues they are facing, such as restlessness, excessive worrying, irrational phobias, altered sleep pattern, video fatigue, changes in eating habits (emotional eating), difficulty in concentration, and feeling of guilt, among others that triggered stress. Finally, participants were asked to rate their level of stress too on a scale of 1 to 5. The age of students varied from 17 to 26, with a mean age of 20.01 years. The basic characteristics of participants are shown in Table 1. In India, engineering undergraduate students get placement opportunities from organizations/companies visiting their campus for recruitment. Job losses and economic malaise as an impact of COVID-19 have instigated fear of the future and upheaval in students’ career prospects, leading to increased stress situations. Job prospects are one of the major factors contributing to the stress and depression in students, as most of the students are from the second and third year, who don’t have an active job offer with them. Four hundred ninety-two students were worried about their offered job/internship opportunities as COVID-19 might affect their existing offers, or they might not receive as many offers as they would have in the pre-pandemic situation. The participants who had job or internship offers were
18
A. Kumar et al.
Table 1 Characteristics of the participants Characteristics
Frequency
Percentage (%)
Male
554
76
Female
188
24
17–20
342
46.50
20–22
368
50.30
> 22
22
3.10
B.Tech. CSE/IT
427
58.30
B.Tech. CSE with specialization
183
25
B.Tech. other than CSE
77
10.50
M.Tech.
38
5.20
Ph.D.
7
0.90
1st
20
2.70
2nd
402
54.90
3rd
214
29.20
4th
96
13.10
Gender
Age
Course Enrolled in
Year of the study
still under pressure as 120 had delays in joining date. Table 2 presents this analysis of job prospects. Table 2 Analysis of job offer to students Characteristics
Frequency
Percentage (%)
Yes
100
14
No
560
77
NA
72
10
Job offered received
Stress that pandemic will affect the job/internship opportunities Yes
492
67
No
68
9
Maybe
172
23
Yes
120
16
No
184
25
NA
428
58
Joining delayed?
Empirical Analysis of Psychological Well-Being of Students During …
19
The third main focus of C-19 MHQ was about the adoption and adaption of remote learning, that is, whether the students were able to attend online lectures, for which 430 students responded with yes, whereas 238 students attended the lectures sometimes, depending upon the network availability, and 34 students were unable to attend any of the lectures. Different issues that students faced while attending online lectures were internet connectivity issue (518 students), unavailability of laptop/smartphones (160), inability to focus during online lecture (350), background disturbance either from the faculty side or on your own side (240), not enough practice questions in online questions (208), 64 students faced other issues than the mentioned ones. Only 64 students were comfortable with the online classes and did not face any issues. Overall, 84 students preferred online classes over traditional classroom teaching, whereas 526 students still prefer traditional classroom teaching, and 122 students are inconclusive about the decision. The final section of the C-19 MHQ analyzed the effect of online examinations during the pandemic. To understand the students’ pre-condition, their comfort level with online exams is a must. Table 3 outlines the response of participants toward the online exam. Based on the answers to the self-assessment questionnaire, the students’ stress state was categorized into six different classes: amused, neutral, low stress, high stress, depression, and anxiety. It was observed that various factors that caused stress included online exams, job and internships, isolation, competition among friends, Table 3 Analysis of the online exam Characteristics
Frequency
Percentage (%)
Yes
664
91
No
68
9
Prior experience with online exam
Evaluation process will be same in both online and traditional exam? Yes
160
22
No
356
49
Maybe
216
30
Preference of online exam over traditional exam Online exam
288
39
Offline exam
284
39
Doesn’t make a difference
160
22
Considers grades will not be same as if exam would be in offline mode Yes
420
57
No
120
16
Maybe
192
26
Stress induced due to online exams
3.58
71.60
20
A. Kumar et al.
physical health issues, and unavailability of resources, among others. The complete empirical analysis of responses of participants has been discussed in detail in Sect. 5.
4 Empirical Analysis of Affective Mental State of Students To detect the mental stress level of students in the pandemic, two different machine learning algorithms, namely random forest (RF) and artificial neural network (ANN), were trained and tested on the 732 responses collected through the participant data of C-19 MHQ. For prediction modeling, data was pre-processed first by converting the responses of C-19 MHQ to numeric values, by replacing Yes with 1, No with 0, and Maybe, NA, and sometimes to 2 in all the features. Similarly, all the features present in the data are converted into numeric values; the final mental health issues defined by the users were further categorized into six different classes, amused, neutral, low stress, high stress, depression, and anxiety. The target class assignment was based on the information provided by the participants over the different issues they were facing. Finally, the same was verified with the help of a practicing clinical psychiatrist. At last the data was transformed using one-hot encoding to avoid providing higher importance to larger values. After the validation of the target class, the final data contained 97 students in the amused category, 126 with neutral mental state, 211 had low-stress issue, 192 students were under high stress, and 74 students were under depression, whereas 32 students had severe anxiety shown in Table 4. After data pre-processing, feature selection was performed by generating a correlation matrix. It analyzed the features that have no effect or minimal effect on the output, removed them and simultaneously gave more initial weightage to high correlation features. By following this approach, seven features were found redundant and were removed and finally 32 features were extracted for processing. Figure 2 shows the generic architecture of the ML-based predictive model for affective mental states. The 4-cross validation was used to avoid overfitting. Firstly, the data was applied on an ensemble method, i.e., random forest (RF), in which seven mini decision trees were used to predict the mental state of the user. RF allows collective decisions of different decision trees in RF. The prediction of the class was not simply based on one decision tree, but by an (almost) unanimous prediction made by seven decision Table 4 % distribution of participants’ affective psychological state Affective mental state
Participants
Percentage (%)
Neutral
126
17.21
Low stress
211
28.82
High stress
192
26.22
Depression
74
10.11
Anxiety
32
4.37
Amusement
97
13.25
Empirical Analysis of Psychological Well-Being of Students During …
21
Fig. 2 Affective mental state prediction model
trees. Prediction in RF is truly ensemble as for each decision tree, predicts the class of instance, and then returns the class which was predicted the most often (Fig. 3). The second model used to predict the user’s mental state was an artificial neural network (ANN). For providing the feature values as an input to the neural network,
Fig. 3 Random forest with K decision trees
22
A. Kumar et al.
data is normalized in the range of 0–1. As the decision tree works well on the actual values, the normalized data is only used for the neural network. The model was trained, and rather than initializing each layer with equal weights, job offer, availability of the resources, health issues, and the support of families were initialized with more weight (0.1 for each), and the rest of the features were initialized with 0.02 weight. One hundred fifty epochs were used to train the neural network with backpropagation with a learning rate of 0.01 optimized using stochastic gradient descent.
5 Results and Discussion RF and ANN were trained using features for identifying the affective psychological state of the participants. The models were trained using 75% data and tested with 25% of the data; 4-cross validation was performed. An accuracy of 90.4% was obtained for RF and 89.15% for ANN. The results are shown in Table 5. The confusion matrix generated by both the models is shown in Figs. 4 and 5 for ANN and RF, respectively. It indicates that RF has an overall better performance in comparison to ANN. The phenomenological study conducted highlights that the students face a huge amount of stress during this isolation period. Multiple factors are spiking the stress levels in different students, some of the factors include the unavailability of the resources to study online, the stress to get the job offer, to repay the loan, feeling isolated, not having the support of friends and family, missing the college and other day to day activities, the insecurity about own learning growth in comparison to friends and classmates, and along with that the stress of giving the online examination. Some of these attributes are shown in Figs. 6 and 7. Due to insecurity and the competition among students, most of the students when asked for what will ease in spending time at home alone, 67% of the participants selected the Free MOOC Courses (Fig. 8), as they are feeling vulnerable in this current pandemic. As some of the companies have already declared the financial year 2020–2021 as the pandemic-induced recession, students have a lot of pressure on them to learn new technologies to be market-ready on completion of their graduation. When asked what technologies they will like to learn, most of the students wanted to learn at least two new technologies, as shown in Fig. 9. Table 5 Performance results of the predictive models Model
Precision
Recall
Accuracy
F-measure
Neural network
88.0
87.4
89.15
87.48
Random forest
88.7
89.9
90.4
89.29
Empirical Analysis of Psychological Well-Being of Students During …
23
Fig. 4 Confusion matrix of artificial neural network
Fig. 5 Confusion matrix of random forest
The nationwide lockdown has provided students with the unique opportunity to stay with their families; as most students belong to different states or districts, this was the first and unique opportunity for participants to spend time with their families, but with the ongoing financial crisis, their focus was more on to be industry-ready, rather than to enjoy this time. The issues that students were facing are illustrated with the help of the pie chart, as shown in Fig. 10.
24
A. Kumar et al.
Fig. 6 Support received during pandemic from a social group
Fig. 7 Activities missed most during the COVID-19
The pie chart shows that 17% of students have difficulty concentrating, followed by 13% of the participants who face boredom. 12% have an altered sleep pattern, and 8% of the participants suffer from excessive worrying, stress, and changed eating habits along with the significant weight change and are also restlessness. When asked about whether they have any thoughts of worthlessness and self-harm, 52 students responded with yes, whereas 79 students answered it as sometimes, which is an alarming state and needs corrective measures and interference. Being the students of a highly equipped state university in India, surrounded with all the amenities, established in a smart city, the students were given as much support as possible in the remote environment; still, they are under stress, as predicted by the classifier, 509 students out of the 732 students are under some kind of stress
Empirical Analysis of Psychological Well-Being of Students During …
25
Fig. 8 Resources to reduce stress during the pandemic
Fig. 9 Technologies student want to learn during nationwide lockdown
ranging from low stress to severe anxiety. The variation of stress faced by students of different years is shown in Fig. 11. As students of each year face a different level of stress because of diverse parameters, the major factors contributing to the stress in final year students were job offers and joining delays in the jobs. In contrast, the major factors contributing to stress in second- and third-year students were keeping up with the technical skills as per industry standards and pressure to learn new technologies for future prospects. The first-year students faced stress because of uncertainty and unavailability of resources to incorporate in the changing scenarios. Therefore, each student had a different
26
Fig. 10 Issues faced by the participants
Fig. 11 Box plots showing variation in student stress
A. Kumar et al.
Empirical Analysis of Psychological Well-Being of Students During …
27
effect of the same event as the pre-conditions and current environmental–physical conditions varied for each individual, resulting in a discrete affective psychological state. The long-term mental health effects of pandemic and consequent remote learning affect the well-being of students [33]. Conclusively, this study emphasizes that a rapid assessment of outbreak-associated psychological disorders among various sections of society is needed as the pandemic may lead to severe public mental health implications. As stress, depression, and anxiety for a prolonged time can leave a permanent affective change on the psychological state of an individual, it is imperative for public health and policymakers, along with the university administration, to take extra care of the students by providing them with all the kind of the support that can keep them motivated during this crucial period. It is vital to identify students at-risk for emotional difficulties and develop a plan of action to connect for effective support. The educational centers must adopt a multi-tier system which: • creates mental health first-aid services in centers and community with counselors and psychologists; • assists in identifying at-risk students through AI-based predictive learning; • advocates for universal screening of the school population during and following online learning phases; and • increases awareness of the importance of mental health screening with their teacher colleagues as well as school administrators and parents with the help of webinars, talks, close group discussions, and virtual one-on-ones.
6 Conclusion The pandemic has forced educational centers to shut down and quickly pivot to remote learning since March 2020, leading to a sense of uncertainty about the academic future. It has created havoc in a student’s routine life and negatively affects their mental health and disrupts their studies. The ongoing remote learning and online assessments without considering the digital divide and economic burden have traumatized many students leaving them exposed to indefinite stress and mental health morbidity. Tracing mental health is imperative for timely interventions to improve the psychological well-being of students. A phenomenological study on 732 students from an Indian State University was conducted to cognize their psychological wellbeing during this ongoing COVID-19 crisis. The study data is used for predictive modeling using two ML algorithms to detect six psychological states: amusement, neutral, low stress, high stress, depression, and anxiety. The purpose of the study was to probe the prevalence and severity of COVID-19 associated psychological distress in Indian students using the interim remote learning arrangement. The study had few limitations. Firstly, the number of study samples was few, but we believe that it was sufficiently a representative sample that gave a foretaste of the gravity of appalling mental health. Secondly, the study was done without personal interactions for data collection due to the nationwide lockdown, and therefore, the collection of data was
28
A. Kumar et al.
possible from tech-savvy participants only. Further, as the data was collected online, the physiological changes of the individuals could not be monitored with the help of biomarkers for clinically meaningful symptoms. Also, the study was carried out for four weeks and lacked longitudinal follow-up. Additionally, the work relied on hand-crafted features, which is efficient with the current data size and time; the more advanced techniques such as deep learning can be a promising future only if large sized data is available, which will be required for automatic feature extraction as these can learn a hierarchical representation of features and improve the performance. Acknowledgements The authors would like to thank all the student participants at DIT University, Dehradun, Uttarakhand, India.
References 1. Ortiz JS, Guevara BS, Espinosa EG, Andaluz VH (2020) Smart university immersive virtual learning. In: 2020 15th Iberian conference on information systems and technologies (CISTI). IEEE, pp 1–5 2. Kwet M, Prinsloo P (2020) The ‘smart’ classroom: a new frontier in the age of the smart university. Teach High Educ 25(4):510–526 3. Kumar A, Sharma A, Arora A (2019) Anxious depression prediction in real-time social data. In: International conference on advances in engineering science management & technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India 4. Tull MT, Edmonds KA, Scamaldo K, Richmond JR, Rose JP, Gratz KL (2020) Psychological outcomes associated with stay-at-home orders and the perceived impact of COVID-19 on daily life. Psychiatry Res 113098 5. Luxton DD (2016) An introduction to artificial intelligence in behavioral and mental health care. In: Artificial intelligence in behavioral and mental health care. Academic Press, pp 1–26 6. Al-Turjman F (2019) Cognitive routing protocol for disaster-inspired internet of things. Futur Gener Comput Syst 92:1103–1115 7. Kumar A (2021) Machine learning for psychological disorder prediction in Indians during COVID-19 nationwide lockdown. Intell Decis Technol 15(1):161–172 8. Casagrande M, Favieri F, Tambelli R, Forte G (2020) The enemy who sealed the world: effects quarantine due to the COVID-19 on sleep quality, anxiety, and psychological distress in the Italian population. Sleep Med 9. Zhou D, Luo J, Silenzio V, Zhou Y, Hu J, Currier G, Kautz H (2015) Tackling mental health by integrating unobtrusive multimodal sensing. In: Proceedings of the AAAI conference on artificial intelligence, vol 29, no 1 10. Lim CKA, Chia WC (2015) Analysis of single-electrode EEG rhythms using MATLAB to elicit correlation with cognitive stress. Int J Comput Theory Eng 7(2):149 11. Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T et al (2016) Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry 21(10):1366–1371 12. Duan L, Zhu G (2020) Psychological interventions for people affected by the COVID-19 epidemic. Lancet Psychiatry 7(4):300–302 13. Xiang YT, Yang Y, Li W, Zhang L, Zhang Q, Cheung T, Ng CH (2020) Timely mental health care for the 2019 novel coronavirus outbreak is urgently needed. Lancet Psychiatry 7(3):228–229 14. Liu D, Ren Y, Yan F, Li Y, Xu X, Yu X et al (2020) Psychological impact and predisposing factors of the coronavirus disease 2019 (COVID-19) pandemic on general public in China
Empirical Analysis of Psychological Well-Being of Students During …
29
15. Ho CS, Chee CY, Ho RC (2020) Mental health strategies to combat the psychological impact of COVID-19 beyond paranoia and panic. Ann Acad Med Singapore 49(1):1–3 16. Yang Y, Li W, Zhang Q, Zhang L, Cheung T, Xiang YT (2020) Mental health services for older adults in China during the COVID-19 outbreak. Lancet Psychiatry 7(4):e19 17. Wang X, Hegde S, Son C, Keller B, Smith A, Sasangohar F (2020) Investigating mental health of US college students during the COVID-19 pandemic: cross-sectional survey study. J Med Internet Res 22(9):e22817 18. Chen Q, Liang M, Li Y, Guo J, Fei D, Wang L et al (2020) Mental health care for medical staff in China during the COVID-19 outbreak. Lancet Psychiatry 7(4):e15–e16 19. Lai J, Ma S, Wang Y, Cai Z, Hu J, Wei N, Hu S (2020) Factors associated with mental health outcomes among health care workers exposed to coronavirus disease 2019. JAMA Netw Open 3(3):e203976–e203976 20. Kamarianos I, Adamopoulou A, Lambropoulos H, Stamelos G (2020) Towards an understanding of University students’ response in times of pandemic crisis (COVID-19). Eur J Educ Stud 7(7) 21. Patsali ME, Mousa DPV, Papadopoulou EV, Papadopoulou KK, Kaparounaki CK, Diakogiannis I, Fountoulakis KN (2020) University students’ changes in mental health status and determinants of behavior during the COVID-19 lockdown in Greece. Psychiatry Res 292:113298 22. Kapasia N, Paul P, Roy A, Saha J, Zaveri A, Mallick R, Chouhan P (2020) Impact of lockdown on learning status of undergraduate and postgraduate students during COVID-19 pandemic in West Bengal, India. Child Youth Serv Rev 116:105194 23. Iannizzotto G, Nucita A, Fabio RA, Caprì T, Lo Bello L (2020) Remote eye-tracking for cognitive telerehabilitation and interactive school tasks in times of COVID-19. Information 11(6):296 24. Nilima N, Kaushik S, Tiwary B, Pandey PK (2020) Psycho-social factors associated with the nationwide lockdown in India during COVID-19 pandemic. Clin Epidemiol Global Health 25. Irawan DL (2020) Psychological impacts of students on online learning during the pandemic COVID-19 1. Jurnal Bimbingan dan Konseling (E-Journal). 26. Christian M, Purwanto E, Wibowo S (2020) Technostress creators on teaching performance of private universities in Jakarta during Covid-19 pandemic. Tech Rep Kansai Univ 62(6):2799– 2809 27. Cao W, Fang Z, Hou G, Han M, Xu X, Dong J, Zheng J (2020) The psychological impact of the COVID-19 epidemic on college students in China. Psychiatry Res 287:112934 28. Tiwari D, Bhati BS (2021) A deep analysis and prediction of COVID-19 in India: using ensemble regression approach. Artif Intell Mach Learn for COVID-19, pp 97–109 29. Sharma A, Sharma K, Kumar A (2022) Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion. Neural Comput Appl. https://doi.org/10.1007/s00 521-022-06913-2. 30. Kumar A, Sharma K, Sharma A (2022) MEmoR: a multimodal emotion recognition using affective biomarkers for smart prediction of emotional health for people analytics in smart industries. Image Vis Comput 104483 31. Kumar A, Sharma K, Sharma A (2021) Hierarchical deep neural network for mental stress state detection using IoT based biomarkers. Pattern Recogn Lett 145:81–87 32. Hawryluck L, Gold WL, Robinson S, Pogorski S, Galea S, Styra R (2004) SARS control and psychological effects of quarantine, Toronto, Canada. Emerg Infect Dis 10(7):1206 33. Kumar A, Sharma K, Sharma A (2021) Genetically optimized fuzzy C-means data clustering of IoMT-based biomarkers for fast affective state recognition in intelligent edge analytics. Appl Soft Comput
Handwritten Digit Recognition Using Machine Learning Mayank Sharma, Pradhyuman Singh Sindal, and M. Baskar
Abstract Today, we are living in a world where we are totally surrounded by numbers, the digits, directly or indirectly. The digits are everywhere whether it be your smart phones, latest gadgets, vehicles, technology-oriented infrastructures, etc. Today, we are focusing more and more on the digitalization and automation of almost everything. So now, it becomes very much important to get into the technologies like machine learning, artificial intelligence, deep learning, etc. to classify and recognize those digits. So, in this project, we tried to execute a model which recognizes the digits and makes our tasks and our difficulties very easier. Our model identifies the given digits with an accuracy of 98.40% which is actually great. We are using the convolutional neural network (CNN) to train our model. Keywords Digitial recognition · Machine learning
1 Introduction This information is the key to success in today’s world. This age is rapidly revolutionizing the way exchanges are taking place. Today, various tasks and actions are regularly being managed digitally, in place of handwriting or in-person. This growth in digital or e-exchanges has contributed to a higher demand for quick and accurate user identification and authentication. Access codes for buildings, bank accounts, bank checks, number plates of vehicles and various computer systems often use PINs and digits for identification and security clearances. M. Sharma · P. S. Sindal SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu 603203, India e-mail: [email protected] P. S. Sindal e-mail: [email protected] M. Baskar (B) Associate Professor, Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu 603203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_3
31
32
M. Sharma et al.
Having such widespread applications, the digits and the accurate recognition of those digits becomes very important. Imagine yourself writing some digits and your signature on a check or even your artistically printed number plates of your vehicles, all such places where digits play an important role [1]. This all would not work if your digits are not recognized properly, and if not, so then all your efforts might be in vain. So, the digit recognition technology becomes important and may solve all these problems. So, we are using the CNN algorithm for our project with the MNIST dataset which gives an accuracy of 98.40%.
2 Related Work Many digit recognition systems and software have been under test and have been executed within the last ten years. Every application utilizes various methods and various algorithms more than other software. Some digit recognition software extracts the digit properties from the given photo to identify the digit [2]. Other algorithms normalize a set of digit images and then reduce the digit data, and then secure the data in 1 photo that could be utilized for number prediction. The input image is compared with the digit data. Below is an introduction for some of the existing digit recognition programs that were used for various reasons. Anuj Dutt in one of his written papers showed that using deep learning models, he (Anuj Dutta) gets a very good performance by correctness. Apart from these, implementation of CNN by using TensorFlow shows an amazingly better result of 99.70% [3]. Notwithstanding the truth that the difficulty of the process and code seems to be more when gone against generic machine learning algorithms though the correctness the man received is clear. A resource paper multilayer perceptron (MLP) neural network was executed to learn and judge handwritten numbers from 0 to 9. The given neural network was instructed and examined on the data received from MNIST.
3 Existing Model 3.1 K-Nearest Neighbor KNN is an example of a rooted learning algorithm. There are two lead profits of utilizing KNN algorithms, i.e., it is vigorous to noisy instructing the dataset, and it is extremely effective if the dataset is huge in size. In order to carry out exemplary, this algorithm needs a set of instructing data which indulges ideal labeled data points [4]. The algorithm takes unique data points as its feed and executes classification by measuring distance b/w latest and named data points utilizing the Euclid’s or Hamming distance formulas.
Handwritten Digit Recognition Using Machine Learning
33
3.2 Random Forest Classifier It is an algorithm which is supervised. It deduces an instant connect in total no. of trees and the consequences it receives: The greater the count of trees, much accurate the result would be. For RFC algorithms, it ignores the overfitting affairs [5]. The classifier can handle the ignored counts. Once the instruction is complete, judgements are received from each and every tree, and the mean is measured using the equation.
3.3 Support Vector Machine It is an algorithm which is supervised in nature, in such a method, which are counted as points in an n-dimensional memory. The classifier funds by pursuing classifications b/w the 2 classes. The advantage of this model is that it gives a regularization measure which ignores the overfitting issues [6].
4 Proposed Model In this given task, we have completed digit recognition by making use of deep learning algorithms. We make use of convolution neural networks to find out the information of multi-layer networks in the process of digit recognition. And we also utilized the algorithm to make use of OpenCV to predict real entered digits. Experimental results have shown that we have achieved good results. Of course, there is a lot of scope in this project. This will be the focus for us in the future to continue to optimize the work. The implemented model is a real-time digit classification system that reads an image and recognizes any digit present in the image. The software is divided into two parts: training model and digit recognition.
4.1 Algorithm Suggested for Digit Detection The procedure starts with the initial image which is gray scaled to accelerate the processing. After applying the contours, only the digit part is selected for prediction. First, the digits are detected by drawing contours, and then the image is threshold, giving more importance on the special characteristics of the number.
34
M. Sharma et al.
4.2 Algorithm Suggested for Digit Recognition Data Acquisition The input can be a still image. We initially trained our model on the MNIST DATASET consisting of 70,000 images.
4.3 Input Processing A pre-processing was done on the image by normalizing it and doing one hot encoding.
4.4 Digit Classification and Decision Making Now the given network or the one which is made is then used to identify digits in a provided photo and if located then bound them surrounded by boundaries. A newly recorded pattern is pre-processed and compared with each digit print stored in the database. There are two ways to give instructions to the system. The user could either provide the photo of the number he/she is willing to judge or can provide numbers given in the dataset. The given photos are pre-processed. Utilizing the various classifiers, the known number correctness is dealed, and the outcome is obtained. The outcomes are shown with correctness (Fig. 1).
Input Image
Preprocessing
User
MNIST Data
Results with Accuracy
Fig. 1 Digit detection algorithm
Train and Evaluate Model
Classification Using Classifiers KNN, SVM, RFC & CNN
Handwritten Digit Recognition Using Machine Learning
35
5 Implementation Generic models of recognition mostly recognize numbers with a few certain things, for example, shape, size and their alignment, which have shown disadvantages. They are forgotten without any difficulty. If you utilize a few such certain things to recognize, the effect would be pretty well and good, for example: number identification, fingerprinting, etc. The advantage of this is the reduced memory needs, and the count of measures to be instructed is ultimately lowered. The implementation of the algorithm is therefore enhanced. Simultaneously, in other machine learning methods, the images require us to implement pre-processing or feature extraction. But we very often require these methods when using CNN for digit classification. 1 of those is the need for a lot of samples to make a deeper model, which hinders the utilization of this model. Today, pretty fine outcomes have been received in the space of digit recognition.
5.1 Convolutional Neural Network Introduction As various developments are made in CNN, it is vastly used for various purposes building it as a center of study. To increase the training interpretation for both forward and backward propagation number of learning parameters are reduced, it is achieved by using spatial relationship of the CNN [7]. In the structure of convolution neural networks, data is feed from the starting input layer, through every given layer by processing, and then in another structure, every stage has a convolution kernel to get utmost important data features.
5.2 CNN Model The neural network architecture is divided into following architecture: CNN structure is basically used to predict images like images of various digits. The structure consists of a total of seven layers, all the layers have training parameters, etc. except for the input layer, and we could get the input characteristics by convolution kernel. And every characteristic includes many neurons [8]. The image given displays the structure or the architecture (Fig. 2).
5.3 Digit Image Collection and Processing To process images using CNN, we need to have a large number of images so that our computer can learn. To do this, we will need a lot of pictures of handwritten digits
36
M. Sharma et al.
Fig. 2 Architecture diagram
of various textures [9]. The image given is the matter of the digit dataset to be used (Fig. 3). The CNN layer designed in this paper consists of a lot of layers including the input layers, convolution layers, max pooling layers, dropouts, all connected layers and many more. The last layer which is the output layer is that we have used logistic regression to do multi-faceted classification (Fig. 4).
Fig. 3 Digit dataset
Input Layer
Convolution + sampling Layer
Fig. 4 CNN structure diagram
Convolution + sampling Layer
Hidden Layer
Classification Layer
Handwritten Digit Recognition Using Machine Learning
37
Keras programming interface helps us in putting the dataset. We could see (60,000, 28, 28) as our outcome which shows there are 60,000 images in our dataset with 28 * 28 is the size of each image. The points are shown as a display of 784-d points and the range from 0 to 255. In order to understand, let us take an example such as 0 means black and 255 means white.
6 Result 6.1 Data Processing Our dataset consists of 60,000 images for training and 10,000 samples for tests. We are dividing our data in two different sets like train and test sets. The x_train & x_test includes grayscale codes, while on the other hand, y_test & y_train includes tags from 0–9 showing the numbers (Fig. 5). While checking the structure of our dataset in order to check if it is compatible with CNN, we can see (60,000, 28, 28) as our result that means there are 60,000 images in our dataset with 28 * 28 which is the size of each image (Fig. 6).
Fig. 5 Data processing
38
M. Sharma et al.
Fig. 6 Data normalizing
6.2 Building the Model We will be creating CNN models to predict handwritten digits. A CNN usually contains convolutional and pooling layers. CNN works well for problems consisting of image classification like digit recognition. We will also add dropout layer. We then compiled the model using Adam optimizer (Adaptive Moment Estimation) (Fig. 7). We have used TensorFlow as backend by using the keras Api. After that, we used the sequential model which is there in Keras and after that added various layers like convolutional layer, maxpooling layer, etc. Dropout layers help us in overcoming overfitting, while flatten layers flatten 2D arrays to 1D arrays (Fig. 8).
Fig. 7 Model building
Handwritten Digit Recognition Using Machine Learning
39
Fig. 8 Model summary
6.3 Compiling and Fitting/Training the Model Until now just a CNN has been created which is initially empty. After that, we used an optimizer with a loss function which uses a metric and trains our model on the training date. The Adam optimizer is used as it works well as compared to other ones (Fig. 9). The model.fit() function in keras does the work to make model trains on various dataset. We are getting a good accuracy with a smaller number of epochs since our dataset is well balanced. Until the difference between accuracy and val_accuracy is not major our model is doing great.
6.4 Evaluate the Model We have about 10,000 pictures in the dataset which we will be using to check the working of our model. We used training data to train our model and test data to evaluate it [10]. The used MNIST dataset is made in such a way that we can get around 99% accuracy (Fig. 10).
40
M. Sharma et al.
Fig. 9 Compiling the model
Fig. 10 Training model
6.5 Predictions In this, we have selected a picture and process it to do the prediction then display both the image and prediction to see if it is accurate. We first check from the testing sample of the MNIST dataset itself, before moving to user-based input (Fig. 11). Now, moving to check predictions for user-based input images, which are drawn on the paint software. Following images which are used as user input (Fig. 12). We will be using the OpenCV library for Python to handle images input. Input images must be resized and normalized first to be processed for further steps (Figs. 13 and 14). Let us also check for some other images and their predictions (Fig. 15).
7 Conclusion In this given task, we have completed digit recognition by making use of deep learning algorithms. We make use of convolution neural networks to find out the information of multi-layer networks in the process of digit recognition. And we also utilized the
Handwritten Digit Recognition Using Machine Learning
41
Fig. 11 Prediction
Fig. 12 Sample image
algorithm to make use of OpenCV to predict real entered digits. Experimental results have shown that we have achieved good results. Of course, there is a lot of scope in this project. This will be the focus for us in the future to continue to optimize the work.
42
Fig. 13 Drawing contours
Fig. 14 Contouring images
M. Sharma et al.
Handwritten Digit Recognition Using Machine Learning
43
Fig. 15 Output
References 1. Vinjit BM, Bhojak MK, Kumar S, Chalak G, A review on handwritten character recognition methods and techniques 2. Wu M, Zhang Z (2010) Handwritten digit classification using the MNIST dataset 3. Dutta A, Dutta A (2017) Handwritten digit recognition using deep learning. Int J Adv Res Comput Eng Technol (IJARCET) 6(7) 4. Mihalyi RG (2011) Handwritten digit classification using support vector machines 5. Jain G, Ko J (2008) Handwritten digits recognition, project report, University of Toronto, 11/21/2008 6. Al Maadeed S, Hassaine A (2014) Automatic prediction of age, gender, and nationality in offline handwriting. EURASIP J Image Video Process 1 7. http://cvisioncentral.com/resources-wall/?resource=135 8. MNIST image-Lim S-H, Young S, Patton R (2016) An analysis of image storage systems for scalable training of deep neural networks 9. Ruiz-Castilla J-S, Rangel Cortés J-J, García-Lamont F, Espinosa AT (2019) Chapter 54 CNN and metadata for classification of Benign and Malignant Melanomas. Springer Science and Business Media LLC 10. Norris DJ (2020) Chapter 6 CNN demonstrations. Springer Science and Business Media LLC
A Smart Movie Recommendation System Using Machine Learning Predictive Analysis Pranshi Verma, Preeti Gupta, and Vijai Singh
Abstract These days, recommendation engines are so pervasive that many of us are not even aware of them. A recommendation system is essential for a better user experience because no one could possibly read through all of a website’s offerings or information. Additionally, it makes more inventories visible that would otherwise be hidden. Amazon’s review sites, Netflix’s proposals for shows and movies in your newsfeed, YouTube’s suggested videos, Spotify’s suggested music, Instagram’s newsfeed, and Google AdWords are all examples of recommender systems in use. This study suggests a Python-based machine learning predictive analysis-based intelligent movie recommendation system (RS). In this study, RS uses the correlation between numbers of factors to obtain precise results. According to the simulation results, RS has improved in terms of content and data similarity from users. The purpose of this study is to acquire the skills necessary to carry out feature engineering, handle missing values, manipulate data in accordance with specifications on real-time data, and suggest appropriate things specific content and likeness. Keywords Item-based · Jupyter Notebook · Machine learning · Movies · Predictive analysis · Python rating · Recommendation
1 Introduction Some people are fond of online shopping. There are many online shopping sites such as Amazon, Myntra, Flipkart. Every such website recommends some products which surprisingly seem to be of one’s interest. All such recommendations are made using the recommender system structure embedded in several online sites. The recommender system has grown in importance in the digital world, as consumers are sometimes overwhelmed by options and want aid in locating what they are looking for. Trends in user behavior are detected using machine learning algorithms, which P. Verma · P. Gupta (B) · V. Singh Department of Computer Science and Engineering, Inderprastha Engineering College, Ghaziabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_4
45
46
P. Verma et al.
are then utilized to offer new things to them. This results in happier customers, which leads to an increase in sales. A recommendation system can assist any company. Several companies that use a recommendation engine have provided examples. On most pages of their website and in their email campaigns, Amazon.com uses item-toitem collaborative filtering recommendations. According to McKinsey, recommendation algorithms account for 35% of Amazon transactions. Another data-driven corporation that uses recommendation algorithms to improve consumer satisfaction is Netflix. According to the same Mckinsey study cited above, recommendations influence 75% of Netflix streaming. Netflix is so preoccupied with producing the greatest results for its viewers that they hold data science competitions dubbed the Netflix Prize, in which the winner of the most accurate movie recommendation algorithm receives a prize of $1,000,000. You might also like or “You might know” recommendations on LinkedIn. The following two factors influence how much a company gains from a recommendation system: (i) The scope of the data, (ii) Data volume. Because of the volume of data, an organization serving a small client base who behaves in a multitude of ways will not benefit from an automated recommendation system. From a few examples, humans are still far preferable to technology when it comes to learning. In such scenarios, staff will specialize in providing recommendations based on their intellect, and qualitative and quantitative understanding of users. Having a single data point for each user, on the other hand, is not favorable for software applications. Deep data on a customer’s browsing activities, as well as offline purchases, if possible, can ultimately accelerate accurate recommendations [2]. The following are some of the reasons why service providers need to use RS technology: 1. 2. 3. 4.
Increase the product’s sale prices Selling a wide range of products Increased customer loyalty and share of mind A better understanding of user needs.
Types of Recommender Systems See Fig. 1. 1. Recommendation System using Information: Based on the users’ previous actions or feedback, information screening suggestion, generally called contextual screening, uses object characteristics to advise additional goods that are identical to whatever they like. The movie is processed by the suggestion system that relies on its characteristics. Material filtering encompasses identifying products with unique keywords, understanding
A Smart Movie Recommendation System Using Machine Learning …
47
Fig. 1 Types of recommender systems
the internet and online, searching the database for those keywords, and promoting comparable models. 2. Collaborative Recommender System: This methodology allows suggestions based on “usage patterns.” There have been no characteristics that correspond to similar users in this methodology. It makes use of such a utilitarian matrix but is the most result of organization in enterprises so it is unaltered by extra data. It is based on historical information, and the implicit premise is that users who have received services are more ready to comply in the upcoming. This kind of recommender system needs nothing more than the user’s previous preferences for a group of objects. 3. Combination Recommender Framework: A combination recommender framework combines any of the two methodologies in such a way that is tailored to a certain sector. This is the most sought-after recommender framework, as it combines the benefits of many recommender frameworks and eliminates any shortcomings that may emerge only when one personalized recommendation framework is used.
1.1 Contribution In this paper, we propose a machine learning-based smart movie recommendation system. We conducted research and made basic recommendations based on similar genres and films that users enjoy. The goal of this work is to learn how to execute feature engineering, missing values handling, data manipulation according to requirements, and recommendations based on content and similarity. The following are some of the key points covered in this paper: • • • •
Movies’ competitiveness Expansive evaluation based on ethnicity Comparing revenue and monetary gain for various genres Celebrities, feature films, and genres are used to create recommendation algorithms.
48
P. Verma et al.
2 Related Works The concept of recommendation engines has attracted a lot of attention. Despite the fact that several academics are exploring alternative strategies for every category of recommendation system, data analysts like it very much. Data analysts utilize a variety of techniques for each category of recommender systems. A recommender system that uses a collaborative filtering technique to provides a user with the best research articles in their field based on the searches and patterns discovered from other customers’ searches, saving the client time by eliminating time-consuming searching. It helps clients save time by reducing the amount of time they spend browsing [10]. They had to create a synthetic dataset with evaluations for the published papers because there was no pre-prepared dataset with current research ratings available. The method takes advantage of the domain’s unique features to provide a strategy that is quick and produces high-quality suggestions, and they had to build synthetic data with evaluations for the published papers because there was no pre-prepared dataset with current research ratings available. A movie recommendation system used Apache Mahout to develop two information retrieval algorithms [14]. This research will also explore ways to analyze data using Python’s Matplotlib modules and gain insights into the film database. A movie recommender system uses an item-based collaborative filtering method to generate dynamic item recommendations that learn from positive comments [7]. Various Python programming paradigms are used in a movie recommendation engine that uses content-based filtering and collaborative-based filtering. It uses a data set of different movies published between July 2017 and July 2018 to provide recommendations using a range of methods, including a simple recommendation system, content-based filtering, and collaborative-based filtering. KNN algorithms and collaborative filtering were coined as a system to improve accuracy and exhibit efficacy [4]. The research’s primary purpose was to improve the accuracy of results by using content-based filtering. This method relies on cosine similarity, k-nearest neighbor, and a collaborative filtering mechanism to circumvent the drawbacks of content-based filtering. Cosine similarity is used instead of Euclidean distance since cosine angle accuracy and movie equidistance are virtually equivalent. A recommendation system based on the user’s preferred genres of entertainment offers content-based filtering [11] with genres correlation. The system’s dataset is the Movie Lens dataset. R is the data statistical package used in this system. A major focus is on leveraging a public steam dataset to create a genres-based and topic modeling approach in a recommender system to forecast a user’s game rating. In order to create a hybrid recommender system, both models will be integrated. The KNN technique is used in this model to predict a customer’s rating. The Python programming language was used to create this system. For the data cleaning process, several Python packages were used. All of the projected ratings were examined and compared to one another. The genre-based method surpasses both topic modeling and hybrid models in terms of performance.
A Smart Movie Recommendation System Using Machine Learning …
49
3 Working Environment and Technology Used In this study, we presented a novel machine learning-based paradigm for movie recommendation systems. We conducted research and made basic suggestions based on similar genres and films that people enjoy, as well as analyzed numerous parameters to improve the efficiency of recommendation engines. Anaconda Jupyter Notebook is used to implement our planned work in Python.
4 Proposed Methodology This research establishes a recommender system for cinephiles. The workflows of the chosen technique are depicted in Fig. 2. Our proposed system consists of six steps. Create a working environment and download the IMDb movies dataset as a first step (the dataset website link is present in reference). Import the dataset in CSV files as well as the essential libraries into Jupyter Notebook in the second stage. Next, we clean up our dataset for comprehensive equality and analyze it to impute missing values in the third stage. In the fourth stage (feature engineering), we compute the top 10 movies that raise millions, alter the columns for period and linguistic and extract movie genres for further analysis using feature extraction and various parameters. Afterward, in the fifth phase (data visualization), do data visualization and calculate the top 10 best successful feature films on social sites, then determine which genres are the most bankable and calculate financial benefit and loss based on multiple dialects. Calculate the box office for movies depending on the specified time frame. Find a link between the IMDB Dataset Rating and the length of the movies. Examine the top 10 films based on box office receipts and IMDB scores, then compare highly rated performers and flicks. After this in-depth research, our system recommends movies based on affinities in dialects, characters, genres, and other aspects in the last significant chunk (step six recommendations).
5 Implementation and Result To see if our recommender system is providing reliable results or not, we conduct several analyses and evaluate our findings using a variety of parameters. The actions we took to implement our RS are listed below. Step 1: Create a Working Environment and Download the IMDb Movies Dataset To implement our proposed system, initially download the data set from the internet and create a working environment in Jupyter Notebook. The movie dataset was
50
P. Verma et al.
Fig. 2 Workflow
obtained from the IMDb website, which is a prominent and free source of information about movies, TV series, home videos, video games, and online streaming entertainment. Step: 2 Import the Dataset and Required Libraries In this stage, import the dataset in CSV format, as well as the necessary libraries, such as Numpy, Pandas, Matplotlib, Seaborn, Plotly Express, and Ipywidgets, into Jupyter Notebook. After that remove some extraneous columns from the dataset and inspect the remaining columns. Step: 3 Missing Values Imputation Clean the dataset for further evaluation in this step. To begin, look for rows in the dataset that have a high percentage of missing values. Using the mean and mode values functions, fill in all the missing values, provided input returns zero, indicating that there are no missing values in our dataset and that it is ready for further research and analysis.
A Smart Movie Recommendation System Using Machine Learning …
51
Step: 4 Feature Engineering (A) Calculate Top 10 Movies that Raise Millions Feature engineering is employed to generate new feature profit, and then the top 10 movies that raise millions are examined. The parameters listed below are used to calculate a film’s earnings. Budget: The total amount of money spent by producers to make a film, including production, casting, and advertising expenditures. Revenue: Producers profit from the distribution of their films in theaters, the sale of satellite rights to television, and the sale of OTT platforms such as Prime, Hulu, Disney + Hotstar, and Netflix. Profit: Budget-revenue in our calculations, we used this formula to determine the top most profitable movies. (B) Manipulate the Duration or Language Column and Extract the Movie genres We can figure out which kind of films do better at the box office and which films are more popular among users. Step: 5 Data Visualization (A) Calculate the Top 10 Most popular Movies on social media As part of this stage, visualize your data. Using bar plot, effectively analyze the top 10 most popular movies on social media. In the simulation, the social media popularity of these movies is computed using the formula. Formula: (No. of People Reviewed for Movie/No. Of People Voted for Movie) * No. Of Facebook Likes. Check out the Top 10 Most Popular Movies on social media after determining a movie’s Social Media Popularity.0 (B) Analyze which genres are most Bankable A genre investigation has the most potential to assist the film business in making much critical monitoring and financial decisions. The data must now be grouped by genres, and the results must be aggregated into a minimum, average, and maximum revenue. We have plotted these figures on a line chart (Fig. 3). (C) Analyze profit and loss based on the English and foreign dialect movies We examine the years in which the box office made the greatest profit among other years using the results of which genres are most bankable. After that, we must categorize the data by title year and total the profit earned in each of the years. Then, using the time series, plot the profit of English and foreign dialect movies, with blue denoting English and red denoting foreign dialect films (Fig. 4).
52
P. Verma et al.
Fig. 3 Line chart shows gross with genres
Fig. 4 Time series for box office profit for English versus foreign movies
(D) Compare the gross of movies based on that duration We determine that movies with a longer runtime have greater banking capacity. Then, using a grouped bar chart from the Seaborn library, we examine the impact of gross on long and short duration films based on English and foreign language movies, where the orange color represents foreign-language movies and the pink color represents English language movies (Fig. 5).
A Smart Movie Recommendation System Using Machine Learning …
53
Fig. 5 Based on English and foreign language movies, the impact of gross on long and short duration
(E) Compare critically acclaimed actors and movies In this stage, we compare some of Hollywood’s most well-known and popular critically regarded performers. It’s a new feature in recommendations that’s available on the largest OTT platforms to help users get more accurate results. (F) Analyze the top ten films based on their box office receipts and IMDB ratings These analyses are carried out with the help of interactive widgets from the Ipywidgets Library. We create an interactive function to do a query in an interactive manner, and we set the score to nine in the function. This parameter will offer us a slider with which we can modify the values. Also, the result will be arranged by IMDB score. Then we apply it to the two columns budget and gross. When we alter the value in the dropdown, the result changes. Step: 6 Recommendations (A) Recommend Movies Based on Languages and actors We assume that the majority of people have strong preferences for performers and dialects. Because we only have two dialect options, English and foreign, a system has been designed to recommend the best movies based on dialects. For more accurate results, we select the dialect column, the movie title, and the IMDB score as recommendations. After that, we just sort this data set in descending order based on IMDB scores to get the highest-scoring IMDB movies at the top (Fig. 6).
54
P. Verma et al.
Fig. 6 Recommending movies based on dialects
Fig. 7 Recommend movies based on actor Tom Cruise
After that, we must examine all of the dataset’s actor columns and sort them in descending order after combining them and take the top 15 values and perform some testing. For instance, look for the actor “Tom Cruise.” (Fig. 7). (B) Recommend Movies Based on similar genres Most individuals who view movies on OTT platforms like Prime, Hulu, Disney + HotStar, Netflix, and others use genres to narrow down what they want to see. As a result, we are recommending more movies to viewers based on comparable genres for a better experience. To finish this phase, we will need to collect the pivot of all movie genres using a transactional encoder. Determine whether or not a specific film belongs to a genre, and vice versa. For example, action See Fig. 8.
Fig. 8 Result in action genres adventure, Sci-Fi, and thriller are the most similar genres to action
A Smart Movie Recommendation System Using Machine Learning …
55
Fig. 9 List includes The Expendables 3, The Expendables 2, Mission: Impossible, James Bond, and many other similar films
(C) Recommend similar Movies In this phase, we examine the user’s previous viewing history and propose similar movies to them. To do so, we establish a sparse matrix to recommend movies, then transpose the pivot table we made to promote comparable genres and use the correlation function to determine how similar the movies are. Afterward, we built a recommendation engine to find related movies and tested it on some data. For example, the movie “The Expendables.” (Fig. 9).
6 Conclusion and Future Work The proposed system’s outcome is entirely dependent on data quality. Proposed tactics and approaches that are most effective in a certain analytical scenario are still a mystery. In this work, we used many types of analysis and various parameters to improve the results of our RS. Following this in-depth examination, we conclude that domain expertise is critical for achieving more precise outcomes in the era of machine learning-based recommendation engines. Domain knowledge always aids in the discovery of a plethora of previously unknown patterns, correlations, and relationships. As a result, when solving analytical problems, it is critical to have a solid and clear understanding of the subject. Untidy data in this system might sometimes lead to unfavorable results. Overall, the paradigm we provide is quite solid. It may be readily embedded in any website, mobile application (iOS, Android, or native), or worldwide OTT platform to execute suggestions and work on both little and large amounts of data in real time. Using the IMDB movie dataset, we were able to obtain 99.99% accuracy with our RS. Still need to figure out better ways to cut across categories and recommend items from the catalog on a larger scale in future work.
56
P. Verma et al.
References 1. Agrawal S, Jain P (2017) An improved approach for movie recommendation system. In: International conference on I-SMAC (IoT in social, mobile, analytics, and cloud) (I-SMAC 2017. https://doi.org/10.1109/I-SMAC.2017.8058367 2. Dilmegani C (2017) Recommendation systems: applications, examples & benfits. https://res earch.aimultiple.com/recommendation-system/ 3. Falk K (2019) Practical recommender systems. Manning Publications, 1st edn, 2 Feb 2019 4. Gupta M, Thakkar A, Aashish G, Vishal R, Pratap Singh D (2020). Movie recommender system using collaborative filtering. In: Proceedings of the international conference on electronics and sustainable communication systems (ICESC 2020) IEEE Xplore Part Number: CFP20V66ART; ISBN: 978-1-7281-4108-4. https://doi.org/10.1109/ICESC48915.2020.9155879 5. https://www.imdb.com/interfaces/ 6. Kumar S, De K, Roy PP (2020). Movie recommendation system using sentiment analysis from microblogging data. In: IEEE transactions on computational social systems, pp 1–9. https:// doi.org/10.1109/TCSS.2020.2993585 7. Kharita MK, Kumar A, Singh P (2018) Item-based collaborative filtering in movie recommendation in real time. In: First international conference on secure cyber computing and communication (ICSCCC).https://doi.org/10.1109/ICSCCC.2018.8703362 8. Li H, Cui J, Shen B, Ma J (2016) An intelligent movie recommendation system through grouplevel sentiment analysis in microblogs. Neurocomputing (S0925231216305872). https://doi. org/10.1016/j.neucom.2015.09.134 9. Mohanty SN, Chatterjee JM, Jain S, Elngar AA, Gupta P (2020) Recommender system with machine learning and artificial intelligence (practical tools and applications in medical, agricultural and other industries). In: An introduction to basic concepts on recommender systems. https://doi.org/10.1002/9781119711582(pp.1-25). https://doi.org/10.1002/ 9781119711582.ch1 10. Murali MV, Vishnu TG, Victor N (2019) A collaborative filtering based recommender system for suggesting new trends in any domain of research. In: 5th international conference on advanced computing & communication systems (ICACCS), pp 550–553. https://doi.org/10. 1109/ICACCS.2019.8728409 11. Reddy MM, Sujithra KR, Surendiran B (2020) Analysis of movie recommendation systems; with and without considering the low rated movies. In: International conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–4. https://doi.org/10.1109/ ic-ETITE47903.2020.453 12. Sang-Min C, Sang-Ki K, Yo-Sub H (2012) A movie recommendation algorithm based on genres correlations. Expert Syst Appl 39(9):8079–8085. https://doi.org/10.1016/j.eswa.2012. 01.132. 13. Satapathy SC, Bhateja V, Das S (2019) Content-based movie recommendation system using genres correlation. Smart intelligent computing and applications volume 105 (Proceedings of the second international conference on SCI 2018, Volume 2)||Content-based movie recommendation system using genres correlation. https://doi.org/10.1007/978-981-131927. (chapter 42), pp 391–397. https://doi.org/10.1007/978-981-13-1927-3_42 14. Wu C-SM, Garg D, Bhandary U (2018) Movie recommendation system using collaborative filtering. In: IEEE 9th international conference on software engineering and service science (ICSESS), pp 11–15. https://doi.org/10.1109/ICSESS.2018.8663822 15. Widiyaningtyas T, Hidayah I, Adji TB (2021) User profile correlation-based similarity (UPCSim) algorithm in movie recommendation system. J Big Data 8(1). https://doi.org/10. 1186/s40537-021-00425-x 16. Zhang J, Wang Y, Yuan Z, Jin Q (2020) Personalized real-time movie recommendation system. Practical prototype and evaluation. Tsinghua Sci Technol 25(2):180–191. https://doi.org/10. 26599/TST.2018.9010118
A Strategy to Accelerate the Inference of a Complex Deep Neural Network P. Haseena Rahmath, Vishal Srivastava, and Kuldeep Chaurasia
Abstract Deep learning is an effective ML algorithm capable of learning and extracting deep representation of data with utmost accuracy. The outstanding performance of deep learning models comes with a series of network layers that demand high computational energy and add latency overhead to the system. Inference of a deep neural network (DNN) completes and delivers output after processing all the network layers irrespective of the input pattern. The complexity of the deep neural network prohibits its usage in energy-constrained low latency real-time applications. A possible solution is multi-exit neural networks that introduce multiple exit branches to the standard neural networks. These early exit neural networks deliver output from their intermediate layers through exit points based on specific confidence criteria. The majority of the input sample can be processed at the initial layers of the network, while more complex input samples can be forwarded further for processing to the remaining layers of the network. This paper analyzes the performance of early exit deep neural networks against their confidence criteria and the number of branches. This study also evaluates the classification accuracy among exit branches. For analysis, implements an object detection application using the early exit MobiletNetV2 neural network and Caltech-256 as datasets. The experiments prove that early exit DNN can speed up the inference process with acceptable accuracy, and the selection of confidence criteria has a significant impact on the system performance. Keywords Deep neural networks · Early exit deep neural networks · Fast inference
P. Haseena Rahmath (B) · V. Srivastava · K. Chaurasia Bennett University, Greater Noida, Uttar Pradesh, India e-mail: [email protected] V. Srivastava e-mail: [email protected] K. Chaurasia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_5
57
58
P. Haseena Rahmath et al.
1 Introduction Among various machine learning algorithms, deep learning has proved its remarkable capabilities in data processing and analysis. With more complex network structures, a deep neural network extracts more features and performs better, but it demands high computational costs and execution time during deployment. For a real-time application with energy and resource-constrained devices such as mobile or IoT devices, implementing a complex DNN is a challenging task. Recent studies proposed techniques like model compression [2], quantization [16], and optimized implementation [22] for reducing the complexity of the DNN model. These approaches ignore the natural complexity variation of input samples. For obtaining a prediction, they mandate full-stack execution of all DNN layers irrespective of the nature of the input pattern. Teerapittayanon et al. [20] introduced an early exit architecture (BranchyNet) that modifies the standard DNN structure by adding multiple side exit branches to the intermediate layers. In this type of architecture, inference can stop early, and output can deliver from the intermediated layers of DNN if the confidence obtained in the added exit points meets some predefined threshold values. Early exit DNN is an ideal choice in edge computing [17] as it requires less computational resources and delivers results very quickly from its early exit branches. Its branching structure gives more flexibility to partition the DNN model among device, edge, and cloud servers. This paper analyzes an early exit DNN, its architecture, and the relationship among parameters such as threshold values, exit branches, inference time, and accuracy. The analysis exposes the constraints of early exit DNN and elicits significant research interest in this emerging family of neural networks. The main contributions of this paper are the following: • Construction of a multi-branch early exit DNN using MobileNetV2 and an object detection application using the Caltech-257 dataset. • Analysis of classification accuracy and processing time of input samples using different threshold values and across all exit branches. • Investigation of the impact of predefined threshold values on execution time and accuracy. Examination of the effect of early exit branches on processing time. The remainder of this paper is organized as follows: Sect. 2 discusses the background and related literature, and Sect. 3 briefly reviews the model architecture of the standard deep neural network and early exit deep neural network. Section 4 explains the training and inference process of the early exit DNN, and Sect. 5 covers the methodology adopted for the study. Section 6 analyzes the early exit DNN in terms of its processing time, threshold, and the number of branches. Finally, Sect. 7 concludes this work and presents various applications of early exit DNN.
A Strategy to Accelerate the Inference of a Complex Deep Neural Network
59
2 Background and Related Works Many recent contributions [2, 7, 8] considered various DNN models compression techniques such as weight pruning, quantization, and compact network architecture implementation to reduce model complexity. Weight pruning is a widely used model compression method that removes redundant weights from a trained DNN. A magnitude-based weight pruning is applied in [3] that removes small weights whose magnitudes are below a threshold. The quantization technique uses a more compact format to represent layer inputs and weights [16], whereas compact network architecture implementation reduces the number of weights and operations by improving the network architecture itself. Optimized implementation techniques are proposed in [10, 14, 22] to reduce the execution time by making the neural network computation algorithmically faster. Kim et al. [6] and Umuroglu et al. [21] tried various hardware acceleration techniques to accelerate the neural network execution at the hardware level. Recent studies [11–13, 24] claimed that smaller neural network architectures process most input patterns accurately. Leroux et al. [12] achieved top-5 accuracy in feature extraction on one by third of the ImageNet dataset with a single convolutional layer. Kaya et al. [5] and Wang et al. [25], argues that smaller network architectures avoid the over-thinking phenomenon of deeply layered architectures, which results in incorrect predictions. Laskaridis et al. [9] and Passalis et al. [18] demonstrated that features extracted at the layers closer to the input layers can effectively classify the majority of the samples from a given dataset. Based on these facts, a new family of neural networks called early exit neural networks has emerged that can leverage output at intermediate layers of DNN depending upon the nature of the input pattern. BranchyNet [20] is the first such early exit convolutional neural network that introduced many side exit branches to standard deep neural network architectures (LeNet, AlexNet, and ResNet). Rather than searching for a single, efficient neural network for processing all the input samples, early exit neural networks provide multiple exit points which deliver output early if the confidence of the exit branch matches with a predefined threshold value.
3 Early Exit DNN Architecture Deep neural networks are efficient machine learning tools constructed as a stack of intermediate layers. Each layer consists of numerous neurons or computational units that take input from previous layers and pass through a nonlinear function. These neurons are connected, and the weights associated with these connections are learned during the training process. The layers may be fully connected layers or convolutional layers. In a fully connected layer, each neuron connects to all other neurons present in the subsequent layer, while in the convolution layer, neuron connections are more localized in which neurons connect to the nearest neuron that shares their weights
60
P. Haseena Rahmath et al.
Fig. 1 Standard deep neural network [15]
across the group. Figure 1 shows the standard neural network where the input layer receives the input sample and passes it to the subsequent layers up to the output layer. Early exit deep neural networks consist of one entry point and multiple exit points. The multiple exit points are created by introducing branches. Each branch contains more than one neural network layer followed by an exit point. The early exit DNN performs predictions and stops inference early, without being processed by the entire DNN, which accelerates the inference process. This paper implemented a standard MobileNetV2 neural network and created branches to it based on the procedure proposed by BranchyNet [20]. A generic MobileNetV2 network acts as a backbone of the structure, and five exit branches are added to it to construct an early exit MobileNetV2 network, as shown in Fig. 2. These added side branches perform prediction and deliver output if the prediction confidence at each exit point is above some predetermined confidence threshold.
Fig. 2 MobileNetV2 neural network with five exit branches
A Strategy to Accelerate the Inference of a Complex Deep Neural Network
61
4 Training and Inference of Early Exit DNN Early exit DNN can be trained in three ways [19]. In the first approach, each exit branch is trained separately, where each exit point training includes that exit branch and the preceding layers on the backbone network [4]. The second approach considers the training of the entire architecture as a single optimization problem that covers all exit branches [11], and the third approach trains the backbone network first and then trains each exit point separately [23]. This study used the second approach proposed by BranchyNet [20] to train the early exit DNN. Training completes in two steps: forward training and backward training. Forward propagation trains the backbone networks along with side branches. It records the output from each exit branch and calculates network error. In the backward training, the calculated error passes through the backbone network and adjusts the weight parameters using a gradient descent algorithm (Adam). During the training process, each exit branch leans its parameters and tries to minimize the optimization function, which is a softmax cross-entropy loss function as given in Eq. (1) [20]: 1 L yˆ , y; θ = yc log yˆc |C| c∈C
(1)
where yˆ = softmax(z) =
exp(z) c∈C exp(z c )
(2)
and z = E n (x; θ )
(3)
where x is the input sample, y is the one-hot ground-truth label vector and C is the set of all possible labels. E n represents the output of the nth exit branch while θ represents the set of parameters of the layers from the entry point to the corresponding exit point. Training of the entire model performs as a joint optimization problem, and the weighted sum of all the loss functions calculated at each exit branch is the objective function for optimization as given in Eq. (4) [20]. N L yˆ , y; θ = wn L( yˆn , y; θ n=1
where N is the total number of exit branches and wn is the weights.
(4)
62
P. Haseena Rahmath et al.
Inference on Early Exit DNN Once trained, early exit deep neural networks can receive any input sample and deliver inference quickly by processing the sample at the early exit branches instead of being processed by an entire DNN. The input layer receives the input sample and processes it layer by layer until it reaches an exit branch. At the exit branch, it passes through a set of fully connected neuron layers and produces outputs. This output, z = E n (x), is then passed through a softmax function, y = softmax(z), and finally calculates entropy as given in Eq. (5) [20]: entropy(y) =
yc log yc
(5)
c∈C
where y is the computed class labels probability vector and C is a set of all possible labels. If the calculated entropy of an exit branch is above a predetermined threshold, T, then the input sample finishes the inference and returns the result from that exit point with no further computation on higher layers. Otherwise, the input sample processing continues to the subsequent layers and then to the adjacent exit points until it reaches the output layer. At the final output layer, it always returned with the output and entropy score.
5 Methodology For analyzing the early exit DNN, the study implemented an object detection application using the MobiletNetV2 neural network and Caltech-256 datasets. MobileNetV2 is a lightweight convolutional neural network introduced by Google. MobileNetV2 performs better on mobile devices as it requires less computation when compared to a standard convolution neural network. It consists of 19 inverted residual layers preceded by convolutional and rectified linear unit layers. The data is then passed to a drop-out layer, an average pooling layer, a fully connected layer, and a softmax layer to perform prediction. This study used a pre-trained MobileNetV2 standard neural network and created five branches to it, as shown in Fig. 2. The Caltech-256 [1] is an image dataset that includes 30,607 real-world images, categorized into 256 classes of varying-sized images such as coffee, sunflower, headphone, and one clutter class. Each category contains at least 80 images. Eighty percentage of the dataset is used for training the model and 20% for analysis. Figure 3 illustrates the methodology used for the experiment. Weights are initialized from the pre-trained MobileNetV2 model. The parameters of the main branch are adjusted, and the parameters of exit branches are learned during the training process. And, the trained model is saved for the inference. The analysis phase records the processing time and classification accuracy of each input sample against different threshold values along with five exit branches. The threshold values used for the
A Strategy to Accelerate the Inference of a Complex Deep Neural Network
63
Fig. 3 Methodology used for the analysis of early exit DNN
analysis are 0.7, 0.75, 0.8, 0.85, and 0.9. It also examines the impact of threshold values and exit branches on both classification accuracy and processing time.
6 Analysis of Early Exit DNN This section evaluates the performance of the early exit MobileNetV2 model against different confidence criteria along five exit branches. It compares the image classification accuracy of exit branches with various threshold values, as shown in Figs. 4, 5, and 6. It also analyzes the correlation between the inference time and the processing of input samples at each exit branch, as shown in Fig. 7. Finally, it evaluates the tradeoff between processing time and pre-selected confidence criteria, as shown in Fig. 8. Figure 4 shows the image classification accuracy and processing time of early exit DNN at branches 2, 3, 4, and 5 against the threshold value of 0.7. As shown in Fig. 4a, none of the images is classified at the second branch, while at branches 3, 4, and 5, the percentage of images correctly classified are 19%, 57%, and 85%, respectively, as illustrated in Fig. 4b–d. Figure 5 displays the image classification accuracy and processing time at branches 2, 3, 4, and 5 with the threshold value of
64
P. Haseena Rahmath et al.
Fig. 4 Image classification accuracy and processing time on branches 2, 3, 4, and 5 with 0.7 as a threshold value
Fig. 5 Image classification accuracy and processing time on branches 2, 3, 4, and 5 with a threshold value of 0.8
0.8. As shown in Fig. 5a, the second branch fails to classify any images, while in branches 3, 4, and 5, the percentage of images correctly classified are 14%, 49%, and 81%, respectively, as shown in Fig. 5b–d. Figure 6 depicts image classification accuracy and processing time at branches 2, 3, 4, and 5 with the threshold value of 0.9. The second branch again failed to classify
A Strategy to Accelerate the Inference of a Complex Deep Neural Network
65
Fig. 6 Image classification accuracy and processing time on branches 2, 3, 4, and 5 with a threshold value of 0.9
Fig. 7 Overall processing time in each branch
images (Fig. 6a), while in branches 3, 4, and 5, the percentage of images correctly classified is 8%, 39%, and 72%, respectively, as shown in Fig. 6b–d. Figures 4, 5, and 6 indicate that none of the images is classified at the first and second exit branches of early exit DNN irrespective of the confidence criteria selected. As these branches are very close to the input layer, the model might have failed to extract enough features at early layers for correct image classification. The image classification accuracy linearly increased while traversing to higher level exit branches. It is also observed that the performance of exit branches depends
66
P. Haseena Rahmath et al.
Fig. 8 Overall processing time in terms of threshold values
on the predetermined threshold value. For instance, at exit branch five, 85% of images are correctly classified when experimented with a threshold value of 0.7, while only 72% of images are classified with 0.9 as the threshold value. Figure 7 depicts how the processing time changes while traversing input samples through different exit branches. An input sample takes almost twice the processing at branch 3 in comparison with the processing time at branch 2. It shows the effectiveness of early exit DNN as it tries to process the input sample at the earliest exit branch possible. The tradeoff between processing time and pre-selected confidence criteria is picturized in Fig. 8. It shows the mean processing time of the input sample in terms of different threshold values. It is clear from the figure that an increase in threshold value directly affects the inference speed. Hence, the threshold values should be selected carefully.
7 Conclusion Analysis of early exit deep neural networks proved that early exit DNN speeds up the inference with higher accuracy. It is observed that the value of the predetermined threshold in each branch directly affects the performance and accuracy of the early exit DNN. A high threshold value classifies fewer images at earlier exit branches, while a relatively low threshold value degrades the classification accuracy. Hence, the threshold value decision at each exit branch is a critical task. Resource-restricted devices like mobile or IoT devices can exploit early exit DNN in their low latency real-time application and deliver a better user experience. Edge computing can take advantage of early exit DNN as it reduces the requirement of computational resources and delivers results very quickly from its early exit branches. Early exit DNN can
A Strategy to Accelerate the Inference of a Complex Deep Neural Network
67
be executed at the device or the edge server. Hence, it avoids data transmission to the cloud server, which ultimately decreases the latency of the application. The early exit DNN can deliver results fast from its early exit points based on a predetermined confidence value. If the inferred result is not satisfactory, then the device needs to send data further for inference. The branching architecture of early exit DNN can be partitioned into different parts and executed in collaboration among device, edge, and cloud servers.
References 1. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical report 7694, Caltech 2. Han S, Mao H, Dally WJ (2015) Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149. https://arxiv.org/abs/1511.06530 3. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst 28:1135–1143 4. Hettinger C, Christensen T, Ehlert B, Humpherys J, Jarvis T, Wade S (2017) Forward thinking: Building and training neural networks one layer at a time. arXiv preprint arXiv:1706.02480 5. Kaya Y, Hong S, Dumitras T (2019) Shallow-deep networks: Understanding and mitigating network overthinking. In: International conference on machine learning. PMLR, pp 3301–3310 6. Kim Y, Kim J, Chae D, Kim D, Kim J (2019) µlayer: low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In: Proceedings of the fourteenth EuroSys conference 2019, pp 1–15 7. Kim YD, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 8. Lane ND, Bhattacharya S, Mathur A, Forlivesi C, Kawsar F (2016) DXTK: enabling resourceefficient deep learning on mobile and embedded devices with the DeepX toolkit. In: MobiCASE, pp 98–107 9. Laskaridis S, Venieris SI, Almeida M, Leontiadis I, Lane ND (2020) SPINN: synergistic progressive inference of neural networks over device and cloud. In: Proceedings of the 26th annual international conference on mobile computing and networking, pp 1–15 10. Lavin A, Gray S (2016) Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4021 11. Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. PMLR, pp 562–570 12. Leroux S, Bohez S, De Coninck E, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B (2017) The cascading neural network: building the internet of smart things. Knowl Inf Syst 52(3):791– 814 13. Marquez ES, Hare JS, Niranjan M (2018) Deep cascade learning. IEEE Trans Neural Netw Learn Syst 29(11):5475–5485 14. Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through FFTS: International conference on learning representations, ICLR2014, CBLS, April 2014. In: 2nd international conference on learning representations, ICLR 15. Moolayil JJ (2020) A Layman’s guide to deep neural networks. https://towardsdatascience. com/a-laymans-guide-to-deep-neural-networks-ddcea24847fb 16. Oh YH, Quan Q, Kim D, Kim S, Heo J, Jung S, Jang J, Lee JW (2018) A portable, automatic data qantizer for deep neural networks. In: Proceedings of the 27th international conference on parallel architectures and compilation techniques, pp 1–14 17. Pacheco RG, Bochie K, Gilbert MS, Couto RS, Campista MEM (2021) Towards edge computing using early-exit convolutional neural networks. Information 12, 10, pp 431
68
P. Haseena Rahmath et al.
18. Passalis N, Raitoharju J, Tefas A, Gabbouj M (2020) Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits. Pattern Recogn 105:107346 19. Scardapane S, Scarpiniti M, Baccarelli E, Uncini A (2020) Why should we add early exits to neural networks? Cogn Comput 12(5):954–966 20. Teerapittayanon S, McDanel B, Kung HT (2016) Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd international conference on pattern recognition, ICPR. IEEE, pp 2464–2469 21. Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, pp 65–74 22. Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on CPUs. In: Proceedings of deep learning and unsupervised feature learning NIPS workshop, vol 1, p 4 23. Venkataramani S, Raghunathan A, Liu J, Shoaib M (2015) Scalable-effort classifiers for energyefficient machine learning. In: Proceedings of the 52nd annual design automation conference, pp 1–6 24. Wang G, Xie X, Lai J, Zhuo J (2017) Deep growing learning. In: Proceedings of the IEEE international conference on computer vision, pp 2812–2820 25. Wang X, Luo Y, Crankshaw D, Tumanov A, Yu F, Gonzalez JE (2017) Idk cascades: fast deep learning by learning not to overthink. arXiv preprint arXiv:1706.00885
Predicting Aramco’s IPO Long-Term Performance During COVID Times Mohammad Imdadul Haque, Master Prince, and Abdul Rahman Shaik
Abstract This paper aims to assess whether the outbreak of the highly contagious pandemic had an impact on the share prices of recently listed Aramco in light of the Fads hypothesis using the methods of neural network and ARIMA. The IPO of Aramco, the world’s largest oil company, was a much-hyped affair. Given the relevant importance of the company, it was expected that Aramco’s share prices would not underperform in the long run. But the analysis indicates the opposite. The study uses two time periods using the announcement of the pandemic by the World Health Organization as the threshold date to see the impact of the pandemic on Aramco’s share prices. The forecasting results validate the Fads hypothesis implying that Aramco’s share prices would have underperformed in the long run, even in the absence of a pandemic outbreak. Finally, the study cautions investors against the hype created by IPOs. Keywords Initial public offer · Impresario hypothesis · Coronavirus · Neural network · ARIMA
1 Introduction Initial Public Offers (IPOs) are a significant source of mobilizing funds using the huge profits on the day of listing [24]. IPO is a critical decision related to the firm’s structure, ownership, and prestige and has been referred to as an ‘inevitable stage in the life cycle of a firm’ [34]. Nevertheless, the underperformance of IPOs, in the M. I. Haque (B) Department of Economics, Faculty of Social Science, Aligarh Muslim University, Aligarh, India e-mail: [email protected] M. Prince Department of Computer Science, Qassim University, Mulaydha, Saudi Arabia e-mail: [email protected] A. R. Shaik College of Business Administration, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_6
69
70
M. I. Haque et al.
long run, is a global phenomenon and has been reported by many studies ([3] for Latin America; [22] for the USA; [14] for China; [16] for U.K. and Germany; [21] for Thailand; Wang 2005 for Taiwan; [35] for Turkey; [25] for Tunisia; [17] for Bangladesh). The literature on financial economics provides many hypotheses to explain the abnormal initial returns and long-term underperformance [15]. Nevertheless, this study mainly concentrates on the Fads (impresario) hypothesis. This hypothesis points out that IPOs with high initial returns tend to have lower aftermarket returns. The hypothesis postulates that the unusual initial returns are because investors overvalued IPOs by investors and not systematic underpricing. Moreover, due to the phenomenon of efficient markets, the IPO’s pieces move to the equilibrium price, which is lower than the initial price. Aggarwal and Rivoli [2], Shiller [28], and Ritter [26] are some of the early proponents of this Fads hypothesis. A high opening price leading to easy and excess and easy funds is attributed to over-optimism or fads of the investor as hypothesized by Ritter [26]. Recent empirical studies by Cogliati [13] on IPOs in Italy, Germany, and France; Chan [11] on IPOs in the U.S.; Chipeta and Jardine [12] on South African IPOs; Wong et al. [33] on Malaysian IPOs lend further credence to this hypothesis. Saudi Arabia is an emerging financial market [4, 5]. Recently, a major oil giant, Aramco of Saudi Arabia, announced its IPO. Aramco manages the world’s largest oil fields and is currently the largest producer of crude oil globally. On December 11, 2019, Aramco is officially listed on Tadawul, the Saudi Stock Exchange, and its shares began trading at 32 Saudi Arabian Riyal. The offering process concludes on December 4, 2019, generating subscriptions by institutional and individual subscribers of SAR446 billion, equivalent to USD119 billion. It attracted 5 million subscribers as it sold 3 billion shares equivalent to 1.5% of its share capital. The IPO generated proceeds of SAR96.0 billion equivalents to USD25.6 billion making it the world’s largest IPO. The firm was valued at $1.7 trillion. 3 billion shares were sold at 32 riyals ($8.53). Institutional investors acquired twothird of the shares. Local Saudi-based companies acquired 37.5% of the shares. The offer started on November 17, and closed on December 5, 2019. A significant development that corresponds with the issue of this important IPO is the outbreak of the Coronavirus pandemic. The extremely high contagious nature of this virus and its aftermath of lockdowns, suspension of work, and even restrictions on movement have changed not only the way things work but also human sentiments. Sentiments are essential for stock markets. Fluctuations in the mood of investors affect stock market prices. A downward trending market accompanied by high-risk perceptions leads to pessimism, and investors wait for the market to revive [10]. Anxiety and bad mood lead to pessimism resulting in adverse investment decisions [20]. Stock prices symbolize the prospect of earnings in the future. Investors in the stock markets perceive the outbreak of the pandemic as inhibiting economic activity and this creates concerns about future earnings. Moreover, the Corona crisis is also associated with news on the spread of contagion and deaths. News influences investors’ buying and selling decisions. Though both good news and bad news are said to impact financial decision making, awful news
Predicting Aramco’s IPO Long-Term Performance During COVID Times
71
impacting financial decision making is argued for by Svensson [29] and Akinchi and Chahrour [7], Shehzad et al. [31]. During the SARS crisis, WHO and related health news mediated the news to investors, which was instrumental in forming negative sentiments toward investments [27]. Baker et al. [8] believe that for the USA, none of the previous pandemics or related outbreaks like the Spanish Flu of 1918–20, influenza pandemics of 1957–58 and 1968 impacted the stock market volatility as done by COVID-19. Besides, nowadays, social media investment platforms also catalyze crowd wisdom promoting financial decision making [9]. The study aims to study the long-term performance of Aramco shares. In the context of Saudi Arabia, Alanazi et al. [6] found that the firm’s performance deteriorated after the IPO for 16 firms that issued IPOs between 2003 and 2009. Apart from the study mentioned above, the study finds a literature gap in studies on overall IPOs in Saudi Arabia and particularly on Aramco. Toward this, the study proceeds, intending to study the expected performance of Aramco. As the current prices of Aramco are lower than the initial price, this raises a research question that this may be due to the Fads hypothesis of overvaluation by investors as proposed by Shiller [28] and others, or it may be due to the COVID-19 pandemic. The main contributions of this study are: • The study uses and compares two popular but different approaches to forecasting, namely, neural network and ARIMA. • The study focuses on the abnormal period of the COVID-19. • The study finds that the method of neural network predicts better than ARIMA in abnormal situations. • The study validates the Fads hypothesis as it predicts that in the long run the much-hyped IPO of Aramco will not be able to attain the opening price hike.
2 Methodology A long short-term memory (LSTM) network has been proven to perform exceptionally well in time series forecasting [19, 30]. ‘LSTMs,’ is special RNNs that are suitable for learning long-term dependencies. One of the objectives was to design an artificial recurrent neural network (RNN), a deep learning architecture. (1) RNN Architecture Figure 1 shows the architecture of RNN, which consists of four layers: sequence input layer (1 feature), LSTM layers (200 hidden units), fully connected layer (1 response), and regression output layer (loss function mean-square-error with response). (2) Dataset The considered dataset is the price of Aramco’s IPO. The forecasting is carried out in two different datasets. First, data consists price from December 12, 2019, till 27 August 2020. And second, data consists of price from December 12, 2019, till March
72
M. I. Haque et al.
Fig. 1 RNN Architecture
10, 2020 (the day COVID-19 was announced as a pandemic by WHO). Partitioned the training and test data, train on the first 90% of the sequence, and test on the last 10%. Preprocessing is carried out on the dataset for a better fit and to prevent the training from diverging. Equation (1) is used to standardize data to have zero mean and unit variance. Data Train Standardized = (data Train − μ)/σ
(1)
where μ and σ represent the mean and standard deviation of the training dataset respectively. At prediction time, the test data would be standardized using the same parameters as the training data. (3) Training Training is carried out on the same architecture and with the same parameter twice. Firstly, based on data from December 12, 2019, to August 27, 2020, consisting of 176 observations. Secondly, data from December 12, 2019, to March 10, 2020, consisting of 64 observations. Training parameters are set as adaptive moment estimation (adam) as optimizer to prevent the gradients from exploding, set the gradient threshold to 1 and the model is trained with this initialization for 250 epochs. Initially, the learning
Predicting Aramco’s IPO Long-Term Performance During COVID Times
73
rate was set to 0.005 and drops the learning rate after 125 epochs by multiplying by a factor of 0.2. Finally, we stopped the training after 250 epochs with a reduced learning rate when the loss plateaued. The models are tested with the test data and the model with optimal performance is saved. The root-mean-square-error (RSME) is used to test the performance of the model and the behavior of the data. rmse = sqrt(mean (Preddata − Testdata ).2
(2)
Equation (2) is used to obtained RSME. Trained models are saved for both the cases (without pandemic and with pandemic). The training and testing samples are not being randomized here as the idea is to test the behavior of the data particularly during the COVID-19 period. (4) Forecast Using each of the trained models, we forecast the price of the IPO separately. Using the first model (without pandemic), prediction is carried out up to 510 days. Using the second model (with pandemic), prediction is carried out up to 605 days (March 11, 2020 onwards). Further, to validate the results an econometric method of autoregressive integrated moving average (ARIMA is also used to predict the prices. The autoregressive integrated moving average (ARIMA) model considers the present values of a time series data in terms of past values of its own and the error term, the former known as autoregressive element, while the latter known as the moving average. It is a commonly used econometric model of forecasting [18]. The integrated part in the model is n number of times the data differentiated for stationarity. The Y t represents the AR(p) process and is modeled as follows: Yt = β1 Yt−1 + β2 Yt−2 + . . . β p Yt− p + at =
p
βi Yt−i + at
(3)
i=1
Alternatively, the model (1) can be written as β(B)Yt = λt
(4)
where β(B) is a polynomial with an order p in backshift, B is a backshift operator. The Yt in the following model represents the MA(q) process and is modeled as follows Yt = at + φ1 at−1 + φ2 at−2 + . . . φq at−q =
q
φ j at− j
(5)
j=0
Alternatively, the model (3) can be written as Yt = φ(B)at
(6)
74
M. I. Haque et al.
where at is a white noise term, and φ(B) is a polynomial with an order q in the backshift. The ARMA (p, q) is modeled as follows: p i=1
βi Yt−i =
q
φ j at− j
(7)
j=0
where β0 and φ0 equal to 1. The ARIMA (p, d, q)(P, Q, D) model is represented as follows: β(B)(B)d SD Yt = φ(B)(B)at
(8)
where p is the A.R. term, q is the M.A. term, and d is stationarity term. Further, the components P, Q, D are used to introduce seasonality. Yt = d SD Yt
(9)
Equation (7) denotes stationarity of a series, d = (1 − B)d represents the regular D differences, while SD = 1 − B S represents the seasonal differences.
3 Analysis Figure 2 shows the daily movements of the share prices of IPO between the period December 12, 2019, and August 27, 2020, consisting of 176 observations. After the initial hike in prices, there is a steep downward trend. It reaches its lowest price of 27.8 SAR on March 16, 2020, after which there is an upward trend, but with a lot of fluctuations. It also depicts the predicted values (in red) for the period 159–175 (August 5, 2020, to August 27, 2020) using the actual data for the period 1–158 (December 12, 2019, to July 28, 2020). Figure 3 shows the prediction for other 510 observations (161–670) days using the trained RNN model. It can be seen that the predicted values lie between 34 and 28. This performance of Aramco share prices is contrary to the popular optimism of strong positive performance. It hints at the impact of the COVID-19 pandemic which leads to a global fall in the demand for oil prices. Next, the researcher tries to test the assumption that would Aramco’s prices behave in the same way if there was no outbreak of Corona virus. For this, the study takes the data for the period before the outbreak/announcement of the pandemic and predicts future prices. Figure 4 depicts the forecasting of the data for the period March 11, 2020, to August 27, 2020, using the actual data for the period December 12, 2019, March 10, 2020. The predicted points are shown in red. Figure 5 shows the RSME calculated for projected data from March 11 to August 27 as 2.7155. It is much higher than the RMSE when the data till July 28 is considered, that is when more observations of the pandemic period are incorporated. This result is
Predicting Aramco’s IPO Long-Term Performance During COVID Times
Fig. 2 Daily movements of the share prices
Fig. 3 Graphical representation of forecasting-I
75
76
M. I. Haque et al.
Fig. 4 Graphical representation of forecasting-II
interesting as it indicates that Aramco’s prices are poorly predicted when the outbreak of the pandemic is ignored as the RMSE is higher than when the prices during the pandemic period are included. Figure 6 shows the projected price of the IPO for 605 days based on the trained CNN model the projected price of the IPO for 605 days (from March 11 onwards). As is evident from Fig. 6, the prices do not reach the initial hike in prices, and also there is a heavy fluctuation in the prices. This implies that even if there had been no pandemic still there would have been an underperformance of Aramco’s share prices in the long run. Next, the study uses an econometric methodology to predict the IPO prices. Comparison of the results would validate the results obtained through neural network. The data is not stationary at level. The log of the data is taken to make it stationary at first difference. Among the various possible estimates, the best model is chosen based
Fig. 5 Graphical representation of forecasting error
Predicting Aramco’s IPO Long-Term Performance During COVID Times
77
Fig. 6 Graphical representation of price predicted by CNN model
on Akaike Information Criteria (AIC). The estimation results of ARIMA indicate that ARIMA (2,1,3) is the best model as it has the lowest AIC value of − 5.375376. The model has an R-squared value of 0.92 which is significant at the F-ratio value of 358.4021 is significant at 5% level of significance with p-value 0.000. To look for the impact of corona virus on the prices out of the sample forecasting is done using two different time periods. Two different models are run. In the first model the period of COVID-19 is included. For this model, the RMSE is 1.92. In the second model, the time period is taken after March 3, 2020. For this model, the RMSE is 1.08. The results indicate that the RMSE is higher in the model which incorporates the COVID-19 period. A comparison of the actual prices and prices predicted through ARIMA is given in Fig. 7. As is evident, initially the downward fall in actual prices is captured well by ARIMA, but when the prices are actually recovering, ARIMA does not show a corresponding upward trend. Finally, a comparison between the forecast of ARIMA, CNN, and actual prices is done taking a small forecast sample period August 5, 2020, to August 27, 2020. As is evident in Fig. 8, the prices predicted by neural network correspond the pattern of the actual prices to a great extent when compared to the prices predicted by ARIMA. This leads to the conclusion that the method of neural network is better suited for predicting prices than ARIMA, at least during turbulent times like the COVID-19 period.
78
M. I. Haque et al.
40 38 36 34 32 30 28 26
25
50
75
100
PRICEF
125
150
175
Price
Fig. 7 Graphical representation of comparison between actual prices and prices predicted by ARIMA 35.2 34.8 34.4 34.0 33.6 33.2 32.8
2
4
6 ACTUAL
8
10 CNNF
12
14
16
ARIMAF
Fig. 8 Graphical representation of the comparison between actual prices and prices predicted by ARIMA and CNN
4 Conclusion The novelty of this study is the application of neural networks to predict the longterm performance of Aramco’s share prices. It also uses ARIMA, a popular econometric technique for forecasting. Estimation results of both methods indicate that the volatility of IPO’s price is higher during the COVID times. The results also indicate that the IPO will not be able to attain the opening price hike in the long run. Innovatively, it validates the Fads hypothesis. Aramcos’ share prices had an initial high return followed by long-term underperformance. The results of this study are similar to previous studies by Cogliati et al. [13]; Chan [11], Chipeta and Jardine [12], Wong et al. [33]. The results of this study further validate the long-run underperformance of IPOs as reported by previous studies by Aggarwal et al. [3], Loughran and Ritter [22], Chen et al. [14], Goergen and Renneboog [16], Kim et al. [21], Yalama and
Predicting Aramco’s IPO Long-Term Performance During COVID Times
79
Ünlü [35], Rekik and Boujelbene [25], and Haque and Imam [17]. The future implication cautions investors against the hype created by IPOs, both in normal and in abnormal circumstances. The long-term project share prices incorporating the announcement of a pandemic by WHO are between 34 and 28. This is lower than the opening day price of 36.8. This indicates the Aramco’s share prices will not be able to retain the initial hike for a period of more than a year. This performance of Aramco shares can be attributed to the outbreak of the COVID-19 pandemic. The long-term project share prices using the cutoff date as March 10; that is, a day before the announcement of a pandemic by WHO is between 36 and 30. This again is lower than the opening day price of 36.8. This implies that Aramco’s share prices would have been lower than the initial high prices even if there was no pandemic outbreak. The price movements of the shares of Aramco indicate an element of underpricing as there is a significantly high price on the day of listing compared to the offer price. The results of this study validate the Fads hypothesis as, in the long-run; the market behaves in a way to rectify the overvaluation of the IPO. Finally, based on comparing two methods of neural network and ARIMA, the study recommends using the former method for prediction during abnormal situations.
References 1. Agrawal R, Gupta N (2021) Analysis of COVID-19 da- ta using machine learning techniques. In: Data analytics and management. Springer, Singapore, pp 595–603 2. Aggarwal R, Rivoli P (1990) Fads in the initial public offering market? Financ Manag, 45–57 3. Aggarwal R, Leal R, Hernandez L (1993) The after-market performance of initial public offerings in Latin America. Financ Manag 42–53 4. Al-Maadid A, Alhazbi S, Al-Thelaya K (2022) Using machine learning to analyze the impact of coronavirus pandemic news on the stock markets in GCC countries. Res Int Bus Financ 101667 5. Alzyadat JA, Asfoura E (2021) The effect of COVID-19 pandemic on stock market: an empirical study in Saudi Arabia. J Asian Financ Econ Bus 8(5):913–921 6. Alanazi AS, Liu B, Forster J (2011) The financial performance of Saudi Arabian IPOs. Int J Islamic Middle East Financ Manag 4(2):146–157 7. Akinchi O, Chahrour R (2018) Good news is bad news: leverage cycles and sudden stops. J Int Econ 114:362–375 8. Baker SR, Bloom N, Davis SJ, Kost K, Sammon M, Viratyosin T (2020) The unprecedented stock market reaction to COVID-19. Rev Asset Pricing Stud 10(4):742–758 9. Breitmayer B, Massari F, Pelster M (2019) Swarm intelligence? Opinion of the crowd and stock returns. Int Econ Financ 64:443–464 10. Burns WJ, Peters E, Slovic P (2012) Risk perception and the economic crisis: a longitudinal study of the trajectory of perceived risk. Risk Anal Int J 32(4):659–677 11. Chan Y (2014) How does retail sentiment affect IPO returns? Evidence from the internet bubble period. Int Rev Econ Financ 29(1):235–248 12. Chipeta C, Jardine A (2014) A review of the determinants of long run share price and operating performance of initial public offerings on the Johannesburg stock exchange. Int Bus Econ Res J 13(5):1161–1176 13. Cogliati GM, Paleari S, Vismara S (2011) IPO pricing: growth rates implied in offer prices. Ann Financ 7(1):53–82
80
M. I. Haque et al.
14. Chen G, Firth M, Kim JB (2000) The post-issue market performance of initial public offerings in China’s new stock markets. Rev Quant Financ Acc 14(4):319–339 15. Durukan MB (2002) The relationship between IPO returns and factors influencing IPO performance: case of Istanbul Stock Exchange. Manag Financ 28(2):18–38 16. Goergen M, Renneboog LDR (2003) Insider retention and long-run performance in German and UK IPO’s. Discussion paper. Tilburg University, Tilburg Law and Eco nomic Center 17. Haque R, Imam MO (2014) Earning management, timing ability and long-run underperformance of IPOs in Bangladesh. Res J Financ Acc 5(17):180–192 18. Haque MI, Shaik AR (2021) Predicting crude oil prices during a pandemic: a comparison of ARIMA and GARCH models. Montenegrin J Econ 17(1):197–207 19. Hua Y, Zhao Z, Li R, Chen X, Liu Z, Zhang H (2019) Deep learning with long short-term memory for time series prediction. IEEE Commun Mag 57(6):114–119 20. Kaplanski G, Levy H (2010) Sentiment and stock prices: the case of aviation disasters. J Financ Econ 95(2):174–201 21. Kim KA, Kitsabunnarat P, Nofsinger JR (2004) Ownership and operating performance in an emerging market: evidence from Thai IPO firms. J Corp Finan 10(3):355–381 22. Loughran T, Ritter JR (1995) The new issues puzzle. J Financ 50(1):23–51 23. Narayan PK (2019) Can stale oil price news predict stock returns? Energy Econ 83:433–444 24. Poornima S, Haji AJ, Deepha B (2016) A study on the performance of initial public offering of companies listed in NSE, Indi & Gulf Base GCC index. Int J Res Financ Market 6(11):31–46 25. Rekik YM, Boujelbene Y (2013). Tunisian IPOs underpricing and long-run underperformance: highlight and explanation. E3 J Bus Manag Econ 4(4):093–104 26. Ritter J (1991) The long-run performance of initial public offerings. J Financ 46(1):3–27 27. Smith RD (2006) Responding to global infectious disease outbreaks: lessons from SARS on the role of risk perception, communication and management. Soc Sci Med 63(12):3113–3123 28. Shiller RJ (1990) Speculative prices and popular models. J Econ Perspect 4(2):55–65 29. Svensson J (1999) Is the bad news principle for real? Econ Lett 66:327–331 30. Sahoo BB, Jha R, Singh A, Kumar D (2019) Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys 67(5):1471–1481 31. Shehzad K, Xiaoxing L, Kazouz H (2020) COVID-19’s disasters are perilous than global financial crisis: a rumor or fact? Financ Res Lett 101669 32. Wang YJ (2014) The evaluation of financial performance for Taiwan container shipping companies by fuzzy TOPSIS. Appl Soft Comput 22:28–35 33. Wong ES, WB RW, Ting LS (2017) Initial public offering (IPO) underpricing in Malaysian settings. J Econ Financ Stud 5(02):14–25 34. Yalçin N, Ünlü U (2018) A multi-criteria performance analysis of initial public offering (IPO) firms using CRITIC and VIKOR methods. Technol Econ Dev Econ 24(2):534–560 35. Yalama A, Ünlü U (2010) The calendar anomalies in IPO returns: evidence from Turkey. J Econ Bus Financ 25(286):89–109
Development of a Transdisciplinary Role Concept for the Process Chain of Industrial Data Science Jörn Schwenken , Christopher Klupak , Marius Syberg , Nikolai West , Felix Walker , and Jochen Deuse
Abstract In recent years, there has been an increasing interest in using industrial data science (IDS) in manufacturing companies. Structured IDS projects proceed according to process models such as the cross industry standard process for data mining (CRISP-DM), knowledge discovery in databases (KDD), or the process chain of industrial data science. Because of the process Chain’s transdisciplinary procedure, the participation of different people in the analysis with different tasks and competencies is necessary. Therefore, a concept to define specific roles is required, since it provides unequivocal descriptions of the respective tasks. As no role concept for the process chain of IDS exists, this paper develops and presents a transdisciplinary role concept including the four essential roles: Data engineer, analyst, user, and J. Schwenken (B) · M. Syberg · N. West · J. Deuse Institute of Production Systems, Technical University Dortmund, Leonhard-Euler-Str. 5, 44227 Dortmund, Germany e-mail: [email protected] M. Syberg e-mail: [email protected] N. West e-mail: [email protected] J. Deuse e-mail: [email protected] J. Deuse Center for Advanced Manufacturing, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia C. Klupak · F. Walker Institute for Vocational and Business Education, University Hamburg, Sedanstr. 19, 20146 Hamburg, Germany e-mail: [email protected] F. Walker e-mail: [email protected] C. Klupak Institute of Technical Didactics, Technical University Kaiserslautern, Kurt-Schumacher-Str. 74a, 67663 Kaiserslautern, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_7
81
82
J. Schwenken et al.
project manager. These roles are described in terms of their characteristics to enable structured cooperation between them. In addition, this paper presents the AKKORD platform for learning and collaboration, which especially should make an important contribution for small and medium-sized enterprises (SME) to develop knowledge in the usage of the process chain of IDS. The platform provides the opportunity for the users to train the essential roles with their characteristics in the company through targeted competence development. Keywords Industrial data science · Modular data science · Learn platform · Role concept · Process chain of industrial data science
1 Introduction Ever since the use of IDS in manufacturing companies, scientists proposed several process models for industrial applications. Only some of those models have established themselves in the industry and became the standard of IDS. One of the first models developed is the KDD Process [1]. It provides companies with a linear process model from data to knowledge with iterative loops to optimize their analyses. Using the model helps users to find valid, novel, potentially useful, and ultimately understandable patterns in industrial datasets. Today, this basic idea behind the analysis of data is methodically carried out in the industrial environment according to the CRISP-DM [2]. By dividing the process into six phases, it supports companies with a systematic approach to solve data analysis problems. Executing those projects needs several skills. Companies require skills in statistics, computer science, and knowledge in the respective domain of application [3]. This is usually not achievable by a single person. Especially in SME, companies need interdisciplinary teams with different roles to handle IDS projects. In the AKKORD research project, researchers developed a modular and datadriven reference toolkit (Fig. 1). The objective of this reference toolkit is the application-oriented provision of modular and uniform solution modules for industrial data analysis. Therefore, IDS projects become more convenient and easier to implement. At the core of the toolkit lies the process chain of IDS. In practical implementation, this part contains modular, individually expandable modules for access, analysis, and application. The fourth module is administrate which is not a separate part of the toolkit as a logical continuous task. Among others, access includes modules to identify relevant data sources for a given analysis, recode missing data and provide the data to an analyzing system [4]. In addition to modules for data pre-processing, analysis provides modules for the use of a wide range of packages and data science algorithms. The modules are adaptable and configurable to individual usage. With the third step application, users carry out analyses on specific IDS issues as well as the continuous implementation of long-term observations, visualizations, and individual interfaces to numerous systems. Lastly, administrate includes indirect operational
Development of a Transdisciplinary Role Concept for the Process Chain …
83
Best Practice Stories
Access Module
Analysis Module
Application Module
Platform for Learning and Collaboration Extended Range of Software and Service Solutions
Fig. 1 Modular and data-driven AKKORD-reference toolkit
and organizational tasks, for example, the assignment of a long-term data manager, the assurance of consistent data security, and ethically correct data use [4]. With the help of best practice stories, the reference toolkit gives future users an impression of possible applications. To develop the necessary competencies and prepare employees for their tasks in the project, the toolkit includes a Platform for Learning and Collaboration. Existing training offers are often associated with a high expenditure of time and money, which SME tend to avoid due to the increased efforts [5]. The toolkit incorporates a new concept of a learning and collaboration platform that takes into account the status quo in companies and their requirements. Through the platform-based support of the preconfigured modules, companies gain access to software and service solutions that enable the integrated application of IDS for value-creating, competence-oriented collaboration. As already argued, especially SMEs in IDS projects need transdisciplinary teams. To ensure that the learning platform trains the right competencies, this paper presents a role concept for the process chain of IDS. This paper discusses the tasks and competencies of the roles below. First, already existing role concepts in the environment of IDS will be discussed (Sect. 2). Then the role concept with the four roles: Data engineer, analyst, user, and project manager, will be elaborated (Sect. 3). Finally, the learning platform will be explained (Sect. 4).
2 Related Work In the field of data science, there are already approaches to define the participating persons and roles in role concepts. Researchers usually describe them based on competencies as well as tasks and responsibilities. In the following, this chapter explains roles and role concepts from the general area of data science and subsequently from the specific area of IDS.
84
J. Schwenken et al.
The two most frequently represented roles in data science projects are the data scientist and the data engineer. There is a discernible difference between them, although a so-called full stack data scientist can also perform both roles at once [6]. In addition to these two more technical roles, the researchers mention domains such as sales, marketing, and software development. Another role, which is also mentioned, is a mix of manager and data scientist, which is a so-called data science manager [7]. In total, Crisan et al. [8] identified nine distinctive roles in a study and assigned them to four categories. These four categories are similar to the role concepts already mentioned. There is usually a computer science expert, a data science expert, a domain expert (or user), and a role for managing the data science project. In the IDS area, researchers partly created the existing role concepts along process models. Roles along the CRISP-DM were identified by Mazarov et al. [9] as a part of the AKKORD project focusing on the roles of data scientist, IT employee, domain expert, and management. Furthermore, both Kühn et al. [10] and Deuse et al. [11] argue that a fifth, orchestrating role, may be necessary for IDS projects. Especially Deuse et al. [11] use the role of the citizen data scientist, to fulfill this task. Moore [12] defines this citizen data scientist as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the realm of statistics and analytics. At the same time, the citizen data scientist closes the gap between the usual self-service analytics of business users and the advanced analytics techniques of data scientists. This role can perform sophisticated analysis that would have required more expertise in the past and can thus deliver advanced analysis without having the skills typical of data scientists. In the following, this paper shows how the understanding of the citizen data scientist anchors the AKKORD toolkit. The idea here is to make the tasks in the analysis module more convenient and easier so that the toolkit enables citizen data scientists to carry out the analysis. Particularly, SME with low personnel capacity and little experience in IDS projects benefits from AKKORD.
3 Role Concept The process chain of IDS has evolved from the process chain of time management with its modular structure [13]. It contains four modules: access, analyze, apply, and administrate. The role concept assigns each of the four modules a role (data engineer, analyst, user, and project manager) (Fig. 2). The analysis process in the process chain runs from left to right, while the requirements are defined from right to left. As such, the user defines the analysis requirements and passes them on to an analyst, while the analyst frames the data requirements and passes them on to the data engineer.
Development of a Transdisciplinary Role Concept for the Process Chain …
85
Module Creator Module Configurator Data Engineer Access
Analyst Analyze
User Apply
Administrate Project Manager
Fig. 2 Role concept along the process chain of industrial data science [13]
The responsibility of the data engineer is to provide the requested raw data for the subsequent analysis in the appropriate data format. To do this, he must identify relevant data sources in the company, collect its data and improve its data quality. Typically, the data engineer has an IT technical background. Due to the modular structure of the process chain of IDS and the collaborative AKKORD platform, the concept divides the analyst into two sub-roles. If the analyst creates a new analysis module, the analyst is a module creator. This sub-role needs a deeper data science knowledge, like a data scientist, and provides his created modules to other analysts afterward. Another analyst who benefits from the created module and uses it for his or her application is called module configurator. This sub-role only needs to adapt the modules to his or her application and accordingly requires less in-depth data science knowledge than a citizen data scientist. In addition to creation and configuration, the analyst specifies the data requirements and structure to the data engineer. The analyst also has to provide the analysis results in a visual form to the user. The user is at the end of the process chain. This role initiates the execution of the IDS project and uses the results of the analysis in his or her domain. By nature, the user is an expert in his domain and has the competence to read visual representations of the analysis results and interpret them. During the analysis process, the project manager deals with indirect operational and organizational tasks. The role is primarily responsible for communication within and outside the project. The role of the project manager comes from a more management and leadership-oriented area. It corresponds to the administrate module of the process chain of IDS. The objective of the role concept is to define clearly the necessary roles for the process chain of IDS concerning their tasks and competencies. Especially for SME, the AKKORD-reference toolkit opens the possibility to use the massive potential of modern methods of data networking and analysis in a sustainable, low-cost, and application-oriented way. The next section explains the platform for learning and collaboration. It describes a competence development for IDS that is suitable for SME and intends to contribute to empowering the users for their roles.
86
J. Schwenken et al.
4 Platform for Learning and Collaboration for SME Bigger companies are already increasingly investing in the use of smart data technologies and profiting from them in the long term. SME, on the other hand, face additional challenges applying IDS because they lack the monetary and human resources as well as the knowledge about the benefits of identifying relevant data sources and storing and analyzing these data [14, 15]. The AKKORD platform for learning and collaboration fills this gap by offering a SME-oriented skill development. On the platform, users can acquire the competencies for their role in five areas (Fig. 3). Furthermore, they can collaborate for the successful execution of IDS projects. In the channel area, users can select data science topics that are relevant to them. In this way, the platform keeps the users up to date by continuous enrichment with new thematic contributions. Another part is the service area, where users can take advantage of a variety of services. For example, it presents the area best practice stories from well-known companies here, which provide users with new ideas and a broader understanding of application fields. The next main area is the learning area, which is the central aspect of competence acquisition and development. Here, the competence development of the relevant content on the topic of IDS takes place via numerous course units, which the area assigns to a basic course, an advanced course, additional lessons, and a best practice training. The course units are designed as micro-course units, making them particularly suitable for users in SME. In the beginning, a questionnaire-based initial survey determines the actual competencies of the user. The results represent the starting point for the targeted development of competence building. The selected role (see Fig. 2) recommends to the user a learning path with essential course units. For explorative users, so-called explorers, the possibility to visit role-independent course units is available. After completing the path-dependent course units, the learning area gives the user an additional questionnaire for self-assessment of his competence development. To measure objective
Fig. 3 Concept of the AKKORD platform for learning and collaboration
Development of a Transdisciplinary Role Concept for the Process Chain …
87
competence development, the user can complete a learning path-oriented test session at the end. Upon successful completion, the user receives an AKKORD certificate. In the community area, users can exchange information, for example on how to solve tasks in the learning area. This area helps to collaborate successfully. The news area provides current contributions from research and practice on the topic of IDS on an ongoing basis. In summary, the AKKORD platform for learning and collaboration gives users an individual opportunity to develop their competencies in IDS. By integrating the role concept, the platform reaches the competencies for the tasks within the process chain of IDS in a targeted manner. The different areas of the platform are closely interlinked. The broad spectrum of offerings thus specifically supports SME in building up the necessary data science competencies.
5 Conclusion Due to its modular structure, the process chain of IDS allows a systematic and task-oriented application of IDS. The presented role concept contributes further to the development of the process chain, by providing a task description for future users. The realized AKKORD-reference toolkit serves as an exemplary application of IDS that supports its usage, especially for SME. Through the help of its modular structure, it enables citizen data scientists, with fewer data science competencies, to carry out IDS projects. To empower SME for the implementation of modular IDS, the AKKORD platform for learning and collaboration provides a SME-oriented competence development for IDS. Acknowledgements The work on this paper has been supported by the German Federal Ministry of Education and Research (BMBF) as part of the funding program “Industry 4.0—Collaborations in Dynamic Value Networks (InKoWe)” in the project AKKORD (02P17D210).
References 1. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AIMag. https://doi.org/10.1609/aimag.v17i3.1230 2. Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: step-by-step data mining guide 3. Bauer N, Stankiewicz L, Jastrow M, Horn D, Teubner J, Kersting K, Deuse J, Weihs C (2018). Industrial data science: developing a qualification concept for machine learning in industrial production. https://doi.org/10.5445/KSP/1000087327/27 4. West N, Gries J, Brockmeier C, Gobel JC, Deuse J (2021) Towards integrated data analysis quality: criteria for the application of industrial data science. In: 2021 IEEE 22nd international conference on information reuse and integration for data science (IRI). 2021 IEEE 22nd international conference on information reuse and integration for data science (IRI), Las Vegas, NV, USA, 10–12 Aug 2021. IEEE, pp 131–138. https://doi.org/10.1109/IRI51335.2021.00024
88
J. Schwenken et al.
5. Meierhofer J, Etschmann R, Kugler P, Olbert-Bock S, Redzepi A, Thiel C, Tietz R, Dobler M, Schumacher J, Mueller R (2020) Data science für KMU leicht gemacht. Aktuelle Erkenntnisse und Lösungen. Data4KMU Projektbericht 6. Aho T, Sievi-Korte O, Kilamo T, Yaman S, Mikkonen T (2020) Demystifying data science projects: a look on the people and process of data science today. In: Morisio M, Torchiano M, Jedlitschka A (eds) Product-focused software process improvement. 21st international conference, PROFES 2020, Turin, Italy, 25–27 Nov 2020, Proceedings, Cham, 2020. Springer International Publishing; Imprint Springer, Cham, pp 153–167. https://doi.org/10.1007/978-3-03064148-1_10 7. Golan S, Bouhnik D (2019) MaDaScA: instruction of data science to managers. In: Proceedings of the 2019 InSITE Conference. InSITE 2019: Informing Science + IT Education Conferences: Jerusalem, 30 Jun 2019. Informing Science Institute, pp 125–140. https://doi.org/10.28945/ 4271 8. Crisan A, Fiore-Gartland B, Tory M (2021) Passing the data Baton: a retrospective analysis on data science work and workers. IEEE Trans Visual Comput Graphics. https://doi.org/10.1109/ TVCG.2020.3030340 9. Mazarov J, Schmitt J, Deuse J, Richter R, Kühnast-Benedikt R, Biedermann H (2020) Visualisierung in Industrial Data-Science-Projekten. Nutzen grafischer Darstellung von Informationen und Daten in Industrial-Data-Science-Projekten. Industrie 4.0 Management 36:63–66 10. Kühn A, Joppen R, Reinhart F, Röltgen D, von Enzberg S, Dumitrescu R (2018) Analytics canvas—a framework for the design and specification of data analytics projects. Procedia CIRP. https://doi.org/10.1016/j.procir.2018.02.031 11. Deuse J, Wöstmann R, Schulte L, Panusch T (2021) Transdisciplinary competence development for role models in data-driven value creation. The citizen data scientist in the centre of industrial data science teams. In: Sihn W, Schlund S (eds) Competence development and learning assistance systems for the data-driven future. Goto Verlag, pp 37–58. https://doi.org/ 10.30844/wgab_2021_3 12. Moore S (2017) Gartner says more than 40 percent of data science tasks will be automated by 2020. Analysts to explore trends in data science at Gartner data & analytics summits 2017 [Press release]. Sydney, Australia 13. Deuse J, West N, Syberg M (2022) Rediscovering scientific management. The evolution from industrial engineering to industrial data science. Int J Prod Manag Eng. https://doi.org/10.4995/ ijpme.2022.16617 14. Zamani ED, Griva A, Conboy K (2022) Using business analytics for SME business model transformation under pandemic time pressure. Inf Syst Front J Res Innov. https://doi.org/10. 1007/s10796-022-10255-8 15. Bianchini M, Michalkova V (2019) Data analytics in SMEs. Trends and policies. OECD SME and entrepreneurship papers. https://doi.org/10.1787/1de6c6a7-en
Convolutional Neural Network-Based Lung Cancer Nodule Detection Based on Computer Tomography Ahmed Hamid Ahmed, Hiba Basim Alwan, and Muhammet Çakmak
Abstract Because of the great responsiveness of aspiratory knob location, computerized tomography (CT) is generally used to analyze cellular breakdown in the lungs without performing biopsy, which could make actual harm nerves and vessels. Notwithstanding, recognizing threatening and harmless aspiratory knobs stays troublesome. Since CT checks are regularly of low goal, it is challenging for radiologists to peruse the output picture’s subtleties. The proceeded with quick development of CT examine examination frameworks lately has made a squeezing need for cutting edge computational apparatuses to remove helpful highlights to help the radiologist in understanding advancement. PC-supported discovery (CAD) frameworks have been created to diminish notable mistakes by distinguishing the dubious highlights a radiologist searches for in a case survey. Our project aims to compare performance of various low memories, lightweight deep neural net (DNN) architectures for biomedical image analysis. It will involve networks like vanilla 2D CNN, U-Net, 2D SqueezeNet, and 2D MobileNet for two case classifications to discover the existence of lung cancer in patient CT scans of lungs with and without primary phase lung cancer. Keyword Pulmonary nodule · Computer tomography · AI model · CAD system
1 Introduction Lung cancer is considered as one of the most popular types of cancer on the Earth. Cellular breakdown in the lungs is the primary resource of malignant tumor passing, A. H. Ahmed (B) · H. B. Alwan Departement of Computer Science, University of Technology, Baghdad, Iraq e-mail: [email protected] H. B. Alwan e-mail: [email protected] M. Çakmak Departement of Electrical-Electronics Engineering, Karabuk University, Karabuk, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_8
89
90
A. H. Ahmed et al.
as per information distributed by the American Cancer Society [1]. Early location and determination are significant ways of further developing disease endurance rates. In the field of cellular breakdown in the lungs determination, a patient associated with having a cellular breakdown in the lungs is encouraged to go through a chest Xbeam first since the minimal expense and ease of organization. Cellular breakdown in the lungs creates by aspiratory knobs, which are masses of unusual tissue inside the lung. Whenever an aspiratory knob is projected onto an X-beam picture, the mass has various areas of thickness than sound tissues. The radiologist can find cellular breakdown in the lungs by noticing the adjustment of the area of thickness. A chest X-beam, then again, is a two-layered projection of the lung, and the aspiratory knob is typically smaller than 10 mm in size. Accordingly, the radiologist’s challenging is to affirm the kind of knob (i.e., harmless or threatening) as well as the specific area of the knob to perform further clinical treatment. Assuming that the radiologist sees any dubious discoveries on the chest Xbeam, the individual will prescribe a modernized tomography assessment to conclude the finding. Radiologists oftentimes utilize modernized tomography (CT) checks because of their high awareness in recognizing aspiratory knobs. It has a higher location rate than other chest radiography strategies and in this way adds to a decrease in cellular breakdown in the lungs mortality [2]. To resolve the issue of the chest X-beam, a CT scanner breaks down the patient’s lung by taking various X-beam estimations from different points and creating cross-sectional pictures of the lung. The radiologist examines the patient’s lung in three-dimensional space by combining the scanned CT images. Dissimilar to X-beam pictures, CT pictures have a complex measurement (alluded to as Hounsfield Units) to help radiologist in distinctive examples among strange and typical tissues. As displayed in Fig. 1a, a solitary CT examine produces a progression of crosssectional pictures that run from the top to the lower part of the lung. A slice is a cross-sectional image obtained from a CT scanner. Figure 1b, c shows two distinct cuts acquired from a 64-year-old female cellular breakdown in the lungs patient at various hub positions. Practically speaking, in spite of the way that CT checks have a high responsiveness in distinguishing pneumonic knobs [4] it is challenging for a radiologist to decide if the knob is harmless or threatening. Whenever there are countless patient cases, it requires investment for the radiologist to create a finding. The original national lung testing sample observed that screening low-portion CT filters diminished cellular breakdown in the lungs mortality by 20%. The pace of positive low-portion CT filters, then again, was 24.2%, with an aggregate of 96.4% of positive screenings showing a bogus positive. As indicated by, a bogus positive conclusion will result in pointless subsequent clinical assessments, deferring the determination while expanded radiation openness hurts the patient. Since the CAD framework works on radiologists’ symptomatic precision, it is an excellent choice for preliminary diagnosis [3, 5, 6]. An ordinary CAD framework for a lung CT examines is isolated into two phases: (1) knob identification and (2) lung sickness order. The main phase alludes to zones of worry that are featured as suspect knob up-and-comers. To produce marks, general
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
91
Fig. 1 64-year-old female’s lung CT scan [3]. a Direction of cut obtaining b a solitary standard cut c one unusual cut on the right top of the right flap with a harmful knob
methodologies are applied through picture handling philosophies like twofold limit measurements or morphology tasks. Computer-aided design produced marks, then again, have high awareness’s and bogus positive rates [3]. For knob characterization, highlights from stamped areas of concern are normally extricated, and AI plans are utilized [20]. Figure 2 portrays a finished CAD framework to help clinical radiologists in the conclusion of pneumonic knobs. A PC-supported finding instrument for pneumonic knob threat evaluating is important for the second phase of the CAD framework and spotlights on decreasing misleading up-sides by recognizing knob types among harmless and threatening cases. Since pneumonic knobs arrive in a broad scope of forms, dimensions, and high visual similitude’s among harmless and dangerous examples, breaking down low-level non-literary attributes, for example, knob shapes or sizes never yields a promising indicative presentation. Early CAD frameworks for pneumonic knob discovery utilized traditional imaging handling strategies to depict low-level aspiratory knob highlights in higher layered spaces [6]. Nonetheless, because of low discriminative power, such highlights had restricted speculation when new knob designs showed up [7].
92
A. H. Ahmed et al.
Fig. 2 Typical CAD system that uses a CT scan to detect a malignant pulmonary nodule
Profound learning has seen extraordinary outcome in picture characterization, objective identification, picture division, and normal language handling as of late. Deep learning can achieve near-human performance in many of these fields [8]. Convolutional brain organization, the most famous profound learning engineering to perform picture grouping, can separate undeniable level discriminative highlights. Notwithstanding clinical use for pneumonic knob location, ConvNet utilizes a fixbased crude picture with no extra data like knob division or knob volume, and the ConvNet is prepared naturally start to finish in a directed way without the utilization of any extra component extractor or classifier. Accordingly, ConvNet is supposed to support the improvement of CAD framework execution. Many studies have been conducted to investigate the use of deep learning in the medical field [9] in order to improve diagnostic accuracy.
2 Proposed System Cellular breakdown in the lungs is the second most normal disease in all kinds of people that besets 225,500 individuals per year in the USA. Almost 1 out of 4 diseases passing are from cellular breakdown in the lungs, more than colon, bosom, and prostate malignant growths consolidated. Early recognition of the malignant growth can consider early treatment which altogether builds the possibilities of endurance. This venture makes a calculation that naturally recognizes competitor knobs and predicts the likelihood that the lung will be determined to have disease in the span of 1 year of the CT examines. The calculation is summed up by the accompanying system (Fig. 3).
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
93
Fig. 3 Obtain a large lung CT dataset that contains annotated nodules that are verified by trained radiologists preprocess dataset into inputs suitable for image segmentation (see Fig. 4). Train a U-Net CNN for image segmentation of the nodules
94
A. H. Ahmed et al.
2.1 Data Description The Lung Image Database Consortium (LIDC-IDRI) image collection [4] is our primary dataset, and it contains diagnostic and lung cancer screening images. Thoracic computed tomography (CT) scans with annotated lesions that have been marked up. The annotations and associated candidates files are provided in.csv format. The dataset itself has images as.mhd files (512 × 512). The characteristics of the data are detailed below. For every patient, the data involves of CT scan data and a label (0 for no cancer, 1 for cancer). The up-and-comers document is a csv record that contains knob applicant per line. Each line holds the output name, the x, y, and z positions of every up-andcomer in world directions, and the relating class. The comment record is a csv document with one finding on each line. Each line contains the output’s SeriesInstanceUID, the x, y, and z positions in world directions, and the comparing distance across in mm. There are 1186 knobs in the comment document.
2.2 Data Preprocessing We are using scipy, SimpleITK tool kit alongside Python Imaging Library (PIL) to process image containing knot detection and scaling (important for identifying tumors associated with lung cancer) have done down sampling, segmentation, normalization, besides to zero centering. The used images contains the ground truth coordinates as well as the nodule radii for each scan.
2.3 Down Sampling The dataset being used for analysis happens to contain only 1351 positive samples which constitute only 0.2451% of the available dataset. This is indicative of heavy imbalance in the dataset. Because the algorithm predicts no failure for every example, analyzing the unbalanced dataset yields suboptimal results. The Type 1 error rate is far too high. When a failure occurs but the model predicts no failure, this is referred to as a type 1 error (also known as a false negative). Down examining produces a fair dataset by coordinating the quantity of tests in the minority class with an arbitrary example from the greater part class built a new dataset with 1351 the positive cases and add to it a set of 5404 randomly samples negative cases (4 times the number of positive samples) thus, giving a dataset which comprises of 20.0% positives and 80% negatives. Still the dataset is unbalanced though the imbalance is not as substantial as before. This imbalance will be further addressed in the section of data augmentation.
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
95
2.4 Thresholding and Segmentation First, each scan was rescaled so that each voxel represented a 1 × 1 × 1 mm volume. To handle and detect nodules and masses that were hidden near the lung tissue edge, segmentation needs to be performed on the tomography images provided in the dataset. This had to be done with caution because any suboptimal process could remove them from images, allowing no possibility for a knot detector for detecting them. The preprocessing involves masking the regions which are not lungs, also known as the process of thresholding will mask all regions of the scan that are not lung tissue with Hounsfield Units (HU) that measure radio density. It is − 500, − 1000, 700, and 0 HU for lung tissue, air, bone, and other tissues and blood respectively. Pixel intensities in CT scans can be expressed in HU and have semantic meaning. The minimum and maximum interesting Hounsfield values were used to clip all intensities. HU, therefore, will cover pixels that will be close to 1000 or higher than 320, leaving just tissue of lung. 2D reductions with a size of 512 × 512 will be made only for images with handles (insinuating the handle coordinates given in the competitors). CSV bookkeeping sheet) of each wake thresholding for each CT filter. The proposed system is to input these images directly into the neural network. However, such large size of images is not suitable; hence, it was reduced. Figures 4 and 5 show the ways images have been reduced in size while segmenting out the region around the nodule. This area-specific segmentation is carried out with the help of the data annotated in the file candidates .csv document containing the directions of every handle in each result. The last picture size considered in the model is 100 × 100 pixels.
Fig. 4 Segments of positive and negative samples
96
A. H. Ahmed et al.
Fig. 5 Subsampling of CT scan images based on HU for a negative sample of patient #600
2.5 Data Augmentation This section dealt with the class imbalance using down sampling. To further resolve the issue, chose another popular technique of data augmentation to create new images through performing slight transformations on the images obtained after processing by section slight variations like translations, flips, or rotations alters image such that neural network would interpret these as distinct images. Figure 6 shows enlarged ages augmented to the dataset.
2.6 Data Splitting Once the dataset imbalance is minimized, we can split the data into training and test set based on an 80–20 split. Further, the training data is split using an 80–20 split to yield a training and validation set (for cross-validation and hyper-parameter tuning of the chosen model). From the pickle files (used to store labels and address of image files) for train, validation, and test data, images are extracted and converted into .jpg format and storing them in HDF5 files for further use in keras/tensor flow processing.
2.7 Model Selection In medical imaging, most widely used architectures include U-Net + ResNet/ UNet + AlexNet which are heavy on the system’s memory. As opposed to these conventional preferences, it is intention to experiment with the lightweight networks
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
97
Fig. 6 Data augmentation of segment image of patient #3285
for medical image analysis. We are not using any of the available pre- trained networks as data differs vastly from the ImageNet used to train these models. Baseline CNN Model As described in Fig. 7, we are using a baseline Vanilla CNN using 3 convolutions layers. Each layer of CNN has a stride 1, since the nodule is defined by the transition from white to dark patch, believed that stride would be useful to identify this transition. A weighted soft-max cross-entropy loss computed is used, as a label of 0 is far more common than a label of 1. Class weights are hyperparameters subjected to experimentation. Experimenting the class weights on the Vanilla CNN model, weights = [0.4, 0.6] provided the best results. The architecture of the baseline convolutional neural network is detailed above. Since this network has 2 fully connected layers, it takes longer time for the model to train on GPU. Optimizer used was Adam with learning rate 0.001. The best accuracy achieved with this model on validation set was 89.932.
98
A. H. Ahmed et al.
Fig. 7 Block diagram depicting architecture of the vanilla 2D CNN
SqueezeNet It has long been experimented with to train heavy networks such as AlexNet, and ResNet for nodule detection with identical exactness. Hence, it served like a motivation for using low memory networks like a SqueezeNet classification that have been chosen to ignore the max-pooling layer because our image is relatively smaller. SqueezeNet is similar to AlexNet but involves some changes to the latter’s architecture including: (1) 3 × 3 filters will be replaced with 1 × 1 filters. (2) Input channels will be reduced to 3 × 3 filters. To decrease the parameters we have to decrease the inputs to the 1 × 1 filters. (3) Fire modules will be designed and consist of a squeeze (1 × 1 filters) layer as well as an expansion layer (3 × 3 filters). This is the macroarchitecture of CNN employed within the framework SqueezeNet is shown in Fig. 8. MobileNet Distinguishable convolution is used to decrease the size of the model as well as complex arrangement intricacy. It became particularly advantageous aimed at adaptable and novel idea functions. To put it another way, significance-wise convolution is a channel-wise spatial convolution, though point-wise convolution is a 1 × 1 convolution. It ought to be noticed that batch normalization (BN) and ReLU are performed after every convolution. The depth-wise separable convolution divides this filtering and combining layer. This will significantly reduce computation and model size [10].
3 Results and Discussion In clinical trials awareness is the degree to which real up-sides are not disregarded (so bogus negatives are not many) and particularity is the degree to which genuine negatives are delegated such (so misleading up-sides are not many) (Table 1). Apart from the plots in Figs. 6, 7, 8, we also tested each model on our test data which the model has not seen before. The results were: 0.8478 (vanilla CNN) > 0.8324 (MobileNet) > 0.8296 (SqueezeNet) as shown in Figs. 9 and 10. It is also necessary in medical applications to be concerned with not only the accuracy as a metric but also specificity (Sp) and sensitivity (Se). These two metrics are of significance since people with malignant cancer should be never be misclassified as a ‘safe’ patient. Therefore, the best model will be contingent upon the Sp, Se. Noticing the region
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
a
99
b
Fig. 8 Functional unit for a fire module architecture for SqueezeNet b fundamental functional unit of the MobileNet Table 1 Evaluation metrics Model
Accuracy
Specificity
Sensitivity
AUC score
Baseline CNN
89.921
0.877
0.838
0.9209
SqueezeNet
85.212
0.883
0.874
0.9019
MobileNet
86.315
0.765
0.916
0.8919
100
A. H. Ahmed et al.
under the ROC bend of a test can be utilized as a standard to gauge the test’s separating capacity, for example how great is the test experiencing the same thing and as shown in Fig. 11.
Fig. 9 Training plots over 20 epochs
Fig. 10 Validation plots over 20 epochs
Convolutional Neural Network-Based Lung Cancer Nodule Detection …
101
Fig. 11 Receiver operating characteristics (ROC) curve
4 Conclusion Although it is for the vanilla CNN achieves higher accuracy on the test data, it is not an ideal choice as it takes longer to train and it is Sp-Se trade-off is comparable to SqueezeNet. Furthermore, a higher accuracy of the MobileNet does not make it a favorable choice due to its poor Sp-Se trade-off. Hence, we decide that for the given data SquezeNet is the best model given the faster computational time and good SpSe trade-off along with satisfactory accuracy believe these results are significant as SqueezeNet and MobileNets are not widely renowned networks for analysis of biomedical images aim to extend the project further to adapt these deep learning architectures to predict using 3D volumetric slices. Besides, the splendid pixels are for the most part bigger than the area of risky handles, so it extremely satisfactory might be feasible for enlargement ongoing model to choose the exact zone of the hazardous handles.
References 1. American Cancer Society (2017) Cancer facts & figures 2017. Available online: https://www. cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancerfacts-figures-2017. html 2. Buda M, Maki A, Mazurowski MA (2017) A systematic study of the class imbalance problem in convolutional neural networks. CoRR, abs/1710.05381 3. Castellino RA (2005) Computer aided detection (cad): an overview. Cancer Imag 5(1):1719
102
A. H. Ahmed et al.
4. Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H (2015) Chest pathology detection using deep learning with non-medical training. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI), pp 294–297 5. Armato SG, Lissak Giger M, Moran CJ, MacMahon H, Doi K (1998) Automated detection of pulmonary nodules in helical computed tomography images of the thorax. Proc.SPIE 3338:4 6. Haixiang G, Yijing L, Jennifer Shang G, Mingyun HY, Bing G (2017) Learning from classimbalanced data: review of methods and applications. Expert Syst Appl 73:220–239 7. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag, New York Inc, Secaucus, NJ, USA 8. Armato SG, Drukker K, Li F, Hadjiiski L, Tourassi GD, Kirby JS, Clarke LP, Engelmann RM, Giger ML, Redmond G, Farahani K (2016) Lungx challenge for computerized lung nodule classification. J Med Imag 3:3–9 9. Ciompi F, de Hoop B, van Riel SJ, Chung K, Th. Scholten E, Oudkerk M, de Jong PA, Prokop M, van Ginneken B (2015) Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2d views and a convolutional neural network out-of-the-box. Med Image Anal 26(1):195–202 10. Bottou L (1998) Online learning in neural networks. Chapter on-line learning and stochastic approximations. Cambridge University Press, New York, pp 9–42
D-Test: Decentralized Application for Preventing Fake COVID-19 Test Certificate Scam by Blockchain Himani Mishra and Amita Jain
Abstract More than 6 million people have lost their lives due to COVID-19 across the world (Ghatkopar in Fake negative COVID-19 certificate scam unearthed, 2019, [2]; WHO (World Health Organization) in https://covid19.who.int/table, [3]). Recently, fake COVID-19 test certificate scams have spiked up drastically and become one of the reasons for the spread of COVID-19. In light of the current scenario, this paper proposes a decentralized approach called, “D-Test” for COVID19 testing which allows the hospital and the general public to register themselves at a common platform which follows the concept of CIA triad (Confidentiality, Integrity, and Availability) and allows users to register without any fear of data breach. This platform registers users based on smart contract and enables the user to do the following once registered successfully: (a) Book Testing Slot, (b) Find nearby registered testing laboratories, and c) Generate the COVID-19 reports which could be imported and exported as and when required by the user. This has a higher value of trust because the source of the report can be traced back since usage of Blockchain prevents the likelihood of data tampering by an entity. This framework could help the government(s) keep track of distributing authentic COVID-19 testing certificates, prevent the fake COVID-19 testing certificate scams, and will speed up the process of verifying the users’ test reports, thereby saving lives of many citizens around the world. Keywords Fake COVID-19 certificate · Blockchain · Smart contract · Meta-mask · COVID-19
H. Mishra · A. Jain (B) Netaji Subhas University of Technology, Delhi 110031, India e-mail: [email protected] H. Mishra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_9
103
104
H. Mishra and A. Jain
1 Introduction In 2020, the world has seen the worst pandemic called COVID-19. During the pandemic in order to check whether a person is infected or not, the COVID-19 tests were performed by various hospitals and laboratories. Even though during the second COVID-19 wave, the demand of testing had increased tremendously and even booking a slot was a huge task. But by the end of year 2021, self-testing and self-proclaimed COVID-19 test kits and facilities were introduced by various health organizations. It is important to note that in the year 2020-till now, about 474,659,674 total [1] COVID-19 cases globally have been registered and around 6,103,355 death rates have been registered as of now, which is certainly quite tragic yet alarming [2, 3]. This is one of the major causes for such a large-scale demand for testing. The following records are the COVID-19 cases recorded in the following countries as follows India recorded 430,14,687 COVID cases till date, USA COVID cases till date 79,091,857, Europe alone recorded 1,96,493,754 COVID cases, and United Kingdom COVID cases 20,516,002 till date [2, 3]. Today, the COVID-19 test certificates are used in various places as means to be allowed admission for travel, office, public places like hotels, etc. Also, in order to verify whether the user should be sent to quarantine or hospital depending upon the status COVID-19 disease test post-travel for example. It was observed that many passengers to expedite their admission knowingly or unknowingly took advantage of COVID-19 fake certificates that are not generated from any authentic laboratories or testing kits. The given paper proposes an approach toward decentralized platform for COVID19 fake news detection with the help of blockchain and machine learning. This platform will help the reviewers to detect as well as report the fake reports as for each report logs would be maintained in the form of private blockchain. These would be maintained and updated time to time by the reviewers as well as administration.
2 Brief Literature Survey During the research, various scientific papers were seen focusing on the scams and fake COVID-19 certificates now even though there were solutions, but they were not discussed as the main point but separately [4]. This paper allowed the hospital facility to allow to view the patient’s data and the details but verifying the user was quite complex [5]. This paper focuses on using the unique identification number which is assigned to each citizen of India that is Aadhar card. Adding this feature during the registration phase will help in tracking the users easily, and also in future, many other countries could link with their unique identification number (UID) which could be linked with the government portal to allow with ease the fast track verification of COVID-19 test certificate especially in case of traveling from country to country or
D-Test: Decentralized Application for Preventing Fake COVID-19 Test …
105
state to state. The portal will maintain the logs of each COVID-19 test done thus, making the process smooth, scalable, and most importantly safe. In the proposed methodology, this paper focuses fully on creating a decentralized platform to prevent scams for COVID-19 testing certificates.
3 Methodology In the given section, the proposed methodology discusses in detail the D-Test application to prevent fake certificates or reports. The system design is as shown below (Fig. 1). The steps followed for D-Test decentralized application are detailed below: (a) Registration of the Hospital and the users the decentralized platform The hospitals register their respective testing laboratories in the decentralized portal where a smart contract is signed to ensure the integrity and authenticity of the testing laboratories. These details would be verified by the administration of the decentralized platform which when approved would be automatically added to the platform. The same process would be taken place for the users as well and instead of hospital details the users’ personal details would be supplied which are only accessed by the user and the administration for verification purpose. (b) Two-Step Authentication using the OTP (One-time password) and face detection The two-step authentication would verify the user’s authenticity in real time. The users would first verify through their mobile number by generating an OTP which they have given in order to go to next step. Fig. 1 Discuss the framework for reporting COVID-19 blockchain
Organization
Smart Contract
Two step login
Authorised User
Review
106
H. Mishra and A. Jain
Fig. 2 Process for the two-step login using the meta-mask
Step 1: USER Public Address Nounce Step 2: Generate Random Nounce Step 3: Nounce In process Step 4 : Fetch the current nounce
Step 5: Sign the nounce
Step 6 : Signature
Step 7 : Verify the signature
As soon as they login, the platform would ask for face detection in which the user’s face snapshot would be taken for the future record (Fig. 2). (c) Register for COVID-19 testing The user would register with the listed COVID-19 hospitals who have registered earlier in the portal according to their nearest laboratories or testing centers. During the registration, the user details such as name, place, date of birth, Aadhar card number, and vaccination details would directly be taken from the portal which they provided during the registration. (d) Report Verification on the platform Once the user’s identity has been verified as unique and authentic, the hospital would directly report the status of the report in the given platform. The proposed methodology proves not only integrity and authenticity but also security, by the real-time check that is created in the secured platform (Fig. 3).
D-Test: Decentralized Application for Preventing Fake COVID-19 Test …
107
Algorithm 1: User’s Request for COVID19 testing 1.Input: function (user_data, upublic_key, users_u_key): 2. if block authenticate then 3. Accept_data(user_data,upublic_key) 4. else if 5. Reject the request for covid19 report 6. else 7. Output : End Fig. 3 The algorithm used for requesting COVID-19 test report
4 Results and Discussion The given section discusses the comparative analysis of the current solutions available with the proposed solution that is D-Test application framework. The D-Test is proposed as a decentralized platform designed to combat the generation of fake COVID-19 certificate with the help of blockchain. Figure 4 discusses the comparative analysis which depicts that D-Test showed better results compared to off the shelf centralized and decentralized systems in healthcare as depicted below. Based on the evaluation, it can be clearly stated that even though centralized systems have their own advantages but they are still not secured and are prone to cyber-attack easily [4–6]. While blockchain-based solution not only allows a platform to connect around the world at once, it also secured because of strong cryptography security and mutual consent of all trusted members in the network. Centrlized System
Decentrlized System
DTest 25 20 15 10 5
0
Fig. 4 Comparative analysis of centralized based system, decentralized system, and proposed DTest solution against the KPIs that enable the prevention of fake reports
108
H. Mishra and A. Jain
Since the D-Test is blockchain based, so the level of security along with twostep validation acts as multiple layers of verification which makes it more secure. Combined with the Aadhar card (unique identification number) that is unique for all citizens and enforces more traceability. Also, the hospitals registered in the D-Test portal will go through the same steps of verification and background check which makes it more secure and reliable than other systems. The D-Test solution can be integrated to any blockchain network, as well as can be incorporated in government-based system accurately and easily. A. D-Test Application Dashboard The application will give users three options as stated in Fig. 4: Dashboard of a user in a D-Test application. The user can do the following once logged in which are (a) finding testing laboratories near you, (b) Book Slot, and (c) check reports of the COVID-19 test result. The hospitals will also register in the same platform but only after fulfilling the required documents during the smart contract and proper background check only. B. Finding the registered laboratories and Slot Booking through D-Test Application This feature allows the user to track the nearby testing laboratories. The location of the device should be on in order to give smooth user experience of tracking the testing laboratories (Fig. 5; Table 1). These testing laboratories would be registered with the D-Test portal. Thus, verified and authentic laboratories which are certified by the government would be only allowed thereby, allowing user to book slot without any fear of scam. The laboratories would directly update in the portal the report of COVID-19 test. The user then can import or export report as per convince or requirement, the logs along with the report would be maintained in the user dashboard in the given D-Test portal. The user can pay for the testing using the crypto-based wallet transaction as well along with the traditional payment such as cash, net banking, and feature as well (at the testing laboratory) in order to do fast and easy booking.
5 Conclusion and Future Work This paper successfully proposes a decentralized application framework for preventing the fake news using the blockchain technology. In the given proposed application D-Test, the user and the hospitals and testing laboratories would register using the unique identification (number along with the necessary details in order to verify them in the portal). The user and their data are safe due to strong cryptographic checks. In addition, the user can import or export the reports as in when required. In
D-Test: Decentralized Application for Preventing Fake COVID-19 Test …
109
Fig. 5 Feature allows to find nearby testing laboratories and book a slot Table 1 Detailed comparison of causes of scams, current solution for preventing scams, short coming of current solutions and proposed solution through D-Test application for preventing fake COVID-19 certificate scam S.no.
Domains
Causes of scams Current solutions for preventing scams
Short coming of current solutions
Proposed solution: D-test
1
Check
Certificates are not checked
Manual check Automated check Automate
2
Third party involvement
High
High
High
Low as compared to current solution
3
Security
Very low
Very low
Very low
High
4
Single point failure
Possible
Possible
Possible
Not possible (continued)
110
H. Mishra and A. Jain
Table 1 (continued) S.no.
Domains
Causes of scams Current solutions for preventing scams
5
Cyber attacks High
6
Data breaches Possible
7
Control
Centralized authority
Short coming of current solutions
Proposed solution: D-test
High
High
Low
Possible
Possible
Rare
Centralized authority
Centralized authority
Decentralized authority, the user has full control
future, it can be expanded further by adding more features such as tracking of certificate and adding home testing laboratories which could help further in enhancing the useability.
References 1. WHO (World Health Organization). https://covid19.who.int/table 2. COVID-19 cases https://covid19.who.int/table (2022) 3. Ghatkopar (2019) Fake negative COVID-19 certificate scam unearthed. https://www.mumbai live.com/en/crime/fake-negative-covid19-certificate-scam-unheartened-71665 4. Shang Q, Price A (2019) A Blockchain-based land titling project in the Republic of Georgia: Rebuilding Public trust and lessons for future pilotprojects. Innovations: Technology, Governance, Globalization 12:3–4 5. Hasavari S, Song YT (2019) A secure and scalable data source for emergency medical care using Blockchain technology. In: 2019 IEEE 17th international conference on software engineering research, management and applications (SERA), pp 71–75. https://doi.org/10.1109/SERA.2019. 8886792.(2019). https://ieeexplore.ieee.org/document/8886792 6. Shirin H, Yeong TS (2019) A secure and scalable data source for emergency medical care using Blockchain technology. https://doi.org/10.1109/SERA.2019.8886792
A Novel QIA Protocol Based on Bell States Position by Random Selection B. Devendar Rao and Ramkumar Jayaraman
Abstract Identifying trusted user plays an important role before initiating secure communication; therefore, a quantum authentication schemes is proposed based on Bell pair. Two trusted party initial shared a common secret key (known as pre-shared key) which only known to them. Various existing protocol uses quantum resources with memory for Authentication process but the storage time of qubits in 3 ns. In proposed protocol, authentication process had done without strong the Bell states by trusted parties. Sending party selects 4 classical bits to form Bell pair in any one of the positions {(1, 2)(3, 4)} or {(1, 3)(2, 4)} based on the consecutive bits in pre-shared key. The receiving party uses the consecutive bits in pre-shared key to decode the information about authentication key using Bell state measurement. Adversary has no knowledge about pre-shared key and guessing incorrect position makes entangle swapping of Bell states. Adversary incorrect position leads to its identification by trusted parties by verifying the authentication key in classical channel. The security of proposed protocol is analyzed under intercept measure and resend attack (IR). In addition, proposed protocol can prevent the adversary to fetch the information about the pre-shared key. Various existing protocol is compared with the proposed one with quantum resources and memory requirement. The proposed protocol is implemented in IBM Quantum Lab, and circuit simulation is shown visually in IBM quantum composer. Keywords EPR · Pre-shared key · Authentication · Entanglement swapping
B. Devendar Rao · R. Jayaraman (B) Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] B. Devendar Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_10
111
112
B. Devendar Rao and R. Jayaraman
1 Introduction The modern-day security relies not only on secure transmission of information among user but also identifying whether the user is trusted or not [1]. Current generation depends on online for various transactions from banking to grocery shopping, a large quantity of communication task involved; therefore, it requires designing more reliable authentication system. Authentication is done to identify the legitimate user or component/devices before initiating communication. For example, users or people are authenticated during online voting, authentication of devices during IoT communication. Classical authentication system developed by mathematical algorithm and computation power is in threat due to quantum computing. The QIA protocol designed without using Bell states, where single photon is used to communicate between trusted parties [2, 3]. Another alternative design of QIA protocol is through Bell states, where two photons are entangle and communicated. Recently, several authentication schemes [4, 5] are proposed using EPR states. In 1995, the first QIA protocol was proposed by Crepeau et al., which was based on Quantum Teleportation [6]. Quantum key distribution (QKD) and classical procedure are merged together to obtain a new Authentication schemes proposed by Dusek et al. [7]. Quantum secure direct communication (QSDC) schemes are used in providing the identity authentication based on correlation among GHZ states [8]. Hong et al. [9] proposed a quantum identity protocol based on single qubit which uses unidirectional in performing authentication. Third parties are extensively used in providing simultaneous authentication for two users in bidirectional way. The usage of third party in authentication protocol always leads to some information gain about the authentication key. Most QIA protocol performs well by storing the single qubit by one of the communicating parties in theoretical research but fails in actual implementation. Department of modern physics and National Laboratory for physical science at micro-scale have maximum storage of the qubit up to 3 ns. In proposed model, how Bell states are used to perform QIA without storing the qubit are shown. Many QIA protocols designed with or without EPR pair can be constructed via communication [10] or computation task [11]. The paper is organized as follows: Sect. 2 discusses about the basic of Bell states and entanglement swapping. The working model of proposed protocol and how the protocol works when adversary not involved in communication was given in Sect. 3. The security model when an adversary involved in communication and its implementation in IBM Qiskit tool not involved in communication was given in Sect. 4. Comparison among various QIA protocol methodology and quantum resource utilization is as discussed in Sect. 5.
A Novel QIA Protocol Based on Bell States Position by Random Selection Table 1 Applying Pauli operator depends on pre-shared key
113
ki
ki+1
Pauli operator
Applied on qubit
0
0
I
1
0
1
X
2
1
0
Z
3
1
1
Y
4
2 Proposed Protocol 2.1 Initial Procedure Step 1: Before initiating authentication protocol, Alice and Bob must share a unique key (which is known as pre-shared key) after the QKD process. K = {k1 , k2 , k3 . . . kn }, Step 2: Alice and Bob agree on each of the four Bell states {00—|+ >, 01—| + >, 10—|− >, 11—| − >} to can carry two bits of classical information to encode the authentication key. Step 3: Alice and Bob agree on the position information based on the pre-shared key ki ⊕ ki+1 = 0 for position {(1, 2) and (3, 4)} or ki ⊕ ki+1 = 1 for position {((1, 3) and (2, 4)}. Step 4: Both the trusted parties share Table 1, where Pauli operator selection depends on the values of pre-shared key ki and ki+1 as well as on which qubit the Pauli should applied as shown in Table 1.
2.2 Quantum and Classical Procedure Step 5: Alice wants to communicate with Bob; Alice creates the random authenticated key of length 4 m, where m = n%10. A K = {Ak1 , Ak2 , Ak3 . . . Akm }, Step 6: Alice selects group of 4 bit in sequential order and generates the position choice based on the pre-shared key values xi = k ⊕ ki+1 . If xi = 0, generate the Bell pair in position{(1, 2) and (3, 4)} or else in {(1, 3) and (2, 4)}.
114
B. Devendar Rao and R. Jayaraman
Step 7: Alice applies the Pauli operator {I, X, Z, Y } on Bell states qubit based on the value of ki and ki+1 as given in Table 1. Step 8: Alice sends the generated two EPR pair to the Bob. Step 9: Bob receives the Bell pair and applies the Pauli operator {I, X, Z, Y } based on ki and ki+1 as given in Table 1. Step 10: Bob performs Bell state measurement on Bell states based on the position pair given by the pre-shared key xi = ki ⊕ ki+1 and stores the Alice authenticated key AK . Step 11: Repeat Steps 5 to 10 until 4 m authentication key send from Alice to Bob. Step 12: Bob compares the authentication key A K and AK in classical channel. If the difference between the authentications key is more than Qubit Bit Error Rate (QBER) [12], abort the protocol or else continue to step 13. Step 13: Bob prepares its authentication key B K and performs the same procedure as discussed in Step 6 to Step 8. Alice measures using BSM as specified in Step 9 to Step 10 and stores the Bob’s authentication key B K . Step 14: Bob compares the authentication key B K and B K in classical channel. If the difference between the authentications key is more than QBER [12], abort the protocol or else continue with secure message communication.
3 Security of Proposed Protocols In trusted communication, Adversary Eve impersonates the trusted users to obtain the authenticated key information. Adversary uses the Intercept Measure and Resend (IR) attack from the family of individual attack. IR attacks the incoming Bell states from Alice, by performing Bell state Measurement (BSM) to decode the key and encode the obtain key into Bell states, and send to Bob. Eve tries to maximize the information about the trusted parties’ authentication key without being detected. The proposed protocol was implemented in IBM Qiskit tool using Quantum Lab and Quantum Composer [13].
3.1 Adversary Guesses Correct Bell Position Eve has no authenticated information about the pre-shared key K; therefore, Eve has to randomly selects the Bell position either {(1, 2) and (3, 4)} or {(1, 3) and (2, 4)}. If Eve is lucky enough to get the right position guess, it can trespass the security measure created by the proposed protocol. Let us see with an example, Alice generate the Bell pair |− >12 | − >34 based on the pre-shared key ki ⊕ki+1 = 0⊕0 = 0, and apply the
A Novel QIA Protocol Based on Bell States Position by Random Selection
115
Fig. 1 Quantum circuit when Eve guesses the key ‘00’
Fig. 2 Quantum circuit when Eve guesses the key ‘11’
identity gate for the key value ‘00’ and sent to Bob. Eve has two choices here to select the correct position either eki ⊕ eki+1 = 0 ⊕ 0 = 0 or eki ⊕ eki+1 = 1 ⊕ 1 = 0. If Eve is lucky enough to select eki = eki+1 = 0, then Eve is able to apply the correct Pauli operator ‘I’ and decode the authenticated bit by applying BSM in position {(1, 2) and (3, 4)} as shown in Fig. 1. In each iteration, Eve has 100% chance of generating same authenticated key E A K equal to A K and AK . If Eve has 1/2 chance to selecting eki = eki+1 = 1; then, Eve is able to apply the incorrect Pauli operator ‘Y ’ on 4th qubit and decode the authenticated bit by applying BSM in position {(1, 2) and (3, 4)} as shown in Fig. 2. Bob performs the Pauli operator ‘I’ and applies BSM based on ki ⊕ ki+1 = 0 ⊕ 0 = 0 and decodes the authenticated key bit. The Eve information gain for the authenticated key has 50% chance to generate key and another 1/2 chance to obtain 50% key, so the total information gain is 12 ∗1+ 21 ∗ 21 = 34 . The detection probability of Eve by the trusted party Alice and Bob is null, because even for incorrect Pauli operator only change the Eve key but Alice and Bob’s key remain the same.
3.2 Adversary Guesses Incorrect Bell Position Choice Eve has 1/2 chances to select the wrong Bell states and has higher probability of getting caught by trusted user. Let us see with an example, Alice generates the Bell pair |− >12 | − >34 based on the pre-shared key ki ⊕ ki+1 = 0 ⊕ 0 = 0, and applies the identity gate for the key value ‘00’ and sends to Bob. Eve has two choices here to select the incorrect position either eki ⊕eki+1 = 0⊕1 = 1 or eki ⊕eki+1 = 1⊕0 = 1. If Eve has 1/2 chance to selecting eki = 0andeki+1 = 1, then Eve applies the incorrect Pauli operator ‘X’ on qubit 2 and decodes the authenticated bit by applying BSM in position {(1, 3) and (2, 4)} as shown in Fig. 3. Since Eve selects wrong position choice, Eve BSM results in entangle swapping and decode wrong authentication key bits. The wrong decoded bits are encoded using incorrect position and applied Pauli
116
B. Devendar Rao and R. Jayaraman
Fig. 3 Quantum circuit when Eve guesses the key ‘01’
Fig. 4 Quantum circuit when Eve guesses the key ‘10’
operator ‘X’ on qubit 2 and sent to Bob. If Eve has another 1/2 chance to selecting eki = 1 and eki+1 = 0, then Eve is able to apply the incorrect Pauli operator ‘Z’ on qubit 3 and decode the authenticated bit by applying BSM in position {(1, 3) and (2, 4)} as shown in Fig. 4. Since Eve selects wrong position choice, Eve BSM results in entangle swapping and decode wrong authentication key bits. The wrong decoded bits are encoded using incorrect position and applied Pauli operator ‘Z’ on qubit 3 and sent to Bob. Bob applies the Pauli operator ‘I’ on qubit 1 and performs BSM based on ki ⊕ ki+1 = 0 ⊕ 0 = 0 and decode the authenticated key bit Ak . The Eve information gain by guessing wrong position choice is null, since the proposed protocol uses the concept of applying unitary operator based on the pre-shared key. The total information gain about the authentication key by Eve is 21 ∗ 34 + 21 ∗ 0 = 38 . Eve to trespasses the security measures of the proposed protocol is 1/4, since Alice and Bob generates the same authenticated bits. The Eve to trespass with security measures in both correct and incorrect position choice given as 21 ∗1+ 21 ∗ 41 = 58 . The total detection probability n of Eve by trusted party during the entire communication channel Pd = 1 − 58 .
4 Comparison with Previous Protocols The proposed protocol has high efficiency and security compared to the existing protocol. Compared with Xiaoyu and Liju [14] presented the protocol using Bell states as proposed but it requires the memory for qubit storage. In current technology, the storage time for qubit is 3 ns. Kang et al. [15] used fully trusted third party, who can obtain secure key K AB information from Alice and Bob during authentication phases while they are identifying each other. Kang et al. [16] proposed an another improved version of previous version by using introducing untrusted third party instead of fully trusted. Secret key information leakage still exists in authentication phases while legitimated user are verifying. Zhang et al. [17] proposed a
A Novel QIA Protocol Based on Bell States Position by Random Selection
117
Table 2 Comparison of existing protocol with proposed protocol Protocol
Quantum resources
Memory needed
The third party
Way of authentication
Xiaoyu and Liju [14]
Bell states
Yes
No
Unidirectional
Kang et al. [15]
GHZ states
Yes
Fully trusted
Bidirectional
Zhang et al. [17]
Bell states
Yes
Semi-honest
Bidirectional
Kang et al. [16]
GHZ states
Yes
Un-trusted third party
Bidirectional
Proposed
Bell states
No
No
Unidirectional
protocol using Bell states and semi-honest trusted parties, where both parties verify their identity simultaneously. Many protocols use the third party which provides bidirectional authentication process by concurrently validating the identity but the protocol requires the trustworthiness of the third party as shown in Table 2. The Bell states are easy to prepare compared to GHZ states; therefore, it is more feasible for implementation in current technology. The proposed protocol needs no memory since the Bell states are sent based on the value of the pre-shared key and it is difficult for adversary to obtain the pre-shared key which was generated via quantum key distribution protocols.
5 Conclusion The proposed quantum identity protocol use of Bell states to authenticate the trusted user instead of single photon. An indirect selection of Bell states position based on two consecutive pre-shared key bits by applying the XOR operation between them. After generating the two Bell states, trusted user applies the Pauli operator {I-00, X-01, Z-10, Y-11} on depending on pre-shared key bits. Eve information gain about the authentication key is null, if Eve guess incorrect position about Bell states and its detection rate is 75%. The protocol was implemented in ideal quantum channel ignoring the noise and imperfect device. Proposed protocol is verified visually by IBM quantum composer and implemented in IBM Quantum Lab. Extension of proposed protocol can be implemented using high dimensional qubit between GHZ and W states which depend on the key value of pre-shared key.
118
B. Devendar Rao and R. Jayaraman
References 1. Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J Comput 26(5):1484–1509 2. Zawadzki P (2019) Quantum identity authentication without entanglement. Quantum Inf Process 18(1):7 3. Zhu H, Wang L, Zhang Y (2020) An efficient quantum identity authentication key agreement protocol without entanglement. Quantum Inf Process 19(10):381 4. Li X, Barnum H (2004) Quantum authentication using entangled states. Int J Found Comput Sci 15(04):609–617 5. Zhang Z, Zeng G, Zhou N, Xiong J (2006) Quantum identity authentication based on ping-pong technique for photons. Phys Lett A 356(3):199–205 6. Creau C, Salvail L (1995) Quantum oblivious mutual identification. Advances in cryptology. In: Proceedings of Eurocrypt’ vol 95. Springer, Berlin, pp 133–146 7. Dusek M, Haderka O, Hendrych M, Myška R (1999) Quantum identification system. Phys Rev A 6(01):1–9 8. Nayana D, Goutam P, Ritajit M (2021) Quantum secure direct communication with mutual authentication using a single basis. Int J Theor Phys 60:4044–4065 9. Hong CH, Heo J, Jang JG (2017) Quantum identity authentication with single photon. Quantum Inf Process 16(10):236 10. Bennett CH, Brassard G (2014) Quantum cryptography: public key distribution and coin tossing. Theoret Comput Sci 560(1):7–11 11. Shan R-T, Chen X, Yuan K-G (2021) Multi-party blind quantum computation protocol with mutual authentication in network. Sci China Inf Sci 64(6):162302 12. Shor PW, Preskill J (2000) Simple proof of security of the BB84 quantum key distribution protocol. Phys Rev Lett 85:441–444 13. IBM Quantum, https://quantum-computing.ibm.com/. Last accessed 2022/03/15 14. Xiaoyu L, Liju C (2007) Quantum authentication protocol using bell state. In: Proceedings of the first international symposium on data, privacy, and E-commerce, pp 128–132 15. Kang MS, Hong CH, Heo J (2015) Controlled mutual quantum entity authentication using entanglement swapping. Acta Phys Sin 24(9):90306–90306 16. Kang MS, Heo J, Hong CH (2018) Controlled mutual quantum entity authentication with an untrusted third party. Quantum Inf Process 17(10):159 17. Zhang S, Zhang-Kai C, Run-Hua S, Feng-Yu L (2020) A novel quantum identity authentication based on Bell states. Int J Theor Phys 59:236–249
Some Methods for Digital Image Forgery Detection and Localization Ankit Kumar Jaiswal, Shiksha Singh, Santosh Kr. Tripathy, Nirbhay Kr. Tagore, and Arya Shahi
Abstract Digital images are a critical wellspring for information. However, a forged image has the potential to spread wrong information and thus may defeat the cause. This resulted in wide research interest in developing image authenticity mechanisms. To be specific, digital image forgery (DIF) detection is an open research area. This paper first provides an overview of (DIF) and its consequences to society at large. Next, image authentication techniques developed by us in the recent past are briefly discussed. In the end, key future research directions are listed with concluding remarks. Keywords Digital forensics · Image forgery detection · Data-driven
1 Introduction With the advancement of handheld devices, graphic editing applications are easily available. An image can be altered by these editing applications (Adobe Photoshop [1], Sensi, and Faceapp [2]) in such a way that a bare human eye cannot detect whether an image is manipulated or not. An altered image with a change in semantics is considered a forged image. DIF can be classified into two categories; one is a copymove forgery (CMF), and another is image splicing. In the former, a part of an image A. K. Jaiswal (B) · N. Kr. Tagore School of CSE & Technology, Bennett University, Greater Noida, India e-mail: [email protected] N. Kr. Tagore e-mail: [email protected] S. Singh Shambhunath Institute of Engineering and Technology, CSED, Prayagraj, India S. Kr. Tripathy CSED, Indian Institute of Technology (BHU), Varanasi, India e-mail: [email protected] A. Shahi Banasthali Vidyapith, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_11
119
120
A. K. Jaiswal et al.
Fig. 1 Instance for the DIF a CMF b image splicing
is copied and pasted onto the same image, and in the later, cropped region of an image is pasted onto another image (Fig. 1). The uploaded image on social media platforms can influence public opinions. Hence, a forged image may change the action of the community [3]. In a survey conducted [4], it was shown that around 15% of respondents were involved in scientific misconduct (manipulating data, plagiarizing, fabricating, etc.) during the year 2011–2014. In another study [5], it was shown in a research article of Cell Biology that out of accepted manuscripts 20% of the manuscript contain altered figures. Digital images work as visual evidence in courtrooms as primary or secondary evidence. But then again, they are being altered in smoothly to manipulate the legal or court working. Manipulation of an image is a question mark on the integrity of an image, which gives attention to this research problem. Two questions arise regarding the authenticity of the image: 1. What are the ways to define the authenticity of an image? 2. What are the methods to localize regions of forged image? To proof the fact whether a human eye can detect the forged image, an experiment is conducted in [3]. In this experiment, participants were asked to select fake images from the given news articles [6]. It was found that most of the participants were even unable to distinguish the forged and the original images. So, it has become the need of the hour to develop such a technique that can detect the forged image and localize the manipulated region. The manuscript is divided into five sections. The first section of the paper gives an introduction to DIF and the motivation behind the research area. A literature review is done in the second section. The third one is dedicated to the thesis contribution, and future research direction is given in the fourth section. The manuscript is concluded in the last section.
2 Literature Review Existing image authentication techniques are grouped under active protection schemes and passive detection techniques. In an active protection scheme, during
Some Methods for Digital Image Forgery Detection and Localization
121
the image acquisition process, some contents are attached with the acquired image which further helps in image authentication. This content can be in the form of a digital signature or watermarks [7]. While passive detection technique considers the intrinsic traces left by different components during the process of digital image acquisition. It could be sensor-based noise inconsistent pattern, color filter array, lighting condition, or even the metadata of the image. Active protection schemes are sensitive to changes in pixel values, such as brightness or contrast adjustment. Moreover, the image transmitted through the channel requires compression of the image, in such case resaving of images, changes the pixel values, which is undesirable. Minor changes, like contrast and brightness adjustment, do not manipulate the semantic content of an image. Hence, generic signatures are not robust during mild processing of an image and the image may be detected as forged. Considering the challenges of active protection schemes, a lot of research have been done on passive detection techniques. These techniques are based on intrinsic footprints (traces/evidence) left by the process at a different phase of the image acquisition pipeline during the formation process of the image. These are mathematical or statistical models on assumptions that forged images may leave traces during the editing process. These traces are in the form of added heterogeneous noise [8–11] by sensors, light intensity variation due to lens aberration [12–14], an inconsistent pattern of color filter array [15, 16], artifacts during JPEG compression [17–19] and many more. These footprints are quite effective for the detection of regions in a forged image but have several shortcomings and challenges. A very important challenge among them is a single scope of application with a single footprint. Other challenges are- these systems do not have any sequence of morphological operations to localize forged regions and do not produce the result in efficient time. A system that always gives perfect results but takes a long time for the output is not useful. Several methods are therefore copy-move forgery detection (CMFD). Most of the block-based methods are not invariant to geometrical transformation. And most of the methods which are keypoint-based are not robust with the mild processing operations (i.e., noise addition, brightness change) [20].
3 Contributions The research aims to develop some efficient approaches for DIF detection and localization. The main contribution of our thesis is discussed in this section. To detect the input image whether an image is forged or authenticated, a machine learning classification technique can be used. The major challenge here is to find the right features from the image to train the classification model. To tackle this major challenge with machine learning, a contribution to digital image splicing detection is done based on the machine learning technique. In this work, a relevant feature set is proposed in [21]. Then, these feature sets are provided to logistic regression (machine learning classification) to classify the feature space into forged or authenticated. In the proposed method, a set of texture and shape features are extracted from a converted
122
A. K. Jaiswal et al. Feature Extraction Classification Model Preprocessing
• • • •
LBP DWT HoG LTE
Logistic Regression
Identify Tampered Images
Fig. 2 Abstract overview of machine learning-based contributed technique
grayscale image; then, a logistic regression classifier is used to train these features. This model is validated on three different datasets, CASIA 1.0, CASIA 2.0 [22], and Columbia [23]. These datasets are publicly available and have different types of images from natural to indoors and from outdoor to texture. Figure 2 represents an abstract overview of this method. Existing methods based on intrinsic footprint as noise suffer from challenges like localization of tampered region in manipulated image, estimation of non-consistent and non-Gaussian noise, and requirement of prior knowledge. To address these issues and reduce detection time, a time-efficient spliced image localization technique is proposed using higher-order statistics [24]. In this technique, four major steps are used to get the forged region in the spliced image. These various steps are, preprocessing of the image, wavelet transformation of the pre-processed image, blockwise noise sample estimation of transformed image, and post-processing of estimated sample to get the output. In a pre-processing step, the color image is converted into a grayscale image. Then, this pre-processed gray-scaled image is transformed into a wavelet domain and the transformed image is divided into 2 × 2 distinct blocks to get noise statistics. Its fourth-order central moment is estimated on a 2 × 2 block of wavelet transformed image. A threshold is calculated using estimated moment distribution. This threshold value is used to classify blocks into forged and original. This binary image is the localized binary image to get knowledge of the tampered region in the spliced image. The presented method is evaluated on three datasets. These datasets are, Columbia Uncompressed Dataset (CUD) [23], CASIA Dataset [22] and IEEE IFS TC-image forensic challenge dataset [25]. Figure 3 represents the overall abstract diagram of the presented method. Another challenge in a similar type of method is confusion between noise and edge, sequence and set of morphological post-processing operation for localization. To overcome these challenges a framework is proposed in [26]. In this method, the image is transformed into a wavelet domain, and from its diagonal component, the standard deviation is calculated on distinct blocks. A difference of standard deviation with lower envelope of calculated noise sample is calculated to reduce the confusion between noise and edge. In this method too, various steps are divided into three different steps. These steps are pre-processing, noise estimation, and post-processing. In the given method, first the image is converted from color to YCbCr. From this converted image, luminance component (Y) is taken for further computation. This Y component is then transformed into a wavelet domain. Then noise estimation is done on the B × B distinct block of the transformed image. To estimate noise, medium
Some Methods for Digital Image Forgery Detection and Localization Fig. 3 Various steps involved in the method [24]
123
Pre-processing
Wavelet Transformaon
Post Processing
Noise stascs esmaon
absolute deviation (MAD) is calculated on each block. All samples are fitted into a lower envelope. Then, based on a threshold value binary image is generated. A set of morphological operations are used to get the localized result. This method is also evaluated on two different publicly available datasets. These datasets are Columbia Uncompressed Dataset (CUD) [23] and IEEE IFS dataset [25]. Figure 4 represents the overall abstract diagram of the presented method. To address issues of block-based and keypoint-based approaches of CMFD, a method is developed using a combination of block-based and keypoint based [27– 29]. To reduce time complexity, a data-driven-based CNN approach is proposed [30]. A multiple scale of input image with a multiple stage of deep learning framework may overcome the challenge of scale-invariant. Thus, in the proposed framework, this concept is used to detect and localize the forged region. In this deep learningbased approach, multiple scales of the input image are taken, and feature space is extracted from these multiple scale images. To extract features encoder and decoder
Color Image
YCbCr
Wavelet
Result
Lower Envelope Fing
MAD Samples
Fig. 4 Overall abstract diagram of the proposed method [26]
124
A. K. Jaiswal et al.
Scale n
Scale 2
Scale 1
blocks are used. In encoder block, image convolution layers are used to extract features and the max-pooling layers are used to half-sample these features space. Similarly, in decoder block, convolution layers are used to extract feature space and up-sampling layers are used to upscale the feature space. Finally, the feature space is given to the sigmoid activation function to localize the result. This proposed method is evaluated on different performance measures on two different publicly available datasets. These datasets are CoMoFoD [31] and CMFD. Figure 5, represents the overall abstract overview of the presented method. The results of the proposed approaches are shown in Table 1. In this table, the first column presents the method name, the second column presents the type of forgery which is detected in the proposed method, and the third column presents the publicly available dataset on which the given methods are validated. The rest of the columns present various performance measures (i.e., precision (p), recall (r), accuracy (a), F1-score (F1), and MCC values). Figure 6 represents a bar graph of the proposed result on different performance measures. Encoder Phase
Decoder Phase
Feature Extracon at Level 1
Feature Extracon at Level 1
Feature Extracon at Level 2
Feature Extracon at Level 2
Feature Extracon at Level n
Feature Extracon at Level n
Classificaon Sigmoid
Output
Input
Fig. 5 Abstract overview of the method [30]
Table 1 Result of the proposed approaches on different performance measures Method
Forgery type
[24]
Spliced
[26] [30]
Spliced Copy-move
Name of dataset
p
r
a
F1
MCC
IFS
0.7493
0.8327
0.9547
0.7435
0.7476
CUD
0.9508
0.9007
0.9651
0.9221
0.9024
IFS
0.6591
0.5281
0.9533
0.6020
–
CUD
0.8129
0.5955
0.8808
0.6812
–
CMFD
0.9892
0.9982
0.9878
0.9936
0.8329
CoMoFoD
0.9863
0.9962
0.9839
0.9909
0.8578
Some Methods for Digital Image Forgery Detection and Localization
125
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 IFS
CUD
IFS
CUD
[25]
CMFD
[27] p
r
a
CoMoFoD [30]
f1
mcc
Fig. 6 Bar graph of the presented contributions result on multiple performance measures and on different datasets
4 Future Research Direction With the advancement of handheld devices, network and image editing applications challenges and issues with image authentication are also increasing day by day. Therefore, research works also need to be channelized which need fresh innovative contributions for digital image authentication. We are currently in an era in which technology advances day by day. Improvement in digital camera design, the use of digital images in healthcare, transportation system, and other domains generate many open research challenges for image authentication that need to be addressed. Some of the open research areas and future directions are briefly discussed in the remaining part of this section. Advancement in mobile phone and digital camera is light field camera. This allows change in the depth of field, which contributes to significant variations in the content of an image. In this case, assumption of variation in the forged image may lead to false achievement. New techniques will be required for the forensic analysis of such images. A mobile phone camera is having multi-lens and multi-sensor in it. While the assumption of the image acquisition pipeline is for single lens and single camera, again, this opens a research challenge in the field of image forensics.
126
A. K. Jaiswal et al.
5 Conclusion The paper gives an idea about methods proposed for DIF detection and localization. These methods are based on intrinsic footprints left during the process of DIF and data-driven. The proposed methods try to fill the research gaps of state-of-the-art methods. Forgery detection methods using intrinsic footprints (given in literature) are semi-automatic techniques and based on manual selection of the forged region. To overcome this limitation, data-driven techniques are proposed. These methods localize the forged region automatically without any manual selection. These works do not only help the society but the government organizations like the cyber-cell to get benefited. Hence, this research work is essential and needs further contribution and development in the field of digital image forensics from researchers and professionals.
References 1. Adobe: Adobe Sensi. https://www.adobe.com/in/sensei.html. Accessed 19 Mar 2018 2. FaceApp: FaceApp-AI Face Editor, https://www.faceapp.com/. Accessed 19 Mar 2018 3. Schetinger V, Oliveira MM, da Silva R, Carvalho TJ (2017) Humans are easily fooled by digital images. Comput Graph 68:142–151. https://doi.org/10.1016/j.cag.2017.08.010 4. Tijdink JK, Verbeke R, Smulders YM (2014) Publication pressure and scientific misconduct in medical scientists. J Empir Res Hum Res Ethics 9:64–71. https://doi.org/10.1177/155626461 4552421 5. Bakiah N, Warif A, Wahid A, Wahab A, Yamani M, Idris I, Ramli R, Salleh R, Shamshirband S (2016) Copy-move forgery detection: survey, challenges and future directions. J Netw Comput Appl 75:259–278. https://doi.org/10.1016/j.jnca.2016.09.008 6. Doty M (2016) Misinformation in 2016: A timeline of fake news (photos) 7. Fridrich J (1999) Methods for tamper detection in digital images. In: Proceedings of workshop on multimedia and security, pp 19–23 8. Zhu N, Li Z (2018) Blind image splicing detection via noise level function. Signal Process: Image Commun 68:181–192. https://doi.org/10.1016/j.image.2018.07.012 9. Lyu S, Pan X, Zhang X (2013) Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vision 110:202–221. https://doi.org/10.1007/s11263-013-0688-y 10. Pan X, Zhang X, Lyu S (2012) Exposing image splicing with inconsistent local noise variances. In: 2012 IEEE international conference on computational photography, ICCP 2012. https://doi. org/10.1109/ICCPhot.2012.6215223 11. Mahdian B, Saic S (2009) Using noise inconsistencies for blind image forensics. Image Vis Comput 27:1497–1503. https://doi.org/10.1016/j.imavis.2009.02.001 12. Riess C, Unberath M, Naderi F, Pfaller S, Stamminger M, Angelopoulou E (2017) Handling multiple materials for exposure of digital forgeries using 2-D lighting environments. Multimed Tools Appl 76:4747–4764. https://doi.org/10.1007/s11042-016-3655-0 13. Carvalho TJD, Riess C, Angelopoulou E, Pedrini H, Rocha ADR (2013) Exposing digital image forgeries by illumination color classification. IEEE Trans Inf Forens Secur 8:1182–1194. https://doi.org/10.1109/TIFS.2013.2265677 14. Yao H, Wang S, Zhao Y, Zhang X (2012) Detecting image forgery using perspective constraints. IEEE Signal Process Lett 19:123–126 15. Ferrara P, Bianchi T, De Rosa A, Piva A (2012) Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Trans Inf Forens Secur 7:1566–1577. https://doi.org/10.1109/ TIFS.2012.2202227
Some Methods for Digital Image Forgery Detection and Localization
127
16. Singh A, Singh G, Singh K (2018) A Markov based image forgery detection approach by analyzing CFA artifacts. Multimed Tools Appl 77:28949–28968. https://doi.org/10.1007/s11 042-018-6075-5 17. Korus P, Huang J (2016) Multi-scale fusion for improved localization of malicious tampering in digital images. IEEE Trans Image Process 25:1312–1326. https://doi.org/10.1109/TIP.2016. 2518870 18. Iakovidou C, Zampoglou M, Papadopoulos S, Kompatsiaris Y (2018) Content-aware detection of JPEG grid inconsistencies for intuitive image forensics. J Vis Commun Image Represent 54:155–170. https://doi.org/10.1016/j.jvcir.2018.05.011 19. Li W, Yuan Y, Yu N (2009) Passive detection of doctored JPEG image via block artifact grid extraction. Signal Process 89:1821–1829. https://doi.org/10.1016/j.sigpro.2009.03.025 20. Zhu Y, Shen X, Chen H (2016) Copy-move forgery detection based on scaled ORB. Multimed Tools Appl. https://doi.org/10.1007/s11042-014-2431-2 21. Jaiswal AK, Srivastava R (2020) A technique for image splicing detection using hybrid feature set. Multimed Tools Appl 79:11837–11860. https://doi.org/10.1007/s11042-019-08480-6 22. Dong J, Wang W. CASIA v1.0 and CASIA v2.0 image splicing dataset. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Corel Image Database 23. Ng T-T, Hsu J, Chang S-F Columbia image splicing detection evaluation dataset 24. Jaiswal AK, Srivastava R (2020) Time-efficient spliced image analysis using higher-order statistics. Mach Vis Appl 31. https://doi.org/10.1007/s00138-020-01107-z 25. IFS T. IEEE IFS-TC Image Forensics Challenge Database. Accessed 12 Mar 2019 26. Jaiswal AK, Srivastava R (2020) Forensic image analysis using inconsistent noise pattern. Pattern Anal Appl. https://doi.org/10.1007/s10044-020-00930-4 27. Jaiswal AK, Gupta D, Srivastava R (2020) Detection of copy-move forgery using hybrid approach of DCT and BRISK. In: 2020 7th international conference on signal processing and integrated networks, SPIN 2020, pp 471–476. https://doi.org/10.1109/SPIN48934.2020. 9071015 28. Mehta V, Jaiswal AK, Srivastava R (2020) Copy-move image forgery detection using DCT and ORB feature set. Springer, Singapore. https://doi.org/10.1007/978-981-15-4451-4_42 29. Jaiswal AK, Srivastava R (2019) Copy-move forgery detection using shift-invariant SWT and block division mean features. Springer, Singapore. https://doi.org/10.1007/978-981-13-26851_28 30. Jaiswal AK, Srivastava R (2021) Detection of copy-move forgery in digital image using multiscale, multi-stage deep learning model. Neural Process Lett 1–6. https://doi.org/10.1109/aim s52415.2021.9466005 31. Tralic D, Zupancic I, Grgic S, Grgic M. CoMoFoD—new database for copy-move forgery detection
A Novel Approach to Visual Linguistics by Assessing Multi-level Language Substructures Monika Arora, Pooja Mudgil, Rajat Kumar, Tarushi Kapoor, Rishabh Gupta, and Ankit Agnihotri
Abstract A VQA system takes a picture and an open-ended query in natural language related to the image as inputs and outputs a response in natural language. In this paper, we aim to comprehend numerous methods devised by researchers and compare their performance on various datasets. This includes several methods including Bilinear models, Attention and Non-Attention models, Multimodal approach, etc. Additionally, we have proposed a Hybrid Co-Attention model that addresses visual and linguistic features simultaneously over different levels to find semantic overlaps between them. We were able to get an accuracy that is similar to state-of-the-art models by altering only the text-level features, i.e., training accuracy of 57.02% and validation accuracy of 42.78%. We also got top 5 accuracy of 93.47% on the training set and 77.67% on the validation dataset. Keywords Computer vision · Visual question answering (VQA) · Natural language processing · Bilinear models · Multimodal approach · Attention-based VQA
M. Arora · P. Mudgil · R. Kumar (B) · T. Kapoor · R. Gupta · A. Agnihotri Department of Information Technology, Bhagwan Parshuram Institute of Technology, GGSIPU, Delhi, India e-mail: [email protected] M. Arora e-mail: [email protected] P. Mudgil e-mail: [email protected] T. Kapoor e-mail: [email protected] R. Gupta e-mail: [email protected] A. Agnihotri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_12
129
130
M. Arora et al.
1 Introduction Deep learning research has made significant strides in many computer vision tasks. In general, it is easy for humans to detect objects in an image, understand the spatial position, understand their properties and their relationships among each other, relate each object in the context of its surroundings, etc. However, this is not true for computers. It is quite recent that computer vision systems answer natural language questions about images. Visual Question Answering (VQA) is a task in which the system is asked a text-based question about a particular image and needs to derive an answer from it. To precisely answer visual-based questions, the computer must understand both the image and question [1]. This research was inspired by VQA’s potential applications, which include generic object identification, holistic scene understanding, interpreting information and narratives from visuals, and creating interactive educational applications that pose questions about images. In this paper, we analyze various methods for VQA and compare these approaches in detail. Next, we introduced a Hybrid Co-Attention model for VQA that looks for semantic overlaps between visual and linguistic information simultaneously at multiple levels. We used co-attention on images with unigram, bigram, and trigram features. To get the most information out of textual features, these features were summed in a bottom-up manner. We achieved an accuracy comparable to the other models by only varying these text-level features.
2 Related Work 2.1 Bilinear Models Bilinear Attention Networks point out that current methods overlook the link among words in the question and objects in the image and recommend that each multimodal channel should be used to create a bilinear co-attention map. BGN [2] talks about an image graph that conveys the characteristics of the identified object to words in the query, which enables having both referring and accurate knowledge at the output nodes, and a question graph that alternates data between these output nodes of the image graph to reinforce the absolute but essential links between objects. These two types of graphs interact, and their resulting model can represent connections and dependencies between objects, leading to the implementation of multi-step reasoning. Bilinear Graph Networks (BGNs) are made up of layers of image and question graphs. MUTAN The authors of [3] present MUTAN, which is a multimodal tensorbased Tucker decomposition that efficiently parametrizes bilinear interactions among image and text descriptions. It concentrates on modeling rich and fair relations between image and textual models. With MUTAN, they check the complexity
A Novel Approach to Visual Linguistics by Assessing Multi-level …
131
while maintaining good and understandable fusion relations. MUTAN resolves the interaction tensor into interpretable elements that provide control over the model’s expressiveness.
2.2 Multimodal Approaches Multimodal learning is a model that represents the combination of several modalities’ representations. They aim to create models that can handle and relate data from multiple modalities. Such multimodal approaches have been used in VQA. UNITER [4] proposed UNiversal Image-TExt Representation (UNITER), which is a big-scale pre-trained model that is used for common multimodal embedding. They used transformer at the core of their model, to control its sophisticated mechanism of self-attention for learning various descriptions. Overall research indicates that masking conditionally and OT-based WRA result in better training process. MOVIE [5] focuses on visual counting. MoVie: Modulated conVolutional bottlenecks use a feed-forward network on a feature map to achieve inference, and reasoning is done implicitly. Symbolic reasoning and counting are done implicitly and holistically in MoVie. The MoVie can be added to common VQA models and enhances the “number” type. Bottom-Up and Top-Down Attention Model [6] proposes a combined mechanism of bottom-up and top-down visual attention. The bottom-up mechanism provides a lot of image areas, that are represented by a combined vector of conv features. They run bottom-up attention with the help of Faster R-CNN. The top-down mechanism uses the context of a specific activity to predict the distribution of attention between image areas. The weighted average of all visual features is the attended feature vector. OSCAR [7] OSCAR (Object-Semantics Aligned Pre-training) is a VLP approach in which the training samples contain a series of word, object tags, and visual features. VQA is executed as a classification task with multiple labels, allocating a soft target score to every textual outputs depending on their relevance to real answers, as well as adjusting the model to reduce the cross-entropy loss obtained from soft target scores. Weakly Supervised Visual-Retriever-Reader [8] The authors present visual information and test a cross-modality model and a text-only caption-driven model on the retriever side. They create two types of visual readers, a classification and an extraction kind, both of which rely on visual data. Overall, their method proposes that weak supervision training on SR tasks may allow the model to lessen its dependency on false linguistic correlations, allowing for stronger generalization abilities. VL-BERT This paper [9] proposes a unified single-steam network that learns generic feature representations from both the visual and language contents simultaneously. They found that using a single-stream model design outperforms a two-stream model design. The output depends on the previous input, next input, and multiple input
132
M. Arora et al.
elements in both directions, the model tries to extract generic inference from the joint visual and linguistic input components.
2.3 Attention-Based Models In Natural Language Processing (NLP), the attention mechanism evolved as an upgrade over the encoder-decoder-based neural machine translation system. This approach, or adaptations of it, is used in visual linguistic tasks like VQA. Hierarchical Question-Image Co-Attention In [1], the authors present a CoAttention Model for VQA which focuses on visual and question attention simultaneously. The three-level hierarchical architecture for processing images and questions apply at (a) the word level, (b) the phrase level, and (c) the question level. A convolutional neural network is applied to extract the data of unigrams, bigrams, and trigrams. Every level of question expression returns a generic question map and image co-attention map. The maps are then iteratively merged for forecasting the overall response distribution. The visual and question features acquired from various levels are consolidated and moved through an FC layer. Hence, they get a softmax distribution for the range of responses. Analyzing the Behavior of Visual Question Answering Models [10] present structured methods for analyzing the behavior of numerous VQA models, gaining insight of the failures, as well as suggesting various avenues for growth. They examine models with attention and without attention. They used LSTM model along with CNN model without attention and with attention. MCAN [11] The proposed Modular Co-Attention Networks (MCAN) architecture for VQA includes two deep co-attention models, called stacking and encoderdecoder, which are made up of several layers that are highly connected to progressively enhance the observed visual and textual features. They create a multimodal fusion model to fuse the multimodal variables and then pass them to a multi-label classifier to predict answers for VQA tasks, after obtaining the attended picture and question information.
2.4 Solving Bias and Compositional Errors The implicit biases that Natural Language Processing (NLP) models learn are one of the most critical challenges they face. VQA systems naturally inherit this issue. Apart from bias, VQA systems are tested based on their compositionality, i.e., being able to answer queries that are related to unseen concepts. GGE [12] The authors present Greedy Gradient Ensemble (GGE) method to eliminate biases of two types. At first, GGE pushes models to over-fit the biased data
A Novel Approach to Visual Linguistics by Assessing Multi-level …
133
distributions, thereby imparting more focus on instances which are difficult to resolve using these models. They investigate linguistic bias in VQA on the basis of visual modeling to check response decisions. N2NMNs An End-to-End Module Networks (N2NMNs) [13] is proposed, which learns reasoning without the use of a parser, i.e., by directly anticipating instancespecific network architectures. Their model learns to construct network architectures while also learning network parameters. Their model predicts network architectures and, after a period of exploration, even enhances expert-designed networks. Deep Compositional QA with Neural Module Networks [14] describes a neural module network (NMN)-based solution to visual question answering. The authors begin with a semantic parser analysis of each question, which is then used to define the basic computational units needed to respond to the question, and links between various modules. All modules in an NMN are self-contained and configurable, enabling different computations for each issue and possibly unseen training. Beyond the NMN, their final solution reads the query using a recurrent network (LSTM), which was demonstrated to be useful in modeling common sense knowledge and dataset biases.
2.5 Miscellaneous Models Other than multimodal approaches, bilinear models, attention-based models, and models reducing the bias and compositional errors, there are more methods that use various approaches toward VQA. RAMEN [15] RAMEN (Recurrent Aggregation of Multimodal Embeddings Network) goes through three stages: early fusion of visual and linguistic characteristics based on the progression of visual cues that are localized spatially with textual attributes, using shared projections to learn bimodal embeddings helps the network’s learning of the visual and textual features’ interrelationships, and the learned bimodal embeddings are recurrently aggregated with bi-directional Gated Recurrent Unit. LoRRA [16] They present an approach which interprets the words in the image, processes the relation of the input question and the image, and hence, gives a response that is based on the text in the image. Their method is known as Look, Read, Reason, and Answer (LoRRA). They attempt to combine OCR with VQA. VC R-CNN The authors of [17] present a method of teaching the unsupervised representation of tasks called the Visual Commonsense region-based Convolutional Neural Network. It is used for powerful tasks such as subtitling and VQA. It may be built on a Recurrent CNN framework and can do a lot of tasks by simply concatenating features (Table 1; Fig. 1).
134
M. Arora et al.
Table 1 Comparative analysis of various VQA methods VQA model
Dataset used
Accuracy achieved (%)
BGN (bilinear graph networks) [2]
VQA v2.0
72.28
MUTAN (multimodal tucker fusion) [3]
MSCOCO
67.36
UNITER (universal image-text representation) [4]
COCO, Visual Genome, Conceptual Captions, SBU Captions
77.85
MoVie (modulated convolutional bottlenecks) [5]
CLEVR
97.42
Bottom-up and top-down attention model [6]
MSCOCO, Visual Genome, VQA v2.0
70.34
OSCAR [7]
VQA v2.0
80.37
Visual-retriever-reader [8]
OK-VQA
66.60
VL-BERT [9]
VQA v2.0
72.22
Hierarchical question-image co-attention [1]
VQA v2.0, MSCOCO
65.40
Analyzing the behavior of visual question answering models [10]
VQA v2.0
56.00
MCAN (modular co-attention networks) [11]
VQA v2.0
70.90
GGE (greedy gradient ensemble) [12]
VQA-CP v2, VQA v2.0
57.32
N2NMNs (end-to-end module networks) [13]
CLEVR
64.90
Deep compositional QA with neural module networks [14]
VQA v2.0
55.10
RAMEN [15]
CLEVR, CLEVR-CoGenT, VQA v2.0
71.02
LoRRA (look, read, reason, and answer) [16]
VQA v2.0, VizWiz
69.21
VC R-CNN [17]
MSCOCO
71.49
Multi-level text co-attention model (proposed model)
VQA v2.0
57.02
3 Work Done We have extended the work done in Hierarchical Co-Attention Models [1] and have proposed a model that increases accuracy and provides state-of-the-art results. Multi-level Language Substructures: In visual reasoning, the language substructures were first mentioned in [1] which learned both question and image attention simultaneously. They co-attended three levels of language structures—word level, phrase level, and sentence level, along with visual features to identify their semantic
A Novel Approach to Visual Linguistics by Assessing Multi-level …
135
Fig. 1 Graphical comparison of accuracy achieved by various VQA methods
overlap. We are proposing a hybrid model whose baseline model is the hierarchical co-attention model [1]. Dataset: The VQA v2.0 dataset was utilized for training and validation. We have used the 1000 most frequent classes for the training of the model. Further, a 10% split of the dataset (combination of training and validation samples) was utilized for evaluation. Notation: Given an image, its feature vectors are represented as V = {v1 , v2 , . . . , vn } and question vectors are represented as Q = {q1 , q2 , . . . , qn }. The image feature vector representations are obtained using the last layer of the VGG19 model. The question features are represented as a collection of word embeddings generated using an embedding layer on the tokenized words. Model: Using the tokenized word sequences, we generate word embeddings for all the words in the question where each word is represented as a vector qi . Next, 1-D convolution operations are applied to the sequence of word embeddings Q to generate unigram, bigram, and trigram representations using varying kernel sizes, i.e., 1, 2, and 3. Then, we concatenate all these to generate the phrase-level features and, subsequently, the sentence-level features using recurrent Bidirectional LSTM layer (Fig. 2). At this point, our model differs from [1] which uses the phrase-, word-, and sentence-level features with image features, V, in the co-attention layer to obtain Word (Qw ), Phrase (Qp ), and Sentence (Qs ) attentions on Questions and Image features. In our model, we feed all the multi-level substructures (unigrams, bigrams, and trigrams) in the co-attention layer instead of word embeddings, along with the image to generate multi-level attention features on the Question and the Image for all three levels ((qu , vu ), (qb , vb ), (qt , v,)). For the co-attention layer, we have followed the parallel attention mechanism mentioned in [1]. First, we calculated the affinity matrix from image and question
136
M. Arora et al.
Fig. 2 Process flow diagram
features to find their semantic similarities. This is, in turn, used alternatively with both image and question features to find their respective attention features aq and av . The attention features signify localized attention to atomic questions and image features, i.e., image regions and words. For the classification task, we combine all the multi-level image-attended question features and question their respective question-attended image features in a bottom-up manner, where at each level we use fully connected layers activated using tanh activation function along with a dropout layer. The resulting features from the last level are finally passed through a softmax-activated layer to obtain the prediction. The above process is followed as: v = tanh(W (qu + vu )), v = tanh(W [qb + vb ], v) v = tanh(W [qt + vt ], v), v = tanh W q p + v p , v
A Novel Approach to Visual Linguistics by Assessing Multi-level …
v = tanh(W [qs + vs ], v),
137
P = softmax(W v)
Result: The model was trained for 27 iterations with learning rate = 0.0001 and patience 5. Early stopping relied on validation accuracy. We achieved an absolute accuracy of 57.02% on training data and 42.78% on the validation data. We used a metric called TopKAccuracy, where we measure the percentage of the correct answer existing in the top k predicted answers. We got top 5 accuracy of 93.47% on the training dataset and 77.67% on the validation dataset.
4 Conclusion Through this paper, we discussed Multimodal approaches, Bilinear models, Attention-based models, and also various approaches that reduce bias and compositionality issues associated with VQA. Many approaches have performed well on natural datasets like MSCOCO, Visual Genome, VQA v2.0, etc., giving accuracies ranging from 50 to 80%. It has also been seen that models like UNITER [4] and RAMEN [15] perform exceptionally well on synthetic datasets like CLEVR and CLEVR-CoGenT achieving accuracies above 90%. Finally, we proposed a hybrid model with the hierarchical co-attention model [1] as its foundation model. In contrast to [1], which utilized co-attention on images with single word-level features, we employed co-attention on images with unigram, bigram, and trigram features. These features were aggregated in a bottom-up manner to gain maximum information from textual features. By modifying solely the text-level features, we were able to attain an accuracy that is at par with the other models.
5 Future Work Currently, we have focused on the effect of multiple-level linguistic features on solving VQA problems and how the results are affected when we attend each multilevel feature separately with an image. We can also get better inference from the image data by using various techniques like using an object-labeled image dataset like GQA or using techniques mentioned in VL-BERT [9] model. We will be investigating their effects in the future.
138
M. Arora et al.
References 1. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. Advances in neural information processing systems, p 29 2. Guo D, Xu C, Tao D (2021) Bilinear graph networks for visual question answering. IEEE Trans Neural Netw Learn Syst 3. Ben-Younes H, Cadene R, Cord M, Thome N (2017) Mutan: multimodal tucker fusion for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2612–2620 4. Chen YC, Li L, Yu L, El Kholy A, Ahmed F, Gan Z, Cheng Y, Liu J (2020) Uniter: universal image-text representation learning. In: European conference on computer vision. Springer, Cham, pp 104–120 5. Nguyen DK, Goswami V, Chen X (2020) Movie: revisiting modulated convolutions for visual counting and beyond. arXiv preprint arXiv:2004.11883 6. Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086 7. Li X, Yin X, Li C, Zhang P, Hu X, Zhang L, Wang L, Hu H, Dong L, Wei F, Choi Y (2020) Oscar: object-semantics aligned pre-training for vision-language tasks. In: European conference on computer vision. Springer, Cham, pp 121–137 8. Luo M, Zeng Y, Banerjee P, Baral C (2021) Weakly-supervised visual-retriever-reader for knowledge-based question answering. arXiv preprint arXiv:2109.04014 9. Su W, Zhu X, Cao Y, Li B, Lu L, Wei F, Dai J (2019) Vl-Bert: pre-training of generic visuallinguistic representations. arXiv preprint arXiv:1908.08530 10. Agrawal A, Batra D, Parikh Dp (2016) Analyzing the behavior of visual question answering models. arXiv preprint arXiv:1606.07356 11. Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6281–6290 12. Han X, Wang S, Su C, Huang Q, Tian Q (2021) Greedy gradient ensemble for robust visual question answering. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1584–1593 13. Hu R, Andreas J, Rohrbach M, Darrell T, Saenko K (2017) Learning to reason: end-to-end module networks for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 804–813 14. Andreas J, Rohrbach M, Darrell T, Klein D (2016) Deep compositional question answering with neural module networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48 15. Gamage BMSV, Hong LC (2021) Improved RAMEN: towards domain generalization for visual question answering. arXiv preprint arXiv:1903.00366 16. Singh A, Natarajan V, Shah M, Jiang Y, Chen X, Batra D, Parikh D, Rohrbach M (2019) Towards VQA models that can read. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8317–8326 17. Wang T, Huang J, Zhang H, Sun Q (2020) Visual commonsense R-CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10760–10770
Sentiment Analysis on Amazon Product Review: A Comparative Study Shivani Tufchi, Ashima Yadav, Vikash Kumar Rai, and Avishek Banerjee
Abstract Social media has evolved into a highly strong means of communication between individuals, allowing them to express their opinions and ideas in each conversation or article, resulting in a massive volume of unstructured data. Organizations must process and research the data as well as gather business information in order to analyze it. In this article, machine learning models such as Multinomial Naive Bayes, Logit Regression, Linear Support Vector Classifier SVC, and Multinomial Random Forest are used to analyze Amazon’s product reviews. We conducted a comparison examination of these models by implementing them and deciding which model detects the polarity of sentiments with the greatest accuracy, and we discovered that Logit Regression and Linear SVC both perform well, with 87.3% and 87.4% accuracy, respectively. To summarize, the purpose of this study is to do a comparison analysis in order for future researchers to choose the best algorithm for their research. Keywords Machine learning · Opinion mining · Sentiment detection · Product reviews · Classification
1 Introduction Online retail which is also popularly called as commerce contributes to global trade with sales of around 4.9 trillion USD annually. Now, it goes without saying that its backbone is nothing but the Internet. People buy and sell products through a plethora of e-commerce websites, but experience, workflows, and user(s) expectations stay similar. Having mentioned this, it is also worth noting that product reviews have a very strong emotional impact on sales, especially for the future buyers of the products S. Tufchi (B) · A. Yadav · V. K. Rai Bennett University, Greater Noida, Uttar Pradesh, India e-mail: [email protected] A. Yadav e-mail: [email protected] A. Banerjee CSBS, Asansol Engineering College, Asansol, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_13
139
140
S. Tufchi et al.
across various categories [1]. Sellers taking the decision to continue production and enhancing features based upon the reviews received is also an effective and common practice. Therefore, analyzing data from these customer’s reviews and understanding the sentiments or expectations from the large datasets of reviews is a very logical explanation. Also, in the era of machine learning, by reading reviews we can realize a product’s receptiveness; this is very much possible as well [2]. Sentiment Analysis is an evident and applicable way to process Natural Language. The same can be applied to derive the customer’s attitude toward browsing and buying on the online shopping portals [3]. It is also worth noting that the development of social media and its popularity are increasing day by day. As people find themselves online, hence sharing with others on the buying experiences also creates attraction toward the products [4]. We never know what you are doing well or wrong without sentiment analysis. We never comprehend why it is seen as right or bad, to be more explicit. We have no choice but to make assumptions. And those assumptions are frequently incorrect. Sentiment analysis, on the other hand, provides a wealth of information that can be used to dig deeper and acquire a better picture of where your brand is succeeding and where it needs to rethink messaging. Positive, negative, neutral, or fine-grained classifications are applied to the sentiments or opinions expressed in the data (most positive, least positive, most negative, and least negative). Sentences can also be represented using text, movies, audio, emoticons, and graphics [5]. The use of sentiment categorization techniques to automatically collect multiple view points from numerous platforms helps researchers and decision-makers better comprehend opinions and client satisfaction. This work uses Random Forest, Logit Regression, Linear SVC, and Multinomial NB to establish how sentiments can be classified [6].
2 Background and Literature Survey In this paper, we have applied supervised learning algorithms to correctly predict the polarity of the sentiments, and these are following methods given in brief.
2.1 Random Forest or Decision Tree The random forest approach is one of the most powerful classification algorithms, capable of classifying large amounts of data accurately. It is a classification and regression ensemble learning method that constructs a number of decision trees during training and outputs a class that is the mode of the individual trees’ output classes [7] (Fig. 1). Decision trees are simple to learn and are used by both executives and researchers. It can simply be used by people who do not have a solid mathematical background.
Sentiment Analysis on Amazon Product Review: A Comparative Study
141
Fig. 1 Random forest or decision tree architecture
2.2 Linear SVC The most appropriate machine learning method is Linear SVC. This is mainly a technique for classifying linear issues. The aim of this classifier as known as Support Vector Classifier is to couple the data and give back the hyperplane that best classifies the data. After the hyperplane, for prophecy, we have to use some characteristics. This makes it possible to use certain methods and techniques in any way. Linear SVC has adjustable execution of SVC with the linear kernel. In the sklearn testimonial, the process used in Linear SVC is more accurate [8].
2.3 Multinomial Naive Bayes NB classifier is a supervised machine learning which focusses on the classification of text. It assesses the likelihood of each tag for a given sample and returns the tag with the highest probability.
2.4 Logit or Logistic Regression It is used to determine the output whether there are one or more independent variables. The output value can be 0 or 1 which is in the binary form [2]. It determines the relationship between the dependent variable and one or more independent variables. A brief literature survey of research papers is discussed below. The goal is to describe what is known about the proposed field of study in order to provide context and rationale for the research. Amrani et al. [7] presented numerous strategies for determining the polarity of tweets. Naive Bayes and Random Forest were the approaches used. The Naïve Bayes classifier produced the best results. The Naive Bayes classifier had an accuracy of 81.45%, while the Random Forest classifier had an accuracy of 78.65%. Khanvilkar and Vora
142
S. Tufchi et al.
[9] discussed that SVM and Random Forest machine learning algorithms will help to improve Sentiment analysis for Product Recommendation utilizing Multi-class classification. Ahuja et al. [10] discovered that logistic regression gave the best sentiment predictions by giving the highest output for all four comparison parameters, namely accuracy, recall, precision, and F1-score, as well as for both feature extraction methods, namely N-Gram and word-level TF-IDF. Dey et al. [6] compared SVM and Naive Bayes classifiers to examine the polarization of sentiment in Amazon product reviews and discovered that the support vector machine can polarize Amazon product reviews with a higher accuracy rate. Bhatt et al. [11] established their new methodology that integrates existing sentiment analysis approaches, and as a result, they were able to improve the system’s accuracy, which in turn provided correct reviews to the user. Jagtap et al. [12] looked at a solution for sentiment classification at a fine-grained level, namely the phrase level, where the polarity of the sentence can be classified into three categories: positive, negative, and neutral. Tsai and Lin [13] did an empirical analysis of the performance of three typical OCC classifiers, including OCC SVM, and discovered that for most datasets, the majority class does not increase the ultimate performance of the OCC classifiers. Haque et al. [2] observed that in most circumstances, ten folding produced improved accuracy, while Support Vector Machine (SVM) provided the best classifying results, with accuracy over 90% with the F1 measure, precision, and recall over 90%. Syamala and Nalini [14] in their paper defined a method to maximize classification accuracy for Amazon product reviews sentiment prediction. Experimental results reveal that the proposed model outperforms existing classification techniques by combining three decision tree algorithms: Random tree, Hoeffding tree, and AdaBoost + Random tree. On real-time training data, the suggested model beats conventional classification algorithms by about 12%.
3 Proposed Approach As displayed in Fig. 2, we have used this approach “predict the polarity of the sentiments” as negative or positive.
3.1 Preprocessing Techniques These are some of the techniques which have been used for preprocessing of data so that it can be used efficiently for training purpose.
Sentiment Analysis on Amazon Product Review: A Comparative Study
143
Fig. 2 Approach Undertaken
Tokenization: The process of dividing a unit of successive characters into identifiers namely keywords, words, symbols, phrases, and other elements are called tokenization. Therefore, we can consider tokens as words, phrases, or complete sentences. Also, certain characters such as punctuation marks are eliminated in the token-making phase. Tokens serve as inputs for certain processes such as text analysis and digging [6]. Lemmatization: The process of lemmatization is the reduction of a word to its simplest form. It helps to improve accuracy by inserting terms in the same form that have the same meaning but are spelled differently in the data collection. Elimination of Stop words: They are words that are often used yet have no bearing on the data. They also do not contribute anything to a deeper understanding of the sentiment or play a role in analyzing it. In order to improve the accuracy of the test, these terms are often overlooked. There are different stop names in different formats depending on location, language, and more. However, there are a few exceptions to the English form [6]. Cross-validation Technique (K-FOLD METHOD): This method is used for model validation that divides the dataset into k-folds (one test for each training). The model repeats the structure and test for each wrap. Ultimately, the total inaccuracy of the k test is intended [15]. In the implementation part, we used fivefold cross sectional validation to measure model performance in the dataset.
4 Experimental Setup The experimentation started with surveying different review papers. Analysis of generally used classification algorithms such as Multinomial NB, Logit Regression, Linear Support Vector Machine, and Decision Tree Forest. The steps of experimental approach are as follows: First, we have collected data from amazon movie reviews from http://www.mediafire.com/file/gg77787ihf1dht6/product_reviews.json. Then, we have fetched data for preprocessing. During preprocessing of data, we have done few steps to preprocess. First, we have done tokenization of data; second, we have done lemmatizations on data, and then, stop words have been removed from the
144
S. Tufchi et al.
Table 1 Comparison table of different methods by different authors References
Prediction method
Accuracy (%)
Amrani et al. [7]
Naïve Bayes (NB)
81.43
Haque et al. [2]
Logit regression (LR)
87.84
Urkude et al. [16]
Logit regression (LR)
83.89
Proposed method
LR, linear SVC, multinomial NB
87.94, 87.48, 85.05
dataset so that data should be ready for training and testing. After preprocessing of the data, training of classifiers is being done so that it performs well while predicting the sentiments. After training, model started giving predictions using the evaluation metrics. F1-score, Accuracy, Precision, Recall, and confusion matrix as an evaluation measure.
5 Experimental Results 5.1 Comparison of Various Results Obtained by the Authors In Table 1, we have shown the comparison of machine learning methods by different authors in their papers and we found that our proposed method is giving more accuracy and this research will authors in future to use the best approach in their work.
5.2 Comparison of the Classifiers Implemented We have taken the final database which is generated arbitrarily by taking 55,000 selected samples. In total, there are total 1,000,000 (1 million) samples in the dataset. This paper includes a comparative study of the performance of the 4 Modeling Machine Learning Model. The results that we have achieved are given in Fig. 3.
5.3 Comparison of Our Proposed Machine Learning Classifiers Table 2 compares the outcomes of all 4 algorithms in a numeric format to correlate them correctly. Hence, we get logistic regression and linear SVC as the best one that achieved the highest recall 0.88% and lowest Random Forest value 0.81% among all four algorithms.
Sentiment Analysis on Amazon Product Review: A Comparative Study
145
Fig. 3 Comparison on the basis of selected features
Table 2 Comparison of various classifiers S. No.
Classifier used
Accuracy
Precision
Recall
F1-score
Roc Auc
1
Multinomial NB
0.850708
0.850622
0.854875
0.852743
0.85066
2
Logit regression
0.873964
0.869861
0.882822
0.876294
0.873863
3
Linear SVC
0.87484
0.870509
0.883968
0.877187
0.874735
4
Random forest
0.813517
0.787263
0.864922
0.824268
0.81293
5.4 Performance of Classifiers on the Basis of Evaluation Measures Taken We have used these charts for showing predictions on the model which is trained, and these descriptions are given in Fig. 4. Figure 4 shows logistic regression and Linear SVC in terms of accuracy of all models on the given dataset. So, it is clearly visible that these classifiers are giving highest accuracy among all. Figure 5 shows logistic regression and Linear SVC in terms of precision of all models on the given dataset. So, it is clearly visible that these classifiers are giving highest accuracy among all. Figure 6 shows logistic regression and Linear SVC in terms of F1-score of all models on the given dataset. So, it is clearly visible that these classifiers are giving highest accuracy among all.
146
S. Tufchi et al.
Fig. 4 Prediction in terms of accuracy
Fig. 5 Prediction in terms of precision
Fig. 6 Prediction in terms of F1-Score
Figure 7 shows logistic regression and Linear SVC in terms of “Recall of all models” on the given dataset. So, it is clearly visible to get that these classifiers are giving highest accuracy among all.
Sentiment Analysis on Amazon Product Review: A Comparative Study
147
Fig. 7 Prediction in terms of recall
Fig. 8 Prediction in terms of ROC AUC
Figure 8 shows logistic regression and Linear SVC in terms of ROC AUC of all models on the given dataset. So, it is clearly visible that these classifiers are giving highest accuracy among all (Fig. 9).
6 Conclusion and Future Work The core of sentiment analysis is sentiment classification. Sentiment analysis is the technique of obtaining information from people’s feelings, views, and feelings about beings, events, and their characteristics. In this paper, we used various techniques to detect the polarity of the reviews. The classification techniques implemented are Multinomial Naïve Bayes, Linear SVC, Logit Regression, and Random decision Forest. Since only four classification algorithms were implemented in this paper and by seeing the results, we can clearly see that Logit Regression and Linear SVC both
148
S. Tufchi et al.
Fig. 9 Overall comparison of the evaluation measures
perform well, with 87.3 and 87.4% accuracy. The limitation of this research is that we have done analysis using only four machine learning models. In future, we will focus on implementing more machine learning models as well as deep learning models so that we can predict the sentiment polarity more accurately. In future, we may focus on testing other machine learning algorithms, or we may develop mixed methods to obtain increased accuracy in our results. Finding the polarity of updates can help in a variety of domains. Smart and efficient models can be developed that can assist users with a complete review of products and services, so that users need not review any product’s individual reviews, they can be able to make direct commitments on the basis of outcomes provided by expert programs.
References 1. Asghar MZ, Khan A, Ahmad S, Kundi FM (2014) A review of feature extraction in sentiment analysis. J Basic Appl Sci Res 4(3):181–186. Van Thai D, Son L, Tien PV, Anh N, Ngoc Anh NT (2019) Prediction car prices using quantify qualitative data and knowledge-based system. In: 11th international conference on knowledge and systems engineering (KSE), pp 1–5 2. Haque TU, Saber NN, Shah FM (2018) Sentiment analysis on large scale Amazon product reviews. In: 2018 IEEE international conference on innovative research and development (ICIRD). IEEE, pp 1–6 3. Bhavitha BK, Rodrigues AP, Chiplunkar NN (2017) Comparative study of machine learning techniques in sentimental analysis. In: 2017 international conference on inventive communication and computational technologies (ICICCT). IEEE, pp 216–221 4. Khanvilkar G, Vora D (2018) Sentiment analysis for product recommendation using random forest. Int J Eng Technol 7(3):87–89 5. Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385 6. Dey S, Wasif S, Tonmoy DS, Sultana S, Sarkar J, Dey M (2020) A comparative study of support vector machine and Naive Bayes classifier for sentiment analysis on Amazon product reviews. In: 2020 international conference on contemporary computing and applications (IC3A). IEEE, pp 217–220
Sentiment Analysis on Amazon Product Review: A Comparative Study
149
7. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci 127:511–520 8. Javed Awan M, Mohd Rahim MS, Salim N, Mohammed MA, Garcia-Zapirain B, Abdulkareem KH (2021) Efficient detection of knee anterior cruciate ligament from magnetic resonance imaging using deep learning approach. Diagnostics 11(1):105 9. Khanvilkar G, Vora D (2019) Smart recommendation system based on product reviews using Random Forest. In: 2019 international conference on nascent technologies in engineering (ICNTE). IEEE, pp 1–9 10. Ahuja R, Chug A, Kohli S, Gupta S, Ahuja P (2019) The impact of features extraction on the sentiment analysis. Procedia Comput Sci 152:341–348 11. Bhatt A, Patel A, Chheda H, Gawande K (2015) Amazon review classification and sentiment analysis. Int J Comput Sci Inf Technol 6(6):5107–5110 12. Jagtap VS, Pawar K (2013) Analysis of different approaches to sentence-level sentiment classification. Int J Sci Eng Technol 2(3):164–170 13. Tsai CF, Lin WC (2021) Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9:13717–13726 14. Syamala M, Nalini NJ (2020) A filter based improved decision tree sentiment classification model for real-time Amazon product review data. Int J Intell Eng Syst 13(1):191–202 15. Ilyas H, Ali S, Ponum M, Hasan O, Mahmood MT, Iftikhar M, Malik MH (2021) Chronic kidney disease diagnosis using decision tree algorithms. BMC Nephrol 22(1):1–11 16. Urkude SV, Urkude VR, Kumar CS (2021) Comparative analysis on machine learning techniques: a case study on Amazon product
GAER-UWSN: Genetic Algorithm-Based Energy-Efficient Routing Protocols in Underwater Wireless Sensor Networks Mohit Sajwan, Shivam Bhatt, Kanav Arora, and Simranjit Singh
Abstract Underwater wireless sensor networks (UWSNs) are composed of numerous underwater wireless sensor nodes dispersed in the marine environment. Since the UWSN uses limited battery capacity and is difficult to replace or charge, energy efficiency becomes a difficult design issue. Earlier research found that multipath routing can help the UWSN save energy. But optimal path selection is a nondeterministic polynomial-time (NP) hard optimization issue that can be handled using metaheuristic algorithms. The GAER-UWSN technique is a new, improved metaheuristicsbased multihop multipath routing protocol for underwater wireless sensor networks. The GAER-UWSN technique chooses cluster optimal pathways that lead to the least energy consumption. The proposed technique uses the most fundamental process: genetic algorithm-based multipath routing. These characteristics include residual energy and inter-neighboring node distance. Deriving a fitness function from five factors: residual energy, transmission energy, aggregation energy, distance, average energy, and total possible path. The GAER-UWSN technology increases the UWSN’s energy efficiency and lifetime. A number of simulations were run to ensure the GAER-UWSN technique’s high results. The findings showed the GAER-UWSN technique outperformed in many metrics such as network lifetime and energy consumption. Keywords Routing protocol · Heterogeneous nodes · Node deployment · Genetic algorithm · HWSNs · Network lifetime
M. Sajwan (B) · S. Bhatt · K. Arora · S. Singh Bennett University, Greater Noida 201310, UP, India e-mail: [email protected] S. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_14
151
152
M. Sajwan et al.
1 Introduction Water comprises more than 70% of the earth’s total surface, while we have explored very little of it [1]. There are many techniques for exploration of oceans like submarines, sonar, submersibles, ROVs/AUVs, but in all of them, underwater wireless sensor network(UWSN) is the most efficient and cost-effective technique. UWSN is a collection of interconnected sensor nodes forming a network which are deployed randomly or manually to collect data and transmit/forward it to the surface node also called as sink node [2]. Sensor nodes are cheap and require less computation energy. A node comprises of four basic components. These include a processor, a transceiver, a sensor, and a power source. Sensor nodes also comprise of application-dependent additional components such as a power generator, location finding system(GPS), and a mobilizer [3]. Sensor nodes are restricted by their limitations such as compute power, storage, and battery capacity, hence reducing power consumption being the most critical issue for an efficient UWSN model. The UWSNs are capable of fulfilling important tasks in places that are harsh and are deep underwater where human intervention is not possible such as unmanned underwater exploration, localized and precise knowledge acquisition, tetherless underwater networking, and large-scale underwater monitoring [4]. Nodes can be used to collect data such as temperature and pressure or to predict environmental events (Fig. 1). Information can be transmitted in two ways either directly or indirectly. In direct transmission, the source/origin node sends data to the sink/receiving node directly [5]. Indirect transmission is preferred to use when the distance between the sink node and source node is greater than the transmission range, so intermediate nodes also called forwarding nodes are used to transmit/pass the information to the sink node [6]. The widely used approaches for transmission of data are acoustic transmission and radio transmission. Radio waves are electromagnetic waves and can travel more
Fig. 1 Depiction of UWSNs
GAER-UWSN: Genetic Algorithm-Based Energy …
153
distance as compared to acoustic waves which are mechanical/sound waves. In denser mediums like water, acoustic waves can traverse long distances due to their low relative absorption, but in some models, use of radio frequencies (RF) is also efficient as the propagation speed of acoustic waves is less than radio waves [7]. The vital problem to be taken into consideration from the communications factor of view is routing. There are various routing protocols like flooding-based, multipathbased, cluster-based, etc. [8]. Recently, several routing protocols have been designed in the literature with sole purpose of providing an energy-efficient protocol. However, all of them are facing several problems like loss of data packets due to medium absorption, high latency, desynchronization, load imbalance, and high energy consumption. In this paper, we define a genetic algorithm-based energy-efficient multipath routing protocol. Using this algorithm, we can determine the most suitable path using nature-inspired genetic algorithms. Genetic algorithms are fast and efficient models for optimization problems of an objective function under given constraints. This paper contributes to the field of underwater networks by presenting an energy-efficient model based on multipath routing with genetic algorithms.
2 Related Work UWSNs are a sort of decentralized network comprised of independent nodes that gather and analyze information before transmitting it to a receiving node through wireless links. Traditional routing methods do not take into account energy-limited nodes, which has a substantial influence on overall power dissipation. As a result, new and efficient routing techniques for UWSNs are required. In UWSNs, data transmission is the activity of sending discovered data from the transmitting/source node to the receiving/sink node. The widely used approaches for transmission of data are acoustic transmission and radio transmission. Radio waves and optical waves cannot communicate at every point in the ocean. Considering the entire limitations [9], UWSNs can only utilize acoustic signal because in dense environment, the acoustic signals can traverse long distances due to their low relative absorption. The vital problem to be taken into consideration from the communications factor of view is routing. UWSNs necessitate a routing technique to identify the way for transmitting data along the path from source node to receiving node, with intermediate nodes, also known as forwarding nodes, relaying data packets [10]. The routing protocols are mainly classified into hierarchical protocol also known as clustering protocol and multipath routing protocol. In a hierarchical network [11], data acquired by sensor nodes is relayed to the cluster head (CH), causing a load imbalance on the CH. CH removes redundancy from the aggregated data gathered by the child nodes and sends only one packet to the sink/receiving node. If the CH fails to send the data to the receiving or sink node, all of the data acquired by the child nodes is lost.
154
M. Sajwan et al.
Multipath routing transmits data via several pathways, which has numerous advantages such as load balancing, reliability, and fault tolerance in addition to application specific QoS [12], alleviating the network congestion, smoothing out the traffic, reducing the frequency of route discoveries, enhancing the privacy of the information being sent, extending the lifetime of the system in WSNs by distributing more homogeneously the power consumption among its nodes [13]. Mutlipath routing determines the most efficient path from several possible paths based on various factors such as load on individual nodes and energy consumption on each path [14, 15]. Challenges S. No. Description 1 The equipment, deployment, and maintenance of underwater sensor nodes are a long and costly process 2 Battery efficiency is a critical issue as the battery power of the nodes has a low capacity, and they cannot be recharged with solar power in an efficient way because technologies to convert underwater photons to electrical energies are very expensive and unreliable 3 Nodes get displaced due to the motion of underwater currents 4 Natural factors like pressure and temperature underwater make the sensor nodes prone to corrosion over time 5 Low bandwidth, high attenuation, multipath propagation, and big propagation delay because of the low speed of acoustic waves underneath the water are the major limiting factors and challenges of acoustic communication 6 Network layers of a UWSN send information by delivering data packets wirelessly. This information can be eavesdropped or changed by intruders. Generally used attacks are Sybil attack, warmhole, and identity replication attack
3 System Model The following models have been discussed in detail: • Network Model: In this paper, we created a rectangle cube terrain and randomly placed nodes along the (x, y, z) axis. Each node N has a vertex V (corresponding to a neighboring node) and an edge E (i.e., link information). • Network Lifetime Model: This model considers the network lifespan, or the time between the start of transmission and the first node’s death (in this case). First node dead (FND) events are detrimental to network performance in many applications and must be avoided. • Energy Model: All components of a sensor node except transmission utilize less and constant energy (due to distance among the transmitting nodes and receiving nodes). We used a first order radio model, where k bits are sent between nodes di apart. A sensor node’s energy transmission and reception are equal to: E Transmission (k, di) = E elec ∗ k + E freespacemodel ∗ k ∗ di2 if (di < d0 )
(1)
GAER-UWSN: Genetic Algorithm-Based Energy …
155
E Transmission (k, di) = E elec ∗ k + E multipathmodelp ∗ k ∗ di4 if (distance ≥ d0 ) (2) And energy consumed by node j is given as E Reception (k, distance) = E electronicsenergy ∗ k
(3)
Assumptions 1. Sensor nodes are stationary and are randomly distributed in the ocean. 2. Through various localization approaches, the sensor nodes are aware of their positions. 3. The circuitry of the sensor node consumes the same amount of energy during packet transmission and reception. 4. The nodes’ batteries are irreplaceable once they are deployed. 5. Depending on how far away the receiver is, nodes can alter their transmission level. 6. An aggregation function is performed by each node, which receives data from neighboring nodes. 7. The channel of communication is trustworthy and error-free.
4 Problem Description 4.1 Overview of Genetic Algorithm GA is based on natural evolution. Each generation of artificial evolution seeks useful modifications to address the given challenge [2]. A population-sizing equation is also devised. With GA, you can do a global optimization search in a complicated area with ease. Weakly coupled networks (UWSNs) are often created and tested. GA is a multi-solution search method [16]. When using a GA to solve a problem, three procedures have an impact on the algorithm’s performance: (1) the fitness function; (2) the representation of persons; and (3) the values of GA parameters. The GA search process has two significant challenges: efficiency and reliability. All three GA procedures rely on chance. While GA requires randomization, it introduces numerous needless and worse solutions. Inefficient search leads to low-quality solutions.
4.2 GAER-UWSN Algorithm GAER-UWSN is a self-sustaining routing protocol that saves energy by not communicating over long lines. Each node simply keeps information about its neighbors, lowering the memory requirements. Initially, each network node delivers its emphNbr
156
M. Sajwan et al.
table data to the sink via multi-casting rather than flooding. The initiator node begins relaying neighbor information to the base station. To minimize network loops, each sensor node will only send the emphNbr table information packet once to each source node. Each node keeps a received neighbor list. So it minimizes network traffic and saves energy. The base station uses each sensor node’s state to elect cluster heads and determine the path between them. Once the sink node has the complete information about the network (i.e., coordinates of each node), the sink node will apply the genetic algorithm where it follows the following phase as mentioned in Algorithm 1: Phase 1: Network setup The GA-based optimization technique determines the ideal multipath for each underwater sensor node toward the sink. The GA algorithm was developed for the entire region and finds the optimal path in the following steps. GA starts the chromosome population with n chromosomes, i.e., P = (p1, p2, p3, pn). This population defines the possible number of paths among nodes a node and sink nodes can take, given the 3D coordinates of each node and sink. Phase 2: Fitness function After network setup phase, if nodes fail to reach a consensus on the network setup phase within a predetermined time period, they will calculate the network life time for a WSN as stated in line number 20. If nodes reach a consensus on the network setup phase within a predetermined time period, they will calculate the objective function as stated in equation no. 4. Furthermore, the value of the objective function will be calculated for each iteration of the program. After the ith iteration, the iteration with the lower value is more suitable, which is why we used a tournament selection algorithm to choose the most suitable iteration. We have, at long last, made use of mutation. Once all stages have been completed, the best-suited past is chosen, and finally, the iteration with the lowest fitness value is chosen. It is the sink node that propagates the same information to the rest of the networks. A sensor network that is under the water will be able to last longer as well
objectivei = 1/total possible path
path (E tr + E re + E ag ) ∗ pathki − E avg
(4)
k=1
5 Comparative Analysis and Simulation Results The proposed algorithm is simulated in MATLAB. The simulation yields findings in terms of network lifetime and residual energy as depicted in Fig. 2. All the simulation parameters are taken from [5] comprises the simulation parameters. Figure 2a depicts the alive node statistics against rounds, where GAER-UWSN outperforms others in first node dead, half node death, and last node dead statistics. While Fig. 2b depicts the energy consumption of the routing protocol over a period of time, less energy-based
GAER-UWSN: Genetic Algorithm-Based Energy …
157
Algorithm 1 GA-based routing algorithm for UWSN 1: declare s = Number of homogeneous sensor nodes 2: declare e = Energy of homogeneous nodes 3: declare i = Number of iteration 4: declare n = Population size 5: declare t = Terrain size 6: declare (x, y, z) = Coordinate position of sensor nodes 7: declare (sink X, sinkY, sink Z ) = Coordinate position of sink node 8: declare (cr ) = Crossover rate 9: declare (mr ) = Mutation rate 10: initiall i f etime1 = lifetime1 11: saturation c ount = 0 Generate initial population 12: for z range ∈ [1, i] do 13: initialize lifetimes a vector of size population size to -1 14: initialize fitness a vector of size population size to -1 15: initialize objective a vector of size population size to -1 16: for i range ∈ [1, n] do 17: for j range ∈ [1, h] do 18: calculate energy 19: end for 20: [lifetime2,en3] = Routing Protocol(x,y,z,energy,sinkX,sinkY,sinkZ) find lifetime and energy of this generation path 21: objectivei = 1/totalpossiblepath k=1 (E tr + Er e + E ag ) ∗ path ki − E avg 22: Fitness = 1/Objective calculate Fitness 23: end for 24: update best lifetime 25: cr ossoverc ount = r ound(cr ∗ n) TOURNAMENT SELECTION 26: /*Crossover rate cr decides the number of crossover operations*/ 27: mutation c ount = r ound(mr ∗ n) MUTATION 28: Mutation rate mr decides the number of crossover operations 29: update homogeneous nodes 30: calculate mean lifetime 31: if mean(li f etimes) > pr evm ean then 32: update saturation point 33: end if 34: if saturation p oint > 5 then 35: break 36: end if 37: updatepr evm ean=mean(lifetime) 38: end for
routing is better because for transmitting the same amount of energy is transmitted, and Fig. 2b shows that our proposed protocol uses less energy, hence it will sustain for a longer period of time. Finally, in this paper, we computed the network lifetime for a total of 10 iterations, and in all of them, GAER-UWSN outperforms the existing one, as shown in Fig. 2a.
158
M. Sajwan et al.
(a) Network lifetime
(b) Energy Consumption
(c) Box plot of FND statistics for the iteration of 10 Fig. 2 Performance parameter of UWSN routing protocol
6 Conclusion The main objective of the proposed algorithm is to maximize the energy efficiency and longevity of UWSN using genetic algorithms. The GAER-UWSN technique led to the selection of the best-suited (i.e., less energy-consuming) multipath from source nodes to the sink. It was also identified as a function for selecting optimal routes to BS. The route selection helps improve network performance. The GAERUWSN technique’s experimental results are evaluated and investigated under various conditions. The GAER-UWSN algorithm outperformed well-established existing techniques in all comparisons. The GAER-UWSN technique’s energy efficiency can be improved in future by combining data. Resource allocation strategies based on metaheuristic algorithms can also be created.
GAER-UWSN: Genetic Algorithm-Based Energy …
159
References 1. O’Rourke MJ (2012) Simulating underwater sensor networks and routing algorithms in matlab. PhD thesis, University of the Pacific 2. Singh VK, Sharma V (2014) Elitist genetic algorithm based energy balanced routing strategy to prolong lifetime of wireless sensor networks. Chin J Eng 1–6:2014 3. Matin MA, Islam MM (2012) Overview of wireless sensor network. In: Matin MA (ed) Wireless sensor networks, chap 1. IntechOpen, Rijeka 4. Cui JH, Kong J, Gerla M, Zhou S (2005) Challenges: building scalable and distributed underwater wireless sensor networks (UWSNs) for aquatic applications. Channels 45(4):22–35 5. Subramani N, Mohan P, Alotaibi Y, Alghamdi S, Khalaf OI (2022) An efficient metaheuristicbased clustering with routing protocol for underwater wireless sensor networks. Sensors 22(2):415 6. Ismail AS, Wang X, Hawbani A, Alsamhi S, Abdel Aziz S (2022) Routing protocols classification for underwater wireless sensor networks based on localization and mobility. Wirel Netw 1–30 7. Proakis JG, Sozer EM, Rice JA, Stojanovic M (2001) Shallow water acoustic networks. IEEE Commun Mag 39(11):114–119 8. Mamta M, Goyal N, Nain M (2022) Optimization techniques analysis in underwater wireless sensor network. ECS Trans 107(1):5403 9. Awan KM, Shah PA, Iqbal K, Gillani S, Ahmad W, Nam Y (2019) Underwater wireless sensor networks: a review of recent issues and challenges. Wirel Commun Mob Comput 2019 10. Ashraf S, Gao M, Chen Z, Naeem H, Ahmed T (2022) CED-OR based opportunistic routing mechanism for underwater wireless sensor networks. Wirel Pers Commun 1–25 11. Agarkhed J, Biradar GS, Mytri VD (2012) Energy efficient QoS routing in multi-sink wireless multimedia sensor networks. Int J Comput Sci Netw Secur (IJCSNS) 12(5):25 12. Mohan P, Subramani N, Alotaibi Y, Alghamdi S, Khalaf OI, Ulaganathan S (2022) Improved metaheuristics-based clustering with multihop routing protocol for underwater wireless sensor networks. Sensors 22(4):1618 13. Gupta O, Goyal N (2021) The evolution of data gathering static and mobility models in underwater wireless sensor networks: a survey. J Ambient Intell Hum Comput 12(10):9757–9773 14. Gupta S, Singh NP (2022) Energy efficient void avoidance routing for reduced latency in underwater WSNs. In: Mobile radio communications and 5G networks. Springer, pp 543–557 15. Luo J, Chen Y, Wu M, Yang Y (2021) A survey of routing protocols for underwater wireless sensor networks. IEEE Commun Surv Tutor 23(1):137–160 16. Abbas S, Javaid N, Almogren A, Gulfam SM, Ahmed A, Radwan A (2021) Securing genetic algorithm enabled SDN routing for blockchain based internet of things. IEEE Access 9:139739– 139754
Occlusion Reconstruction for Person Re-identification Nirbhay Kumar Tagore, Ramakant Kumar, Naina Yadav, and Ankit Kumar Jaiswal
Abstract Person re-identification is very important for monitoring and tracking crowd movement to provide security in public places. Although this domain has gotten enough attention in the last few years, with the introduction of deep learning to develop an automated re-identification model. There are several challenges in the task, one of them is performing re-identification in the presence of an unconstrained situation (i.e., occlusion) that has not been addressed effectively. In this work, we focus on re-identification from RGB frames captured by multiple cameras. Occlusion reconstruction in these captured non-sequential frames has been done by employing the generalization capability of generative adversarial networks (GANs). Our algorithm is based on a multi-model approach to jointly perform occlusion reconstruction along with re-identification. The performance of the proposed approach is evaluated on three publicly available datasets, and the results obtained are quite satisfactory and superior to all the existing approaches used in the study. Keywords Generative modeling · Occlusion reconstruction · Siamese network · Person re-identification
N. K. Tagore (B) · A. K. Jaiswal School of CSET, Bennett University, Greater Noida, India e-mail: [email protected] A. K. Jaiswal e-mail: [email protected] R. Kumar · N. Yadav CSE Department IIT (BHU), Varanasi, India e-mail: [email protected] N. Yadav e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_15
161
162
N. K. Tagore et al.
1 Introduction Identifying an individual in the image or video is nowadays become an important issue in terms of providing security to the individuals in a surveillance zone and multimedia applications. The process involves offline searching, online tracking of an subjects of interest in an image or video. The continuous recording of surveillance videos from the camera network setup in the zone results in the large amount of data which is not so easy to analyze manually and tiresome also. In most public spaces such as railway stations, airports, shopping malls, hospitals, and office buildings, a network of cameras is installed for surveillance purposes. Manual analysis of this large amount of video data to perform person re-identification or other video surveillance tasks is laborious and computationally time-intensive that need to be processed either manually or through automated means. However, in most real-life situations, occlusion is an inevitable occurrence, and it occurs when static/dynamic objects appear in between the camera field-of-view and the target subject causing certain body parts of the target subject to get obstructed. The occluded regions in occluded re-identification task may contain noise and other factors like unavailability of complete frame or hindrance from some object which results in incorrect matching. Unavailability of complete subject information in a frame causes improper identity matching. The presence of occlusion degrades the effectiveness of traditional image-based re-identification techniques. To the best of our knowledge, although there exist a few deep learning-based occlusion reconstruction strategies in the context of person re-identification, none of these consider occlusion reconstruction and re-identification as two separate modules. Rather, these methods train a single deep neural network to perform re-identification directly from the input occluded frames. It appears that the effectiveness of these approaches can be improved by training two separate dedicated deep neural network architectures for occlusion reconstruction and re-identification and stacking them during deployment as a single end-to-end model. The previous approaches to handling occlusion in re-identification attempt to reconstruct the occluded pixels from spatial information present in the other non-occluded pixels in the frame or adjacent frames before performing the identity matching. To date, not much focus has been given to occlusion reconstruction in person re-identification. In this work, we address a plausible solution to this problem by proposing multi-model architectures for occlusion reconstruction from image data. We propose an Autoencoder model to predict the occluded frames. The main contributions through this work can be highlighted as follows: 1. Occlusion reconstruction followed by re-identification is an important contribution of this work. 2. Effective approaches to occlusion reconstruction have been proposed for nonsequential occluded image data. We propose a combination of an improved Autoencoder model for the reconstruction of the occluded regions in each frame of the synthetically occluded dataset.
Occlusion Reconstruction for Person Re-identification
163
3. We carry out extensive experimental evaluation and perform a comparative analysis of our method with existing approaches using three most popular public datasets. The rest of the paper is organized as follows. Section 2, presents a thorough literature review of existing re-id schemes by emphasizing more on the latest learningbased methods. Section 3, explains the proposed approach. Next, in Sect. 4, we explain the evaluation settings and present the experimental results with a thorough analysis of the work. Finally in Sect. 5, we provide the conclusions of the proposed work and point out future research scopes.
2 Related Work This section presents a thorough literature survey on person re-identification. Section 2.1 provides an overview of the traditional approaches that do not use any learning technique. Next, we provide an elaborate discussion on the recently developed deep Learning-based person re-identification methods in Sect. 2.2. Based on the literature review, we point out the limitations of the existing methods and possible future research scopes.
2.1 Tradition Methods These methods rely entirely on visual descriptors and do not incorporate any external contextual information for establishing correspondences between images. These can further be classified as passive or active methods, as discussed next. Passive Methods: This category of approaches exclusively deals with visual descriptor design and does not depend on learning techniques to measure the similarities in the person appearances. The human blob extracted using color and shape features from an image is split into polar bins in [13], and a descriptor is formed from the color Gaussian model and edge pixel counts corresponding to each bin. In [11], spatio-temporal edges detected through watershed segmentation and graph partitioning have been used to compute features to perform person re-identification task. In [26], a two-step approach is presented that first detects the person images in a scene and next employs color features-based pictorial structures for re-identification. A weighted sum of complementary appearance features extracted from multiple camera images known as histogram plus epitome (HPE) has been used for appearance matching in [4]. The work in [3] uses a weighted sum of three appearance features: weighted HSV histogram, maximally stable color regions (MSCR) [9], and recurrent highly textured local patches for re-identification.
164
N. K. Tagore et al.
2.2 Modern Methods Here, we review the person re-identification techniques that use modern sophisticated learning techniques. Till now, several deep neural network-based techniques have been developed by researchers across the world for person re-identification. The very first approach among these is the technique proposed in [17]. The authors propose two major contributions through the paper: (i) a deep neural network called as filter pairing neural network (FPNN) and (ii) a person re-identification dataset (popularly known as CUHK_03). In [31], authors propose a hierarchical scheme for person re-identification. The approach involves the combination of color-based filtering and Siamese network-based prediction. The relative distance comparison scheme is proposed in [8] that employs a deep neural network for scalable distance-driven learning. The [22], McLaughlin et al. firstly introduced the concept of modeling dynamic feature information in frame sequences, i.e., video clips, by employing a time series network, i.e., recurrent neural network (RNN). Here, the average of each recurrent cell output has been used as a clip-level representation. Similar to [22], Yan et al. [36] also proposed a method based on RNN to encode sequential feature information to store the entire video feature information. A deep neural network, namely quality aware network (QAN), was introduced in [21] by Liu et al. to compute the static feature information by a attention-weighted average scheme employed in QAN. The joint framework for extracting temporal motion information was proposed in [43] and [35]. In these approaches, the RNN-based features are used to extract attention features. The works proposed in [18] introduced a harmonious attention CNN (HA-CNN) model to jointly learn soft and hard region pixels in an input frame. In another work [27], a mask guided technique is proposed with triplet loss function. The first Siamese neural network was proposed in [5], and later on, with the development of deep learning architectures, there have been significant advancements in this area. Deep Siamese networks have also been used widely in research on person re-identification [2, 6, 15, 23, 25, 37, 40]. Generative adversarial networks (GANs) are highly effective in performing various types of image translation and prediction tasks [7, 10, 24] and also play an important role in reconstructing the missing information present in the occluded image frames. These networks have also been used in the past to perform re-identification after GAN-based reconstruction of the frame embeddings. Authors in [30] proposed an ensemble-based approach for videobased person re-identification. Authors in [28] also developed a model for capturing multi-scale features by applying dilation to the convolution layers in the Siamese architecture. The spatio-temporal feature information is exploited in [32] to develop a joint framework for occlusion reconstruction and re-identification. Although, the accuracy of the approach is not significantly high, and it can be improved by employing two dedicated different deep neural networks each for occlusion reconstruction and re-identification. A pose transfer model was proposed in [24] by Qian et al., namely pose-normalization GAN (PN-GAN). In this paper, the model is used to
Occlusion Reconstruction for Person Re-identification
165
generate the different poses for an input image. Tagore et al. in [29] make a similar suggestion for a new deep network specifically devoted to occlusion reconstruction and re-identification. The authors proposed a two-step approach to jointly perform reconstruction and re-identification in a given non-sequential dataset. The pose variation issue was addressed in [10]. The authors develop a Siamese architecture-based model for handling pose misalignment. They proposed a feature distilling GAN (FD-GAN) deep model for removing misalignment and performing re-identification with a single network. The cross-modality is another aspect which is very useful for the deployability of the model. The authors in [7] proposed a cross-GAN model for extracting features from both RGB and IR (infrared) images. In [38], Zhang et al. presented the part-based non-direct coupling embedded GAN to perform reidentification by incorporating a block-based learning technique. The authors in [33] handled the lighting variations, viewpoint, and pose changes in the person reidentification task by employing a GAN-architecture, namely the person transfer GAN (PTGAN). Similarly, in the work proposed by Zhou et al. [42], a multi-camera transfer GAN is introduced to improve the re-identification performance for crossdataset experiments to make the more robust for real-life implementation. Unlike from the previous works, the work proposed in [1] is a different work, in which the authors handled the issue of low-resolution images by employing a GAN model to translate the input (low-resolution) images into equivalent output (high-resolution) images before performing the re-identification. From the extensive literature survey, we observe that in person re-identification, the quality of occlusion reconstructed outputs by the existing models needs to be improved. In this work, we propose new and effective approach for occlusion reconstruction from non-sequential image frames with a focus on enhancing the quality of the reconstruction and achieve a better re-identification performance.
3 Proposed Work This section explains about the proposed multi-model approach in detail. The proposed work has been presented in three separate modules: (i) preparation of occlusion dataset by applying synthetic occlusion, (ii) developing deep generative model for occlusion reconstruction, and (iii) performing re-identification using Siamese network. As mentioned above, we use deep neural networks to predict the unoccluded versions of input occluded frames. Since training any machine/deep learning model requires the availability of ground truth, we need to prepare an extensive dataset of occluded image frames along with their corresponding unoccluded counterparts and, henceforth, train a neural model to map occluded images to their corresponding unoccluded versions. Generation of this training data can be done by considering any unoccluded dataset and applying synthetic occlusion at random positions within the frames present in this dataset. The objects used for creating synthetic occlusion are everyday objects such as table, stool, tree, lamp, and dog. For each unoccluded
166
N. K. Tagore et al.
Fig. 1 Insight view of an Autoencoder model
image frame, we randomly decide the maximum percentage of pixels to occlude and superimpose certain occluding objects (randomly scaled) at randomly selected positions in the frame such that the number of pixels altered does not exceed the decided maximum percentage of occlusion. The percentage of occlusion can be defined as Corrupted pixels in an occluded image frame × 100. Number pixels in the frame (1) During implementation, the maximum percentage of occlusion in each frame is varied between 0 and 50%, and a synthetically occluded re-identification dataset is constructed by corrupting each frame with a certain percentage of synthetic occlusion. Like any neural network-based generator, an Autoencoder also consists of two deep neural networks: (i) the encoder and (ii) the decoder. The encoder (refer to Fig. 1) first encodes the image into a lower-dimensional latent representation, and next, the decoder decodes the latent representation back to an image. The autoencoder is trained to copy its input to output bypassing through the layers of the networks by simultaneously compressing the data minimizing the reconstruction error. In this work, we employ a convolutional stacked Autoencoder due to its demonstrated effectiveness in handling image translation tasks to reconstruct the occluded pixels in non-sequential images. Autoencoder learns to reconstruct the occluded pixels and map to the output. It reconstructs effectively similar to the original image (refer to Fig. 3). The Autoencoder architecture used in this work consists of six layers with ReLU activation function except for the last layer in which sigmoid activation has been used. The layer-wise detailed configuration of the Autoencoder is shown in Table 1. From the table, first four layers of the Autoencoder model are convolutional layers with same padding, kernel size of 3 × 3, and the respective number of filters are 128, 64, 64, and 32. The fifth layer is the ConvTranspose layer with the same Occlusion percentage =
Occlusion Reconstruction for Person Re-identification
167
Table 1 Layered configuration of the Autoencoder model Network Layers Size of filter Stride Encoder
Decoder
Conv2d_1 Conv2d_2 Max_Pooling2d Conv2d_3 Conv2d_4 Conv2dTranspose Conv2d_1
# of filters
3×3 3×3 2×2 3×3 3×3 3×3
1 1 – 1 1 2
128 64 – 64 32 32
3×3
1
3
Fig. 2 Insight of the deep Siamese network Table 2 Architecture of the Siamese network Network Layers Siamese network
Conv2D1 Conv2D2 Conv2D3 Conv2D4 FC
Size of filter
# of filters
5×5 5×5 5×5 3×3
20 25 25 25 No. of neurons = 500
padding, stride of 2, kernel size of 3 × 3, and the number of filters in this layer is 32. Finally, the last layer is a Conv2D layer with the same padding, three filters, and a kernel size of 3 × 3. The model is trained using the synthetically occluded dataset. The Autoencoder model has been seen to achieve convergence in 1000 epochs. In recent years, Siamese networks have been used widely to predict whether a pair of input images are similar or not [2, 10, 31]. The detailed configuration of the Siamese network proposed in the study is shown in Fig. 2 (Table 2). The training of Siamese network is carried out by creating the training set of positive and negative pairs of persons images, where positive pairs are those with
168
N. K. Tagore et al.
same person identities, while negative pairs are those with different identities. We have used Adam optimizer for network training with contrastive loss to optimize the network parameters. In Eq. (2), Y is the predicted label (either 0 or 1). Here, Y = 1 indicates the image pairs belong to the same class, whereas Y = 0 indicates that the image pairs are from different classes, Dw represents the output score obtained from the Softmax layer. 1 1 L Contr = (1 − Y ) (Dw )2 + (Y ) {max(0, m − Dw )}2 . 2 2
(2)
4 Results Here, we present an overview of the datasets used in the study to evaluate the performance of the proposed multi-model approach. Dataset: Three publicly available datasets have been used to test the performance of our proposed approach. The related details of each dataset (i.e., dataset name, total number of cameras used, total subjects, and the total images) are presented in the Table 3. Firstly, the combined set of unoccluded frames from all the datasets along with their corresponding synthetically occluded versions have been used to train the occlusion reconstruction (i.e., Autoencoder) model. Finally, the Siamese architecturebased re-identification network is trained by forming positive and negative pairs of image frames from the training data of all the datasets combined. Figure 3 presents the reconstruction results after applying Autoencoder model. The first column in the figure presents the unoccluded images from the original dataset, while the second and third column show the respective synthetically occluded samples and reconstructed samples after applying synthetic occlusion and reconstruction from Autoencoder, respectively. It can be easily observe that the reconstruction capability of Autoencoder model is significantly good, and it is able to remove the occluders from the occluded samples. Next, we carried out the comparison of rank-wise re-identification performances of the proposed re-identification approach with existing state-of-the-art techniques,
Table 3 Image frame-based datasets used in the study Dataset Total cameras used Total images CUHK_01 [16] Market1501 [39] CUHK_03 [17]
2 6 5 pairs
3884 32,268 13,160
Total Id’s 971 1501 1360
Occlusion Reconstruction for Person Re-identification
169
Fig. 3 Sample ground-truth unoccluded non-sequential frames (1st column), frames with synthetic occlusion (2nd column), reconstruction using Autoencoder (3rd column)
namely [14, 15, 17, 19, 20, 34, 39, 41] is presented. The approaches [15, 17, 20, 41] have been developed specifically to work in unoccluded scenarios. In this study, we test the effectiveness of these approaches on our test data after reconstructing the occlusion using the proposed Autoencoder model. Table 4 presents the re-identification results of our proposed approach for different ranks (i.e., R-1, R-5, and R-10) with popular state-of-the-art methods. It can be observe from the table that the rank 1 performance of the proposed Autoencoder + Siamese model is highest among all other approaches used for comparative study. For Cuhk_01 dataset the rank 1 performance is 91.0%, which is about 1.3% greater than the second best approach [15]. Similar results can be seen for all other datasets used in the study, for CUHK_03, and Market1501 dataset, our approach achieves 90.8% and 91.2% of rank 1 accuracy, respectively.
170
N. K. Tagore et al.
Table 4 Rank-wise comparative results on CUHK_01, CUHK_03, and Market1501 datasets Approaches
Dataset CUHK-01
CUHK-03
Market1501
R=1
R=5
R = 10
R=1
R=5
R = 10
R=1
R=5
R = 10
BoW [39] + KISSME [14]
54.8
63.0
69.0
51.0
57.0
65.0
44.5
63.4
72.2
LOMO [19] + XQDA [19]
64.1
77.4
82.0
66.1
75.4
82.0
32.4
44.8
60.5
HistLBP [34] + XQDA [19]
56.7
70.1
81.6
54.7
61.1
67.6
36.7
50.2
67.6
WARCA [12]
56.4
66.8
74.6
49.4
61.8
74.6
45.4
68.8
76.6
FPNN [17]
31.5
44.6
56.2
26.4
44.8
57.2
26.4
40.6
60.2
MSCAN [15]
89.5
92.3
94.4
83.7
90.3
95.4
86.8
–
–
CamStyle + re-rank [41]
86.0
88.8
93.5
88.6
90.8
92.5
85.2
92.8
95.5
SVD-Net [20]
87.4
89.5
92.0
89.2
92.5
95.0
82.2
91.5
95.0
Autoencoder + Siamese
91.0
93.6
95.2
90.8
92.6
92.2
91.2
93.2
94.8
5 Conclusions We propose an occlusion handling approach to address the problem of partial occlusion in person re-identification. Firstly, we introduce an autoencoder-based generative model to reconstruct the occluded image pixels by exploiting spatial domain information from the unoccluded images present in the dataset. Next, a Siamese network is introduced to perform re-identification by establishing one-to-one correspondence among the test and gallery subject. We evaluated the proposed approach on three public datasets, and the appealing reconstruction and re-identification results demonstrate the superiority of proposed approach over other existing methods used in the study. The reconstruction results shown in Fig. 3 can be further improved by applying a fine-tuning model on the reconstructed image frames that can be considered as the future scope of the work.
References 1. Adil M, Mamoon S, Zakir A, Manzoor MA, Lian Z (2020) Multi scale-adaptive super-resolution person re-identification using GAN. IEEE Access 8:177351–177362 2. Ahmed E, Jones M, Marks TK (2015) An improved deep learning architecture for person re-identification. In: Proceedings of the conference on CVPR, pp 3908–3916 3. Bazzani L, Cristani M, Murino V (2013) Symmetry-driven accumulation of local features for human characterization and re-identification. Comput Vis Image Underst 117(2):130–144 4. Bazzani L, Cristani M, Perina A, Farenzena M, Murino V (2010) Multiple-shot person reidentification by HPE signature. In: Proceedings of ICPR, pp 1413–1416
Occlusion Reconstruction for Person Re-identification
171
5. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1994) Signature verification using a “Siamese” time delay neural network. In: Proceedings of the advances in NIPS, pp 737–744 6. Chung D, Tahboub K, Delp EJ (2017) A two stream Siamese convolutional neural network for person re-identification. In: Proceedings of the ICCV, pp 1983–1991 7. Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. In: Proceedings of the IJCAI, vol 1, p 2 8. Ding S, Lin L, Wang G, Chao H (2015) Deep feature learning with relative distance comparison for person re-identification. Pattern Recognit 48(10):2993–3003 9. Forssén PE (2007) Maximally stable colour regions for recognition and matching. In: Proceedings of the conference on CVPR, pp 1–8 10. Ge Y, Li Z, Zhao H, Yin G, Yi S, Wang X, Li H (2018) FD-GAN: pose-guided feature distilling GAN for robust person re-identification. arXiv preprint arXiv:1810.02936 11. Gheissari N, Sebastian TB, Hartley R (2006) Person re-identification using spatiotemporal appearance. In: Proceedings of the conference on CVPR, vol 2, pp 1528–1535 12. Jose C, Fleuret F (2016) Scalable metric learning via weighted approximate rank component analysis. In: Proceedings of the ECCV, pp 875–890 13. Kang J, Cohen I, Medioni G (2004) Object reacquisition using invariant appearance model. In: Proceedings of the ICPR, vol 4, pp 759–762 14. Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: Proceedings of the conference on CVPR, pp 2288–2295 15. Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the conference on CVPR, pp 384–393 16. Li W, Zhao R, Wang X (2012) Human re-identification with transferred metric learning. In: Proceedings of the ACCV, pp 31–44 17. Li W, Zhao R, Xiao T, Wang X (2014) DeepReID: deep filter pairing neural network for person re-identification. In: Proceedings of the conference on CVPR, pp 152–159 18. Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: Proceedings of the conference on CVPR, pp 2285–2294 19. Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the conference on CVPR, pp 2197– 2206 20. Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017) Hydraplus-Net: attentive deep features for pedestrian analysis. In: Proceedings of the ICCV, pp 350–359 21. Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: Proceedings of the conference on CVPR, pp 5790–5799 22. McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the conference on CVPR, pp 1325– 1334 23. Munir A, Martinel N, Micheloni C (2020) Multi branch Siamese network for person reidentification. In: 2020 IEEE international conference on image processing (ICIP). IEEE, pp 2351–2355 24. Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang YG, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the ECCV, pp 650–667 25. Shen C, Jin Z, Zhao Y, Fu Z, Jiang R, Chen Y, Hua XS (2017) Deep Siamese network with multi-level similarity perception for person re-identification. In: Proceedings of the 25th ACM international conference on multimedia, pp 1942–1950 26. Sivic J, Zitnick CL, Szeliski R (2006) Finding people in repeated shots of the same scene. In: Proceedings of the BMVC, vol 2, p 3 27. Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1179–1188 28. Tagore NK, Chattopadhyay P (2020) SMSNet: a novel multi-scale Siamese model for person re-identification. In: Proceedings of the ICETE, pp 103–112
172
N. K. Tagore et al.
29. Tagore NK, Chattopadhyay P (2021) A bi-network architecture for occlusion handling in person re-identification. Signal Image Video Process 1–9 30. Tagore NK, Chattopadhyay P, Wang L (2020) T-MAN: a neural ensemble approach for person re-identification using spatio-temporal information. Multimed Tools Appl 79(37):28393– 28409 31. Tagore NK, Singh A, Manche S, Chattopadhyay P (2021) Person re-identification from appearance cues and deep Siamese features. J Vis Commun Image Represent 75:103029 32. Wang G, Lai J, Huang P, Xie X (2019) Spatial-temporal person re-identification. In: Proceedings of the AAAI conference on AI, vol 33, pp 8933–8940 33. Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer GAN to bridge domain gap for person re-identification. In: Proceedings of the conference on CVPR, pp 79–88 34. Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: Proceedings of the ECCV, pp 1–16 35. Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the ICCV, pp 4733–4742 36. Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Proceedings of the ECCV, pp 701–716 37. Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: Proceedings of the ICPR, pp 34–39 38. Zhang Y, Jin Y, Chen J, Kan S, Cen Y, Cao Q (2020) PGAN: part-based nondirect coupling embedded GAN for person re-identification. IEEE MultiMedia 27(3):23–33 39. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the ICCV, pp 1116–1124 40. Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive Siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5735–5744 41. Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camera style adaptation for person reidentification. In: Proceedings of the conference on CVPR, pp 5157–5166 42. Zhou S, Ke M, Luo P (2019) Multi-camera transfer GAN for person re-identification. J Vis Commun Image Represent 59:393–400 43. Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the conference on CVPR, pp 4747–4756
Analysis and Prediction of Purchase Intention of Online Customers with Deep Learning Megha Bansal and Vaibhav Vyas
Abstract Nowadays, online shopping has transformed the traditional shopping trend enormously. This shift has taken a altogether enlarged view after the COVID19 pandemic which provided world full of opportunities for customers to make purchasing online with ease of home and without compromising the safety parameters. But now every new or old, small, or big platform wants to grab their customers at any cost for which they really need to understand the demand or expectations of customer popping in. Understanding the need of the customer is the key of success for online shopping sites. In our paper, we have made use of a deep learning model to first predict whether the customer is going to make a purchase or not. After deciding this broader category, we can actually provide offers for non-purchaser or a better deal which can make them shop, and for purchasers, we can provide some loyalty cash or coupons to make them stay. In our model, we have got training accuracy of 90.24% and validation accuracy of 88.15%. And the loss of both is 23% and 28%, respectively. Keywords Purchase intension · Machine learning · Deep learning · Online shopping · Customer retention
1 Introduction COVID-19 crisis has given a new ray of hope to e-commerce market. There is a historic change in customer behavior. As per the PWC-Global Consumer Insights Pulse Survey June 2021[1], trends predict a huge transition toward online market and which is going to stay as well as increase rapidly. According to this survey, 46% goes to Physical Stores, 34% preferred online shopping via PC, 38% did online M. Bansal (B) · V. Vyas Department of Computer Science, Faculty of Mathematics and Computing, Banasthali Vidyapith, Aliyabad, Rajasthan, India e-mail: [email protected] V. Vyas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_16
173
174
M. Bansal and V. Vyas
shopping via tablet, 44% did online shopping via mobile phones and smartphones, and 42% did online shopping by voice assistants. Moreover, if we see web purchasing provides various of comparative options for users from the comfort of home, and on the other hand, it opens up a huge window of both big and small firms to generate profit by applying few machine learning algorithms. Machine learning techniques can help these companies to predict various things like; to decide whether the visitor came for actual purchasing or not, how to seize a non-serious visitor by showing some interesting products at better price or by providing some discount coupons, etc., these algorithms also help in retaining serious customers by providing them some royalty cash or coupons that can be availed on next purchase. So, in totality machine learning algorithm shows a win–win path for companies. In this paper, we have predicted whether an incoming customer will make a purchase or not using Deep Learning algorithm of Machine Learning over the available dataset of Online Shoppers Purchasing Intention at UCI Machine Learning Repository. The results generated in our study can really help the companies for a better revenue generation in future.
2 Related Work A very traditional approach for understanding the behavior or intention of a customer is to study deeply the pattern over the web. This could really help when analyzed further by any tool or machine learning algorithm for providing better suggestions and offers to customers which can help the e-commerce market for generating better profit. In [2], author has used Hidden Markov Model to predict the purchase intention of web user in 2005 but the Precision was only 51% and Recall is 73%. Hence, it clearly indicates that many more methods or techniques need to be tried for the prediction. [3] elaborates a Structural Equation Modeling (SEM) for fetching various factors that makes an impact on mobile web shopping. Further, [4] made a study on 350 college students for searching out the parameters made by them for doing the online shopping. Their decision was dependent on various factors like usefulness of the product, gender-based, and social norms. But overall study was aimed at making comparison toward purchase intention at stores and at web. Afterward, [5] discussed about technology plays an important role in online shopping by a data collected over undergraduate students. Then, [6] describes that tourists use websites of rural areas for making bookings and purchases. [7] provides a conceptual model for predicting the impact of advertising on web shopping. Then, [8] succeeded in predicting buying behavior using clickstream data, website data, and session data by using LSTM RNN. [9] focused on Naïve Bayes and Decision trees to predict the user behavior.
Analysis and Prediction of Purchase Intention of Online Customers …
175
3 Background 3.1 Dataset Used Online Shoppers Purchasing Intention Dataset has been taken from UCI Machine Learning Repository [10]. Dataset consists of a feature vector belonging to 12,330 sessions. Each session was belonging to individual user for around 1-year duration so that any special occasion, special sale or offer, or any specific period or festival kind of thing can be overruled. Dataset was further divided into 8 categorical and 10 numerical attributes. Each label is associated with training and test example, and various label are as follows: Administrative Integer, Administrative_Duration integer, Information Integer, Informational_Duration integer, ProductRelated, ProductRelated_Duration integer, BounceRate, ExitRate, Page Values, SpecialDay, Browser, Region, TrafficType, userType, Revenue, and Weekend.
3.2 Deep Learning In this paper, we implemented Deep Learning technique of Machine Learning for predicting user’s purchase intention because this seems to be best to imitate how a user thinks. This technique uses multiple layers so that a higher-level feature extraction can be done over the data.
3.3 Tools Used Google Collaboratory IDE is used with Python 3.8. NumPy, Pandas, Matplotlib, Tensor flow, and Keras were the libraries imported.
4 Methodology 4.1 Methodology Flow In our sequential model, we made 4 input layers and 1 output layer. ReLU is used as an activation function. All the functionality is performed on Google Collaboratory IDE which is a free platform for Deep learning problems. Python 3.8 is used with NumPy for performing mathematical functions, Pandas for data analysis; Matplotlib for visualizing data and
176
M. Bansal and V. Vyas
Data Loading
Exploraory Data Analysis
Spling the data into training and tesng set
Defining the Model
Model Compilaon
Model Execuon
Accuracy Check on tesng data
Visualising Accuracy
Predicons out of model
Fig. 1 Methodology flow
graphical plotting, and tensor flow for building up and prediction of the model with the help of IDE generated by Keras. (Fig. 1).
4.2 Metrics These metrics are all about the Data Analysis and Visualization of model. In Fig. 2, a look has been taken to distribute data into two major classes like Revenue generated or not and purchase done on weekend or not. Figure 3 is basically depicting visitor type based on their purchase history like Returned visitors, New Visitors, and others. Also, visitor categorization is done based on different browsers. Further Fig. 4 is showing empirical relationship between different parameters available.
4.3 Clustering Analysis 4.3.1
Informative Duration Versus Bounce Rate
See (Fig. 5).
Analysis and Prediction of Purchase Intention of Online Customers …
177
Fig. 2 Visitors count based on revenue generation and weekend purchases
Fig. 3 Visitor type
4.3.2
Administrative Duration Versus Bounce Rate
Clustering analysis is a statistical process for grouping similar things in respective categories. Here, clustering is done for un-interested customers, general customers, and target customers (Fig. 6).
5 Result As we can see, the accuracy of the model decreased as the number of epochs increased. The accuracy of both training and validation data started with 89% and 82%, respectively, and ended up being 90.24% and 88.15%. Similarly, the loss of the model increased as the number of epochs increased. The loss of both training and validation data started with 24% and 27%, respectively. The max loss should be 1. If it exceeds that then it means our model is losing much more knowledge than it is gaining. However, as the number of epochs increased the
178
M. Bansal and V. Vyas
loss ended up being 23% and 28%, respectively, which is a very good value for an efficient deep learning model in practice. We can identify the visitors who are going to generate revenue and those who are not by confusion matrix. We can use this information as follows: 1. Once we are able to identify that someone is going to generate revenue, we do not need to provide any coupons, rather we can give the visitors special points which they can use the next time they visit. 2. The visitors that are unlikely to make a purchase can be provided with discount coupons so that they are more likely to make a purchase (Figs. 7 and 8).
Fig. 4 Bi-variate analysis
Analysis and Prediction of Purchase Intention of Online Customers …
179
Fig. 4 (continued)
6 Conclusion and Future Work In this paper, we have explained a model that is a multi-class classification model which means it is capable of making predictions on more than two classes. This could be great help to predict the purchase intension of customer so that a lucrative deal can be offered to customers accordingly. In future, model can be improved, extended, and deployed as a web app to make it easier for new businesses to understand how ML works on web data. Our current model does predictions upon 10 classes. We can use data as well as models which are capable of making predictions upon hundreds or even thousands of classes. Our current model is trained and predicted upon a small scale of big data. We can extend it to work upon more inclusive and complex data of high quality too.
180
Fig. 5 Bi-informative duration versus bounce rate
M. Bansal and V. Vyas
Analysis and Prediction of Purchase Intention of Online Customers …
Fig. 6 Administrative duration versus bounce rate
Fig. 7 Model accuracy and loss
181
182
M. Bansal and V. Vyas
Fig. 8 Confusion matrix
References 1. https://www.weforum.org/agenda/2021/07/global-consumer-behaviour-trends-online-sho pping/ 2. Wu F, Chiu IH, Lin JR (2005) Prediction of the intention of Pur-chase of the user surfing on the web using hidden Markov model 3. Lu HP, Su PY (2009) Factors affecting purchase intention on mobile shopping web sites 4. Cha J (2011) Exploring the internet as a unique shopping channel to sell both real and virtual items: a comparison of factors affecting purchase intention and consumer characteristics 5. Chen YH, Hsu IC, Lin CC (2009) Website attributes that in-crease consumer purchase intention: a conjoint analysis 6. San Martín H, Herrero Á (2011) Influence of the user’s psychological fac-tors on the online purhase intention in rural tourism: Integrating innovativeness to the UTAUT framework 7. Saadeghvaziri F, Dehdashti Z (2015) Web advertising 8. Sakar CO, Polat SO, Katircioglu M (2018) Neural Comput Appl 9. Bing L, Yuliang S (2017) Prediction of user’s purchase intention based on ma-chine learning 10. https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset
Create and Develop a Management System for Cardiovascular Clinics Mohamed A. Fadhel, Haider Hussein Ayaall, Zainab Yasir Hanuyt, and Hussein Hamad Hussein
Abstract In reality, health care is very important and can do a lot for the patient, so the idea of a special system to follow up on and diagnose the patient’s health from the side of their heart makes sense. From the patient’s point of view, the patient needs the special requirements of the patient, as most clinics use manual prescriptions, but most of them use them in paper form, which is not clear and complex, and it is a model that I have converted into a software system, so that it is in an organized way, do not waste time, excessive effort, and disorganization away, more complexity and too many details, saving time for full registration of all patients and shortening time for doctor. Diagnostics and results of the patient can be referenced at any time, as the system is divided into two controllers, the system manager and the secretary. The database was designed and programmed by SQL Server 2014 for a SQL Server database and linked to Microsoft Visual Studio interfaces according to the C# language, and it was shown that the system’s desire is to document information, procedures, activities, and other clinics that belong to the patient to be documented. Keywords Cardiovascular · Database · SQL · Visual studio
1 Introduction Overview complexity issues have arisen because there are always new specialties, different roles, and public and private organizations in patients’ healthcare coverage [1–3]. There are IT alternatives for different parts of today’s healthcare systems, and there has been a lot of progress in innovation that can be used to help people in different parts of their lives, including the medical field. People need the medical field because it can save a lot of lives. This system helps to cut down on the problems that happen when you use the manual system. This system lets doctors and clinic assistants keep track of patients’ records, medical stock, and appointments, as well as make reports [4–7]. The system is made because of the problems that happen when M. A. Fadhel (B) · H. H. Ayaall · Z. Y. Hanuyt · H. H. Hussein College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_17
183
184
M. A. Fadhel et al.
you use the traditional process. Inconsistencies in data, mixed data, and problems with reporting are the main problems that the user is having. Because of that, this system was made to solve the problems. Use: This system is easy and simple to use. It can assist the clinic manage their work and also focus on solving the problem immediately. It will be easier and more structured to keep track of all of the patients’ information with these types of systems. All the information will also be kept in a database so there would be less of a chance that it will be lost. Also, the user would be able to see his history from anywhere [8–10]. When we talk about the “Cardiac Clinic Management System,” we want to tell reader about some interesting and noticeable changes that happened there. This is a system that was built to let doctors focus on caring for patients and let the system handle the paperwork. People who run the clinic can make the best use of their resources, be more productive and get more information about patients and administration quickly, all while improving patient care. “Cardiac Clinic Management System” application is simple to get and user-friendly. It automates clinical management tasks. There are a lot of things that we need to understand about heart patients, so humans worked on the Heart Clinic Management System [11–15] because of this. Nowadays, there is a big problem with health care because there is more and more variety and divergence. There are a lot of jobs in patient care and many public and private organizations that work together to help people. All of these tasks in healthcare organizations play a part in taking care of one person at a time. This makes it even more important for organizations and the systems and applications they use to work together. Most of the time, management systems are done by hand, which makes them ineffective. Proper management by hand would then create major trouble. From the time users register until check out, there are more errors. It is hard for a physician who remembers all of the patient’s history during treatment if there is not a method of searching for it again, what is wrong with me. When a doctor recommends to diagnose a person once more, he or she needs to find out more about the patient’s history first. The paper’s primary contribution may be summarized as follows: 1. Make manual into highly automated. Before the first, most of the work in the ward was done by hand. After making it done by computers, it will be simply a list of data. The physician’s report can be difficult to read because it is written in different and complicated ways. Besides computerizing this system, it is easily readable the physician’s report. 2. The second thing users should do to make it much easier for management is to keep track of things. 3. Keep the system in mind when you are making it. It assists with everyday tasks in the health center. 4. All of the patients’ records are kept in a safe place. 5. Make a record of each visit to the patient and show it. 6. A job description for each patient and a printout of the job description were made. 7. Find things simple for the physician to run his own clinic in a simple and easy way, and make absolutely sure has time to register.
Create and Develop a Management System for Cardiovascular Clinics
185
2 Previous Studies In general, technology and information systems, in particular, are looked at as a whole and from the century’s traits. As an outcome, all the disciplines have grown and progressed. This study introduces a different medical system that will help enhance both the healthcare achievement and the patient’s satisfaction. This scheme would then connect a group of healthcare facilities that offer medical analysis and extract the results and then estimate proper analyses for each patient. Furthermore, the researcher needs to meet with people and get all the data that can help us in various manners, and then, they need to make an automated system that shows the prior services and makes healthcare process much easier. This system would also make it easy for patients to change physician or hospitals, and they would not have to pay for it [16]. On another hand, People at Teaching Hospital Ragama able to make dermatology appointments, register, and get their diagnoses all through a web-based project called “Web-based Dermatology” [17]. The system allows patients to get an appointment without having to go to the hospital. They can get an appointment by text message or email instead. There is a way for patients to come in a short time before their appointment, so they do not have to be very early. People who make appointments on a day when the clinic is not running will be told by the system if they are going to be able to come. This system makes it easy for patients to register and get their clinic numbers quickly and accurately. People who work with patients will find it easier to make appointments for them because of this. In this system, there is a special dermatology template that physician use to enter information about their patients. This helps in saving the doctor’s time and allows them to spend more time with their patients. The physician at the hospital pharmacy is given the patient’s medication with the patient’s clinic number so that the medications should be prepared when the patient arrives in for their next meeting to the pharmacy. This could cut down on the line at the pharmacy in the Dermatology clinic at the Teaching Hospital Ragama, which is in the same city. It makes it easier to keep track of important information about patients and send out information in a timely manner [17]. This document [18] gives a quick review of the health care in Myanmar, appears at the literature on EMR systems in other countries, and sums up the results of a pilot evaluation of the CMIS technology by Innovative solutions for Poverty Action (IPA). The document comes to an end with basic guidance [19] for people who want to help and ideas for more studies. This research study is a health center management solution which will be used to fix complex issues that the local clinic in Malaysia is having right now. It can be used to help them. The Clinic Management System is a web-based platform system that helps you run your clinic business. By using digital technologies, this study aims to increase the business operations of a clinic in Malaysia and also achieve better the clinic management software in the market. The goal of this project is to make it easier for people to make appointments, send text messages about them, and get medical certificates (MCs) to print out. Structured planning can make sure that each patient is placed in the queue in a fair way and show how long each patient is going to wait. The patient can schedule an appointment
186
M. A. Fadhel et al.
and a text message would be sent to their phone to let them know when they can come in. Besides, patients can use MyKad to register at the counter, which is very advantageous and saves a lot of time, too. Besides that, the MC can be made and printed out on paper. The physician only needs to sign the MC and cut it with his name [19]. As an outcome [20], a system called Clinic Management System with Alerts utilizing GSM Modem would be constructed to fix all the issues at the clinic now. Clinic Management System with alert Utilizing GSM Modem is developed to let the employees at the health center have top management equipment, computerized and systematic patient records, and detailed treatment records, so that they can do their jobs more quickly. An appointment feature is available in this system. This means that staff can look at the appointments made by physician, process them, and send a notification to patients.
3 Proposed System The system illustrated in Fig. 1 is really what users have to employ in any Iraqi clinics that need it. The people who would employ the system are physician, nurses, or anyone else who works in a clinic’s office. This proposal focuses mostly on making a system that can store computerized patient records and make reports. It also has some other features that can help the people who use it to do better at their jobs. Generally, this is a module for a computerized patient record. This is a database part of the system that we want to make. It has the patient’s information and their medical record in it, too. People’s medical history, previous diagnosis records, and treatment records were all in the medical record that was kept by the physician.
3.1 Welcome The start of the system is for system users only which consists of the login button to the system.
3.2 Login The interface to log in to the system, including the system administrator, is primarily responsible for granting permission to access the system.
Create and Develop a Management System for Cardiovascular Clinics
Fig. 1 Explanation of the system administrator architecture diagram
187
188
M. A. Fadhel et al.
Fig. 2 The starting interface of the program
3.3 The Main Menus of the System The system’s main menus are the basis for smoothly distributing and arranging secondary interfaces so that they can be accessed from the menus. Each interface in the system is handled. The lists are divided into two parts. The first section of the system administrator contains several tasks, lists, patient data, data, reports, and diagnoses, and the second section relates to reservations and specialties in the patient. In addition to tables, the final output of the rat menu and control system settings. The menu in Fig. 2 is the general output menu for the system administrator.
3.4 Information Doctor Doctor’s data is an important part of the system, and the system needs to access it to a part of the doctor’s data previously registered in the system in order to deal with the rest of the system’s lists. After filling in the required data such as name, job title, email and code Number, date, and certificates, as shown in Fig. 3, the other part is Modification or deletion through the search text with its code. We update and modify delete or change what is required.
3.5 Information the Clinic Clinic data is an important part of using the cardiologist’s system it is fetched automatically and parts of its fields are given for the completeness of the doctor’s data within the prescriptions or diagnoses in a list. Special information in the lists is based on the input of the doctor. This data is saved from the Save button and can always be retrieved from Research for text and modify or delete it as shown in Fig. 4.
Create and Develop a Management System for Cardiovascular Clinics
189
Fig. 3 Display the data doctor
Fig. 4 Display the data clinic
3.6 Data New Reservation A new reservation is created for the patient, which is represented by a patient’s name, reservation number, address, reservation date, and expiry date, which uses four parts, including the important part in using the cardiologist’s system.
190
M. A. Fadhel et al.
It is fetched automatically, and parts of its fields are given to complete the patient’s data at the doctor and not to be included in the prescriptions or diagnoses in the list Special information in the lists based on the input of the doctor. This data is saved from the Save button and can always be retrieved from re-search for text and modify or delete it and display previous reservations and modify, change, or search for previously registered patient information.
3.7 Information Status In the case information, the main information approved in the system is fetched only the important parts of each list such as the important information for the doctor and the clinic information and the necessary information for the patient such as the name, type, specialty, and date of the visit. It is fetched automatically, and parts of its fields are given to complete the patient’s data at the doctor and not to be included in the prescriptions or diagnoses in the list Special information in the lists based on the input of the doctor. This data is saved from the Save button and can always be retrieved from re-search for text and modify or delete it as shown in Fig. 5.
Fig. 5 Display the Information Status
Create and Develop a Management System for Cardiovascular Clinics
191
3.8 The First Diagnosis After fetching the patient’s information, the patient’s first diagnosis begins. After the examination, a diagnosis and a request for medical analyses, if necessary, with the notes. This depends on the doctor’s diagnosis to ensure the patient’s health. The main information approved in the system is fetched, only the important parts of each list as important information to the doctor and the clinic and the necessary information for the patient, such as the name, type, specialty, and date of visit. It is fetched automatically, and parts of its fields are given to complete the patient’s data at the doctor and enter them within the prescriptions or diagnoses in the list Special information in the lists based on the input of the doctor. This data is saved from the Save button as in Fig. 6a and can always be retrieved from re-search for text and modify or delete it. Also, print a report on the patient’s condition as shown in Fig. 6a and work on it to modify, change, or search for previously registered patient information.
3.9 The Second Diagnosis After fetching the patient’s information, the second diagnosis of the patient begins. After the examination, the second diagnosis is made after seeing the results from the first diagnosis and requesting medical tests if necessary with notes. This depends on the doctor’s diagnosis to ensure the patient’s health. The main approved information in the system is fetched, only the important parts of fetching the patient’s first diagnosis information. It is brought automatically and given parts of his fields to complete the patient’s data at the doctor and enter it within the prescriptions or diagnoses in the list as well as add an image of the X-ray or ECG, etc., special information in the menus. Based on the doctor’s input, this data is saved from the Save button as in Fig. 6a and can always be retrieved from re-searching, editing, or deleting a text. Also, print a patient status report as shown in Fig. 6b and work on it to edit, change, or search for previously recorded patient information.
3.10 Emergency Case The dangerous situation of the patient, which requires a necessary operation, requires bringing the patient’s information, necessary analyses, and guidance to the medical center, after the diagnosis of the patient. After the examination, the diagnosis is made and medical tests are requested, if necessary, with notes. This depends on the doctor’s diagnosis to ensure the patient’s health. The main approved information in the system is fetched, only the important parts of fetching the patient’s first diagnosis information.
Fig. 6 Display the first (A) and second (B) diagnosis
(a)
(b)
192 M. A. Fadhel et al.
Create and Develop a Management System for Cardiovascular Clinics
193
It is brought automatically and given parts of his fields to complete the patient’s data at the doctor and enter it within the prescriptions or diagnoses in the list as well as add an image of the X-ray or ECG, etc.
3.11 Information Prescription Requires bringing patient information, necessary analyses, and guidance to the medical center, after diagnosis to the patient. Then, upon the doctor’s diagnosis, the patient’s prescription is written and printed with notes. This depends on the doctor’s diagnosis to ensure the patient’s health. The main approved information is fetched into the system, only the important parts. It is brought automatically and given parts of his fields to complete the patient’s data at the doctor and enter it within the prescriptions or diagnoses in the list as well as add an image of the X-ray or ECG, etc.
3.12 System Settings System settings are only allowed for the system administrator, which consists of registering in the system, displaying registrants, restoring a backup copy, or extracting a copy, and also allow the system administrator to enter a new reservation or view previous reservations. System on the interface is authorized by the administrator through the scientific.
4 Discussion The heart clinic system goes through several steps with several stages: The first was to display patient data and additionally save patient data to the reports on the patient’s diagnosis and the prescriptions that doctors rely on, and on their basis, follow-up and evaluation were performed. A report has been prepared for each. Finally, the final results are displayed to track patient stages and give final result, as well as to report and print the last results.
4.1 Diagnostics In short, diagnostics is an important part of the system. In five seconds, the registered name is automatically fetched and the data that contributes to the patient’s life is given each section so that information is collected automatically or for assessment after
194
M. A. Fadhel et al.
diagnosis of it is saved within the database and the final result appears in each section and all data or it is saved in the database to retrieve the results every time.
4.2 System Output The output of the system is the most important part after it, knowing the patient’s final condition in an aggregate form, as well as displaying all patients pre-registered in the system from each diagnosis and control all the way to the output final results, display, report, and print smoothly and in a simple way.
4.3 Final Results of the Program The final results are divided into the first diagnosis and case information, then the second diagnosis, as well as emergency cases or medical prescriptions for each section outcomes and outputs, as mentioned earlier, down to the final results of sections, extract, report, and print to show final results in each section. On this basis, the sections are evaluated separately. As a result of this work, we are: 1. Learn about the existing clinic management systems. 2. Gather ideas for a technique through research, then try them out, and see if they work or not. 3. Make a simple software program that shows how to use the system, so that you can see how it works. 4. Examine how well the system works and think critically about how well and how poorly the system works. 5. Come up with new ideas and suggestions for how to improve this work, and then do that.
5 Conclusion Cardiac Clinic’s management system is a computerized way to keep track of patients’ records. The primary goal of this project is to make it easier for hospital staff to keep track of patients’ records and make it easier for clinics to give them good treatment. Our system includes clinical, scheduling, electronic medical records, charting, and data reporting components that make it easier for clinics to give patients great care. As a consequence, the proposed system would be good for physicians and nurses because it will help them. A lot of work and planning can be done more efficiently. It is meant to help people reach their aims and targets.
Create and Develop a Management System for Cardiovascular Clinics
195
Acknowledgements I would like to express my great appreciation to Assist. Prof. Hiam Hatem, Assist. Prof. Raed Majeed and Sarah Rahim for their support of this research work. Their willingness to generously give us time was greatly appreciated. I would also like to thank the staff at Sumer University’s Faculty of Computer Science and Information Technology for their support and encouragement.
References 1. EIADS, Clinic management system (2006). http://PDFme.eiads.com 2. Elmetwaly HM (2011) Design and implementation of medical information systems for managing and following up work flaw in hospitals and clinics. J Comput Sci 7(1):27 3. Mistry R, Misner S (2014) Introducing Microsoft SQL Server 2014. Microsoft Press 4. Gousset M, Hinshelwood M, Randell BA, Keller B, Woodward M (2014) Professional application lifecycle management with visual studio 2013. John Wiley & Sons 5. Tutorials Point (I) Pvt. Ltd,(2014). “C# PROGRAMMING Object –Orinted programming”. 6. Deitel H, Deitel P (2010) Visual C# 2010 How to program. Prentice Hall Press 7. Ariff M, BIN A (2006) Clinic management system. Univ Manage Syst 8. Petkovic D (2005) 3. Auflage-Microsoft SQL Server 2005: A Beginner’s Guide. McGraw-Hill Osborne Media 9. Michael Barnett, K. Rustan, M. Leino and Wolfram Schulte. “The Spec# programming system: An overview.“ International Workshop on Construction and Analysis of Safe, Secure, 10. Fähndrich M, Barnett M, Logozzo F (2010) Embedded contract languages. In: Proceedings of the 2010 ACM symposium on applied computing. ACM 11. Koshelev VK, Ignatyev VN, Borzilov AI (2016) C# static analysis framework. Proc Inst Syst Program RAS 28(1):21–40 12. Richter, Jeffrey (2012) CLR via C#. Pearson education. ISBN 978-0-7356-6876-8 13. Home repository for .NET Core. Available from. https://github.com/dotnet/core 14. Wijaya CAV, Azwir HH (2020) Information system development using microsoft visual studio to speed up approved sample distribution process. J Ind Eng 5(1):14–24 15. Jayampathi KTK, Jananjaya MAC, Fernando EPC, LiyanageYA, Pemadasa MGNM, Gunarathne GWDA (2021) Mobile medical assistant and analytical system for Dengue patients. In: 2021 3rd International conference on advancements in computing (ICAC), pp 371–376 16. Hewasinghe N (2021) Web based appointment and patient management system for Dermatology clinic at teaching hospital Ragama (Doctoral dissertation) 17. Fertig A, Park J, Toth R (2019) Clinic Management Information System (CMIS) 18. Ang HK (2015) Clinic management system (rapid clinic) (Doctoral dissertation, UTAR) 19. Teke A, Londh S, Oswal P, Malwade SS (2019) Online clinic management system. Int. J (Toronto, Ont.) 4(2) 20. Hatem H, Majeed R, Raheem S (2020) Electronic heart clinic management system. Int J Innov Eng Emerg Technol (IJIEET) 6(3):14–21
Event Detection on Social Data Streams Using Hybrid-Deep Learning Mohammed Ali Mohammed and Narjis Mezaal Shati
Abstract Event discovery making use of social media has actually multiplied given that social networking services have been an active channel of interaction to get in touch with others as well as spread newsletters. Particularly, in real-time social media, residential properties have actually created the possibility to sustain applications/systems in real time. This paper suggested a hybrid system for finding temporal occasions with social information evaluation. This information can be utilized by very first responders, decision-makers, or paper representatives to comprehend the situation, or paper agents to obtain an understanding of the circumstance. The proposed approach makes use of deep knowing approaches using 2 algorithms (1DCNN and LSTM) that play fundamental methods in key tasks consisting of recognizing helpful information from a noisy setting as well as temporal occasion discovery. The initial is the responsibility of a convolutional semantic network version educated from categorized Twitter data. The last is planned for finding events that are sustained by the recurring semantic network element. We revealed our method and empirical outcomes in research on the hate speech information established. Because deep discovery allows information functions to be drawn out without spending a great deal of time producing functions manually, our system is a lot more adaptive than various other systems making use of traditional techniques. This attribute makes our strategy functional to a new context of technique. The suggested system ensured to respond to proper hold-ups within several mins which would absolutely be a useful method to sustain information network representatives. Keywords Event detection · Social media · Visualization · 1DCNN · LSTM · OSN · Hyper · RF
M. A. Mohammed (B) · N. M. Shati Department of Computer Science, College of Sciences, Mustansiriyah University, Baghdad, Iraq e-mail: [email protected] N. M. Shati e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_18
197
198
M. A. Mohammed and N. M. Shati
1 Introduction Social media networks have in fact come to be an efficient tool for proceeded info. Millions of people can sign up with social media as well as likewise share the info they witness, experience, or talk with numerous other sources pertaining to various elements of everyday life. Generally, with the real-time quality, people play a feature as social sensing units, while the uploaded messages offer as activity signals. Therefore, social information can cover an occasion quicker than common information media. Address the concern of disaster celebration disclosure in the context of social media. Given that we job straight in a loud atmosphere as a social media network, our approach will certainly offer precisely just how to filter energetic information to remain to work information for exploration. Within the degree of celebration exploration, there are great deals of traditional strategies for this vital, yet feature-based strategy constantly requires detailed characteristic design. The remainder of the paper is prepared as complies with: Sect. 2 gives a brief conversation of the background as well as relevant job. Area 3 primarily focuses on the recommended technique using the semantic network consisting of the style of the crossbreed CNN as well as the LSTM variation to identify temporal celebrations. We executed experiments with each tool, as well as discussions were made in Sect. 4.
2 Literature Survey Twitter information for the function of locating quakes in real time and also furthermore sending out warns quicker than the Japan Meteorological Agency. This system utilized an SVM classifier to get rid of meaningless tweets, as well as additionally a temporal evaluation probabilistic style was created. The Poisson treatment is the core of the minute layout, which is utilized to approximate the min in time when the quake took place in real time. [1] Twitter information which contains the context of the 3 quakes that happened on the Sumatra coastline of Indonesia from 2010 to 2012 was made utilizing. [2] Produced the quake information system in Italy. I originally had a look at tweets as well as additional actions from Twitter. To filter pointless information, a complex filter was made use of based upon taxonomy in the URL, references, words, individualities, spelling, as well as colloquial/aggressive words. For temporal evaluation, they developed a pulse detection strategy, which noted the diversity of messages during windows. The limitation of [3] as [2] is that a set of properties requires that they be defined for input early. As a result, the efficiency of these exploration systems depends on how exactly the effective attributes are constructed. CNN has currently come to be a famous modern-day innovation for managing category jobs. The impulse “behind CNNs” is that the "convolutional layer" can immediately locate a far better representation of message details, in addition to totally attached layers that inevitably identify this depiction based on a set of tags [4] and also [5] address concerns associated with finding events in sentence
Event Detection on Social Data Streams Using Hybrid-Deep Learning
199
scores using CNN based on word embedding. [4] The merged layer says according to event stimuli as well as discussions in order to get rid of the most important information. While [5] CNN uses many networks (including words and position in addition to the entity) to select the appropriate trigger, unlike the above functions, it used methods the tag network has recently been more challenging images [6, 7] Managed approaches [8, 9] have in fact additionally gotten the sub-event discovery job. These strategies usually control graph-based structures or TF-IDF weighting systems, in our research, after data processing as well as additional cleaning procedure, we will definitely advise a crossbreed style in between CNN formula, LSTM with disclosing end results after the training process as well as comparison with some previous job [10]. Experiments were examined and also evaluated in two various datasets entailing the incident of an unreliable news event from a massive collection of tweets. They utilize various versions of the regularity band (TF) along with TF-IDF which simply put works by grouping the duplicating words and after that determining the frequencies for every term and offering them a details weight. Its main function is to extract features in addition to extra classifiers for SVM, KNN, DT, and some equipment finding out formulas, as well as LR classifiers. Variation drove by theory while scientists [11] study prevalence-based functions or content-based demographics to discover an entirely new event. In order to identify an event before it spreads, informational web content was evaluated at the level of vocabulary, grammar, semantics, and conversation as well. Attributes of how a new occasion is detected and distinguished, as well as the effect of information on circulation, were also studied. They reviewed SVM, RF as well as LR, as well as NB on a dataset (Event News Net), adapted from the official legal website. The joint research study was reviewed [12]. They discuss information for time collection, in which occasion discovery describes variations of common network habits to disclose a brand-new information aim that departs substantially from the understanding design. The lengthy cache network is a frequent semantic network that is educated making use of backpropagation over time in addition to removing the loss regression issue which is a restriction of the recurrent neural network. We used an LSTM event detection design [13]. They searched the learning design tools where they used (SVC) algorithms, accuracy = 0.81, while they used (Multinomial NB) formula, accuracy = 0.90 along with using (Random Forest Classifier) formula, accuracy = 0.86 with two of the other formulas for variance purpose as well as they used the (hate_Speech) Dataset which was launched on the Kaggle tiled. Taking advantage of (BiRNN) you can get one of the most reliable results, unlike other technologies they have already used. The end results show that great efficacy is gained with information when it is little, and good results can be obtained by taking advantage of deep learning when we use more data for our experiments [14]. checked out the likewise content-based as well as propagation-biased attributes of phony details. In order to detect bogus info before its reproduction, they provide a thorough assessment of the structures in addition to top qualities of additional content-biased as well as propagation-based methods. Attributes relevant to deception/disinformation as well as clickbait, as well as the effect of information circulation was researched. They evaluated SVM, RF, LR, and NB on the “Fake News Neti” dataset [15]. Researchers suggested their crossbreed variations of an automatic
200
M. A. Mohammed and N. M. Shati
approach for short-lived flooding projecting in the “Lien River, Poland, using hyper atmospheric variables” where they used a manufacturer finding out a method based upon a transformative formula to quickly select manufacturer finding out versions as well as likewise tune hyperparameters as well as additionally include independent layouts in teams. Their “model” has actually been confirmed on 10 hydro gauges for 2 years [16]. Created an MLP as well as an extreme gradient enhancement version to anticipate glacial traffic jams with data from 1983 to 2013, in the Warta River, Poland. They utilized water as well as likewise air temperature levels, river circulation, as well as additional water level as inputs to their designs, disclosing that both devices’ understanding approaches use attractive outcomes. In Canada, the outcomes were truly enough.
3 Proposed Methodology Finding a discovery requires a great deal of testing and studies using multiple techniques to find results in a wide range of information sets. Story settings should gain a broad understanding of the new nature of the occasion as well as the ways in which it is spreading around the world. Today’s work incorporates these trends by recommending a design prejudiced on totally brand-new innovations that expose the value of deep knowledge releases to the task of event exploration and Classifications. More specifically, it provides a set of Convolutional Neural Network (1DCNN) along with a continuous semantic network (LSTM), which enhances the efficiency of the proposed information flight, method. Semantic networks are among the most common uses of a type of computational technology inspired by the neuron of the human brain today. Their performance works in many recent classifications in each field. In this task, occasion detection in the social information flow is implemented as a session function, making deep semantic networks hassle-free. In particular, deep understanding techniques, along with 1DCNN networks, have already been verified to be remarkably reliable, in terms of understanding how the proposed model works and how to extract as many satisfactory results as possible, etc. In order to cover all facets of the research study in addition to validate the stability, stability, as well as impartiality of this research study, the job is devoted to the complying with.
3.1 Neural Network There are various requirements that affect the efficiency of the semantic network, such as booting the weights, prejudices, activation features used in each layer, optimizer, loss function, and so on. The last layer is the assembly layer, after carrying out the smoothing process, so that the outcome of multiplying the matrices is one dimension after it is relocated to the full link, which gives the results. To lower the mistake
Event Detection on Social Data Streams Using Hybrid-Deep Learning
201
in between the computed outcome as well as additionally the chosen end result, a semantic network uses an optimizer.
3.2 Embedding Word When handling message categories as well as semantic networks, an input message requires that it take a vector or mathematical pattern of the range to guarantee it can be fed to the Network. Words in a “Tweet” can be determined as Vectors, which are “called word vectors” with each word having a unique word vector that determines it from the rest of the words, these word vectors are furthermore called word embedding. The wedding event party events are reported by a huge team, which is usually language or Domain-specific in order to make it feasible for the wedding celebration occasion vocabulary to uncover the practical consistencies of all words in the team. Unlike training weddings, it is also practical to make use of her pre-trained wedding ceremonies freely on words. In our research, we used the implicit word code to distinguish words where the preponderance of words used in the training field was calculated and then compared with the file (glove.6B.100d) which was downloaded from the kaggle website, which contains many word weights, and this in turn reduces the time instead of what is done according to the weight of each word. The words are compared, and the weight is fixed directly.
3.3 Convolutional Neural Network Convolutional Neural Networks for one dimension (1DCNN) consists of the leisure of matrices that offer outcomes to integrate for the additional training therapy. That is why this sort of “Semantic Network” is called a (complicated) semantic network (Fig. 1). These word vectors would like to be used to enlighten 1DCNN. Training is performed by selecting a bit gauge plus a selection of filters. 1DCNN can be multidimensional. One-dimensional CNN (Convu 1D) is used frequently when it comes to message classification or NLP. Convolutional1D is concerned with the one-dimensional classes that refer to word vectors. In CNN, the care filter iterates the size of the house window through the training information, leading to each task optimizing the input Using filter and product weights as well, the final result is held in determining the final result. The filter dimension is defined as the bit dimension as well as the set of filters that determine the choice of job maps that will be utilized. By doing so, 1DCNN can be leveraged to uncover neighborhood tasks acquired straight from the training information.
202
M. A. Mohammed and N. M. Shati
Fig. 1 Illustrating the operation of the 1D convolutional network
3.4 Recurrent Neural Networks (RNN) Due to the fact that the task’s result is used as input for the task each time it is performed in the list below, the method is known as recursive. The loophole is liable for determining that there is no loss of info throughout the cascading process. The activity and job of a solitary cell in an integrated fashion from the input stage to the application phase and also after that the result procedure where there is an input entry for the existing input, an end result gateway for worth projection, along with in addition a negligence entrance that is taken advantage of to acquire eliminate unplanned details (Fig. 2).
Fig. 2 Explanation of the LSTM cell simplified
Event Detection on Social Data Streams Using Hybrid-Deep Learning
203
3.5 Proposed Hybrid Model (1DCNN and LSTM) The result of the CNN layer (i.e., function maps) is the input of the RNN layer to the LSTM modules/cells that conform to it. The RNN layer makes use of the regional attributes drawn out by the CNN as well as likewise discovers the lasting dependences of the neighborhood qualities of newspaper posts that identify them as brand-new or otherwise (Fig. 3). Benefiting from their capacity to spot scene features making use of 1DCNN along with consecutive functions utilizing LSTM. When it comes to NLP functions, RNNLSTM can detect time as well as additional context functions from notification, along with capturing permanent associations between message entities as well as vital properties, which are found using 1DCNN’s ability to handle spatial relationships. Despite the benefits, versions (deep learning) have useful limitations, such as the problem of finding appropriate hyperparameters from each problem as well as the dataset [17], the need for big training, as well as likewise the absence of interpretability, which has a straight impact on its efficiency in brand-new company. Also unidentified, in fact, the success of the CNN-LSTM blend has been verified in several classes and also regression functions considering that it has the ability to capture both regions as well as the sequential premises of the input information. In addition to maximizing their ability to find the characteristics of the scene using 1DCNN and also the sequence characteristics using LSTM, where it is benefited from CNN that it deals with high weights, while LSTM deals with low weights, and thus, we benefit from the hybrid model in the balance of weights. When it comes to NLP functions LSTM can find time as well as context features from text, as well as record longterm relationships between text entities and important functions as well, which is revealed using 1DCNN’s ability to manage spatial collaboration despite its advantages. Differences in deep understanding have particular useful constraints, such as the problem of discovering the optimum hyperparameters for each and every trouble and also the data set, the orders for large training datasets, and also the inability to interpret, which has a direct impact on their efficiency on brand-new tasks, as well as tasks unspecified, makes it act like an oracle black box. Current developments in biology-inspired approaches make it possible for the enhancement of deep knowing demands along with type the basis for the future generation of expert system optimization solutions. The recommended hybridization design can benefit greatly from hyperparameter maximization combined with commitment to function in the region to consider the numerous techniques inspired by previous research studies as well as to find one of the most suitable for the current task.
Fig. 3 Proposed design hybrid between (CNN and LSTM)
204
M. A. Mohammed and N. M. Shati
3.6 Pre-processing Stage Pre-processing the proposed design is next action. The most crucial procedures at this stage that we have actually executed are (Text Normalization, Remove Stop Words as well as Lemmatization), and afterward, the tokenize phase. The initial stage in the pre-processing phase is Text Normalization, with which we erase all parentheses of all kinds, and after that, we convert uppercase letters to lowercase by transforming the kind of numbers from digital to letters. In the 2nd phase, Remove Stop words, and also via it, we delete all the linking devices in between sentences and also the last stage utilized in our research. It is Lemmatization. The derivation essentially includes Lemmatization of words in their origin or in their base type, and then, we utilize the tokenizing procedure, which is the last action in reprocessing. It assists us to break textual material right into unique words.
3.7 Extraction Feature Stage In the suggested version, the 3rd stage is function removal; likewise, the attribute of this phase is to remove one of the most vital words that help in ranking preparation furthermore to different mad words or noticeable words from the gone into tweets. In this work, numerous techniques were used to draw out words from them (Count Vectorizer) as well as likewise embed words in addition to max_words where the optimal measurement of words is browsed with (“word_tokenize”, “Count Vectorizer”). In order to take benefit of message info for modeling forecast, the text needs to be parsed to eliminate particular words as it is specified in the pre-processing location as well as these words should after that also be videotaped as integers, or a good-valued floating variable, to be utilized as inputs to classification equations. The handcrafted coverall was offered with a dataset of 5 billion words or signs utilizing a vocabulary of (4,000,000) words as well as wedding events preferences in a variety of measurements.
3.8 Attribute Extraction Stage The proposed comparison is content of 5 layers, being composed of (input, installed, as well as outcome layers). The LSTM layer keeps a total of 3 sets of requirements, match input entrance, end result entrance, neglect gate, as well as filter state. A selection of requirements can be developed that disregard bias and also need to be stuck to: w = × 4 + × 4 + x + × 3, where the variety of memory cells (= 49), is the variety of input gadgets (= 64) in enhancement. A consecutive diagram includes a variety of layers of nerve cells:
Event Detection on Social Data Streams Using Hybrid-Deep Learning
205
• The first layer of the semantic network is the TensorFlow inflection layer. This is the input layer where the pre-trained word mixes are utilized by providing the ready embed matrix in addition to the expertise variant by feeding directly right into the training details. The following layer is a one-dimensional CNN layer (Conv1D) to remove regional attributes making use of 64 filters from 3 dimensions. The default fixed direct instrument activation function (ReLU) is utilized. • Then, the huge feature vectors produced by the 1DCNN are aggregated by putting them directly into the Max. Pooling layer with the size of the primary window (2) to minimize circumstances vectors as well as lower undesirable spec and likewise for factor estimations without affecting the performance of the network made use of. The accumulated attribute maps are fed straight right into the following (LSTM) layer. The default straight activation function (for example (F(xi) = x) from TensorFlow is utilized in this layer. The dependent function visitors are defined as making use of a thick layer that decreases the dimension of the outcome area to 1, which matches the tag of a course that utilizes this layer’s activation feature reaction stage Sigmoid while preserving reduced weights.
4 Results of the Proposed Event Detection System The event detection system within the recommended social media network intends to compare brand-new tweets; in our research, we utilized a dataset (hate speech) [17] where they are classified as (hate or clean) and also select the ideal method to remove the feature, which assists in the category model. To much more accurately diagnose tweets, to acquire these prior goals, the proposed system was very initial to use the feature elimination code vector computation technique along with then we utilized the particular word concept and afterward contrasted and also assessed the end results for DL branches, based upon the accuracy well worth to reveal the extremely ideal tweet input classification style. The outcomes of each action of the suggested occasion detection system are made clear in the subsection listed here:
4.1 Load Dataset The proposed event detection system uses a dataset, Hate speech dataset, which was downloaded from the official website of kaggle, which consists of two files, one of which contains (31,961) tweets used in the field of training and the other file contains 17,197 used for the purpose of testing (Table 1).
206
M. A. Mohammed and N. M. Shati
Table 1 Shows descriptions of the data set Hate_Spech_Dataset Field name
Training
Testing
Id
Serial number
Labels
Tweet category: 1-Hat_Tweet and 0-clear_Tweet for training
No label for testing
Tweets
The tweet material
The tweet material
No. of records
31,961
17,197
Table 2 Evaluation of models hyper (1DCNN and LSTM) Hyper (1DCNN and LSTM)
Accuracy
Recall
Precision
F1 score
Confusion matrix
93.01
0.947
90.66
0.481
937(TP)
0(FP)
456(TN)
0(FN)
Table 3 Comparison with previous works Model (Jiang and Suzuki [13])
Proposed model
Accuracy
F1 score
LSTM HD
87.00
38.36
LSTM LD
89.22
40.62
LSTM stacking
87.15
39.90
GRU stacking
88.56
39.68
Hyper(CNN-LSTM)
93.01
48.15
4.2 Results of Classification Stage The proposed model is Hyper between (CNN and LSTM), shown in Table. 2, after conducting the training and testing process on the data. Table 2 shows the results of accuracy. F1 score, recall, and Precision, and in comparison with some other works that used the same dataset shown in Table 3, where the results showed accuracy (93.01), and this proves that the results of the hybrid model give better results than the results of previous work.
5 Conclusions In this research, we provide a crossbreed model-biased classifier (“CNN and LSTM”) in which 1DCNN serves as the “pre-processing” of raw information for occasion discovery utilizing the LSTM design’s capability to time-series information. To show
Event Detection on Social Data Streams Using Hybrid-Deep Learning
207
the prospective power of the recommended method, 2 neutral network-based components in our system were additionally reviewed in 2 usage situations (binary category, several categories) and also 2 sorts of time-collection information (artificial, genuine information) specifically. Our strategy uses deep discovering techniques to adjust prolonged domain names where functions are found out throughout training, and also, it revealed the precision of the removed outcomes for the crossbreed design = 93.01, which is much better contrasted to the previous job that was attended. In the future, we will certainly integrate even more details such as geographical details and also belief evaluation right into occasion exploration as well as increase our study to consist of various occasions. Acknowledgements The authors would like to thank Mustansiriyah University (http://uomustans iriyah.edu.iq) Bagdad—Iraq for its support in the present work.
References 1. Chatfield AT, Brajawidagda U (2013) Twitter early tsunami warning system: a case study in Indonesia’s natural disaster management: In: 2013 46th Hawaii international conference on system sciences. IEEE, pp 2050–2060 2. Avanti M, Cresci S, La Polla MN, Marchetti A, Tesconi M. (2014, March). : Earthquake emergency management by social sensing:. In 2014 IEEE international conference on pervasive computing and communication workshops (PERCOM WORKSHOPS) (pp. 587–592). IEEE. 3. Sakaki T, Okazaki M, Matsuo Y (2012) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 4(25):919–931 4. Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks:. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 1: Long Papers, pp 167–176 5. Nguyen TH, Grishman R (2015) Event detection and domain adaptation with convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol.2: Short Papers, pp 365–371 6. Wang Z, Zhang Y (2017) A neural model for joint event detection and summarization. In: IJCAI 4158–4164 7. Xing C, Wang Y, Liu J, Huang Y, Ma WY (2016) Hashtag-based sub-event discovery using mutually generative lda in twitter. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, no 1 8. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, pp 851–860 9. Meladianos P, Xypolopoulos C,Nikolentzos G, Vazirgiannis M. (2018) An optimization approach for sub-event detection and summarization in twitter. In: European conference on information retrieval. Springer, Cham, pp 481–493 10. Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text classification. Secur Privacy 1(1):e9 11. Zhou X, Jain A, Phoha VV, Zafarani R (2020) Fake news early detection: A theory-driven model. Dig Threats Res Pract 2(1):1–25
208
M. A. Mohammed and N. M. Shati
12. Nguyen DT, Jung JE (2017) Real-time event detection for online behavioral analysis of big social data. Future Gen Comput Syst 66:137–145 13. Jiang L, Suzuki Y (2019) Detecting hate speech from tweets for sentiment analysis. In: 2019 6th International conference on systems and informatics (ICSAI), pp 671–676). IEEE 14. Zhou X, Jain A, Phoha VV, Zafarani R (2020) Fake news early detection: a theory-driven model. Dig Threats Res Pract 1(2):1–25 15. Sarafanov M, Borisova Y, Maslyaev M, Revin I, Maximov G, Nikitin NO (2021) Short-term river flood forecasting using composite models and automated machine learning: the case study of lena river. Water 13:3482. https://doi.org/10.3390/w13243482 16. Graf R, Kolerski T, Zhu S (2022) Predicting Ice Phenomena in a river using the artificial neural network and extreme gradient boosting, resources, vol 11, pp 12. https://doi.org/10.3390/res ources11020012 17. https://www.kaggle.com/vkrahul/twitter-hate-speech(data link)
Automated Machine Learning Deployment Using Open-Source CI/CD Tool Ashish Singh Parihar, Umesh Gupta, Utkarsh Srivastava, Vishal Yadav, and Vaibhav Kumar Trivedi
Abstract This paper proposes the practice of applying the culture of DevOps with machine learning to build and deploy the model rapidly and seamlessly. Without leveraging the power of DevOps and without adopting it, no industry could think of surviving today. Today, industries are dependent on old data and fetching the important data out of it, making models, and training it that help them grow their companies. That practice we call it today is ML. So, there is no possibility of surviving without being quick and ML; however, ML models take time to be developed, so to overcome that issue of time, MLOps plays a vital role today in the industries. Now, they could build it fast and grow fast. The machine learning processes initially seem easy, but if not carefully handled and designed, then creating and deploying such models may lead to huge time loss and resources. Hence, the overall performance of the system would be degraded, as well as the efficiency. This paper presents an applicable model of continuous open-source integration (CI) and continuous delivery (CD) principles and tools to minimize time wastage during system resourcing. Throughout our methodological results, we observed that the model improves the time efficiency, reduces the efforts of data scientists and cost-cutting after avoiding using heavily paid MLOps tools. Keywords MLOps · DevOps · Jenkins · CI/CD · Machine learning · Automation
U. Srivastava · V. Yadav · V. K. Trivedi Department of Computer Science, KIET Group of Institutions, Delhi-NCR, Uttar Pradesh, Ghaziabad, India e-mail: [email protected] V. K. Trivedi e-mail: [email protected] A. S. Parihar (B) Department of Computer Science & Engineering and Information Technology, Jaypee Institute of Information Technology, Noida-Sector (62), Uttar Pradesh, India e-mail: [email protected]; [email protected]; [email protected] U. Gupta Department of Computer Science and Engineering, Bennett University, Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_19
209
210
A. S. Parihar et al.
1 Introduction In today’s era, machine learning (ML) [1] has become crucial for industries, and so did the enormous amount of data to deal with it. By replacing all the older methods [2] of project development, for example, the waterfall model, DevOps came, and project development became fast. DevOps [3, 4] stands for development and operations. It is a culture that aims at bringing the synchronization between development, quality assurance, and operations teams into a single, continuous set of processes. It is a combination of cultural philosophies, tools, and practices and combines development and operations to increase security, speed, and efficiency. MLOps [5–7] is a short term for ML and Operations. It is a sequence of automated steps to deploy the ML model reliably and efficiently. By using an MLOps approach, the data scientists and data engineers could increase the speed of model development and deployment through Continuous Integration and Continuous deployment with proper validation, monitoring, and governance. Many software has been developed in the last few decades. One among them is Jenkins which revolutionize the automation methodology. So, machine learning without implementing automation would take a long amount of time, and each and every time we decide to change a small parameter, we would train and deploy it again and again, manually. Whereas using Jenkins, it can automate our workflow in a snap, and the deployment process will be a single button click away. DevOps continues to gain popularity. Every teach giant is completely depended upon integration and automation [8, 9]. They are totally dependent upon automation and integration, and its lifecycle has been shown in Fig. 1. This figure shows various phases of DevOps lifecycle, from planning to creating software, packaging, releasing, and monitoring; this process keeps continuing and hence the DevOps symbol looks like an infinite symbol. Think of Netflix Meta, if they do not use automation and integration, they will lose their identity. These platforms which are distributed in nature have very powerful artificial intelligence (AI) and ML model that give us recommendations. Complex distributed resource-sharing schemes [10–12] are designed for such systems [13, 14]. There are millions of users who go online every second on Meta, and it handles all the load smartly and reliably by leveraging the power of DevOps and automation. So, automation is a crucial part of the growth of any company. Since the market is very fast, and to capture the market, one must get into the market first, then anybody else. MLOps is the tool that helps us to move to the market in a low operational cost and in very less time. Keeping the requirements of data scientists in mind, we developed this model to reduce their automation challenges. Hence, seeing that how crucial automation is, how important it is for the tech giants for get going with speed, this project is done with a vision to accomplish automation in machine learning models. The objective is to make the automation very smooth and easy for everyone either IT giants or a normal developer. Increasing
Automated Machine Learning Deployment Using Open-Source CI/CD …
211
Fig. 1 DevOps lifecycle
demands of speed make it very important for developers to know automation, and leverage it, this project sets a vision, how to automate and make the process fast and smooth without putting much investments and efforts. It reduces time to development and hence time to market. The rest of the article is organized as follows: In Sect. 2, we discuss the literature survey regarding this area in a compact manner. Section 3 defines our methodology that consists of machine learning rate definition of the model along with the neural layer discussion with impactful results. Finally, the conclusive remark about this article is presented in Sect. 4 with its associated future aspects.
2 Background Jenkins is a Continuous Development (CI)/Continuous Delivery/Deployment (CD) tool. It is an open-source automation tool that enables the developer to build and deploy reliably and efficiently. It is a server-based system that runs on default port number 8080. It provides more than 1700 plugins that enable it compatible with any tool and continue to grow as a continuous solution for software process automation: continuous integration/deployment. A pipeline is a collection of events or jobs which are interlinked with one another in sequence for implementing CI/CD Jenkins pipelines being used. For implementing the Jenkins pipeline, groovy language [15] is used. Kubernetes [16] is a container orchestration tool that automates the deployment, scaling up, scaling down, and management of applications deployed on containers. Kubernetes is a great tool to deploy machine learning models for end-user. It is very efficient and helps models scale efficiently independent of their dependencies. Using a web service for prediction is the simplest way to deploy a machine learning model. While exploring the existing work in this domain, we observed the researchers proposed the various practices of DevOps to machine learning development by addressing the machine learning lifecycle and we highlighted three main works among the entire pool of study for a quick reference.
212
A. S. Parihar et al.
Karamitsos et al. [17] proposed a model and the main objective is to bring together the various principle of Continuous Integration and Continuous deployment in a coherent way for the complex Machine Learning Pipelines. They discussed about the importance of integration of ML with DevOps, and he discussed a robust pipeline which would be responsible for training and delivery of complex ML trained and tested model. It improves the time to deployment, and hence, the time to market of the product increases code and deployment quality, productivity, and visibility. Liu et al. [18] in their paper proposed an OCDL platform that is entirely built using open-source tools, that eventually provide both the runtime and integrated development environment (IDE) for operations to develop the model, approve the result, evaluate release models, and eventually deploy models to the host applications and with the help of CI/CD approach they proposed the runtime and operation platform. Mysari and Sriniketan [19] states that Jenkins with pipeline is a methodology for integrating and building. They discussed how crucial and important the Jenkins is for big companies in the field of automation. It is a very powerful CI/CD tool nowadays. It is one of the best automation tools. Jenkins is easy configure, open-source, userfriendly, platform-independent, and very flexible. Based upon the conclusion, we come up with to integrate Machine Learning with CI/CD to bring the automation. Their work discussed the pipeline how to deploy ML code using Ansible, Jenkins, and jfrog. They build a complete Jenkins job pipeline to automate the task of deployment (Table 1). Table 1 Representation of already existing models Algorithm Methodology
Performance evaluation
Applications
[16]
In this, a model is built by integrating the principles of CI/CD tools coherently for ML pipelines
It is an OCDL tool that is built using CI/CD, that provides runtime, and IDE (Integrated Runtime Environment) for operations to build and approve the model
The model uses open-source CI/CD tool and leverages the power of it
[17]
OCDL platform
Approving the result and releasing evaluation
Deployment of models
[18]
Used Jenkins as a CI/CD tool to build an automation process for deploying ML model
It is an open-source, user-friendly, platform-independent, saves a lot of time, and also helps the developer to build and test in the fly
The model uses Jenkins as an open-source tool for CI/CD and Kubernetes for the deployment
Automated Machine Learning Deployment Using Open-Source CI/CD …
213
3 Methodology The methodology we will follow is CI/CD using Jenkins. ML manual pipeline methodology is very popular and widely used nowadays in the study of use cases. Every single step here is manual, including from data analysis to data preparation, and model training, and validation. When we talk about the architecture of deep learning, it consists of various hyperparameters. Hyperparameters are variables that we need to be set before applying a learning algorithm to a dataset and those are used to control the learning process. Hyperparameters are specified by data scientists. These hyperparameters are divided into two categories: (1) • • • (2) • • (i)
Optimizer Hyperparameters Learning rate Number of epochs Batch size. Model Specific Hyperparameters No. of hidden layers Number of layers. Learning rate [20, 21] is a parameter that controls how much to change the model corresponding to the calculated error every time the weights of the model are updated. With reference to Fig. 2, it depicts three zone of the model, firstly learning rate is extremely low hence accuracy is not improving, secondly it called optimal learning rate range, we can see quick drop in the loss function, and increasing the rate further we reached in the third phase, where we see diverging loss, we could say we need to choose the best learning rate for the perfect accuracy. It is the most important hyperparameter while configuring our neural network (NN). Let us define the basic terms related to NN first.
Fig. 2 Variations with learning rate
214
A. S. Parihar et al.
(ii) Epochs refer to the training of NN for one cycle. In reference to Fig. 3, it shows how train and test accuracy is varying with the variation of Epochs, initially accuracy is increasing with increasing epochs, slowly accuracy is decreasing, so we can conclude that selection of perfect epoch is crucial. One epoch may have one or more batches. Batch size is the number of samples that is being processed before the model is updated. (iii) Batch Size generally refers to the number of training models utilized in one iteration. In this study, we looked into how the size of the batch affected the training model. In the most recent deep learning systems, batch size is one of the most significant and vital hyperparameters to consider. Most of the time, data scientists prefer to train their models with larger batches because it speeds up GPU computations. Larger batch sizes, though, might result in inaccurate generalization. Since NN systems are incredibly susceptible to model overfitting, it is unclear how the availability of “noise” in smaller batch sizes contributes to greater generalization. Practically speaking, figuring out how many epochs are needed to get the highest level of accuracy is quite laborious, and data scientists would have to use hit-and-trail or their best guess. Referring to Fig. 3 and Fig. 4, they demonstrate how the model’s test and training accuracy vary with batch size, respectively. The y-axis displays accuracy, while the x-axis displays the epochs and various batch sizes, which are displayed in various colors. It displays, in brief, the test accuracies of our NN model that was developed using various batch sizes. So, we draw the conclusion that choosing the ideal batch size is highly important. • Blue color curves: batch with size of 512. • Orange color curves: batch with size of 128. • Purple color curves: batch with size of 2048. In light of Figs. 4 and 5, we came to the conclusion that larger batch sizes (for testing and loss) would typically result in lower test accuracy. The number of training epochs is depicted on the graph’s x-axis. To train and test our research, we are using Fig. 3 Variation with epochs
Automated Machine Learning Deployment Using Open-Source CI/CD …
215
Fig. 4 MNIST dataset model training accuracy and loss (with batch size)
the MNIST dataset [22]. Dealing with the dataset is simple. With just our very simple ML model at batch size 128, we could get the perfect accuracy nearly even 100% train and approximately 98% test accuracy. Here, the relationship between batch size and asymptotic test (and train) accuracy variation was quite easy to see. Our initial finding was that larger batch sizes reduce asymptotic test accuracy. Referring to Table 2, it demonstrates how accuracy varies with batch size; initially, it increases with batch size, but after passing a certain threshold, it decreases. These patterns are most extreme for the MNIST dataset; however, in our case, we tried batch size equals 2, and surprisingly, we were able to achieve accuracy even better than the test accuracy of 99% (versus 98% for batch size 128).
Fig. 5 MNIST dataset model test accuracy and loss (with batch size)
Table 2 Representation of variation of accuracy with batch size
Batch size
Accuracy (%)
128
98
512
96
2048
94
216
A. S. Parihar et al.
3.1 Effect of Learning Rate with Batch Size Regarding Figs. 6 and 7, they, respectively, show training loss and test accuracy and test loss and test accuracy. X-axis shows learning rate, and y-axis shows accuracy, and the variations with batch size are being showed in colors, through it we are watching the importance of learning rate and batch size as hyperparameters together. The test accuracy which was lost from a larger batch size can be recovered by increasing the learning rate. We can compensate for larger batch sizes by increasing the learning rate, which is also supported by some optimization literature. That said, to recover the lost asymptotic test accuracy, we ramp up the learning rate of our model. • Orange color curves: batch with size of 128, rate of learning 0.02. • Purple color curves: rate of learning 0.02, batch with size of 2048. • Blue color curves: batch with size of 2048, rate of learning 0.2 The purple and orange curves in Figs. 6 and 7 are copies of the curves in the preceding set of images and are included for reference only. The blue curve trains using a huge batch size of 2048, just like the purple curve. The blue remedy has
Fig. 6 MNIST model training accuracy, loss with learning rate and Batch Size
Fig. 7 MNIST model test accuracy, loss with learning rate and Batch Size
Automated Machine Learning Deployment Using Open-Source CI/CD …
217
Table 3 Representation of variation of accuracy with batch size and learning rate Learning rate
Batch size
Accuracy (%)
0.02
128
98
0.02
2048
95
0.2
2048
98
a tenfold higher learning rate than the purple curve, in contrast. Fascinatingly, we can restore the test accuracy that was lost due to a bigger batch size by speeding up learning. Using a batch size of 128, which is depicted in orange, yields a test accuracy of 98% whereas a batch size of 2048 yields just a 96% accuracy. However, when we raise its learning rate, the batch size of 2048 also obtains a 98% accuracy. We might draw the conclusion that choosing the optimum learning rate and batch size is highly important by looking at Table 3, which illustrates how model accuracy varies with the same Learning rate and different batch sizes.
3.2 Neural Layers Artificial Neural Network (ANN) [23] is inspired by a biological system, also useful in blockchain applications [24]. In a computer, it is used in a form of a layer. Generally, Neural Networks only have three types of layers: Output, Input, and Hidden. Neural networks have two important hyperparameters that regulate the architecture of the network, i.e., the number of neural layers and the number of nodes in each of the hidden layers. Data scientists must specify these values when creating a model. The best way to configure these hyperparameters is by predicting via systematic experimentation. More the layers imply more the capability of the model to predict. In reference to a good model with perfect accuracy, all the above-mentioned hyperparameters need to be configured based upon the predictive approach and the experience of data scientists. Every time model is trained and accuracy is compared, the model may or may not produce the desired accuracy. Then, data scientists would do changes like increasing the number of epochs, adding additional Neural Layers, or changing the learning rate. If all these tasks are needed to be done manually, it would consume lots of time for Data Scientists. With reference to Fig. 8, it shows the workflow of traditional ML workflow, dataset is being analyzed, then model is being created, and accuracy is being monitored, and if undesirable accuracy comes up, model is remodeled and retrained, continuously until desired accuracy is achieved. Consequently, to resolve this issue of manual approach, we proposed our model to fully automate all the above processes with CI/CD tools.
218
A. S. Parihar et al.
Fig. 8 Traditional workflow of deep learning model
3.3 Our Proposed Model With reference to Fig. 8, which shows a traditional lifecycle of ML, it is very timeconsuming to solve the ML problems traditionally. To solve the drawback of Manual Automation, we have automated the pipeline using DevOps, Fig. 9, which shows our approach to automate the ML lifecycle, how to be agile in every phase from collecting the data to creating the model, to training and re-training until desired accuracy is achieved using open-source tools like git, Jenkins, and Docker. Using Jenkins, this model is created. There is a chain of jobs for training. Here, the machine learning model is trained using Jenkins tested and governed continuously. If a model has a lesser accuracy at the initial stage, then the model will reset its hyperparameter and will be trained. It is a continuous process going on until the model reaches the desired accuracy, and finally, the model will be deployed.
3.4 Task of Data Scientist With reference to Fig. 9, the data Scientist firstly creates a Docker file to create a customized Docker image, image is just like an environment to train the model with all necessary libraries installed, so that no dependency problem would occur while training the model on another system. Secondly, he pushes the code on GitHub along with the Docker file and the tweak file. Tweak file contains predefined values of hyperparameters. The values of hyperparameter in the original code are being reset using the tweak file in case of undesired accuracy.
Automated Machine Learning Deployment Using Open-Source CI/CD …
219
Fig. 9 Our proposed workflow of machine learning using CI/CD
3.5 Task oF CI/CD When the data scientist creates an ML model and pushes it over GitHub. The hooks trigger the Jenkins, and Jenkins starts processing its pipelines. The first job pulls the code from GitHub and stores it in a repository of the local system. The second job builds the Dockerfile and pushes it to the Docker Hub. The third job launches the Docker Container using the previously created Docker image. Upon completion of the third job, the fourth job triggers in the queue. The fourth job trains the model inside the container and eventually saves the accuracy in the accuracy.txt file, upon its completion fifth job triggers. The fifth job matches the accuracy whether it is equal to our desired accuracy. It is desired then deploys the model using Kubernetes in the sixth job and sends a notification to the developer. Otherwise, hyperparameters are reset according to the tweak file, also more NN layers are added to the model. Finally, the model goes to the fourth job and trains the model. The process continues until a perfect model is created with the desired accuracy.
3.6 Industrial Significance of Proposed Model As the automation became a crucial part of every industry in one way or other, either it is food industry, clothing, or IT. Therefore, with the launch of CI/CD tool in the
220
A. S. Parihar et al.
market it changed the fate of the IT automation. Sometimes releasing software could be very painful and time-consuming, that would take weeks of manual integration, configuration, and testing, but CI/CD enables organizations release software more frequently without compromising the quality. With the help of pipelines, we have never to do repetitive build, test, and deployment. CI/CD is beneficial in various aspects, it allows us to integrate small pieces of code at one time, due to smaller size they are easy to test with than the huge chunks. It provides fast MTTR (mean time to resolutions), MTTR measures the maintainability of repairable features, simply it helps us to track the amount of time to recover from failure. It provides more test reliability, faster release rate, smaller backlogs, easy to maintain, reduction in cost, and customer satisfaction. With respect to our proposed methodology, we have created MLOps platform using Open-source CI/CD tool, and we have created a pipeline where a data scientist has no need to invest huge amount in training and re-training a ML model. It saves a lot of time of data scientist also; development is fast and so the time to market of the product is very less. It is tested now for small use cases, but very efficient and viable to implement.
4 Conclusion Initially, when we were using traditional approach for Machine Learning, it would take huge amount of time to train a single model and lots of efforts for developers. Machine learning model is quite complex, and it requires extensive work to extract data, configuration, and training. Therefore, agile values and DevOps culture are highly recommended to provide the customer value, improve model quality, and improve wastage. As discussed in the paper to train a model, we need to define all the hyperparameters based upon experience or hit-and-trail, and we saw how crucial they are for a ML model. It became a tedious task for developer, as well as highly time-consuming. We realize how crucial it is to utilize the power of DevOps, for speedy development of Machine Learning model. After going through all the available tools in market that integrate ML with DevOps (MLOps tool), we came to realize utilizing the open-source tools that are free and reliable and make our own MLOps platform that is free and we need not to pay for it. In this paper, we presented an ML pipeline using a CI/CD pipeline based upon the DevOps approach. It is a model that would help data scientists to save time and money both together. We integrate Machine learning with Jenkins pipeline methodology to train the model. It is easy to configure, reliable, user-friendly, platform-independent, and time-saving. This model is entirely formed using open-source tools. The growing demands of ML today gave birth to MLOps and make us believe in the success of our proposed model. This is the model that is being tested and worked on small scale, we are expecting it to be working for models on large scales, and it is reliable, open-source, and user-friendly. For our future researches, we are looking to make it reliable on large scale for large ML problems.
Automated Machine Learning Deployment Using Open-Source CI/CD …
221
References 1. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science (1979) 349(6245):255–260. https://doi.org/10.1126/science.aaa8415 2. Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59. https:// doi.org/10.1016/j.infsof.2011.09.002 3. Ebert C, Gallardo G, Hernantes J, Serrano N (2016) DevOps. IEEE Softw 33(3):94–100. https:// doi.org/10.1109/MS.2016.68 4. Leite L, Rocha C, Kon F, Milojicic D, Meirelles P (2020) A survey of devops concepts and challenges. ACM Comput Surv 52(6):1–35. https://doi.org/10.1145/3359981 5. Alla S, Adari SK (2021) What Is MLOps? Beginning MLOps with MLFlow, Berkeley, CA: Apress, pp 79–124. https://doi.org/10.1007/978-1-4842-6549-9_3 6. Tamburri DA (2020) Sustainable MLOps: trends and challenges. In 2020 22nd International symposium on symbolic and numeric algorithms for scientific computing (SYNASC), pp 17– 23. https://doi.org/10.1109/SYNASC51798.2020.00015 7. Renggli C, Rimanic L, Gürel NM, Karlaš B, Wu W, Zhang C (2021) A data quality-driven view of MLOps 8. Bang SK, Chung S, Choh Y Dupuis M (2013) A grounded theory analysis of modern web applications. In Proceedings of the 2nd annual conference on research in information technology—RIIT ’13, p 61. https://doi.org/10.1145/2512209.2512229 9. de Bayser M, Azevedo LG, Cerqueira R (2015) ResearchOps: The case for DevOps in scientific applications. In: 2015 IFIP/IEEE international symposium on integrated network management (IM), pp 1398–1404. https://doi.org/10.1109/INM.2015.7140503 10. Parihar AS, Chakraborty SK (2021) Token-based approach in distributed mutual exclusion algorithms: a review and direction to future research. The Journal of Supercomputing 77(12):14305–14355. https://doi.org/10.1007/s11227-021-03802-8 11. Parihar AS, Chakraborty SK (2022) A simple R-UAV permission-based distributed mutual exclusion in FANET. Wireless Networks 28(2):779–795. https://doi.org/10.1007/s11276-02202889-y 12. Parihar AS, Chakraborty SK (2022) Handling of resource allocation in flying ad hoc network through dynamic graph modeling. Multimedia Tools Appl 81(13):18641–18669. https://doi. org/10.1007/s11042-022-11950-z 13. Parihar AS, Chakraborty SK (2022) A new resource-sharing protocol in the light of a tokenbased strategy for distributed systems. Int J Comput Sci Eng (In press) 14. Parihar AS, Chakraborty SK (2022) A cross-sectional study on distributed mutual exclusion algorithms for ad hoc networks. In: Gupta D, Sambyo K, Prasad M, Agarwal S (eds) Proceedings of international conference on advanced machine intelligence and signal processing. pattern recognition and data analysis with applications. Springer, Singapore. (In press). https://doi.org/ 10.1007/978-981-19-1520-8_3 15. King P (2020) A history of the Groovy programming language. Proc ACM Program Language 4(HOPL) :1–53. https://doi.org/10.1145/3386326 16. Bernstein D (2014) Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput 1(3):81–84. https://doi.org/10.1109/MCC.2014.51 17. Karamitsos I, Albarhami S, Apostolopoulos C (2020) Applying DevOps practices of continuous automation for machine learning. Information 11(7):363. https://doi.org/10.3390/info11 070363 18. Liu Y, Ling Z, Huo B, Wang B, Chen T, Mouine E (2020) Building a platform for machine learning operations from open source frameworks. IFAC-PapersOnLine 53(5):704–709. https:// doi.org/10.1016/j.ifacol.2021.04.161 19. Mysari S, Bejgam V (2020) Continuous integration and continuous deployment pipeline automation using Jenkins Ansible. In 2020 International conference on emerging trends in information technology and engineering (IC-ETITE), pp 1–4. https://doi.org/10.1109/ic-ETI TE47903.2020.239
222
A. S. Parihar et al.
20. Zhang R, Gong W, Grzeda V, Yaworski A, Greenspan M (2013) An adaptive learning rate method for improving adaptability of background models. IEEE Signal Process Lett 20(12):1266–1269. https://doi.org/10.1109/LSP.2013.2288579 21. Parihar AS, Chakraborty SK (2022) Token based k-mutual exclusion for multi-UAV FANET. Wireless Personal Communications 126: 3693–3714. https://doi.org/10.1007/s11277-022-098 86-6 22. Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms 23. Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018) State-of-theart in artificial neural network applications: a survey. Heliyon 4(11):e00938. https://doi.org/10. 1016/j.heliyon.2018.e00938 24. Parihar AS, Prasad D, Gautam AS, Chakraborty SK (2021) Proposed end-to-end automated e-voting through blockchain technology to increase voter’s turnout. In: Proceedings of international conference on machine intelligence and data science applications, pp 55–71. https:// doi.org/10.1007/978-981-33-4087-9_5
Movie Recommendation System Using Machine Learning and MERN Stack Shikhar Gupta, Dhruv Rawat, Kanishk Gupta, Ashok Kumar Yadav, Rashmi Gandhi, and Aakanshi Gupta
Abstract The entertainment industry is booming, and machine learning is playing a vital role in the technical world. Content consumption habits are growing more complicated and evolving at a faster rate than ever before. Machine learning-based recommendation systems forms self-sufficient system which learn from their experiences and improve without having to be explicitly coded. It is a mechanism that allows a user to find information that is relevant to him or her from huge amounts of data. Every entertainment company uses a complex recommendation algorithm to display meaningful content to a user based on his preferences. It helps them to increase their sales and retain their user base. Movie recommendations systems have various approaches such as collaborative filtering (CF) which compares users for similarity of content consumption or content-based filtering which uses the movie’s features such as year of release, genre, and actors. A hybrid approach incorporates two or more different approaches of movie recommendation. We present a solution in this paper of movie recommendation system architecture that uses MERN stack and ML and handles the cold-start problem. Keywords Machine learning · MERN stack · Collaborative filtering · MongoDB
1 Introduction A recommendation system helps customers find the items they want from the countless possibilities available in a database. Predicting a user’s rating of a product is the fundamental goal of a recommendation system. It helps the user choose the best option from a selection of alternatives. Many companies, including Netflix, YouTube, and Amazon, employ recommendation systems to improve customer experience while boosting profits. Using machine learning, we can make our system improve over time and be flexible to the changes in a person’s preferences over time. S. Gupta · D. Rawat · K. Gupta · A. K. Yadav (B) · R. Gandhi · A. Gupta Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_20
223
224
S. Gupta et al.
There are various approaches for a recommendation system. Recommendations can be made based on the movie. This requires a lot of information such as its year of release, genre, actor, director, ratings, and reviews from users. Such types of recommendations are called content-based filtering. A person can also be recommended new movies by comparing his previously watched movies with other people. This method is called collaborative filtering. Only the users’ historical preferences on a set of products are required in this case as opposed to content-based filtering. It is derived from the idea that users who have watched similar content previously are likely to do the same in upcoming time as well. Collaborative filtering can be done by comparing the users or the items. If recommendations are generated for a user based on other users’ ratings of that, it is called user-based collaborative filtering. On the contrary, if recommendations are generated for a user based on how that user has rated similar goods, it is called item-based collaborative filtering. A hybrid system approach uses two or more different approaches.
2 Applications Recommendation systems are employed in a vast variety of businesses. They are used in almost all industries such as e-commerce, entertainment, banking, and social media. Personalized recommendations are given to your customers after recommendation systems collect and automatically analyze customer data. For such systems, both implicit and explicit data are crucial, including the user’s browsing history, past purchases, and supplied ratings. Additional profits, client pleasure, and engagement and share of mind are just a few of the advantages of the recommendation system. They help users to find relevant topics without having to squander time looking for that item. Amazon recommends products a person might need based on their previous purchases, things that are typically purchased together and things that they have searched for (that is browsing history). In entertainment, streaming services such as Netflix, Disney’s Hotstar, Amazon Prime Videos, and many others use recommendation systems which recommend movies and TV shows according to the user’s taste.
3 Literature Review Yang et al. [1] researched the cold-start problem. When a new item has no user ratings, it cannot be included in the similarity estimation method and cannot, therefore, be recommended to its intended audience (cold-start problem). In their paper, they proposed an approach in which the similarities between directors and actors are computed in order to improve the movie similarity measure. The similarity measure is optimized by finding similarities in directors, actors, and film genres and mapping similarity among projects to label similarity to help expand the scope of search
Movie Recommendation System Using Machine Learning and MERN …
225
for new movies by nearest neighbor algorithm. Using movie labels for new movies to discover comparable sets, it increased the target user recommended rate of the new movie in the cold-start situation. Zhao et al. [2] also worked on improving the original user-based movie recommendation system. Regarding user similarity calculations for movie suggestions, just the preference value was taken into account, omitting the impact of additional variables like age and gender. It is confirmed that in the case of user-based recommendation algorithms, user similarity only depends on users’ preference values on items computed using Euclidean distance similarity, ignoring the similarity of users’ fundamental attributes, such as age and gender. The suggested method extends the user-based recommendation algorithm, which typically emphasizes preference value while ignoring other variables, by adding variables like age and gender. The proposed system gave better RSME values than the original. A. Pal et al. [3] worked on a content-based CF model. In their paper, they proposed a hybrid approach that considers both content and CF algorithms. The algorithm contains a novel way for determining whether two objects have similar content. The proposed technique uses a set matching comparator for contentbased prediction and needs to consider the tags and genres supplied in the dataset which returns the number of common objects between two movies. Tags and genres are combined into a single set referred to as “objects.” The weight of each set for a movie is calculated after obtaining the set of common objects. These are then used to generate ratings for the unrated films based on the previously compared rated films. Each movie’s tags, which various people have given it, are collected into a single list. Each film’s genres are attached to the same set of tags. This final list is known as the items for a specific film. Each active movie’s object set is compared to every other movie’s object set in the dataset, and the number of matching items is assigned to a set. The proposed system outperforms the current system of pure CF and outperforms SVD by a little margin. Agrawal and Jain [4] proposed a hybrid approach which by using SVM as a classifier and evolutionary algorithms for content and collaborative filtering. Three distinct “MovieLens” datasets of various sizes are used to see if the scalability parameter is set correctly, that, is if the system is scalable, various sizes systems must strive to operate well and provide good result when the hybrid technique is used. B. The paper measures the results on various parameters, which are presented as graphs (memory, computing time, recall, accuracy, precision, etc.). Result comparisons reveal that the suggested approach outperforms pure alternatives in terms of accuracy, quality, and movie recommendation system scalability. Furthermore, the proposed solution takes less time to compute than the other two pure alternatives. Cami et al. [5] worked on a content-based approach for recommending movies. The suggested method uses a user-centered framework to infer user preferences and give a good suggestion list by incorporating the content attributes of rated movies (for each user) into a Dirichlet process mixture model. They use MovieLens dataset for their research. They consider the user profile, which consists of movie information (such as plot and genre) that the
226
S. Gupta et al.
user rated over a period. IMDb was used in order to construct the data for this film. Individual profiles are incorporated into the interest’s extraction module, and user interests are discovered. Each interest represents a collection of comparable films that the customer has already chosen. Following the creation of the user model, the prediction module is utilized to generate a list of recommendations. This module determines the possibility of each new item being assigned to each category in order to determine the item’s selection probability. Gao et al. [6] say that the accuracy of collaborative filtering algorithms is influenced by the preparation of user-item rating data. Because certain variances in user-item ratings are caused by the user’s state, the paper presents a weighted calculation approach to correct the user-item ratings, resulting in a normalized user-item rating matrix. They calculate the similarity between users based on the updated user-item rating matrix to produce a rating prediction. The suggested method is tested on the MovieLens (100k) dataset, and the MAE and RMSE results show that it outperforms the classic user-based CF methodology. As a result, their strategy can greatly enhance recommendation accuracy. Uyangoda et al. [7] worked on a recommendation system using user profile as a factor to deal with cold-start issues for personalized recommendations. In their work, they used memory-based CF. They tailor each user’s preferences based on feature scores acquired from past item ratings. Because each user’s ratings on the items they ingested are independent, the pattern they form is unique to each user. The proposed model involves three steps: 1. Creating user feature profiles based on historical records is one of the three key modules in this architecture. 2. Calculate the degree of similarity between profiles using feature scores. 3. Create recommendations based on the similarity matrix you’ve created. Kharita et al. [8] worked on item-based CF approach of recommendation system. The paper proposed an item-based CF approach that is dynamic and learns from positive feedback. The primary goal of any recommendation system is to forecast which goods a user will be interested in based on their preferences. When there is adequate data, recommendation systems based on CF approaches can produce an approximation precise prediction. User-based CF algorithms have shown to be quite effective in the past at recommending items based on the preferences of the users. However, there are significant limitations, such as scalability and data sparsity, which rises as the number of people and products grows. It is difficult to obtain relevant information in a short amount of time on a vast website. The accuracy of the proposed model was somewhat lower than the traditional approaches; however, it offered real-time recommendations. P. Darshna worked on content based and collaborative approaches to recommend movies and reduce the cold-start problem [9]. In this study, a recommendation system is proposed using content-based filtering and CF approach. A content-based recommendation system makes predictions based on user or item information, in addition to user’s previous interests. The contentbased filtering method looks upon the user’s previous interests in a specific item.
Movie Recommendation System Using Machine Learning and MERN …
227
The collaborative filtering technique examines a huge amount of data gathered from previous user reactions to an item as a rating and suggests items to the user. This system is focused on the user-item relationship as well as a rating feedback matrix, with each element reflecting a specific rating on a specific item. For content-based recommendation, when a query on cluster centroid is entered into the database, the cluster centroid attribute value is matched with the track attribute value. If there is a match, the user will be recommended. If the user’s playlist is empty, the user will be recommended music based on its popularity. It was how they overcame the cold-start problem. Gaspar et al. [10] worked on improving personalized recommendations. Recommender systems generate items that buyers should find interesting. Recommenders, on the contrary, frequently fail in the cold-start scenario, which occurs when a new item or consumer appears. They have investigated the cold-start problem for a new customer in any business. They discovered the most similar consumers for a coldstart client and made recommendations based on “their” pre-trained collaborative filtering algorithm. A hybrid recommendation mechanism was presented to overcome the cold-start problem. The first step was to create a matrix representation of customer-item interactions. When a new cold-start customer arises, his first interactions are utilized to identify consumers who are most similar to him. A matrix factorization model is used to propose for the most similar consumers and combine the outcomes of this recommendation to produce the final recommendation. To assess the accuracy and computational performance, multiple recommendation algorithms and similarity measures were evaluated. In the early stages of client acceptance, the cold-start recommendation is critical. It provides e-commerce with a competitive advantage. Gupta and Katarya [11] looked at both user-based and item-based data. On a MovieLens benchmark dataset, collaborative filtering was performed. They gave the results, which included a breakdown of each algorithm’s performance as well as an analysis of which method produces the best outcomes. The error metrics were computed using three benchmark datasets (MovieLens 100K, 1M, and 10M), whereas the implementation metrics were computed using two benchmark datasets (MovieLens 1M and 10M). They discovered that IBCF produced better results than UBCF. It demonstrates that when suggestions are produced based on products that the user has previously liked, the efficiency of such recommendations is higher than when recommendations are made based on users who all like the same item. Sahoo et al. [12] used single value decomposition (SVD) to create a privacy focused recommendation. It featured item-based collaborative filtering. They proposed an optimized hybrid item-based collaborative filtering recommendation model employing binary rating matrices and Jaccard similarity, as well as the basic singular value decomposition approach (SVD) for privacy preservation, in their article. SVD is a filtering process in which a matrix M of size x * y is decomposed into three different matrices ABV such that M = ABV. Here, A and V are two orthogonal matrices, and B is a diagonal matrix which has all its diagonal elements as singular values of M. The normalized matrix is separated into A, B, and V using SVD. The matrix Bkis was then obtained by picking only the k biggest singular values, reducing the dimensions of the matrices A and V. A k, Bk, and BkVT are determined after that. The final
228
S. Gupta et al.
matrices can be utilized to find the active user’s forecast. It has the ability to reveal the matrix’s secret structure. This filtering is mostly utilized in CF recommender systems to recognize user and item features. When the number of users exceeds the number of things, an item-based collaborative filtering recommendation system is preferable. Data sparsity, cold-start problems, shilling attacks, and privacy can all impair the system’s performance. Ifada et al. [13] compared contrasting various methods for the movie recommendation system. There are two main phases of CF, movie similarity and movie rating prediction. In a hybrid approach, we combine the benefits of content-based to CF. The hybrid strategy makes advantage of both the rating data as well as movie data and consists of four main phases, i.e., processing text, assigning weights to terms, clustering, and CF. The performance of both systems is linearly proportional to the size of the movie neighborhood, according to empirical findings. However, because the hybrid-based approach employs a clustering algorithm, the required neighborhood size is inherently lower than in CF. CF outperforms hybrid technique in NDCG metrics at any top-N position and in precision, indicating that hybrid model does not inherently improve CF in movie recommendation systems. Gupta et al. [14] worked on a CF approach for movie recommendation. The approach in this research is based on CF technique and utilizing k-NN cosine similarity. It aimed at improving accuracy and performance of regular filtering techniques. The suggested method uses the KNN to calculate the distance between the target films and every other film in the dataset and then uses cosine angle similarity to rank the top k closest comparable movies. Usually, Euclidean distance is preferred but cosine similarity used as cosine angle accuracy and equidistance of movie almost remain the same. The paper proposes a collaborative (item-based) technique instead of user-based as item-based filtering can be done offline and is non-dynamic in nature.
4 System Architecture The system is divided into two units. First being the machine learning recommendation server made using flask framework and is written in Python. Second part is made using the MERN stack which constitutes the web server, front end, and the database. We have used the MovieLens small dataset provided by grouplens which is sufficient for education and research purposes. This dataset resides within our ML server. Users can create an account through the user’s registration page, after which they can rate movies and write reviews. All user data is stored in MongoDB. As soon as the user requests a recommendation, the web server will send a request containing movie IDs and the rating given by the current user to the recommendation server. The recommendation server will then generate recommendations and return a list of relevant movie IDs to the web server. The web server then sends a request containing the recommended movie IDs to the TMDB API to get movie thumbnails which is then displayed to the user. Recommendation system problem can be divided into four subproblems:
Movie Recommendation System Using Machine Learning and MERN …
229
Fig. 1 Architecture of the workflow
1. Matrix Completion: The user-item rating matrix is generally a very sparse matrix with most values missing. For completing a matrix, numerous algorithms are available such as alternating least squares, stochastic gradient descent, and singular value thresholding. 2. Item/User Recommendation: We create a correlation matrix of the item latent vectors to make recommendations. 3. Cold-Start Problem: To solve this, we created a content-based recommender. Using the tags in the dataset, we create documents of all tags per user. Then the TF-IDF values are compressed using an autoencoder and used to recommend using cosine similarity of compressed vectors. 4. Retraining: As new information is gained either through change in ratings by known users or new users/items are added, recommender systems need to be retrained periodically (Fig. 1). But this leads to the issue of training on old data again and with more retraining the data just piles up. This issue will be addressed in future work.
5 Conclusion The recommendation system needs to be fast, reliable and effective. It should fulfill its purpose of showing great recommendation to users and thus retaining the user base of the platform. Recommendation systems are only able to recommend those movies which were used in the training dataset. Therefore, we are required to use a certain TMDB API, which will help us show new and popular movies to the user, as they release. The cold-start problem is one of the most important issues in
230
S. Gupta et al.
the recommendation system which occurs due to insufficient data about an object which results in poor recommendations. For example, there is no prior information of a customer who has recently registered in application. There is no data for the recommendation system to work upon. Also, when a new movie is added there are no ratings for the movie, the system will not be able to recommend it. The solution for this problem is as simple as collecting data and creating a hybrid recommendation system. Imputing 0 and then running SVD on the user-movie rating matrix gave better results than matrix completion using algorithms such as ALS and Stochastic Gradient.
6 Future Work We plan to implement a lot of quality-of-life updates to our system and features such as watching status that is the user has completely watched the movie or dropped the movie in between or if they plan to watch the movie in the future. We will also incorporate such statuses to our recommendation and check if they improve the recommendation system. We will also consider other hybrid approaches. There are two main areas of research we will investigate. First is the matrix completion using residual neural networks and other being transferring knowledge of historical data to avoid retraining on huge amounts of old data.
References 1. Yi P, Yang C, Zhou X, Li C (2016) A movie cold-start recommendation method optimized similarity measure. In: 2016 16th international symposium on communications and information technologies (ISCIT), pp 231–234. https://doi.org/10.1109/ISCIT.2016.7751627 2. Zhao D, Xiu J, Yang Z, Liu C (2016) An improved user-based movie recommendation algorithm. In: 2016 2nd IEEE international conference on computer and communications (ICCC), pp 874–877. https://doi.org/10.1109/CompComm.2016.7924828 3. Pal A, Parhi P, Aggarwal M (2017) An improved content based collaborative filtering algorithm for movie recommendations. In: 2017 tenth international conference on contemporary computing (IC3), pp 1–3. https://doi.org/10.1109/IC3.2017.8284357 4. Agrawal S, Jain P (2017) An improved approach for movie recommendation system. In: 2017 international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), pp 336–342. https://doi.org/10.1109/I-SMAC.2017.8058367 5. Cami R, Hassanpour H, Mashayekhi H (2017) A content-based movie recommender system based on temporal user preferences. In: 2017 3rd Iranian conference on intelligent systems and signal processing (ICSPIS), pp 121–125. https://doi.org/10.1109/ICSPIS.2017.8311601 6. Gao X, Zhu Z, Hao X, Yu H (2017) An effective collaborative filtering algorithm based on adjusted user-item rating matrix. In: 2017 IEEE 2nd international conference on big data analysis (ICBDA), pp 693–696. https://doi.org/10.1109/ICBDA.2017.8078724 7. Uyangoda L, Ahangama S, Ranasinghe T (2018) User profile feature-based approach to address the cold start problem in collaborative filtering for personalized movie recommendation. In: 2018 thirteenth international conference on digital information management (ICDIM), pp 24– 28. https://doi.org/10.1109/ICDIM.2018.8847002
Movie Recommendation System Using Machine Learning and MERN …
231
8. Kharita MK, Kumar A, Singh P (2018) Item-based collaborative filtering in movie recommendation in real time. In: 2018 first international conference on secure cyber computing and communication (ICSCCC), pp 340–342. https://doi.org/10.1109/ICSCCC.2018.8703362 9. Darshna P (2018) Music recommendation based on content and collaborative approach & reducing cold start problem. In: 2018 2nd international conference on inventive systems and control (ICISC), pp 1033–1037. https://doi.org/10.1109/ICISC.2018.8398959 10. Gaspar P, Kompan M, Koncal M, Bielikova M (2019) Improving the personalized recommendation in the cold-start scenarios. In: 2019 IEEE international conference on data science and advanced analytics (DSAA), pp 606–607. https://doi.org/10.1109/DSAA.2019.00079 11. Gupta G, Katarya R (2019) Recommendation analysis on item-based and user-based collaborative filtering. In: 2019 international conference on smart systems and inventive technology (ICSSIT), pp 1–4. https://doi.org/10.1109/ICSSIT46314.2019.8987745 12. Sahoo AK, Pradhan C, Prasad Mishra BS (2019) SVD based privacy preserving recommendation model using optimized hybrid item-based collaborative filtering. In: 2019 international conference on communication and signal processing (ICCSP), pp 0294–0298. https://doi.org/ 10.1109/ICCSP.2019.8697950 13. Ifada N, Rahman TF, Sophan MK (2020) Comparing collaborative filtering and hybrid based approaches for movie recommendation. In: 2020 6th information technology international seminar (ITIS), pp 219–223. https://doi.org/10.1109/ITIS50118.2020.9321014 14. Gupta M, Thakkar A, Gupta V, Rathore DP (2020) Movie recommender system using collaborative filtering. In: 2020 international conference on electronics and sustainable communication systems (ICESC), pp 415–420. https://doi.org/10.1109/ICESC48915.2020.9155879
A Modified Newman-Girvan Technique for Community Detection Samya Muhuri and Deepika Vatsa
Abstract Community detection is a well-observed problem in the complex network domain for the last two decades. Newman-Girvan is one of the most popular methods in this field. The algorithm is based on the repetitive deletion of edges from a complex graph with high betweenness centrality. In several real-life networks, NewmanGirvan has failed to produce desired results due to ambiguity as more than one edge can have the same betweenness centrality. Also, the average run time complexity of the said algorithm is high than expected. In this manuscript, we have modified the popular community detection technique by introducing clustering co-efficient metrics. Clustering co-efficient would introduce the neighborhood contribution and make stable communities even for a large network. The experimental results are satisfactory over the benchmark data and can be used as an alternative solution in this domain. Keywords Social network · Community detection · Modularity
1 Introduction With technological advancements, the understanding of the complex system has increased significantly. A complex system can be illustrated as a network of nodes and links. In real life, a network is present in many systems like the food Web, world wide Web, communication network, protein-protein network, neural network, social networks, etc. In today’s digital era, the ease of communication over social platforms like Facebook, Instagram, and Twitter has enabled the rise the activities S. Muhuri Thapar Institute of Engineering & Technology, Patiala 147004, Punjab, India D. Vatsa (B) Bennett University, Greater Noida 201310, UP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_21
233
234
S. Muhuri and D. Vatsa
and footfalls on social networks. Today, an individual has become a part of multiple social network communities like family, office organization, friends, city, etc. All the members of particular network form groups or communities based on their common interests and the relationships they share. Like general networks, social networks can be represented in the form of a graph containing nodes and links; nodes represent users, and links represent the association or relationship between a pair of users. Social networks can be studied to identify various characteristics among the participants like weak and dominant nodes or subgroups and understand their role. Social network analysis is an active research area explored by several researchers from the biology, physics, social science, and computer science community. The research trends focus on analyzing networks and gaining valuable insights like identification of dominant nodes, dense sub-modules, community structures, etc. The availability of large-scale real-world data in recent years has also encouraged researchers and has helped uncover different features of real networks like the existence of motifs, power-law distribution, scale-free nature, etc. Researchers have also focused on discovering efficient community detection algorithms as it would help the industries to disseminate news, advertisement, and service to a target audience. A community is defined as a collection of nodes that are densely interconnected but sparsely connected to the rest of the network. A quick visual glance at the communities in a network can provide us with an overview of the arrangement of the nodes in a network. Community detection helps us understand the organization and functionality of different subgroups in a network. Currently, the several types of available online data allow the researchers to explore the communities for different application domains like biology, literature [1], film [2], sports [3], education, and many more [4]. In this paper, Sect. 2 has defined the community detection problem. Section 3 has presented related work. In Sect. 4, the modified community detection methodology has been proposed. Section 5 has contained the experimental results and discussion. Finally, in Sect. 6, we have concluded the manuscript.
2 Community Detection Problem Community detection problem [5–7] is defined as the identification of subgroups of nodes in the network such that nodes in the subgroup have high-density cohesive strength and low density of edges among other subgroups. Figure 1 represents an illustration of communities within a network. Detecting communities from a network holds significant practical importance in the understanding of the network. Like on social media, subgroups could mean social units, in metabolic and regulatory processes, a subgroup could correspond to bio-molecules working for a specific function. Identification of groups in networks has been performed by a variety of researchers across different domains like computer science, biology, social science, etc. The researchers from the computer science community call the problem of discovering
A Modified Newman-Girvan Technique …
235
Fig. 1 Visualization of two communities (encircled) in a network
communities a graph partitioning problem, while researchers from physics, biology, and social science call it block modeling or hierarchical clustering [8]. Communities in a network can be non-overlapping as well as overlapping. Overlapping communities are those where some nodes belong to more than one community. Given a network structure, let us say, we want to determine whether there exists a natural separation of its nodes into non-overlapping communities. The most apparent solution to this problem is to seek nodes that can be divided into two groups to reduce the number of edges that flow between them. In the graph partitioning literature, to achieve this, the “minimum cut” strategy is the most commonly used. The community structure problem, however, varies significantly from graph partitioning in that the sizes of the communities are often unknown in advance. If community sizes are unconstrained, we may, for example, choose a simple network partition that places all of the vertices in one of our two groups and none in the other, ensuring that we will have no inter-group edges. In some ways, this split is ideal, yet it certainly does not give us anything useful. So, the issue is that counting edges is not an effective approach to quantifying the intuitive sense of community organization. Just a few edges between communities does not satisfy as a good community partition of a network, rather it is the one where there are fewer edges between communities than predicted. If the number of edges between groups is substantially fewer than we would anticipate on the basis of chance occurrence, or if the number of edges inside groups is much larger, it is plausible to assume that something interesting is occurring. Thus, unexpected organization of edges may correlate with the communities in a network. The community detection problem can be described as follows: Consider a network presented as a graph G = (V, E) where V = v1 , v2 , . . . , vn denotes the set of vertices or nodes, and E = V × V denotes the set of edges or links between pair of
236
S. Muhuri and D. Vatsa
vertices. The aim is to decompose G into number of modules. A module M is called a community given that it satisfies below-mentioned conditions: 1. M should not be trivial, i.e., M ⊂ V and M = ∅. 2. No two communities should contain the same nodes. 3. Union of all communities should return the set of vertices V. Modularity is a well-known evaluation metric introduced by Newman [8] which assesses how nodes in a community share more edges than what would be anticipated in a randomized network. Given a network with m number of edges, its modularity Q is defined by Eq. 1. A(v, w) denotes the edge connecting nodes i and j, kv denotes the sum of the weights of the edges associated to node v, cv represents the community of node v, δ(cv ,cw ) denotes the function that equals 1 when nodes v and w are in the same community, otherwise, it becomes 0. 1 kv kw Avw − δ(cv , cw ) Q= (1) 2m vw 2m The modularity score given by Q can be either positive or negative. The presence of a plausible community structure is shown by a positive value, which indicates that the number of edges inside a group is more than the number of edges predicted by chance occurrence. The contribution of the paper is as follows: 1. A new modified version of Newman-Girvan approach is proposed introducing clustering co-efficient for modularity computation. 2. Time complexity of the proposed approach is better than the original NewmanGirvan approach. 3. The proposed approach outperforms other traditional approaches in terms of modularity score obtained over three popular real-world data sets.
3 Related Work As community detection is one of the most significant topics in complex network systems, many approaches have been proposed for the same. Approaches based on different methods, like similarity measures [9], random walk dynamics [10, 11], statistical methods [12, 13], have been proposed. Traditionally, hierarchical clustering was employed to find communities in a network. Here, weights are calculated for each pair of nodes which represent the connectedness or strength between the nodes. Then, in an empty network with no edges, edges are added based on their weights. The edge with the higher weight is added first. Subsequently, after each edge addition, a nested set of components is obtained.
A Modified Newman-Girvan Technique …
237
In 2001, Newman et al. [7] had proposed a method based on the centrality index measure to find boundaries of community in a network. In 2006, Newman [8] gave the popular modularity metric that can be used to assess community partitions, thereby implying an optimization strategy. Community detection can be thought of as an optimization problem, given the functions for evaluating community structures. Modularity optimization is widely used for community partitioning in a network. Newman et al. proposed a modularity function Q to estimate the goodness of partitioning of network [6]. They propose a set of algorithms, one for removing edges using the “edge betweenness” measure and the other for measuring the strength of identified community. Edge betweenness measure gives a score to edges that represent how least central those edges are in the network. The edge with the highest score is considered to be the most between the communities or groups, thus removing that edge will result in forming communities in the network. Authors applied this approach to synthetic as well as real data sets and found to achieve significantly good results in terms of obtaining communities. Authors in [14] used the above-mentioned algorithm to identify sub-communities within communities and tested their approach on three real networks. They found 5 sub-communities in the two major communities detected in both Zachary Karate Club as well as Bottlenose Dolphins Network data sets. Zhang et al. [15] analyzed two modularity measures: modularity function Q and modularity density D for partitioning a network into communities. They found that apart from the resolution limit of Q, both measures deal with a limitation, that is, derived communities do not satisfy the criteria of even a weak community. They further examined the causes for limitations and come up with a solution regarding the choice of modularity measure to be used in applications. Chen et al. [16] proposed a set of global as well as local modularity functions for community detection in networks. Application on real networks revealed that local modularity works better than global modularity for community identification in large and heterogeneous networks. Lately, many researchers presented a survey of community detection approaches in complex networks based on techniques from statistics to deep learning [17, 18].
4 Data Sets We have used three real-world networks for the application of our proposed methodology. The details of each of the networks are given below: 1. Zachary Karate Club: Zachary [19] provided a friendship network of 34 individuals of a Karate club at a University in America observed over a period of 2 years. Zachary used various measures to estimate the strength of friendship among individuals. This network is interesting as shortly after its creation, the club got split into two due to some conflict.
238
S. Muhuri and D. Vatsa
2. American College Football Network: The network represents the interaction among football teams of different colleges in 2000 [7]. The network contains a total of 115 teams represented as nodes. The edges in the network represent games played between two particular teams. The teams are divided into two conferences comprising 8–12 teams in each conference. Games are played more frequently between intra-conference teams than between inter-conference teams. 3. Bottlenose Dolphin Network: Lusseau et al. [20] constructed the network of 62 Bottlenose Dolphins living in New Zealand by studying their behavioral pattern for over 7 years. The network constructed in 2003 consists of two large groups and contains 159 edges.
5 Proposed Methodology Here, we have described a novel community detection method based on graph partitioning. It is a modified version of the Girvan-Newman community detection technique [6]. We have tried to reduce the number of iterations, and as a result, communities have been identified much faster than identified using the state-of-the-art approach. Both the Girvan-Newman community detection method and our proposed method detect non-overlapping communities. The following steps have been introduced to examine the communities from the complex network. 1. Clustering co-efficient of all the nodes has been calculated. 2. The edges associated with the highest clustering co-efficient have been removed. If more than one node has a high clustering co-efficient, then all the edges should be removed. 3. For the remaining nodes, again clustering co-efficient has been calculated. 4. Repeat the above-mentioned two steps till all the edges have been removed.
Algorithm 1 Community Detection Algorithm 1: Take Graph G=(E,V ) where E and V are edge and vertex set 2: for i = 0 to all v ∈ V do 3: if clustering coefficient [i] > max [clustering coefficient] then max [clustering coefficient] = clustering coefficient [i] 4: end if 5: end for 6: remove edge (u,v) associated with i from graph G 7: repeat 8: until number of edges in G is 0
The worst-case time complexity of the above-mentioned algorithm would be O(n2 ). Though in the average case, edges associated with the high clustering coefficient would be removed, and hence, the iteration would be also reduced.
A Modified Newman-Girvan Technique …
239
Table 1 Comparison of modularity with different methods Data set Louvain method Fast greedy Local popularity method Karate Dolphin Football
0.41 0.42 0.45
0.17 0.42 0.44
0.40 0.42 0.44
Our method 0.44 0.40 0.47
Fig. 2 Performance of the proposed method on different data sets
6 Result We have shown the performance of the proposed algorithm on three popular publicly available data sets and compared the results with some popular existing methods such as the Louvain method [21], fast greedy method [22], and local popularity method [23]. From Table 1, we have seen that our method shows better results over most of the data sets. In Fig. 2, the modularity variation of the communities based on the proposed method is shown on different data sets. We have seen that with an increasing number of iterations, the community detection algorithm becomes stable for different realworld data sets.
7 Conclusion Community detection techniques are not only important for discovering the common interest, but also to reveal hidden relations among the stakeholders. In this manuscript, we have modified one of the popular community detection techniques and shown improved results on several real-world data sets. In future, we would try to apply
240
S. Muhuri and D. Vatsa
the proposed method to other real-world problems to exhibit its utility. Assembling machine learning algorithms might improve the efficiency of our approach. We would also focus to modify this technique to uncover the overlapping nodes. The satisfactory results produced from our method can inspire the interdisciplinary researchers to utilize the concept further.
References 1. Muhuri S, Chakraborty S, Chakraborty SN (2018) Extracting social network and character categorization from Bengali literature. IEEE Trans Comput Soc Syst 5(2):371–381 2. Chowdhury T, Muhuri S, Chakraborty S, Chakraborty SN (2019) Analysis of adapted films and stories based on social network. IEEE Trans Comput Soc Syst 6(5):858–869 3. Muhuri S, Chakraborty S, Setua SK (2020) Differentiate the game maker in any soccer match based on social network approach. IEEE Trans Comput Soc Syst 7(6):1399–1408 4. Liu W, Suzumura T, Chen L, Hu G (2017) A generalized incremental bottom-up community detection framework for highly dynamic graphs. In: 2017 IEEE international conference on big data (Big Data), pp 3342–3351. https://doi.org/10.1109/BigData.2017.8258319 5. Newman MEJ (2004) Detecting community structure in networks. Eur Phys J B 38(2):321–330 6. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113. https://doi.org/10.1103/PhysRevE.69.026113 7. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826. https://doi.org/10.1073/pnas.122653799 8. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582 9. Pan Y, Li DH, Liu JG, Liang JZ (2010) Detecting community structure in complex networks via node similarity. Physica A: Stat Mech Appl 389(14):2849–2857. https://doi.org/10.1016/ j.physa.2010.03.006 10. Piccardi C (2011) Finding and testing network communities by lumped Markov chains. PLOS ONE 6(11):1–13. https://doi.org/10.1371/journal.pone.0027028 11. Jin D, Yang B, Baquero C, Liu D, He D, Liu J (2011) A Markov random walk under constraint for discovering overlapping communities in complex networks. J Stat Mech 2011(05):P05031 12. Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74:016110. https://doi.org/10.1103/PhysRevE.74.016110 13. Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83:016107. https://doi.org/10.1103/PhysRevE.83.016107 14. Choudhury D, Bhattacharjee S, Das A (2013) An empirical study of community and subcommunity detection in social networks applying Newman-Girvan algorithm. In: 2013 1st international conference on emerging trends and applications in computer science, pp 74–77. https://doi.org/10.1109/ICETACS.2013.6691399 15. Zhang XS, Wang RS, Wang Y, Wang J, Qiu Y, Wang L, Chen L (2009) Modularity optimization in community detection of complex networks. EPL (Europhys Lett) 87(3):38002. https://doi. org/10.1209/0295-5075/87/38002 16. Chen S, Wang ZZ, Tang L, Tang YN, Gao YY, Li HJ, Xiang J, Zhang Y (2018) Global vs local modularity for network community detection. PLoS One 13(10):e0205284 17. Jin D, Yu Z, Jiao P, Pan S, He D, Wu J, Yu P, Zhang W (2021) A survey of community detection approaches: from statistical modeling to deep learning. IEEE Trans Knowl Data Eng 18. Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng QZ, Yu PS (2022) A comprehensive survey on community detection with deep learning. IEEE Trans Neural Netw Learn Syst 1–21. https://doi.org/10.1109/TNNLS.2021.3137396
A Modified Newman-Girvan Technique …
241
19. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473. https://doi.org/10.1086/jar.33.4.3629752 20. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405. https://doi.org/10.1007/s00265-003-0651-y 21. Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111 22. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008 23. Yazdani M, Moeini A, Mazoochi M, Rahmani F, Rabiei L (2020) A new follow based community detection algorithm. In: 2020 6th international conference on web research (ICWR). IEEE, pp 197–202
Mental Stress Level Detection Using LSTM for WESAD Dataset Lokesh Malviya, Sandip Mal, Radhikesh Kumar, Bishwajit Roy, Umesh Gupta, Deepika Pantola, and Madhuri Gupta
Abstract In the present time, stress is one of the major problems in human life. Person from every age group is facing stress and have increased major diseases such as distressing, heart diseases, and others. As we know “Prevention is better than cure” so, early recognition of mental stress can avoid many stress-related problems both physically and mentally including heart stroke and depression. When a person is stressed, many biological signals such as heat, electricity, impedance, acoustics, and optics change take place in their body and these biological signals can be used to evaluate stress levels. In this study, WESAD (Wearable Stress and Affect Detection) dataset is used, which is collected using wearable sensing devices such as wristworn. Further, multiple machine and deep learning models including support vector machine (SVM), Decision Tree (DT), k-nearest neighbors (k-NN), and long shortterm memory (LSTM) are used for detecting human stress. The result shows that the LSTM achieved high model accuracy in comparison with other models in terms L. Malviya · S. Mal School of SCSE, VIT, Bhopal, MP 466114, India e-mail: [email protected] S. Mal e-mail: [email protected] R. Kumar Department of CSE NIT, Patna, Bihar 800001, India e-mail: [email protected] B. Roy (B) School of Computer Science, University of Petroleum and Energy Studies, Dehradun 248007, India e-mail: [email protected] U. Gupta · D. Pantola · M. Gupta School of CSET, Bennett University, Greater Noida, UP 201310, India e-mail: [email protected] M. Gupta e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_22
243
244
L. Malviya et al.
of precision (97.01%), recall (97.00%), F1-score (97.03%), and accuracy (98.00%). So, from the result, it is confirmed that LSTM modal may be used as stress prediction model. Keywords Stress · Relax · Detection · Long short-term memory
1 Introduction Mental pressure is one of the most important individuals of a number of fitness issues. Various measures had been created by scientists and medics to decide the depth of intellectual pressure in its early stages. In the literature, many brain activity strategies for checking intellectual pressure inside the place of work had been presented. EEG signal is a powerful tool to collect brain information, which provides a lot of information about state of mind and conditions. The nervous system and the Pituitaryadrenocortical (PA-axis) coordinate have represented the human psychological and physical stress state [1, 2]. The mainly acute, episodic, and chronic stress are found in the literature by researcher [3]. Anxiety is not harmful because it is effective for the short term. Epistemic stress is a condition in which a stimulus occurs more often for a brief period of time [4]. Inversely, chronic stress, which is brought on by persistent stressors, is the most dangerous type [5–8]. According to numerous experts, a number of variables, such as emotional and mental stress, medical conditions, such as depression and strokes, coronary heart disease, brain problems, and speech difficulties [9, 10]. Additionally, human body is impacted by stress indirectly on a number of levels, including skin issues, food patterns, sleep problems, and judgment [11– 13]. As a response, research has provided a variety of techniques for assessing stress levels early on in order to prevent negative consequences on performance and health. In the past, subjective measurements have been used to measure stress. The most frequently used method is self-report surveys [14], including the perceived stress scale [15, 16]. The fundamental reality for determining psychological stress levels has been proven in numerous studies using survey question scores, self-reported data, and interview assessments. Because assessments are subjective, participants must use the extra care. For this, many people are unaware of their actual stress levels. The cardiac variance, electro-dermal behavior, EEG [17], electromyography, blood pressure, and salivary test are some brain state measurements that have been identified as stress indicators inside the technological era. In order to take the necessary actions to reduce the impact on human existence, that is crucial that the stress level any signs give off is accurate. Table 1 shows the disease’s prediction by stress in human body.
Mental Stress Level Detection Using LSTM for WESAD Dataset
245
Table 1 Diseases prediction by stress in human body Diseases
Stress symptoms
Heart attack
Pain in the chest, weak feeling, pain in the jaw, shortness of breathing, etc.
Depression and anxiety
Loosed interest, hope, and enjoyment
Stroke
Blockage of blood vessels and disturbs the supply of blood to the brain, brain cells start to destroy
Asthma
Chronic stress shortness of breathing, sleeping problems, and cold flu
Diabetes
The level of glucose decreased
Headaches
Migraines, tension
2 Related Work The literature on stress detection using various machine learning models, as authored by various authors, is included in this section. It is pertinent to the study. Ciabattoni et al. [18] has been worked on concurrent mental stress monitoring through an individual’s mental tasks. The dataset contains individual’s information such as body temperature, skin reactions, and ECG that were measured using an electronic smart watch and tool. A classification accuracy of 89.8% was achieved using a k-NN approach. Can et al. [19] acquired information using wearable sensors and smartphones, studied the subjects’ everyday activities, and identified stress. The authors utilized PCA for dimensionality reduction and SVM and k-NN for classification. In order to collect data from physiological signals across three stimulus types; music, films, and games, Anderson et al. [20] acquired data in the form of EEG, ECG, GSR, PPG, EOG questionnaires. The arousal and stimulus categorization methods employing k-NN and SVM achieved classification accuracy of 80.61% and 88.91%, respectively. Sun et al. [21] used several sensors to collect data from 20 people during a variety of basic actions as sitting, standing, and walking. ECG and GSR dataset and DT, SVM, and NB machine learning models used for prediction, out of which NB outperformed with 92.40% accuracy. Ciabattoni et al. [18] recommended a process to look into how much HRV snippets may be compressed without losing their capacity to recognize mental stress and achieved an accuracy of 88%, by using an SVM classifier and a data-driven approach. Schmidt et al. [22] have used WESAD, dataset consists of 15 volunteers (13 male and 2 female) brain signal during a lab study. In order to improve human–computer interaction (HCI), the author developed a method for identifying an individual’s emotional state using observables. Bajpai and He [23] assessed the WESAD dataset’s performance using k-NN model and validated using a fivefold cross-validation procedure. The next-generation stress detector was developed by Lai et al. For first response and other professions, such as firemen, emergency medical technicians, and many others. The authors used SMA, which detects stress features from sensor data, using residual-temporal convolution network. For the
246
L. Malviya et al.
WESAD dataset, the SMA achieves accuracy in the stress recognition and detection modes of 86% and 98%, respectively. This research uses WESAD, a publicly available dataset. The PCA feature selection is applied after successful preprocessing of the data. The selected subset of features is used as input for LSTM model for efficient classification of mental stress and relax prediction.
3 Dataset Description This research collected data from online available repository “WESAD dataset” [23]. Dataset consists of 15 volunteers (13 male and 2 female). Tasks for data collection are baseline, amusement, stress, and meditation. For the categorization task, this proposed task has been used as a class. WESAD collects information from two types of sensors: wrist-worn devices (Empatica E4) and chest-worn (RespiBAN). Because a wrist-worn device is less invasive than a chest-worn device and is said to be highly promising for use in the classification process, we exclusively employ wrist-worn device data in this study.
4 Proposed Work This research work proposed the deep learning model, long short-term memory (LSTM), used for classification tasks for WESAD dataset. LSTM has some extra features as compared to the recurrent neural network (RNN). The deep layered architecture is used by LSTM. It contains one input layer many hidden layers and a single output. Standard networks have feed-forward neural networks, and LSTM has feedback connections. It is processed a large amount of data (such as images), but also entire sequences of data. It is capable of learning long-term redundant data. This is possible because the model’s recurring module is made up of four layers that interact with one another. Method 1 outlines the techniques we employed to predict a person’s level of stress using the WESAD dataset. The suggested model flowchart is shown in Fig. 1. Algorithm: For WESAD data Classification Using LSTM Results: Classified data using the LSTM model and compare by confusion matrix parameters Input: Take Input as EEG signal of WESAD Output: Identify subjects who are under stress or relaxed (continued)
Mental Stress Level Detection Using LSTM for WESAD Dataset
247
(continued) Algorithm Steps: 1: Identify the features in the dataset: (a) Remove useless data (b) Invalid or missing data (c) Data normalization 2: Use PCA to select important features 3: Use the LSTM classification model to compare to the existing model 4: Calculate and save the classification precision, recall, F1-score, and classification accuracy acquired 5: Repetition of Steps 3 and 4
5 Empirical Result and Discussion A confusion matrix is useful to compare machine learning and deep learning performance [22]. Recall: It is determined as the ratio of True (+) to the sum of True (+) and False (−) and is calculated as follows: Recall =
True (+) True (+) + False (−)
(1)
Precision: It is the ratio that correctly predicts whether participants are stressed or relaxed, and it is determined as follows: Precision =
True (+) True (+) + False (+)
(2)
Accuracy: It is a ratio of the total number of correct predictions and is calculated as follows: Accuracy =
True (+) + True(−) True (+) + False (+) + True (−) + False (−)
(3)
where (+) indicates positive and (−) indicates negative to assess the efficiency of the proposed research technique on the testing data, various measures using confusion parameters like accuracy, recall, F1-score, and precision are used. Table 2 depicts a comparison with different machine learning models in terms of performance. LSTM is compared to various ML models and archived batter prediction accuracy. Figure 2 illustrates a graphical comparison of k-NN, DT, SVM, and LSTM.
248
Input From Dataset
L. Malviya et al.
Data Preprocessing
Feature Extraction
(Missing, Nan value)
PCA
Machine Learnig Ensemble Model (Classification)
MentalLoad/Relax
DT,RF,SVM, k-NN, LSTM
Fig. 1 Proposed model
Table 2 Model performances in percentage Models
Precision
Recall
F1-score
Accuracy
Decision tree (DT)
96.10
96.20
95.00
95.00
Support vector machine (SVM)
81.00
75.00
77.00
79.00
k-nearest neighbors (k-NN)
88.00
92.00
90.00
90.49
Proposed model (LSTM)
97.01
97.00
97.03
98.00
Fig. 2 Performances of proposed model with comparison
5.1 Comparative Analysis In this research paper, LSTM performed well as compared to k-NN, DT, SVM machine learning classification models. According to an earlier study, the k-NN algorithm is preferable than a Markov model in identifying the classical and skating styles jointly since it has lower error rates. SVMs are non-parametric models, as the number of training samples grows, so does the complexity. As a result, training a
Mental Stress Level Detection Using LSTM for WESAD Dataset
249
non-parametric model can be more computationally expensive. We examined classifier training and discrimination performance in 50 independent iterations, comparing LSTM analysis to machine-learning-based techniques. Only a few of the parameters offered by LSTMs include learning rates, input and output biases, and learning rates. Therefore, no exact adjustments are needed. It is advantageous that LSTMs reduce the difficulty of updating each weight to O (1), which is comparable to back propagation across time. The confusion parameter precision, F1-score, recall and accuracy are performed well of LSTM than other machine learning algorithms. First limitation of this model is feature selection, and extraction is done manually. Second hyperparameter tuning is done by hit and trial method.
6 Conclusion and Future Recommendation Due to the current COVID situation, almost every person throughout the world is in stress and it is affecting their social as well as economical life. In order to predict this stress, this paper is using WESAD dataset. This dataset is very useful to predict stress in humans using machine and deep learning models. Several models like DT, SVM, kNN, and LSTM have been analyzed to predict stress on the abovementioned dataset. The result shows that the LSTM model outperformed other models in comparison with precision, recall, F1-score, and accuracy (Table 2). Therefore, from the result it is confirmed that LSTM modal may be used as stress prediction model. The future scope for this research is to apply different feature selection and extraction models to improve the classification accuracy of stress detection. Also, we may use different dataset in order to check the compatibility of LSTM and other used models.
References 1. Selye H (1965) The stress syndrome. Am J Nurs 97–99 2. Giannakakis G, Grigoriadis D, Giannakaki K, Simantiraki O, Roniotis A, Tsiknakis M (2019) Review on psychological stress detection using biosignals. IEEE Trans Affect Comput 3. Lazarus J (2000) Stress relief & relaxation techniques, McGraw Hill Professional 4. Bakker J, Pechenizkiy M, Sidorova N (2011) What’s your current stress level? Detection of stress patterns from GSR sensor data. In: 2011 IEEE 11th international conference on data mining workshops. IEEE, pp 573–580 5. Malviya L, Mal S, Lalwani P (2021) EEG data analysis for stress detection. In: 2021 10th IEEE international conference on communication systems and network technologies (CSNT). IEEE, pp 148–152 6. Gedam S, Paul S (2021) A review on mental stress detection using wearable sensors and machine learning techniques. IEEE Access 7. Al-Saggaf UM, Naqvi SF, Moinuddin M, Alfakeh SA, Azhar Ali SS (2022) Performance evaluation of EEG based mental stress assessment approaches for wearable devices. Front Neurorobotics 15:819448
250
L. Malviya et al.
8. Gedam S, Paul S (2020) Automatic stress detection using wearable sensors and machine learning: a review. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7 9. Colligan TW, Higgins EM (2006) Workplace stress: etiology and consequences. J Workplace Behav Health 21(2):89–97 10. O’Connor DB, Thayer JF, Vedhara K (2021) Stress and health: a review of psychobiological processes. Annu Rev Psychol 72:663–688 11. Thoits PA (2010) Stress and health: major findings and policy implications. J Health Soc Behav 51(1):S41–S53 12. Garg A, Chren M-M, Sands LP, Matsui MS, Marenus KD, Feingold KR, Elias PM (2001) Psychological stress perturbs epidermal permeability barrier homeostasis: implications for the pathogenesis of stress-associated skin disorders. Arch Dermatol 137(1):53–59 13. Adam TC, Epel ES (2007) Stress, eating and the reward system. Physiol Behav 91(4):449–458 14. Hou X, Liu Y, Sourina O, Tan YR, Wang L, Mueller-Wittig W (2015) EEG based stress monitoring. In: 2015 IEEE international conference on systems, man, and cybernetics. IEEE, pp 3110–3115 15. Holmes TH, Rahe RH (1967) The social readjustment rating scale. J Psychosom Res 16. Monroe SM (2008) Modern approaches to conceptualizing and measuring human life stress. Annu Rev Clin Psychol 4:33–52 17. Weiner IB, Edward Craighead W (eds) (2010) The Corsini encyclopedia of psychology, vol 4. Wiley 18. Ciabattoni L, Ferracuti F, Longhi S, Pepa L, Romeo L, Verdini F (2017) Real-time mental stress detection based on smartwatch. In: 2017 IEEE international conference on consumer electronics (ICCE). IEEE, pp 110–111 19. Can YS, Arnrich B, Ersoy C (2019) Stress detection in daily life scenarios using smart phones and wearable sensors: a survey. J Biomed Inform 92:103139 20. Anderson A, Hsiao T, Metsis V (2017) Classification of emotional arousal during multimedia exposure. In: Proceedings of the 10th international conference on pervasive technologies related to assistive environments, pp 181–184 21. Sun F-T, Kuo C, Cheng H-T, Buthpitiya S, Collins P, Griss M (2010) Activity-aware mental stress detection using physiological sensors. In: International conference on mobile computing, applications, and services. Springer, Berlin, Heidelberg, pp 282–301 22. Castaldo R, Melillo P, Bracale U, Caserta M, Triassi M, Pecchia L (2015) Acute mental stress assessment via short term HRV analysis in healthy adults: a systematic review with metaanalysis. Biomed Signal Process Control 18:370–377 23. Bajpai D, He L (2020) Evaluating KNN performance on WESAD dataset: In: 2020 12th international conference on computational intelligence and communication networks (CICN). IEEE, pp 60–62
Movie Tag Prediction System Using Machine Learning Vivek Mehta, Tanya Singh, K. Tarun Kumar Reddy, V. Bhanu Prakash Reddy, and Chirag Jain
Abstract Movies can be tagged with various details such as genre, plot structure, soundtracks, and visual and emotional experiences. This information can be used to build automatic systems to extract similar movies, enhance user experience, and improve the recommendations. In this paper, a machine learning-based approach is proposed to predict the tags which can be associated with a given movie. The problem is posed as a multi-label classification problem. To solve this, firstly, we created a fine-tuned set of various tags that exposed the varied characteristics of movie plots (a textual summary of a movie). Then, using different textual representation techniques such as TF-IDF, AVGW2V (averaged word vector), and several machine learning classifiers are used to predict the tags associated with a movie. It is believed that the proposed machine learning-based tag prediction can be useful in other tasks related to narrative analysis. Keywords Multi-label classification · TF–IDF · Word2Vec · Pattern recognition · Big data
1 Introduction In the modern world, consumption of multimedia content is possible over different types of devices. Movies are also one of them that are consumed by people over numerous platforms. Various properties of movies, such as genre, plot structure, soundtracks, and emotional responses, can be labeled into different classes which is done manually for extracting needed information from the data and assigning suitable tags [6] and because of this massive volume of multimedia is generated. It is challenging to automatically analyze this content to determine its validity and classification. As a result, tag quality is based on a subjective criterion that differs from person to person. This generated metadata makes it difficult to gain complete V. Mehta (B) · T. Singh · K. T. K. Reddy · V. B. P. Reddy · C. Jain School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_23
251
252
V. Mehta et al.
insights into major elements of a movie like plot, theme, and structure. Moreover, due to lack of precision, many irregularities get generated. Hence, data become less accurate which directly impacts user experience. This data may be utilized to create automated algorithms that extract related movies, dealing with the growing problem of information overload and improve the user experience. Folksonomyc [17] which is popularly known as collaborative tagging or social tagging which is a way to collect community feedback in form of tags about online items. Movie recommendation systems such as based on Internet Movie Database (IMDB) utilize user-generated tags to provide summarized tags related to the movie. This tag prediction has various applications such as object detection, automatic subtitles generation, and optimization of movie search engines. Based on the above references, we are motivated to address this problem by developing an automated engine that can extract tags from the plot of the movie which is a detailed description of the movie (synopsis of movie storyline) or summary of a movie. A movie can have more than one tag, hence, multi-label categorization can be utilized here, where each sample is given a set of target labels. For example, classifying a movie which may be adventure or action, comedy, horror, flashback, or combination of all. Machine learning’s remarkable breakthroughs have paved the road for discovering patterns in data with high accuracy. Machine and deep learning-based algorithms can efficiently complete such tasks. The key contributions in this paper are to propose a machine learning paradigm that utilizes movie plots to perform tags prediction related to a movie. The details related to data preprocessing, data representations, and implementation of various machine learning models are provided in Sect. 3. This paper mainly contains four sections. This section contains the introduction, Sect. 2 contains the work that directly relates to the proposed work. Section 3 contains the proposed methodology, and Sect. 4 contains the experiments and result analysis (Figs. 1 and 2).
2 Related Work We feel that there has been a low attention toward the tag prediction and categorization of movies in the literature. The relevant work in this sector is mainly focused on small-scale image, video, blog, and other content-based tagging. For example, the redundancy across YouTube videos was used by Siersdorfer et al. [14] to discover connections between videos and gives tags to similar ones. While Chen et al. [2] developed a video tagging approach in which a text-based representation of a video from various sources was generated on which a graph model was applied to find and score the important keywords that would serve as tags. This method was reliant on a written description that has to be generated manually. Xin et al. [18] investigated various free code information Web sites and suggested the “TagCombine” algorithm. The author took into account three factors: similarity-based ranking, multi-label
Movie Tag Prediction System Using Machine Learning
253
Fig. 1 Movies (Ajay kumar Selvaraj Rajagopal | Web Mining [IS688, Spring 2021])
Fig. 2 Tag cloud
ranking, and tag term ranking. According to Zhang et al. [19], correlations between labels should be explored to gain more insights into multi-label learning. On other Web sites, Lipczak et al. [10] applied collaborative tagging and used two different approaches: graph based and content based. In diverse areas such as music and images, automatic tag creation based on content-based analysis has received a lot of attention. For example, deep neural models have all been used to create tags for music tracks by Choi et al. [3]. Lyrics were used to generate tags for music by Van Zaanen and Kanters [16]; and Eck et al. [5], Dieleman, and Schrauwen [4] utilized acoustic features from the songs to generate various tags. There have been some studies for predicting tags for Web content such as AutoTag by Mishne et al. [12] described a model where given a blog post, various tags are suggested that appear to be relevant; the blogger then explores the ideas and selects the ones which seem helpful and a similar model by Sood et al. [15] called “TagAssist”
254
V. Mehta et al.
that generates tag ideas for new blog articles based on previously tagged posts. To generate tags, most of these systems used content-based resources such as user metadata and tags given to similar resources. There have been various similar works in the tag prediction domain using plot synopses like Kar et al. [8] used plot analysis for tag prediction. For each movie, the algorithm forecasts a limited number of tags. However, because the system could only capture a tiny fraction of the multi-dimensional characteristics of movie plots, the tag space produced by the system for the test data only covers 73% of the real set. On the other hand, Makita and Lenskiy [11] present a Naive Bayes model for predicting movie genres based on ratings given by users. The notion is that users favor some genres more than others. Tag prediction has been studied by one more researcher Kuo [9] who used a co-occurrence approach to generate tags based on the words in the post and their relationship to tags. The model was created for next-word prediction in big datasets and then modified by limiting the predicted next word to just tags. This co-occurrence algorithm correctly predicts one tag per post with a classification accuracy of 47%. Ho et al. [7] explored several techniques for categorizing movie genres based on plot. Parametric mixture model (PMM), one-versus-all support vector machines (SVMs), multi-label K-nearest neighbor (KNN), and neural network are some which utilized frequency-inverse document frequency in the study of their approach of the words as features. This experiment was conducted on a limited dataset consisting of 16k titles of movies for both testing and training datasets. It predicted only some limited genres like adventure, comedy, crime, documentary, drama, and family. The best F1 score obtained was 0.58. Blackstock and Spitz [1] conducted an experiment on a limited dataset of 399 scripts, with the best F1 score being 0.56. They attempted to categorize movies with the help of the logistic regression approach by retrieving features from scripts like the ratio of descriptive to nominals words. The model assesses the likelihood that the movie belongs to each genre based on extracted characteristics and picks the k best scores as its predicted genres. Schapire and Singer [13] provided two modifications, namely multi-class and multi-label text classification with as an extension to the AdaBoost algorithm (to solve classification, regression problems this was mainly developed). The conversion of the multi-label problem into separate binary classification problems is done by 1st one, and the ranking the labels in order for the correct one for achieving highest ranking is done by 2nd extension (Fig. 3).
3 Proposed Methodology There are various datasets available that contain movies and their plots, but our main concern is to retrieve a dataset with certain expected attributes, such as tags should be closely related to the plot. The metadata that is completely irrelevant to the plot should be eliminated. The redundancy in tags should be avoided because we need to assign unique tags, so having tags that represent the same meaning would be
Movie Tag Prediction System Using Machine Learning
255
Fig. 3 Flow diagram 1
ineffective. So, we have gathered data from various Internet sources that contain nearly 14,000 movies with unique tag set of 72 tags. Here, each data point has six attributes including IMDB id to get the tag association information for respective movies in the dataset, title of the movie, plot of the movie, tags associated with each movie, split attribute indicating whether the data belongs to test, or train set and source which is either IMDB or Wikipedia (Fig. 4). Moreover, plot synopses should not contain any noise such as HTML tags or IMDB alerts and include sufficient information because understanding stories from extremely short texts would be challenging for any machine learning system; each overview should include at least 10 sentences. Although the text is unstructured data, it is often created by individuals for the purpose of being understood by others. So, how can we handle a big volume of text and convert it into a representation that can
256
V. Mehta et al.
Fig. 4 Movie versus no. of tags per movie
Fig. 5 Percentage of tags in moves versus no. of tags
be used to predict and classify using machine learning models? There are a variety of methods for cleaning and preparing textual data, and we used a few of them here (Figs. 5 and 6). • Converting every word to lowercase and removing HTML tags or other irrelevant elements present in the dataset. • De-contraction of words like can’t to cannot and removing any stop words if present such as “the”, “a”, “an”, “in” as we would not want these terms to take important processing time or space in our database. • Lemmatization of words, which typically refers to performing things correctly with the help of a vocabulary and returning a word to its root form, for instance, converting words that are in 3rd person to 1st person and future and past tense verbs to present tense.
Movie Tag Prediction System Using Machine Learning
257
Fig. 6 Tag distribution
• Stemming refers to reducing words to their word stem that affixes to prefixes and suffixes like “-ing”, “-es”, “-pre”, etc. Exploratory analysis for data of tags data distribution and feature engineering. We created a SQL database file of the given source CSV file and delete the duplicate entries and modify the same by adding a new custom attribute (6 + 1) tag count which indicates the number of tags associated per movie so, we can find the exact count, how many movies are associated with how many tags. We also checked the number of unique tags present in the dataset using the bag of words (BOW) technique which is implemented using the count vectorizer method. We must transform it into vectors of numbers since machine learning algorithms do not accept the raw text as input data. Bag of words is the easiest way to represent text documents. In other words, it will determine how many times a specific word appears in a given document. This method yielded a tag cloud such as flashback, violence, murder, romantic, and cult. This way we produced a more generic version of the common tags relevant to the plot of the movie. On the other hand, tags such as entertaining and suspenseful are slightly less common got filtered out. TF–IDF Vectorizer Here, less frequent words are assigned comparatively more weight. TF–IDF is the product of term frequency (the ratio of the number of times a word appears in a document to the total number of terms in the document) and inverse document frequency (the log of the number of times a term appears in a document) (the total number of documents to documents with term present in it). We tested several approaches in the baseline model construction portion, including the multinomial Naive Bayes classifier, which is good for discrete feature classification, such as word counts for text categorization in a document. Integer feature counts are typically required for multinomial distributions, making them robust and simple to build. While logistic regression is used when the dependent variable (target) is categorical, and the ques-
258
V. Mehta et al.
Fig. 7 Sigmoid function
tion arises why linear regression is not used for classification? Two things explain this. The first one is that classification problems mandate discrete values whereas linear regression only deals with continuous values which makes it not suitable for classification. The second thing is the threshold value for this considering a situation where we need to determine whether an email is spam or not. If we use linear regression here, we will need to select a threshold by which we may classify the data. If the actual class is malignant with a predicted value of 0.45, and the threshold is 0.5, then it will be classified as non-malignant, which can result in serious consequences in real time. So, it can be seen that linear regression is unbounded that is why we need logistic regression for instance, in a binary classification where we need to predict if a data point belongs to a particular class or not and the class with the highest probability is where the data point belongs. So, to fit a mathematical equation of such type, we cannot use a straight line as all the values of output will be either 0 or 1. That is when sigmoid function comes into picture that transforms any real number input, to a number between 0 and 1 (Fig. 7). Sigmoid equation: 1 (1) S(x) = 1 + e−x Hypothesis → Z = W X + Bh(x) = sigmoid(Z )
(2)
Here if Z becomes close to infinity; Y will become 1, and if Z is close to negative infinity, Y will be predicted as 0. Sigmoid equation for multiple features: Here, we are computing output probabilities for all K − 1 classes. For K th class = 1 − Sum of all probabilities of k − 1 classes. So, we can say that multinomial logistic regression uses K − 1 logistic regression models to classify data points for K distinct classes.
Movie Tag Prediction System Using Machine Learning
259
Fig. 8 Data-flow diagram
Another method that we tried is SGD classifiers which is an optimization method, while logistic regression is a machine learning model that defines a loss function, and the optimization method minimizes/maximizes it. In all the cases, our goal is to maximize the micro-averaged F1 score. We used micro-averaging as here, a movie might have more than two tags/labels associated with it. Micro-averaged F1 score is the harmonic mean of micro-recall and microprecision. Micro-precision is the sum of all true positives to the sum of all true positives and false positives. Micro-precision =
TP1 + TP2 + TP3 + · · · + (FP1 + FP2 + FP3 + · · · ) (3) (TP1 + TP2 + TP3 + · · · )
Micro-recall is calculated by first finding the sum of all true positives and false positives, over all the classes. Then, we compute the recall for the sums (Fig. 8).
micro-recall =
TP1 + TP2 + TP3 + · · · + (FN1 + FN2 + FN3 + · · · ) (TP1 + TP2 + TP3 + · · · )
(4)
Logistic regression with outliers was the model that provided us the highest microaveraged F1 score. One versus rest, sometimes known as one-vs-all, is a method that
260
V. Mehta et al.
involves fitting a single classifier to each class. The class is fitted against all the other classes for each classifier. Despite the fact that this technique cannot handle multiple datasets, it trains fewer classifiers, making it a faster and more popular option. AVGW2V By capturing semantic information, word embeddings have been proved to be successful in text classification problems. As a result, we average the word vectors of each word in the plot to capture the semantic representation of the plots. Thus, we get a 1D vector of features corresponding to each document. By using this model, also, logistic regression gave the highest micro-averaged F1 score. According to the exploratory data analysis, a movie is typically associated with three tags. As a result, we attempted to create a model that could predict the top three tags. We utilized the same set of features this time, but the number of tags was set to three. By using TF–IDF vectorizer, the highest F1 score was achieved by using SGD Classifier with log loss as well as accuracy score is also improving and by using AVGW2V (average word2vec) logistic regression model gave the highest F1 score. Similarly, for the next training of our model, we have set the tags to the top 5 and observed the F1 score for each model. We read the dataset and vectorize the tags using the BoW algorithm in the following training step to see which word appears how many times. Then, to construct a new data frame, we sorted these tags in decreasing order depending on how many times they appeared in the document. Out of the 71 distinct tags, we manually chose the top 30 tags based on their frequency. Then, except for the 30 tags present per movie, we eliminated all other tags along with their respective rows and repeated the procedure to train the model further. For the next stage, we employed Python’s topic modeling and latent Dirichlet allocation (LDA). We used latent Dirichlet allocation (LDA) to classify text in a document to a specific topic. Topic modeling is a statistical model for discovering the abstract “topics” present in a collection of documents, and it is a commonly utilized for the discovery of hidden semantic meaningful structures in the body of text. It generates a topic per document and word per topic model based on the Dirichlet distribution. (LDA) is a common topic modeling approach with great Python implementations in the Gensim package. The LDA model above is made up of ten separate topics, each of which is made up of several keywords and given a specific amount of weight to the subject that represent the importance of a keyword to the particular topic. It determines the dominant subject of a particular text and is one of the practical applications of topic modeling. To accomplish so, we search for the topic with the highest percentage contribution. Then, we saved these dominating topics to a CSV file and concatenated it with our original data and used the same method to train the model even further to improve accuracy.
Movie Tag Prediction System Using Machine Learning
261
Fig. 9 Top3-tag-prediction
4 Result and Analysis To solve this multi-label classification problem, we used one vs rest classifier combining with different types of binary classification algorithms. First time we took data containing all tags (complete preprocessed dataset). To make the model understand the plot synopsis, we trained the model with different word counts like 1 grams, 2 grams, and 3 grams. Here, firstly, TF–IDF vectorizer is used to convert the synopsis data into numeric data. After that we tried to train the model with one vs rest classifier multinomial NB, SGD classifier-log loss, SGD classifier-hinge loss, and logistic regression with 1 grams (initially), and the same process is repeated with 2, 3 ,4 grams. Our aim here is to maximize the F1 score. In case of multinomial NB, the score decreased from 1 to 2 grams and stayed same through 2, 3, 4 grams. And when it comes to SGD classifier-log loss, it decreased from 1 to 2 to 3 and increased in 4 grams (and this is the max). SGD classifier-hinge loss increased gradually, and highest is at 4 grams. And now comes the logistic regression it is score increased from 1 to 2 grams and remained unchanged later, and logistic regression showed highest scores in individual models of 1, 2, 3, 4 grams and also in the overall case (and it is 0.26). And on a overview 4 grams model gave the better results when compared with the rest as shown in Fig. 9. Now, to improve the model even further, we try to implement this procedure with top3 and 5 tags as we saw in analysis of data, we take maximum features as 3 and 5 in these respective trainings. And follow the above that we have done with data with all tags. Multinomial NB stays the same in all variants. The F1 of SGD classifier-log loss increases from 1 to 2 grams and decreases from 3 to 4 grams (but still > 1 gram).
262
V. Mehta et al.
Fig. 10 All-tag-prediction
SGD classifier-hinge loss doesn’t show much variation, and logistic regression also stays the same in all cases. But, this time SGD classifier-log loss shows highest score (of all variants) in case of 2 grams that is 0.568. Here, 2 grams performance is better compared to the rest as shown in Fig. 10. Let’s go a step further and try to predict top 5 tags Fig. 11. The F1 score of multinomial NB it increases from 1 to 2 grams and remains unaltered. SDG classifierlog loss F1 decreases in 3 and 4 grams where as equal in 1 and 2 grams. SDG classifier-hinge loss increases the score gradually in all grams. Logistic regression shows same scores with all 1, 2, 3, 4, grams. But, here also, logistic regression is the highest F1 score achiever, and the score is 0.535. Now, let’s compare among these models which try to predict all, top 5 and 3 tags. We can clearly see the significant increase in scores of F1 in top three and five tag prediction compared to all tags. And top 3 model is showing the best results among these. Additionally, we also tried to use word to vector as it converts the data into more meaningful sense compared to TF–IDF. The results are shown in Figs. 12, 13, and 14. The operations are performed just as above for all, 5 and three tag predictions, we also applied this for the dataset containing top 30 tags and got the following F1 scores as follows. All-tag prediction—logistic regression (gave highest score that is 0.214). Top 3—logistic regression (gave highest score that is 0.563). Top 5—logistic regression (gave highest score that is 0.516). Top 30—data tags logistic regression (gave highest score that is 0.32). As one can see this followed the same trend like when we used TF–IDF. And logistic regression gave the best model outcomes in case of both vectorizers. Also, top
Movie Tag Prediction System Using Machine Learning
263
Fig. 11 Top5-tag-prediction
Fig. 12 All-tag-prediction-w2v
Fig. 13 Top3-tag-prediction-w2v
3 prediction gave some what satisfying results in both. We also used topic modeling (which finds the topics with in our synopsis) and LDA (latent Dirichlet allocation) model to implement this. Here, also logistic regression gave the highest result (0.366).
264
V. Mehta et al.
Fig. 14 Top5-tag-prediction-w2v
5 Conclusion and Future Scope In this research work, a machine learning-based methodology for movie tag prediction is proposed using the movie plots. Firstly, it gives a way that allowed us to create a fine-tuned set of various tags that expose the varied characteristics of movie plots. We took on the task of extracting tags relating to movie plots from noisy and repetitive tag spaces which needed to be preprocessed. We present an analysis where we tried to predict movie tags from plot synopsis. Logistic regression model built using TF-IDF representation achieved the highest micro-averaged F1 score 0.583. As, it was observed in the EDA section that on an average a movie contains 3 tags. So, we tried to train our model multiple times and each time we set the tags to different values like top 3 and top 5, and we also selected the top 30 tags manually to analyze our model to a greater extent, and we also used various semantics vectorizations such as latent Dirichlet allocation (LDA) and word2vec to improve our model. Although we had a small dataset, we still got a decent F1 score. One of the key limitation of this work which can be taken in future is to test the use of more advanced text representation models such as fasttext, glove, and Elmo.
References 1. Blackstock A, Spitz M (2008) Classifying movie scripts by genre with a MEMM using NLPbased features. Citeseer 2. Chen Z, Cao J, Song Y, Guo J, Zhang Y, Li J (2010) Context-oriented web video tag recommendation. In: Proceedings of the 19th international conference on World wide web, pp 1079–1080 3. Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2392–2396 4. Dieleman S, Schrauwen B (2013) Multiscale approaches to music audio feature learning. In: 14th international society for music information retrieval conference (ISMIR-2013). Pontifícia Universidade Católica do Paraná, pp 116–121 5. Eck D, Lamere P, Bertin-Mahieux T, Green S (2007) Automatic generation of social tags for music recommendation. Adv Neural Inf Process Syst 20
Movie Tag Prediction System Using Machine Learning
265
6. Greenberg J, Spurgin K, Crystal A (2005) Final report for the AMeGA (automatic metadata generation applications) project. Technical report 7. Ho KW (2011) Movies’ genres classification by synopsis 8. Kar S, Maharjan S, Solorio T (2018) Folksonomication: predicting tags for movies from plot synopses using emotion flow encoded neural network. In: Proceedings of the 27th international conference on computational linguistics, pp 2879–2891 9. Kuo D (2011) On word prediction methods. Technical report, EECS Department 10. Lipczak M (2008) Tag recommendation for folksonomies oriented towards individual users. ECML PKDD discovery challenge, p 84 11. Makita E, Lenskiy A (2016) A multinomial probabilistic model for movie genre predictions. arXiv preprint arXiv:1603.07849 12. Mishne G (2006) Autotag: a collaborative approach to automated tag assignment for weblog posts. In: Proceedings of the 15th international conference on World Wide Web, pp 953–954 13. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2):135–168 14. Siersdorfer S, San Pedro J, Sanderson M (2009) Automatic video tagging using content redundancy. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 395–402 15. Sood S, Owsley S, Hammond KJ, Birnbaum L (2007) Tagassist: automatic tag suggestion for blog posts. In: ICWSM 16. Van Zaanen M, Kanters P (2010) Automatic mood classification using TF*IDF based on lyrics. In: ISMIR, pp 75–80 17. Vander Wal T (2005) Folksonomy definition and Wikipedia. Off the top: Folksonomy entries. Vanderwal 2 18. Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th working conference on mining software repositories (MSR). IEEE, pp 287–296 19. Zhang ML, Zhang K (2010) Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 999–1008
Online Recommendation System Using Collaborative Deep Learning S. B. Goyal , Kamarolhizam Bin Besah, and Ashish Khanna
Abstract Recommendations engines are information filtering technologies that employ various frameworks as well as data to inform a specific user/viewer of the most important aspects. Recommendation systems for content streaming platforms or OTT platforms are critical for responding to the changing demands of a large audience with high-quality, error-free content. Collaborative filtering and content-based filtering are two main ways of implementing a recommendation system that has been presented. Both strategies have advantages, yet they are ineffective in many situations. To boost performance, a variety of hybrid solutions are being examined. This research provides Hybrid User Profiling recommender systems for OTT platforms based on the Movie Lens Dataset and uses Machine Learning techniques as Collaborative learning. The result shows that the proposed method has an MSE of 0.67, a root mean squared error of 0.82, and a training time of 52.46 s. The proposed algorithms have better outcomes in terms of performance parameters like MSE, RMSE, and training time, indicating that they may assist individuals in discovering useful and relevant content as compared to other algorithms. Keywords Recommendation system · Content streaming · OTT · Collaborative learning · Deep learning
1 Introduction OTT and VOD services continue to reach hundreds of thousands of individuals as online media gains popularity. Digital media platforms are seeking for apps that can read user thoughts and present a list of products/shows/items to which they are most likely to be drawn because to the quick change in customer behavior and ideas. S. B. Goyal (B) · K. B. Besah City University, Petaling Jaya, Malaysia e-mail: [email protected] A. Khanna Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_24
267
268
S. B. Goyal et al.
Similar trends may be seen in the Internet OTT entertainment market [1–3]. Calling referrals support OTT growth and income in a number of ways. Recommendation algorithms are widely used by well-known OTT and VOD providers to entice users to return to their subscription-based, transactional, or ad-supported platforms. Today User profiling goes above selling individualized content in recommendation systems. The hybrid recommendation approach improves site persistence and loyalty while also tailoring the Internet streaming experiences for each unique user [4]. Recommendation engines are utilized in many different web applications nowadays. Additionally, the number is rising quickly. A recommendation engine or recommender system is a facility that predicts customer preferences and converts their responses into choices. Indulging in this technology broadens the reach of new advertising and enhances customer interactions. Content-based recommendations and collaborative filtering systems are two of the greatest examples of recommendation engine implementations, as illustrated in Fig. 1 [5, 6]. The features of the products suggested are examined by content-based algorithms. When users search for a movie in a certain genre, for example, the system suggests all of the best-rated films in that genre. Items are recommended by collaborative filtering systems and recommender systems based on a resemblance between search/type of user personality. Shows and episodes, for example, are suggested based on the searches and interests of other individuals who watch the same program as the consumer. Several recommendation system models are employed throughout the various OTT vertices. The majority of the public has moved to OTT platforms instead of conventional media due to the current pandemic impact. It also heightened internal market competitiveness among the entertainment industry’s leading online media COLLABORATIVE FILTERING
CONTENT-BASED FILTERING
Watched by both users
Watched by users
Similar Movies
Watched by her, Recommended to him.
Fig. 1 The basic recommendation
Recommended to users
Online Recommendation System Using Collaborative Deep Learning
269
providers, with the majority of them believing the recommender system can be a game-changer [6]. The value of an AI-driven recommender system for OTT services is enormous, and data is the key that unlocks its full potential. Till-data implementations of recommendation algorithms in OTT are entirely based on consumer data insights. A user’s historical data history is analyzed, and future discoveries which are more inclined to interact with the user are predicted [7, 8]. Machine Learning-based recommender systems assist OTT service providers in determining whether or not a service meets the user’s needs. Reinforcement learning scales the recommendation funnel by carefully studying user preferences, interests, and tastes over time. Recommendation systems may be used in a variety of future ways. In the information age, customized and smooth streaming experiences are becoming the norm, thanks to the use of data and streamlined recommendation algorithms. In addition, by providing custom material, the streaming platform’s quality is enhanced. The use of AI in the recommendation sector is becoming more common. The future of hybrid audio or video streaming will be built on smart suggestions. And it has the ability to do feats that we had not anticipated. The following are some of the practical benefits of implementing a recommendation engine: faster and more efficient content discovery, binge streaming promotion, valuable advertising, and converting a non-subscriber user to a subscribed user [9]. Data exploration is reshaping technology and allowing recommendation systems to be built. Furthermore, the development of hybrid recommendation implementation fields is transforming OTT platforms, allowing them to provide better services and improve customer outcomes. And as the technology landscape evolves, video and audio streaming platforms will become more important in providing a consistent experience and retaining customers [10, 11].
2 Literature Review In [12], author presented a novel similarity algorithm, dubbed User Profile Correlation-based Similarities (UPCSim), that takes into account both genre and user-profile information, such as period, M/F, employment, and area. One of the most extensively used recommendation system methodologies is collaborative filtering. One problem with CF is determining how to apply a similarity algorithm to improve the recommending method’s accuracies. Lately, a similar framework had been presented that integrates the users’ ratings with the user’s behavior value. As a result, the weighted of the similarity of user rating value as well as user behavior value is calculated using all of the user-profile data. The UPCSim method beats the prior algorithm in terms of recommending accuracies, cutting MAE by 1.640% and Root MSE by 1.40% in an experiment. In [13], author presented a real-time multilingual tweet-based movie recommendation system for the movie domain. The LinqToTwitter Library was used to get these tweets from the Twitter API. Tweets are also subjected to sentiment analysis. Multilingual and real-time tweets are taken into account in this study. The Google
270
S. B. Goyal et al.
Convert API is used to translate these tweets into the target language. For preprocessing, the suggested study employed the Stanford library, and RNN was used to categorize the tweets. The tweets are divided into three categories: good, negative, and neutral. Tweets are preprocessed to eliminate undesirable words, URLs, emoticons, and other elements. Finally, the customer is recommended a movie depending on the categorization. This suggested approach is superior to existing techniques since it is implemented on various tweets and emotional analysis is used to get better outcomes. This method is accurate to 91.67%, precise to 92%, recall to 90.2%, and f -measure to 90.98%. In [14], author proposed k-Means clustering or a Hidden Markov model, or by using bagging and boosting approaches. They not just to increase their business but also improve customer experiences by using this technique of displaying movies or consumer items into the profile of a specific customer. However, there are several issues with the standard techniques, such as the cold start, shrill attack, and so on, which expands the field of view of research in the area. This project combines CF with CB Recommender system [15] to create a product and movie recommendation system for social networking sites that demonstrates the efficacy of collaborative filtering while also illustrating the problems of content-based filtering. The suggested work has an f -score of 91.85%, precision of 86.68%, recall of 97.68%, and accuracy of 86.68%. To study, analyze, and categorize products and save information depending on user experience, the creator of [16] employed machine learning techniques. Benchmark Unifying Computed System (UCS), a server for data-dependent computer products designed for evaluating hardware, assisting with visualization, and software administration, gathers product data together with user reviews. Machine learning algorithms have been proven to outperform other techniques based on the findings and comparisons. When compared to other current systems, the suggested system has higher MAPE of 96% and accuracy of approximately 98.0%. The suggested HRS system’s mean absolute error is almost 0.6, indicating that the system’s performance is very effective. In [17], author designed a collaborative filtering (CF)-based recommendation systems for cold-start issue that occurs when there are insufficient ratings for modeling user reference and discovering trustworthy comparable users and presents a novel content-based CF strategy based on item similarity to tackle this issue and increase the efficiency of recommendation systems. In the movie domain, uses the model to extract properties such as genre, producers, characters, and stories. Researchers use the Jaccard coefficients index to convert retrieved attributes like genre, producers, and actors to vectors while the plot feature is turned to semantic vectors. The experiment’s findings show that the accuracy, precision, and recall are each 0.6360, 83.44, and 77.300, respectively. The analysis of the F1—81.027 shows that the proposed system performs better than the baseline systems in terms of accuracy, precision, recall, and F1 scores under cold-start conditions. By using a CF framework to predict suggestions and ratings in this study, the authors of [18] take a preliminary look at how to determine consumers’ preferences for movies. Therefore, researchers may create two lists of movies for each person, one
Online Recommendation System Using Collaborative Deep Learning
271
for the movies they like and one for the movies they do not. Based on this selection of two movies, they generate user profiles for favorable and negative users. Therefore, this system will suggest movies to users that are least similar to their negative profile and most similar to its good profile. In the end, the results demonstrate that this approach improves the MAE index by 12.540%, the MAPE index by 17.680%, and the F1 score by 10.160% when compared to the conventional CF technique. A knowledge-based design recommendation system (IKDRS) for appropriate personalized stylish product development methods with virtual demos for a particular customer was developed by the author in [19]. In order to construct this system, 3D body scanning technology and a sensory evaluation technique are used to first collect anthropometric data and the designer’s appraisal of body forms. This technique operates by adhering to a recently proposed design process: determining customers’ emotional needs; developing schemes; suggesting; displaying and assessing 3D virtual prototypes; and modifying design variables. The designer is free to repeat this process till satisfied. The satisfaction rate is 91%, and the lower standard deviation is around 0.0914. The recommended solution has been tested using several effective real-world design scenarios. In [20], the author suggested using sequencing patterns mining (SPM) and a hybrid knowledge-based recommendation ontology in this research. In the suggested proposing approach, ontology is used to define and explain domain information about the learners and learning resources, while the SPM algorithm identifies the learners’ sequential learning patterns. The results show that the recommended hybrid recommender system performs better after being tested in a number of different ways. The outcomes of the suggested technique include sparsity of 66.7%, accuracy of 0.63, recall of 0.35–0.75, user satisfaction of 94%, and MAE of 0.66.
3 Proposed Methodology Figure 2 shows the step-wise recommender system proposed methodology. The recommendation system works in six steps consisting of data preprocessing, data exploration, Cosine similarity, Matrix generation, feature reduction, and lastly collaborative learning using machine learning. A detailed discussion for the above-given steps is given below.
3.1 Data Preprocessing Data preparation is the process of transforming unstructured data into a format that can be understood. One of the data preprocessing techniques is data cleansing. Duplicate data must be removed as part of the data-cleaning procedure. Getting rid of duplicate entries in today’s databases is increasingly difficult. Record deduplication is the
272
Data Preprocessing
S. B. Goyal et al.
• Cleaning • Removing Duplicates • Checking for NaN values
• Probablity Density Function • Cumulative Distribution Function Data Exploration
Cosine Similarity
Matrix Generation
Feature Reduction
Collaborati ve Learning
• User-User Similarity • USer-Item Similarity • Item-Item Similarity
• Matrix Factorization
• TruncatedSVD
• Machine Learning
Fig. 2 Proposed methodology
process of finding and removing duplicates. Parsing, data transformation, duplication reduction, and statistical approaches are all part of data cleaning. Not A Number (also known as NaN) is a popular method to express a missing number in data. One of the most common issues in data analysis is the presence of a NaN value. In order to achieve the desired effects, it is critical to deal with NaN. It’s simple to find and deal with NaN in an array, series, or data frame. Other approaches include using the Pandas library, NumPy library, math library, compared with itself, and testing the range.
3.2 Data Exploration Data scientists use data visualization and statistical tools to identify dataset properties in the initial step of data analysis, called data exploration, in order to better
Online Recommendation System Using Collaborative Deep Learning
273
comprehend the data. The probability density function calculates the probabilities of a continuous random variable’s outcomes. It is also referred to as a probability distribution function or simply a probability function. The formula for the probability density function is as follows: P(a < x < b) = f (x)dx P(a ≤ X ≤ b) = f (x)dx
(1)
For continuous random variables, the probability density function (PDF) is established, whereas for discrete random variables, the probability mass function (PMF) is established. Another way to explain the distributions of random variables is to use the cumulative distribution function (CDF) of the variable. The random variable XX’s cumulative distribution function (CDF) is specifically defined: F X (x) = P(X ≤ x), for all x ∈ R
(2)
3.3 Cosine Similarity Regardless of size, the cosine similarity metric is used to assess how comparable texts/documents are. Cosine similarity is used to calculate the cosine of the angle between two vectors that are projecting in a multi-dimensional area. The first stage is to build the model by evaluating how similar each item pairing is to the others. There are several ways to find similarities between item pairings. The usage of cosine similarity is one of the most prevalent ways. It is mathematically represented as: A · B B) = Similarity ( A, − → || A || ∗ || B||
(3)
• User-based Collaborative Filtering (UB-CF): Recommendations based on evaluating two users’ commonalities. • Item-based Collaborative Filtering (IB-CF): Recommendation based on evaluating similarities between two items using people’s ratings. • User-Item-based Collaborative filtering: When dealing with large volumes of data, user-item-based filtering is likely to be more productive, while item-based collaborative filtering is required to perform effectively on small data sets.
274
S. B. Goyal et al.
3.4 Matrix Generation In coding theory, a generator matrix is a matrix whose rows serve as the basis for linear codes. The code words are the majority of the linear functions of the row of this matrix, and the linear code is the rows space of its generating matrices. A type of CF method used in recommendation systems is matrix factorization. By using matrix factorization techniques, the user-item interaction matrices are broken down into the product of two lesser-dimension rectangle matrices. Matrix factorization is a fundamental embedding model. The following is what the model learns from the feedback matrix A ∈ Rm×n , where m is the number of users (or queries) and n is the number of items: • A user embedding matrix U ∈ Rm×d , with row i corresponding to the embedding for user i. • An item embedding matrix V ∈ Rn×d , with row j corresponding to the embedding for item j.
3.5 Feature Reduction There are a variety of techniques for minimizing the number of independent variables in training data. It is a matrix factorization approach comparable to PCA called Truncated Singular Value Decomposition (principal component analysis). The number of columns in a truncated SVD factorized data matrix is identical to the truncation. It mathematically shortens the number of floating digits by dropping the digits following the decimal place. A collaborative movie recommendation filtering method is singular value decomposition (SVD). The solution aims to provide users with movie suggestions based on latent features of item-user matrices. The SVD latent factor model is used to factorize matrices in the code. Singular-value decomposition is defined as the factorization of any matrix (mn) into its eigen decomposition or unitary matrix U (mm), rectangular diagonal matrix (mn), and V * (nn) complex unitary matrix. A given mn matrix reduced SVD technique will generate matrices with the stated number of columns, while a standard SVD procedure would generate matrices with m columns. It signifies that all features save the amount of features offered to it will be removed. The components from a high-level (user-item-rating) matrix’s factorization are separated using singular value decomposition, which splits a matrix into three submatrices. A = U SV T Singular matrix of (user * latent factors) = Matrix U, S matrix = diagonal matrices (shows the strength of each latent factor), Singular matrix of (item * latent factors) = Matrix U.
(4)
Online Recommendation System Using Collaborative Deep Learning
275
3.6 Collaborative Learning Introducing collaborative learning, in which several classifiers heads from same networks are trained on same training information at the same time to increase generalization and resilience for labeling noise without incurring additional inference costs. Auxiliary training, multi-task learning, and knowledge distillation are some of its advantages. To put it another way, proposed framework allows you to create a shared prediction system while keeping all of your dataset on the system. Machine Learning is the only basis for this concept which is used in recommendation system. Machine learning (ML) is a sort of AI techniques that enables software programmers to advance its predicting accuracies despite having been specifically intended to do it anyway. Ml algorithms use historical information as input to anticipate forthcoming expected outputs.
4 Results and Discussions 4.1 Data Set Description The grouplens project group compiled the movielens 1 m set of data from movielens for this research. On roughly 4000 films, the data collection comprises 100 million opinions from 6000 individuals. Users.dat, movies.dat, and ratings.dat are the three files that make up the dataset. The user ID, gender, age, occupation ID, and Zip-code are all included in the user data. Moreover, age is separated into seven age categories, employment is divided into twenty groups, and zip codes are not utilized. Table 1 includes the initial user information. The parameters movies ID, movies title, and film genre make up the Movies Data. In the Ratings data, there really are columns for user ID, movie ID, rating, and timestamp. Table 1 Example of original users data User ID
Gender
1
Female
2 3
Age (year)
Occupation ID
Zip code
1
11
65412
Male
53
16
63452
Male
26
15
78625
4
Male
39
16
20498
5
Male
34
8
23675
276 Table 2 Initial code parameter
S. B. Goyal et al. Variables
Values
Num-epochs
100
Batch size
64
Dropout keep
½
Learning rate
0.00001
4.2 Experimental Parameters Throughout the training stage, several important factors for the technique are defined beforehand, and the final outcome is displayed in Table 2. To begin, the set of data is split into training and a test set in a 70:30 ratio. Python is used for the training and then validation of the results. The CNN model’s adjustment variable is ascertained to be fixed in the steps outlined: as the learning rate reduces, the loss reduces initially and then increases whenever the learning rate = 0.00005, and eventually, the learning rate is selected as the learning rate = 0.00001.
4.3 Performance Parameters Two metrics, MSE and RMSE, are computed for performance evaluation: MSE stands for mean square error, which is calculated as the mean of the squared differences among actual as well as estimated value. MSE is calculated utilizing the equation given below: MSE =
n 1 (Yi − Yˆi )2 n i=1
(5)
MSE is for mean square error, and n stands for the number of observations. Y i stands for observed values. RMSE (Root Mean Square Error): RMSE is the type of error having root of the mean of the square values that is why the name RMSE. It is equal to the square root of the mean differences among observed and actual values. RMSE =
n i=1 (Yi
√
− Yˆi )
N
RMSE is the error value, N is the number of points that are non-missing. Y i represents the real data, while the other represents the computed values.
(6)
Online Recommendation System Using Collaborative Deep Learning
277
Fig. 3 Probability versus total number of ratings
4.4 Result Analysis Figure 3 shows the probability of the given dataset with varying number of ratings. The probability is maximum at 0–20 Ratings and as the ratings number increases probability decreases gradually and then becomes to zero. Figure 4 shows the loss vs Epoch graph for both training and test losses. The training and test loss decreases from 0.78 to approximately 0.64 loss and then becomes constant at a particular epoch value. Table 3 shows the performance evaluation of the proposed technique considering MSE and RMSE. For K-nearest neural network, MSE is 0.9546 and RMSE is 0.9762.
Fig. 4 Loss versus epoch graph
278
S. B. Goyal et al.
Table 3 Performance evaluation
Table 4 Comparative analysis
MSE
RMSE
K-NN
0.9546
0.9762
SVD
0.8722
0.9362
NMF
0.9279
0.9635
Fine_tuned CNN
0.67
0.8227
Techniques
MSE
RMSE
Time (in seconds)
CNN [21]
0.777
–
69.42
LSTM-CNN [21]
0.7724
–
64.25
DMGCF [22]
–
0.923
–
Ours
0.67
0.8227
52.46
For SVD, MSE is 0.872 and RMSE is 0.9362. Similarly, NMF is 0.9279 and RMSE is 0.9635. For fine-tuned convolution neural network, the MSE is minimum, which is 0.67, and RMSE is minimum, which is 0.8227. Table 4 shows the comparative analysis of different techniques and are proposed in terms of MSE RMSE and time. The convolution neural network method described in [21] has MSE of 0.77. Only one parameter is considered in this study. In addition, LSTM-based convolution neural network discussed in [21–25] has 0.7724 MSE with a time of 64.25 s. DMGCF-based work has RMSE of 0.923. This method also has one performance parameter. Our proposed work has MSE of 0.67 and RMSE of 0.82 with 52.46 s time taken.
5 Conclusion As the world becomes increasingly global, people have many options when it comes to movies. There are many different genres, civilizations, and languages available in the world of cinema. This highlights the issue of computer programs recommending movies to individuals. There has been a lot of work done in this field up to this point. There is, however, always room for development. A recommendation system is a program that, in response to particular data, suggests movies and web series across different OTT services. Typically, movie recommendation algorithms infer from the traits of previously liked films what movies a user would appreciate. Such recommendation systems are useful for businesses that gather data from a big number of clients and want to provide the best recommendations possible. Many criteria may be taken into account while creating a movie recommendation system, including the film’s genre, performers, and even the director. The systems may provide recommendations based on a single characteristic or a combination of two or more. The recommendation system in this study is based on a machine learning
Online Recommendation System Using Collaborative Deep Learning
279
system. The Movie Lens dataset was utilized to create the system. The algorithms may provide movie recommendations based on one, two, or more features. The performance parameters are MSE and RMSE. Proposed work has 0.67 MSE and 0.82 of RMSE and the training time is 52.46 s. The suggested approach surpasses all current strategies and provides more adaptive, smart recommendation systems for diverse OTT platforms, according to the results.
References 1. Reddy S, Nalluri S, Kunisetti S et al (2019) Content-based movie recommendation system using genre correlation. Smart Innov Syst Technol 105:391–397. https://doi.org/10.1007/978981-13-1927-3_42 2. Ahmed M, Imtiaz MT, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. In: IEEE 8th annual computing and communication workshop and conference (CCWC), pp 143–147. https://doi.org/10.1109/CCWC.2018.8301695 3. Pattanayak S, Shukla VK (2021) Review of recommender system for OTT platform through artificial intelligence. In: 2021 9th international conference on reliability, Infocom technologies and optimization (trends and future directions) (ICRITO). https://doi.org/10.1109/ICRITO 51393.2021.9596297 4. Zhang J, Wang Y, Yuan Z, Jin Q (2020) Personalized real-time movie recommendation system: practical prototype and evaluation. Tsinghua Sci Technol 25:180–191. https://doi. org/10.26599/TST.2018.9010118 5. Kumar S, De K, Roy PP (2020) Movie recommendation system using sentiment analysis from microblogging data. IEEE Trans Comput Soc Syst 7:915–923. https://doi.org/10.1109/TCSS. 2020.2993585 6. Wu CSM, Garg D, Bhandary U (2019) Movie recommendation system using collaborative filtering. In: Proceedings of the IEEE international conference on software engineering and service science (ICSESS), Nov 2019, pp 11–15. https://doi.org/10.1109/ICSESS.2018.866 3822 7. Priadana A, Maarif MR, Habibi M (2020) Gender prediction for Instagram user profiling using deep learning. In: 2020 international conference on decision aid sciences and application (DASA), pp 432–436. https://doi.org/10.1109/DASA51403.2020.9317143 8. Rajarajeswari S, Naik S, Srikant S et al (2019) Movie recommendation system. Adv Intell Syst Comput 882:329–340. https://doi.org/10.1007/978-981-13-5953-8_28 9. Da’u A, Salim N (2019) Recommendation system based on deep learning methods: a systematic review and new directions. Artif Intell Rev 53(4):2709–2748. https://doi.org/10.1007/S10462019-09744-1 10. Roy S, Sharma M, Singh SK (2019) Movie recommendation system using semi-supervised learning. In: 2019 global conference for advancement in technology (GCAT). https://doi.org/ 10.1109/GCAT47503.2019.8978353 11. Chaaya G, Abdo JB, Demerjian J et al (2018) An improved non-personalized combinedheuristic strategy for collaborative filtering recommender systems. In: 2018 IEEE Middle East North Africa communications conference (MENACOMM), pp 1–6. https://doi.org/10.1109/ MENACOMM.2018.8371042 12. Widiyaningtyas T, Hidayah I, Adji TB (2021) User profile correlation-based similarity (UPCSim) algorithm in movie recommendation system. J Big Data 8:1–21. https://doi.org/ 10.1186/S40537-021-00425-X/TABLES/8 13. Singh T, Nayyar A, Solanki A (2020) Multilingual opinion mining movie recommendation system using RNN. Lect Notes Netw Syst 121:589–605. https://doi.org/10.1007/978-981-153369-3_44
280
S. B. Goyal et al.
14. Datta D, Navamani TM, Deshmukh R (2020) Products and movie recommendation system for social networking sites. Int J Sci Technol Res 9 15. Ali SM, Nayak GK, Lenka RK, Barik RK (2018) Movie recommendation system using genome tags and content-based filtering. Lect Notes Netw Syst 38:85–94. https://doi.org/10.1007/978981-10-8360-0_8 16. Yi S (2020) Liu X (2020) Machine learning based customer sentiment analysis for recommending shoppers, shops based on customers’ review. Complex Intell Syst 63(6):621–634. https://doi.org/10.1007/S40747-020-00155-2 17. Nguyen LV, Nguyen TH, Jung JJ (2020) Content-based collaborative filtering using word embedding: a case study on movie recommendation. In: ACM international conference proceeding series, pp 96–100. https://doi.org/10.1145/3400286.3418253 18. Chen YL, Yeh YH, Ma MR (2021) A movie recommendation method based on users’ positive and negative profiles. Inf Process Manag 58:102531. https://doi.org/10.1016/J.IPM.2021. 102531 19. Dong M, Zeng X, Koehl L, Zhang J (2020) An interactive knowledge-based recommender system for fashion product design in the big data environment. Inf Sci (NY) 540:469–488. https://doi.org/10.1016/J.INS.2020.05.094 20. Tarus JK, Niu Z, Yousif A (2017) A hybrid knowledge-based recommender system for elearning based on ontology and sequential pattern mining. Future Gener Comput Syst 72:37–48. https://doi.org/10.1016/J.FUTURE.2017.02.049 21. Wang H, Lou N, Chao Z (2020) A personalized movie recommendation system based on LSTMCNN. In: Proceedings—2020 2nd international conference on machine learning, big data and business intelligence (MLBDBI), pp 485–490. https://doi.org/10.1109/MLBDBI51377.2020. 00102 22. Tang H, Zhao G, Bu X, Qian X (2021) Dynamic evolution of multi-graph based collaborative filtering for recommendation systems. Knowl-Based Syst 228:107251. https://doi.org/10.1016/ J.KNOSYS.2021.107251 23. Goyal SB, Bedi P, Kumar J et al (2021) Deep learning application for sensing available spectrum for cognitive radio: an ECRNN approach. Peer-to-Peer Netw Appl 14:3235–3249. https://doi. org/10.1007/s12083-021-01169-4 24. Diwan TD, Choubey S, Hota HS, Goyal SB, Jamal SS, Shukla PK, Tiwari B (2021) Feature entropy estimation (FEE) for malicious IoT traffic and detection using machine learning. Mob Inf Syst 2021, Article ID 8091363, 13 pp. https://doi.org/10.1155/2021/8091363 25. Bedi P, Goyal SB, Kumar J (2021) Applied classification algorithms used in data mining during the vocational guidance process in machine learning. In: Suma V, Chen JIZ, Baig Z, Wang H (eds) Inventive systems and control. Lecture notes in networks and systems, vol 204. Springer, Singapore. https://doi.org/10.1007/978-981-16-1395-1_11
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map Dibyasha Das and Chittaranjan Pradhan
Abstract At present, globe of modern picture amid the data exchange is more critical. So nowadays, security issues while data exchange have become the major concern. Consequently, chaotic frameworks are exceedingly utilized in picture encryption applications due to its arbitrariness properties. Arnold’s cat map is best used for shuffling of image. Picture encryption strategy utilizing Arnold cat map is proposed, but after a few cycles, the transformed picture will return to the initial picture so keeping the security concerns in mind in addition with Arnold cat map, Cyclic Chaos and PRNG is also proposed to provide dual layer of security. This makes the intruder difficult to decode the content. Further, security of image encryption techniques is analyzed by using histogram analysis, correlation analysis, peak signal to noise ratio (PSNR) and mean square error (MSE). The security analysis demonstrates that the proposed encryption system is secure as the cross-correlation esteem is nearly equal to 1, and the PSNR esteem is on an average 62 which is very high and MSE esteem very low, thus makes very difficult for the intruder to break the proposed image encryption algorithm. Keywords Cyclic Chaos · PRNG · Arnold cat map · Image encryption · Security
1 Introduction With the fast development of web and the advanced data and the expanded utilize of interactive media information, exchange of data/information has become rapid and convenient. Unauthorized use of data/information becomes an important issue worth studying. Therefore, safety and security of information raised with rising technology. Data encryption is an efficient way to resolve this issue. It protects the ciphertext so that the intruder cannot get the actual plaintext. “Cover image” is the phrase used to introduce to an image before the secret information being embedded and “stego image” is an image found after secret data being embedded. D. Das (B) · C. Pradhan Kalinga Institute of Industrial Technology, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_25
281
282
D. Das and C. Pradhan
Chaos system presents specials characteristics which increases the security level of secret writing against statistical intrusions. Generation of chaotic sequence is derived by mathematical equations. Chaos properties include randomness, unpredictable, nonlinear and sensitive to starting conditions and control parameter. A minor change in the starting condition and control parameter tends to huge difference in the result [1, 2]. Therefore, they are suitable for image encryption. Image transformation strategy is one of the foremost utilized procedures to convert an image. To interchange the original image’s pixel positions, Arnold’s cat map is utilized [3, 4]. In this paper to conceal our confidential data, we utilize cover images to be color images and use MSE, PSNR, histogram analysis and correlation analysis to evaluate the execution of the presented scheme. Following is a discussion of the left-out article. Section 2 is the literature survey. Section 3 presents the fundamental concepts of Cyclic Chaos, PRNG and Arnold’s cat map. Section 4 describes proposed work. Illustration of result examination is in Sect. 5. At last, Sect. 6 concludes the paper.
2 Literature Survey Tao Zhang, Xinjian Ping, Ling Xi and Jaron Millikanian proposed PRNG approach which was utilized to detect the positions for embedding and extraction with LSB matching (LSBM). In LSBM, 1 is added/subtracted on the pixels on the of the cover image reducing the chance to detect [5]. Guo and Yen proposed that through a chaotic system, a random sequence can be generated by which the image can be transformed and the arrangement of image pixels are performed via way of means of taking every single bit of picture and XOR with the arbitrary key [6]. As discussed by Zhang Ying-Qian, Gu Sheng-Xian and Wang Xing-Yuan, a new chaotic image encryption algorithm using rotation shift. As a result of rotation shift, the security of the encryption approach is enhanced to a greater extent, making it resistant to known attacks [7]. Wang Xingyuan, Zhao Yuanyuan, Zhang Huili and Guo Kang proposed the use of alternate chaotic mapping structures to enable color image encryption. In this paper, R, G and B elements of the color image are used to acquire matrix. Then one-dimensional and two-dimensional logistic chaotic mapping are acquired to create a matrix and are used to permute the matrix produced by them. Lastly, Exclusive-OR operation is applied to encrypt the picture [8]. Zhen Wei Shang Honge Ren Jian Zhang proposed an algorithm to scramble locations of digital images using Arnold transformation thus meets the requirements of image encryption [9]. This paper combines both the techniques, i.e., Arnold’s cat map to interchange the image’s pixel locations, and the seeds of PRNG are produced by Cyclic Chaos; after that, a sequence is generated by PRNG by which image can be embedded.
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map
283
3 Basic Concepts 3.1 Arnold’s Cat Map A two-dimensional revertible chaotic map is the Arnold’s cat map [10, 11]. It shuffles the pixels positions of the entire image while not changing the value of the pixels and the picture becomes unfamiliar. First secret key to Arnold’s cat map is based on starting conditions and control parameters. Arnold’s cat map has the significant feature that it rearranges the pixels in the picture, but after some iterations, it comes back to the initial locations and thus generates the initial picture. The Arnold’s cat map is explained as below [12]:
an+1 bn+1
=
1 x y xy + 1
an bn
mod M
(1)
where x and y are control parameters, (M * M) is the dimension of the image, (an , bn ) is the position of the pixel of the initial picture and (an+1 , bn+1 ) is the new pixel position in the transformed image after making use of Arnold’s modification.
3.2 Cyclic Chaos Cyclic Chaos structures are chaotic structures whose objective is to beautify the safety of encryption process because this Cyclic Chaos structures have all the required chaotic functionalities which includes sensibility to its starting conditions and control parameters, so that if the control parameter and starting conditions alter slightly, it tends to huge difference within side result. A chaotic structure is described as: xin+1 = f (xin , γi )n
(2)
where xi = (xi1 , xi2 , . . . , xik ) ⊂ Rk represents cyclic chaotic signal. Generally, a Cyclic Chaos structure is described as a three-dimensional signal. ⎧ ⎨ xn+1 = λ1 xn − xn3 − γ |yn |m xn y = λ2 yn − yn3 − γ |z n |m yn ⎩ n+1 z n+1 = λ3 z n − z n3 − γ |xn |m zn
(3)
where λ1 , λ2 , λ3 are internal conditions, γ is the control parameter, m are starting condition and x0 , y0 , z 0 are transmitted as keys. Since x0 , y0 , z 0 can be any esteem in between [−1.6, 1.6], m can vary between [0, 0.5], and γ can be vary between [2.7, 3.0].
284
D. Das and C. Pradhan
3.3 PRNG (Pseudorandom Number Generator) Pseudorandom numbers are generated via PRNG, which utilizes the seed value generated by dynamic functions (i.e., external sources). The seed esteem of PRNG must be random and unpredictable. Hence, it is frequently generated from a random number generator (RNG). As PRNG results are deterministic, and each sequence can be produced from seeds, they are called pseudorandom numbers.
4 Proposed Work In image encryption, Arnold’s cat map is applied to transform the image’s pixels locations without altering the value of the pixel so that the picture will become unidentifiable but in Arnold’s cat map, after certain iterations, it comes back to the initial pixels locations, hence returns the initial image so to make the encryption process stronger, Cyclic Chaos is used to generate the seeds, and by using these seeds, PRNG generates pseudorandom numbers by which again the pixels positions are transformed, and the image is distorted, and secondly, it also decides in which plane pixels of the hidden image to be stored, i.e., R, G, B plane of the cover image. The framework of the proposed work is shown in Fig. 1. Algorithm 1 Cyclic Chaos Input: The key (x0 , y0 , z 0 , λ1 , λ2 , λ3 , γ , m). The interim number I, secret messages are named as S and cover image is named as C. Output: Stego image 1.
Divide d ∈ [−2, 2] into I equal subintervals
2.
xn = x0 , yn = y0 , z n = z 0 Cyclic Chaos (x0 , y0 , z 0 , λ1 , λ2 , λ3 , γ , m)
3.
xn+1 = λ1 xn − xn3 − γ |yn |m xn yn+1 = λ2 yn − yn3 − γ |z n |m yn z n+1 = λ3 z n − z n3 − γ |xn |m z n sorting index = sort (xn+1 , yn+1 , z n+1 ) for a = 1: I-1 i f (d a ≤ sor tingindex and d a+1 > sor tingindex) Se = a, br eak End xn = xn+1 , yn = yn+1 , z n = z n+1 return Se 4.
r1 = randsqgen (Se ) (continued)
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map (continued) 5.
Cyclic Chaos (x0 , y0 , z 0 , λ1 , λ2 , λ3 , γ , m)
6.
r2 = randsqgen (Se )
7.
p = r1 ; q = r2 ⎧ ⎪ ⎪ ⎨ A = a 1 ⊕ a 2 ⊕ b1
8.
⎪ ⎪ ⎩
B = b1 ⊕ b2 ⊕ c1
C = c1 ⊕ c2 ⊕ a1
9.
[C R , C G , C B ] = embedding (A, B, C, m 0 , m 1 , m 2 )
10.
End
11.
End
Algorithm 2 Embedding Input: A, B, C and 3-bit binary secret message m 0 , m 1 , m 2 Output: Embedded in the secret message 1.
if(A = m0 )&&(B = m1 )&&(C = m2 ) No modify
2.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C G mod2 = 0 C G = C G−1 else C G = C G+1
3.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C B mod2 = 0 C B = C B−1 else C B = C B+1
4.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C R mod2 = 0 C R = C R−1 else C R = C R+1
5.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C G mod2 = 0 C G = C G+1 else C G = C G−1
6.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C B mod2 = 0 C B = C B+1 else C B = C B−1
7.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C R mod2 = 0 C R = C R+1 else C R = C R−1
8.
if(A = m0 )&&(B = m1 )&&(C = m2 ) if C B mod2 = 0 C B = C B−1 else C B = C B+1 if C R mod2 = 0 C R = C R+1 else C R = C R−1
285
286
D. Das and C. Pradhan
Fig. 1 Proposed scheme
To understand about the framework of Cyclic Chaos, in Algorithm 1, its pseudocode is shown. Firstly, (x0 , y0 , z 0 , λ1 , λ2 , λ3 , γ , m) is set as the key, I is the interval number, confidential messages named as S and cover image is denoted C. Before starting the process, d is discretized, i.e., d ∈ [−2, 2] into I identical subintervals with a duration of 4/I before embedding. PRNG’s seed is being decided by using the index of the subinterval. Then Cyclic Chaos by using sorting selects the sequence of xn+1 , yn+1 and z n+1 named sorting index. After that in the Cyclic Chaos system, subinterval d is searched that has been discretized, and the primary subinterval smaller than the sorting index is selected as the subinterval. The seed of the PRNG is used
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map
287
by the subinterval’s index value, and the seed denoted by Se . Sequence r1 is pseudorandomly generated from the seed named Se by which the image pixel locations are transformed, and again through a key, Cyclic Chaos is runned, and a seed is generated so from that seed sequence r2 is pseudorandomly generated that decides in which plane pixels of the hidden image are to be stored. The pixel C p,q that is required to be implanted in the secret message can be resolved by sequence r1 and r2 . The three channels of RGB of the pixel are called CR, CG and CB, respectively. Convert CR, CG, CB to binary, i.e., CR = (a8 a7 a6 a5 a4 a3 a2 a1 )2 , CG = (a8 a7 a6 a5 a4 a3 a2 a1 )2 , CB = (a8 a7 a6 a5 a4 a3 a2 a1 )2 . From Algorithm 1, A, B and C are obtained as appeared in Algorithm 2; by swapping the LSB a1 of CR, bits A and C can be modified. In the same way, by swapping the second LSB a2 of CR, bit A can be controlled. Specifically, the bit a1 to CR − 1 is modified, when CR’s pixel esteem is an odd number, and the bit a1 to CR + 1 is modified when CR’s pixel esteem is an even number. Next, by changing the LSB b1 of the CG, bits A and B are changed at the same time. The second LSB of the CG is also changed in order to control bit B. By modifying the LSB c1 of CB, bits B and C can be modified. And by changing the second LSB c2 of the CB, bit C can be modified. With the technique raised by Wu et al. [13], the connection among A, B, C and the secret message m is raised, as appeared in Algorithm 2. The 3 bits of the secret message m 0 , m 1 , m 2 are compared with the extracted initial image value bits A, B and C to see if they are identical. If the condition (A, B, C)2 = (m 0 , m 1 , m 2 )2 is fulfilled, it is not obligatory to modify the pixels CR, CG and CB in the data embedding procedure. Or else, we must change the pixels CR, CG and CB until the condition (A, B, C)2 = (m 0 , m 1 , m 2 )2 is satisfied.
5 Experimental Results and Analysis 5.1 Histogram Analysis Histogram examination check shows the viability of the proposed procedure to exhibit the nature of the stego picture of the proposed strategy. The histogram analysis of the cover pictures Lena, Airplane, Couple and Tree and their stego pictures is shown in Fig. 2. Very well it is visible that the cover picture is practically indistinguishable from the histogram of the stego picture, along these lines confirming the prevalence of the proposed technique.
5.2 Cross-correlation Analysis A correlation coefficient is utilized to examine the resemblance among the adjoining pixels of the original and the encrypted picture. Correlation coefficient (ρ) could be
288
D. Das and C. Pradhan
Lena(256x256)
Airplane(512x512)
Couple(256x256)
Tree(256x256)
Cover Image
Cover Image
Cover Image
Cover Image
Stego Image
Stego Image
Stego Image
Stego Image
Fig. 2 Histograms of various cover images
acquired using (4). Correlation coefficient (ρ) esteem lies among zero and one. The maximum value of correlation coefficient (ρ) is one, and the minimal is 0. Correlation coefficient “0” shows adjoining pixels of the image are absolutely exceptional. Correlation coefficient “1” shows both adjoining pixels are similar.
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map
covar(a, b) ρ=√ √ P(a) P(b) covar(a, b) =
289
(4)
N 1 (ai − Q(a))(bi − Q(b)) N i=1
(5)
where covar(a, b) is covariance, and N is number of pixel pairs. P(a) =
N 1 (ai − Q(a))2 N i=1
(6)
N 1 (ai ) N i=1
(7)
N 1 (bi − E(b))2 N i=1
(8)
Q(a) =
P(b) =
N 1 Q(b) = (bi ) N i=1
(9)
where covar(a, b) is covariance of adjoining horizontal image pixels a and b. P(a) and P(b) are given in (6) and (8). Q(a) and Q(b) are mean of the image pixels a and b, respectively, where a and b are the adjoining pixels of original or encrypted image, and N is number of pixel pairs. The correlation coefficients of plain and encrypted images are shown in Table 1. The correlation coefficients of encrypted images using the proposed algorithm are closed to zero, which means that the correlations of neighboring pixels in the plain image can be removed by using proposed algorithm and can hold statistical attacks. Table 1 Correlation coefficient results
Lena (256 × 256)
Correlation coefficient
Iteration 1
− 0.015129
Iteration 3
− 0.012382
Iteration 5
0.0012919
Iteration 7
− 0.0069666
Iteration 9
0.0034287
Iteration 11
− 0.0035768
Iteration 13
− 0.011893
Iteration 14
− 0.015129
290
D. Das and C. Pradhan
Table 2 MSE and PSNR results Images
Sizes
MSE
PSNR
MSE [14]
PSNR [14]
Lena
256 × 256
0.001386
68.2231
0.0057386
61.9162
Tree
256 × 256
0.0015695
68.1955
0.006493
61.9346
Airplane
512 × 512
0.00076803
68.2191
0.012874
55.907
Couple
256 × 256
0.0023384
68.2423
0.0099941
61.9742
5.3 MSE (Mean Square Error) and PSNR (Peak Signal to Noise Ratio) MSE and PSNR are utilized to find the distinction among encrypted and original image. PSNR is inversely proportional to MSE. The decrease the MSE count, or better the PSNR count, the higher the picture quality, therefore decreasing the probabilities of human visual system detection. Image distortion is measured using MSE and PSNR as given in below equations: MSE =
W H
2 1 X p,q − X p,q W H p=1 q=1
PSNR = 10 log
2552 (2n − 1)2 = 10 log MSE MSE
(10)
(11)
The height and width of the cover image are represented as H and W, respectively, X ( p,q) is the original image pixel value at positions j and k, and X ( p,q) is stego image pixel value location at p and q, respectively. It can be seen from Table 2 that the proposed algorithm has the best performance as MSE value is low and PSNR value is high among the considered algorithms including those in [14].
6 Conclusion In this paper, a modern image encryption method the usage of Arnold’s cat map, Cyclic Chaos and PRNG is proposed. Foremost requirement is to keep confidential messages within the image without forming a huge contrast to the authentic photograph. Arnold’s cat map shuffles the image, and Cyclic Chaos generates the seeds, and by using these seeds, PRNG generates pseudorandom numbers by which again the pixels positions are transformed, and the image is distorted, and secondly, it also decides in which plane pixels of the hidden image to be stored, i.e., R, G and B plane of the cover image. Arnold’s cat map, Cyclic Chaos and PRNG are employed at different locations which make the proposed method hard to break at some stage
Image Encryption Based on Cyclic Chaos, PRNG and Arnold’s Cat Map
291
in transmission and storage. Proposed strategy is sensitive to the alter within the key values that are utilized for encryption; subsequently, only the precise key values can unscramble the picture legitimately. The experiment effects shown in Sect. 6 proven that the proposed encryption device has excessive security. This has been verified by means of the evaluation parameters like MSE, PSNR, histogram analysis and cross-correlation analysis. The experimental results show the effectiveness of the encryption processes. The twofold encryption strategy gives a more prominent level of security. These methods can be further enhanced to make it even more secure.
References 1. Juarez P (2002) Cryptography with cycling chaos. Phys Lett A 303(5):345–351 2. Baptista MS (1998) Cryptography with chaos. Phys Lett A 240(1–2):50–54 3. Chen G, Mao Y, Chui CK (2004) A symmetric image encryption scheme based on 3D chaotic cat maps. Chaos Solitons Fractals 21:749–761 4. Kwok W, Tang KS (2007) A fast image encryption system based on chaotic maps with finite precision representation. Chaos Solitons Fractals 32:1518–1529 5. Mielikainen J (2006) LSB matching revisited. IEEE Signal Process Lett 13(5):285–287 6. Cheng J, Guo J-I (2000) A new chaotic key-based design for image encryption and decryption. In: International symposium on circuits and systems, vol 4. IEEE, pp 49–52 7. Wang X-Y, Gu S-X, Zhang Y-Q (2015) Novel image encryption algorithm based on cycle shift and chaotic system. Opt Lasers Eng 68:126–134 8. Wang XY, Zhao Y, Zhang H, Guo K (2016) A novel color image encryption scheme using alternate chaotic mapping structure. Opt Lasers Eng 82:79–86 9. Shang Z, Ren H, Zhang J (2008) A block location scrambling algorithm of digital image based on Arnold transformation. In: The 9th international conference for young computer scientists 10. Guan Z-H, Huang F, Guan W (2005) Chaos-based image encryption algorithm. Phys Lett A 153–157 11. Heng-fu Y, Yan-peng W, Zu-wei T (2008) An image encryption algorithm based on logistic chaotic maps and Arnold transform. J Hengshui Univ 40–43 12. Fu C, Bian O, Jiang H, Ge L, Ma H (2016) A new chaos-based image cipher using a hash function. In: IEEE/ACIS 15th international conference on computer and information science (ICIS) 13. Wu N-I, Hwang M-S (2017) A novel LSB data hiding scheme with the lowest distortion. Imaging Sci J 65(6):371–378. ISSN 1368-2199. https://doi.org/10.1080/13682199.2017.135 5089 14. Deng J et al (2019) LSB color image embedding steganography based on cyclic chaos. In: 2019 IEEE 5th international conference on computer and communications (ICCC). IEEE
Heartbeat Classification Using Sequential Method Rajesh Kumar Shrivastava , Simar Preet Singh , Avishek Banerjee , and Gagandeep Kaur
Abstract Electrocardiogram (ECG) is a known system to monitor heart conditions. ECG collects heart pulse, which is divided into five different categories. This paper attempts to classify ECG signals into various categories and also verifies the accuracy of ECG data with the help of various machine learning algorithms. Our experiments show that the accuracy of data in training is 98%, and at the time of testing, it is 80%. We also discussed this variation of accuracy in the result section. This paper uses ECG data and performs classification. Keywords Machine learning · Data classification · Health care
1 Introduction Motivation A heart specialist uses electrocardiogram (ECG) to monitor the heart’s health. ECG machines draw the heartbeat pattern, and experts analyze patterns manually. These ECG patterns are like time-series data. However, for the commoner, it is impossible to understand this pattern. In some cases, the expert could not predict the result accurately. If we develop a simple system that helps us understand these patterns automatically with accuracy, it makes our life easier. ECG patterns are displayed in a waveform, and a complete understanding of this waveform is required. Another problem is that ECG tests are costly and need expert monitoring. If we have an automated system that will give results in the English language, then it is easier to understand. If we come up with a cheap solution, all needy people can afford R. K. Shrivastava · S. P. Singh (B) · G. Kaur School of Computer Science Engineering and Technology (SCSET), Bennett University, Greater Noida, India e-mail: [email protected] R. K. Shrivastava e-mail: [email protected] A. Banerjee Asansol Engineering College, Asansol 713305, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_26
293
294
R. K. Shrivastava et al.
it and use it in their daily lives. Our objective is to test ECG signals and predict the result accurately. We go for a cheaper testbed if we achieve higher testing accuracy. In this paper, our focus is on accuracy and classification. Kachuee et al. [1] already shown five different categories of heartbeats. We also used the same category and data for this paper. We downloaded data from Kaggle’s website. Contribution The objective of this paper is to analyze ECG data and categorize it. We also train the ECG signal and convert them into different usable formats. This paper tries to reduce training loss and increase categorization accuracy. The deep learning model performs a sequential modeling approach to train to detect symptoms of arrhythmia, which is identified by the time and duration of a heartbeat. Other heart-related problems that we can also consider are heart rate, heart rhythm, heart attack, blood and oxygen supply to the heart, and heart structure changes. Our ECG classification considers the symptoms of all problems. Organization Paper organization is as follows: Section 2 discusses the related work, which helps us to make a better understanding of the work and related progress. Section 3 explains the working model and also provides detail about the data. Section 4 explains the outcome of the proposed method. Section 5 ends the paper with endnote.
2 Literature Review Sharma et al. [2] also classified heartbeat signals using optimal orthogonal wavelet filters. They achieve 96% accuracy. Their method is effective in classification but need improvement in accuracy. Niu et al. [3] discussed an adversarial domain adaptation-based classification method. In this approach, the author used deep learning for the classification and achieved 92.3% accuracy. Luz et al. [4] reviewed ECG-based heartbeat classification for arrhythmia detection. The author explained conventionally and earlier used search methods. The author mainly discussed the segmentation technique, anomaly detection method, and drawbacks of existing work. The author’s major hurdle in ECG-based classification is the non-availability of data. The author mainly focuses on the used algorithm and the scope of a deep learning algorithm in this area. Rafie et al. [5] discussed the importance of ECG and its manual process. The authors also discussed the accuracy and drawback of existing methods. This paper helped us understand ECG data and gave us the direction to use AI for ECG analyses. Another study is presented by the authors using machine learning approaches toward social economy for well-being of the society [6]. Serhani et al. [7] explored the research challenges of data collection using IoT, cloud, and artificial intelligence (AI). Authors also classify the primary processes and supporting processes in ECG classification.
Heartbeat Classification Using Sequential Method
295
Fig. 1 Pattern of heartbeats
Kashou et al. [8] conducted a comparative study between AI-based systems and the traditional system. Clinical interpretation of ECG data is broadly divided into three categories: unacceptable, acceptable, or ideal. All results are based on the accuracy of the result and interpretation by an expert. Authors expressed their results as 12.5% of the computer-generated, 7.9% of the AI-ECG prediction, and 5.99% of the clinical ECGs were categorized as unacceptable. On the other hand, 64.0% of the computer-generated, 70.6% of the AI-ECG prediction, and 75% of the clinical ECGs were ideal. Authors also claimed that the analysis of the data shows that AI-ECG algorithms outperformed the traditional AI-ECG algorithms and were a better choice for an expert cardiologist. Table 1 gives the compression with other researchers work. We used Ml and deep learning model and got 98% accuracy.
296
R. K. Shrivastava et al.
Table 1 Compare with other work Author Method Niu et al. [3]
Luz et al. [4] Serhani et al. [7] Proposed method
Adversarial domain adaptation-based classification method Deep learning model IoT and cloud ML and deep learning
Accuracy (%) 92.3
90 89 98
Fig. 2 Heat map of confusion matrix
3 Proposed Model 3.1 Data This paper used two different datasets, PhysioNet MIT-BIH Arrhythmia and PTB Diagnostic ECG Databases [1]. Figure 1 shows the heartbeat pattern available in datasets. We have various range of heartbeat pattern in the database, which is further classified into five different categories. These category defined in Table 2 (Fig. 2).
3.2 Methodology Our algorithm divides the whole data in an 80-20 ratio. 80% was used for training, and the rest 20% was used for testing purposes. This paper uses a sequential model with 4 CNN layers. The sequential model is best suited for this approach because each layer
Heartbeat Classification Using Sequential Method Table 2 Categories of heartbeat [2] Category Nonectopic beat (N) Supraventricular ectopic beat (S) Ventricular ectopic beat (V) Fusion beat (F) Unknown beat (Q)
297
Explanation Normal condition Occurs if the length of cardiac cycle change If heartbeats traveling in a different way through the heart Fusion of S and V heartbeats Other then above heartbeats
Fig. 3 Accuracy achieve in testing and validation
has exactly one input and one output. Our sequential model’s first three-layer uses the Rectified Linear Unit activation function (ReLU), and the output layer performs Softmax activation to generate output. ReLU function primly used to replace negative values with 0. So, our model prevents overfitting. The softmax function generates single output. We applied various matrices such as accuracy and performance loss to get a result.
4 Result As Fig. 3 shows, our training program runs with more epochs our results get improved. We achieve 98.5% training accuracy and 97.5% validation accuracy. Figure 4 shows the total loss, which is less than 0.5% in training and less than 1% in validation process. Figure 5 performs a comparison between accuracy and losses. Our experiments clearly show that our proposed algorithm gets better accuracy with minimum losses.
298
Fig. 4 Loss graph
Fig. 5 Accuracy versus loss
R. K. Shrivastava et al.
Heartbeat Classification Using Sequential Method Table 3 Confusion metrics Category Precision 0 1 2 3 4 Accuracy Macro avg Weighted avg
0.97 0.95 0.78 0.81 0.99 0.94 0.90 0.96
299
Recall
f 1-score
Support
0.98 0.56 0.93 0.29 0.93 0.93 0.74 0.96
0.98 0.71 0.85 0.43 0.96 0.96 0.78 0.96
7283 240 569 59 604 8755 8755 8755
Our program evaluated approximately 5000 heartbeats; out of them, 1000 heartbeats are not used in the training program. Initially, datasets were highly imbalanced, so applied data processing methods to balance the datasets. Figure 2 presents the heat map of the confusion matrix applied on the sequential model classifier over the test set. Results show that our model classifies the datasets. Table 3 represents result statistic. The heat map of the confusion matrix is shown in Fig. 2 which supports our hypothesis.
5 Conclusion This paper used ECG datasets and classified them into five different categories. We used the deep learning method (Sequential classification) and achieved up to 95% accuracy. As a future scope, researchers can get more insights and try to analyze heart issues. We can also improve the learning rate and experiment with Explainable AI (XAI). Data security is another aspect where researchers can contribute.
References 1. Kachuee M, Fazeli S, Sarrafzadeh M (2018) ECG heartbeat classification: a deep transferable representation. In: 2018 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 443–444 2. Sharma M, Tan RS, Acharya UR (2019) Automated heartbeat classification and detection of arrhythmia using optimal orthogonal wavelet filters. Inform Med Unlocked 16:100221 3. Niu L, Chen C, Liu H, Zhou S, Shu M (2020) A deep-learning approach to ECG classification based on adversarial domain adaptation. In: Healthcare, vol 8, no 4. Multidisciplinary Digital Publishing Institute, p 437 4. Luz EJDS, Schwartz WR, Cámara-Chávez G, Menotti D (2016) ECG-based heartbeat classification for arrhythmia detection: a survey. Comput Methods Programs Biomed 127:144–164
300
R. K. Shrivastava et al.
5. Rafie N, Kashou AH, Noseworthy PA (2021) ECG interpretation: clinical relevance, challenges, and advances. Hearts 2(4):505–513 6. Singh SP, Sharma A, Kumar R (2020) Designing of fog based FBCMI2E model using machine learning approaches for intelligent communication systems. Comput Commun 163:65–83. ISSN 0140-3664. https://doi.org/10.1016/j.comcom.2020.09.005, https://www.sciencedirect. com/science/article/pii/S0140366420319198 7. Serhani MA, El Kassabi T, Ismail H, Nujum Navaz A (2020) ECG monitoring systems: review, architecture, processes, and key challenges. Sensors 20(6):1796 8. Kashou AH, Mulpuru SK, Deshmukh AJ, Ko WY, Attia ZI, Carter RE, Noseworthy PA (2021) An artificial intelligence-enabled ECG algorithm for comprehensive ECG interpretation: can it pass the ‘turing test’? Cardiovasc Digital Health J 2(3):164–170
Deep Neural Ideal Networks for Brain Tumour Image Segmentation Sadeq Thamer Hlama, Salam Abdulabbas Ghanim, Hayder Rahm Dakheel, and Shaimaa Hadi Mohammed
Abstract The automated segmentation of brain tumours utilizing multimodal magnetic resonance imaging (MRI) is crucial in researching and monitoring disease progression. To aid in distinguishing gliomas into intertumoural classes, efficient and precise segmentation methods are utilized to differentiate gliomas into intratumourally categorized types. Deep learning algorithms outperform classical contextbased computer vision techniques in circumstances that need the segmentation of objects into categories. Convolutional neural networks (CNNs) are extensively used in medical image segmentation, and they have significantly improved the accuracy of brain tumour segmentation in the present generation. Specifically, this research introduces a residual network (ResNet), a blend of two segmentation networks that employ a primary but simple combinative method to provide better and more accurate predictions. After each model was trained on the BraTS-20 challenge data, it was analysed to yield segmentation results. Among the different methodologies examined, ResNet produced the most accurate results compared to U-Net and was thus chosen and organized in many ways to arrive at the final forecast on the validation set. The ensemble acquired dice scores of 0.80 and 0.85 for the augmentation of the tumour, total cancer, and tumour core, respectively, demonstrating more excellent performance than the present technology in use. Keywords Deep learning · Residual network · Medical images · Segmentation · U-Net · CNN · Ensemble
S. T. Hlama (B) College of Science, University of Sumer, Rifai, Iraq e-mail: [email protected] S. A. Ghanim Dhi Qar Education Directorate, Nasiriyah, Iraq H. R. Dakheel University of Sumer, Rifai, Iraq e-mail: [email protected] S. H. Mohammed College of Computer Science and Information Technology, Sumer University, Rifai, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_27
301
302
S. T. Hlama et al.
1 Introduction The brain tumour is one of the most severe diseases that has claimed and destroyed countless lives worldwide. It can affect any region of the body after it has gone to the brain. Tumours have the potential to damage all brain cells. Tumours are capable of causing cancer [1], and they must be recognized and treated as soon as possible. According to the report, brain tumours and other forms of nervous system malignancies are the second most frequent condition. The five-year survival rate indicates how many people survive for at least five years after being diagnosed with cancer. Is it 36% for females and 34% for males? Confronted. According to the World Health Organization, 4000 persons worldwide suffer from brain tumours. In the past year, 120,000 people also died. According to the WHO, 86,970 patients will be diagnosed with primary cancer. There are two types of tumours: Primary and secondary tumours. Tumours of the brain that arise in the brain are known as primary brain tumours. No other parts of the body are affected. Malignancy or benignity (the absence of cancerous cells) are possible outcomes (containing tumour cells). The growth of benign brain tumours is sluggish, and they seldom metastasize. If desired, it may be surgically removed, and its limits are well defined. Malignant tumours of the brain grow swiftly and quickly spread to other parts of the brain, and their borders are irregular. The term “brain cancer” is commonly used to describe them. Unlike cancerous tumours, malignant tumours do not spread to the spinal cord or brain. More people are developing secondary brain tumours than previously thought. The signs and symptoms of a brain tumour vary according to the tumours size, location, and kind of growth. To make matters worse, tumours obstruct the flow of brain-circulating fluid. Nausea, vomiting, and trouble walking are among the most prevalent side effects. As a medical imaging tool, the programme will analyse and diagnose disorders such as brain tumours, among others. Images of brain tumours may be detected using magnetic resonance imaging (MRI), an acronym for magnetic resonance imaging. One of the most critical processes in medical imaging, MRI segmentation of tumours in the brain, demands the most information. On the other hand, brain tumours might be difficult to distinguish because of their soft tissue borders. The form is undefined, and the position is ambiguous in size. Human brain tumour categorization has been challenging to define with precision [1, 2]. Doctors often classify brain tumours between grades I and IV depending on the microscopic anatomy of the tumour. This graph (Fig. 1) depicts the incidence statistics for the various types of brain tumours. Using photos from the database, the algorithms were tested and found promising. However, it was necessary to conduct a thorough analysis of the kind of tumour, the location, development pattern, and location to classify and identify brain tumours in real time. For the vanishing gradient issue in CNN, we suggested the deep network model of ResNet, which has 152 layers and introduces skip connections/shortcuts. This
Deep Neural Ideal Networks for Brain Tumour Image Segmentation
303
Fig. 1 Statistics on brain tumour types
approach compares dice similarity between 0.802 and 0.852 and provides the most fantastic accuracy in ResNet’s detection of tumours utilizing multimodal magnetic resonance imaging (MRI) pictures with clarity n pixel. More accurate forecasts may also be made by combining their probability maps [3–5].
2 Literature Survey Artificial intelligence has been cited in numerous studies as a key factor in enhancing human performance. Numerous techniques have been created to help automate difficult operations, such as fusing ubiquitous computing with machine learning to find foreign objects. Despite their prevalence, gliomas need to be regularly examined and treated according to the diagnosis. The formulation of therapy regimens and the tracking of a client’s problem can both be improved by neuroradiologists utilizing machine learning techniques. The data used in these algorithms should emphasize the different structures of tumours, from their infiltrative growth patterns to their diversification, in order to achieve a high level of precision in tumour division. Access to the results of MRI scans is also made available through the BraTS challenge. The BraTS challenge will undoubtedly include the HGG and LGG scans of individuals from numerous institutions, which will help people in the creation of effective glioma delineation processes [6].
304
S. T. Hlama et al.
3 Architectures for Deep Learning Deep understanding formulae outperform more prevalent context-based strategies in computer vision when there is a semantic split. Deep convolutional neural networks are widely used in clinical photo segmentation. They have a unique niche because they can achieve high levels of precision when it comes to dividing brain tumours. The 2D U-Net architecture was created for the division of mental tumours. To address the issue of course inequalities in the dataset, the loss of the soft dice function was combined with a number of information augmentation techniques [7]. This method is used by Mukambika and Uma Rani to determine the subsequent stages of the illness regardless of whether the tumour manifests. A number of the MRI tumour identification algorithms shown here were examined using the degree collection method. Following feature extraction, SVM classification will undoubtedly occur in this step. The brain MRI pixels were used by Pan, Yuehaov, and Huang to gather the necessary information for classifying the brain tumour. The suggested method for simulating a real brain tumour employed a convolution semantic network (CNN) technique. Pereira and Pinto concluded from their analysis that the tumour was located in the affected area. They examined the programme to see if it contained numerous statistical collections of varying sizes, locations, and strengths. Photos are sent out after going through a screening process to remove any kind of distracting elements that were detected in the images. The scientists verified that their technology could target the movement of the tumour in the brain and automate it. Myronenko won the BraTS 2018 competition using a CNN solution based on an encoder-decoder [8, 9].
3.1 Dataset For the training and validation sets, the brain tumour segmentation challenge (BRATS) 2020 is used, which includes the training set and validation set. There are 138 different types of cancer in this collection, ranging from mild to severe. Nineteen authors contributed to the multi-institutional dataset, which comprises multimodal MRI scans of each patient, such as T 1, T 1ce, T 2, and FLAIR, from which tumoural subregions are segmented.
3.2 Methodology The benefit of using ensemble for segmenting brain tumours is that it improves results and performance. ResNet networks judiciously trained using our training sets
Deep Neural Ideal Networks for Brain Tumour Image Segmentation
305
Fig. 2 Block diagram
comprise a lightweight ensemble we propose. Using these maps of segmentation, the final predictions are formed (Fig. 2). This project is one top-level directory. Dataset for brain tumours (BraTs) in four stages is shown. 138 MRI images are included in Flair, T 1, T 2, and T 1ce. It is time to import the data. A.csv file containing this information was used as an input to forecast the tumour. Select an algorithm from the drop-down menu: I had to choose the algorithms to prepare the dataset for training. This is the dataset used for training. In this step, the dataset is trained using CNN and U-Net algorithms. We introduce a ResNet model to provide clarity to the pixel. Segmentation of a test image: Here is what we do: we take ResNet’s input picture and eliminate all of the black spots to reveal the tumour’s exact position [10].
4 Comparison Graph At this point, comparing the two algorithms yields precise values for which one should be selected, and the tumour’s location will be more clearly seen in the pixel (Fig. 3).
4.1 Convolution Neural Networks Convolutional semantic networks are extensively used for medical image processing. In fact, several scientists have tried to create a model that can more precisely diagnose tumours. We made an effort to create a version that can accurately classify tumours
306
Flair
S. T. Hlama et al.
T1
T1ce
T2
Fig. 3 Multimodal image (MRI) of a single patient (HGG) in the BraTS training set
found in 2D brain MRI data. We chose CNN for our version despite the fact that a fully coupled neural network can detect the tumour due to parameter sharing and link sparsity. A convolutional five-layer semantic network is presented and put into practice for tumour recognition. The combined model, which has seven phases and is made up of hidden layers, gives us one of the most obvious results for finding tumours. The suggested methodology is shown below, along with a brief narrative. Gradually reduce the spatial dimension of the image in the ConvNet design to reduce the number of criteria and the network’s calculation time. Overfitting can lead to contamination when working on a brain MRI image. In this situation, the maxpooling layer is the best option. Utilizing max-pooling 2D, geographic data is created that matches our input photo. Two collections are created from the input images, one for each spatial dimension, resulting in a tuple of two integers for downscaling in both directions. A pooled attribute map is created once the merging layer is applied. One of the most important steps after merging is squashing since we need to convert the entire input photo matrix into a single column vector. In the end, it is processed and taken over by the semantic network. Layers that were fully interconnected were used. The generated vector is utilized as an input for this layer of the neural network and is used to refine semantic networks in Keras using the thick feature. The hidden layer contains 128 nodes. One of the most important results is the 138 nodes, as the number of dimensions or nodes is
Deep Neural Ideal Networks for Brain Tumour Image Segmentation
307
proportional to the number of computer resources needed to fit our version. Due to ReLU’s exceptional merging, it was chosen as the activation feature in this system. The final layer of the design came after the first thick layer and was the second totally linked layer. In this layer, one node activates a single sigmoid activation feature because we need to conserve computer resources to ensure that an additional significant number of nodes reduces the execution time. One activation feature exists in this layer. We were able to employ the sigmoid feature as the activation feature because we reduced the number of nodes in this deep network, which might have decreased discovering in deep networks [11–13]. In the very beginning of his career, Olaf Ranneberger created U-Net for the Fisher and Thomas Bronx to help with the segmentation of medical images. The design of a decoder follows a network of encoders. Semantic division refers to the ability to discriminate at the pixel level and to apply discriminative traits found in different stages of encoders. This encoder is the initial element of an architectural representation. When adding convolutional filters or modifications to an existing classification system such as VGG and ResNet, it is common to use a max-pool-down sampling. This converts the image input to feature descriptions at different levels of fidelity. This is the second part of the challenge, in addition to the encoder. The encoder must use the pixels it has learned to represent the semantics of various top qualities in order to do a more thorough categorization (with a minimal resolution). Concatenation and upsampling are used as a starting point, followed by standard convolutions to complete the decoding process. The device’s initial half employed feedforward convolution layers, as can be seen in the representation. After the convolutional layers, the black arrow shows activation that is worthwhile. A few more channels are present in this layer, but the overall percentages remain the same in terms of height. The number of networks has increased, and there is also an additional activation-oriented convolutional layer. We may then minimize the photo’s width and height using max-pooling. Therefore, activations may benefit from a smaller size and height but a greater variety of networks. We will build the system using a few convolutional layers. Then we will use our value activation feature, where our transpose layer will unquestionably be applied which is indicated by the black arrows.
4.2 Image Enhancement It is necessary to test for CNN overfitting by using random shearing, flipping, grey perturbation, and shape disturbance techniques. Every pixel in a restricted area might be affected by the grey disturbance. To determine the accuracy of semantic segmentation, the easiest method is to use the PP metric, which counts the number of pixels that have been properly tagged
308
S. T. Hlama et al.
against the total number of pixels. The following is the formula used to arrive at this result: k p=0 Npp PP = k k p=0 · q=0 N pq Percentage accurate segmentation in each class is calculated, and the average of all classes is determined. As an example, consider the following: MPA =
Npp 1 γ γ p=0 γ +1 q=0 Npq
MoU establishes a formula for working out the intersection and union ratios between two sets. Averaging the pixel intersection ratios for each pixel category is done as follows: MIoU =
Npp 1 γ γ γ p=0 γ +1 N q=0 pq q=0 Nqp − Npp
MoU has become the standard picture segmentation assessment index because of its high degree of accuracy, efficiency, and conciseness. As a result, MoU serves as the experiment’s primary success metric.
4.3 ResNet Advancements in computer vision have occurred during the previous several years. Advanced convolutional neural networks can now provide top-notch results for image classification and identification tasks such as object detection and categorization. As a result, academics have been working to build increasingly complex neural networks (with more layers) to tackle more challenging problems and increase classification and identification accuracy. Neural networks become more difficult to train and more accurate as the number of layers increases, according to research. It is possible that ResNet can help and provide a solution. ResNet and its design are thoroughly covered in this article. The gradients of the “front” layers are calculated by multiplying the n number of small/large values when the partial derivative of an error function is multiplied with n in backpropagation. n of those little integers vanishes if the network is exceedingly deep (zero). It becomes impossible to multiply n by such a vast number of integers when the network is enormous (exploded). The yellow 20-layer system serves as the foundation for the other 56 layers, which are error networks for testing purposes. It is important to note that the testing error is shown here also has two plots to represent
Deep Neural Ideal Networks for Brain Tumour Image Segmentation
309
Fig. 4 Residual building block
testing and training. The plots are given above result in a test error of two stories. There is a connection between the network strain and this graphic.
4.4 Residual Block You can see right away that there is an immediate connection between the two. Models may have differences in this. The issue of vanishing or bursting is solved by using a skip/shortcut connection. After a number of layers, it is used to bring the input into the final product (Fig. 4). As an input, a collection of feature maps is employed, and the outcome is called H. (x). As an option, we may skip two layers and use this particular convolutional layer’s input as outputs of the convolution layer’s output. So, our output H (x) is actually of f (x) + c input (Fig. 5). H [x] = F[x] + x. F[x] = H [x] − x.
4.5 ResNet (Residual Network) Architecture ResNet features 152 net image layers, which is eight times deeper than VGG nets and has fewer parameters. At the ILSVRC 2014, the cutting-edge method is VGG-19. More convolution layers were added to the 34-layer plain network (middle), deemed the VGG-19’s deeper network. The 34-layer residual network (ResNet) is simplified by adding a shortcut link or skip connection.
310
S. T. Hlama et al.
Fig. 5 Proposed architecture
5 Results and Discussion The brain tumour segmentation method we created has been validated by these common evaluation procedures. The subareas of complying with are covered by evaluation metrics. To assess the effectiveness of division, a team of radiologists created a ground reality division of a comparable image. A common method for comparing two images is to use the dice similarity coefficient. It is also necessary to apply sensitivity, uniqueness, and accuracy measurements when comparing photos. The DSC determines the degree of overlap between the segmented image of a predicted brain tumour and the actual image. The true positive (TP) is the total number of tumour pixels that have been correctly identified, the incorrect positive (FP) is the total number of non-tumour pixels that have been correctly identified, and the false unfavourable (FN) is the total number of non-tumour pixels that have been correctly identified. The effectiveness of the suggested tumour segmentation technique is evaluated by performance criteria like level of sensitivity and specificity. The accuracy of the criterion is related to the level of sensitivity and uniqueness metrics. To put it another way, it has to do with how likely it is that a result will come about as expected (PPV) (Table 1).
5.1 Comparative Study For brain tumour segmentation based on MRI scans, we ran our model on the Brats 2015 dataset HGG, which you will find further down the page. This model was tested with three distinct segmentation methods: CNN (U-Net) and U-Net with residual blocks (Unit-Res) (Table 2) (Fig. 6). ResNet has a higher score than U-Net in the above graph because it has a higher degree of similarity between predicted and original pictures, shown by the x-axis,
0.91
0.87
0.92
0.93
CNN (Pereira, 2016)
U-Net (Ronneberger, 2015)
U-Net-res (Kermi, 2019)
ResNet(2020)
0.96
0.95
0.88
0.95
0.86
0.87
0.86
0.86
0.83
0.83
0.81
0.83
Core
0.91
0.88
0.84
0.84
En
Specificity Whole
Core
En
Dice score
Table 1 Performance comparison of proposed methodologies
0.91
0.89
0.83
0.84
Whole
0.84
0.83
0.79
0.82
Core
Accuracy
0.90
0.86
0.80
0.84
En
0.86
0.84
0.81
0.81
Whole
0.97
0.91
0.91
0.96
Core
Precision
0.98
0.94
0.91
0.96
En
0.92
0.92
0.92
0.92
Whole
Deep Neural Ideal Networks for Brain Tumour Image Segmentation 311
312 Table 2 Calculating average computation time
S. T. Hlama et al. Technique
Computation time (m)
CNN
152
U-Net
345
Unit-Res
278
ResNet
64
Fig. 6 Dice similarity graph
which reflects the number of epochs/iterations required to train both models. As seen in the graph, the U-Net score is shown in green, while the ResNet score is shown in blue.
6 Conclusion To treat any cancer, tumour segmentation is a vital component. Segmentation using deep neural networks is quite successful. But they are challenged with fading gradient concerns throughout the learning process. The residual network is a new concept proposed in this paper as a way to deal with this issue. Backpropagation of the gradient in ResNet is made possible via an “identity shortcut connection” in the network’s architecture. Compared to other CNN, FCN (U-Net), and Un-Res approaches, this one has the highest accuracy and computation time ratings. Compared to previous systems, the strategy can produce a threefold reduction in computing time. Low-grade gliomas may be identified using the suggested method. Changes to the model structure or system settings are required for better segmentation outcomes in the feature extraction
Deep Neural Ideal Networks for Brain Tumour Image Segmentation
313
technique for LGG brain tumours, to increase the accuracy, precision, and reliability of MRI-based tumour segmentation.
References 1. Sauli R, Akil M, Kachori R, et al (2018) Fully automatic brain tumour segmentation using endto-end incremental deep neural networks in MRI images. Comput Methods Programs Biomed 166:39–49 2. Goetz M et al (2015) DALSA: domain adaptation for supervised learning from sparsely annotated MR images. IEEE Trans Med Imaging 35(1):184–196 3. Farahani K, Menze B, Reyes M (2014) Brats 2014 challenge manuscripts. http//www. Brain tumour segmentation. Org 4. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828 5. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554 6. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Proc Syst 153–160 7. Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area V2. Adv Neural Inf Proc Syst 873–880 8. Wang G, Li W, Vercauteren T, Ourselin S (2019) Automatic brain tumour segmentation based on cascaded convolutional neural networks with uncertainty estimation. Front Comput Neurosci 13:56 9. Mukherjee P, Mukherjee A (2019) Advanced processing techniques and secure architecture for sensor networks in ubiquitous healthcare systems. Sens Health Monit 3–29 (Elsevier) 10. Bauer S, Wiest R, Nolte LP, Reyes M (2013) A survey of MRI-based medical image analysis for brain tumour studies 11. Leece R, Xu J, Ostrom QT, Chen Y, Kruchko C, Barnholtz-Sloan JS (2017) Global incidence of malignant brain and other central nervous system tumours by histology, 2003–2007. Neuro Oncol 19(11):1553–1564 12. Dolecek TA, Propp JM, Stroup NE, Kruchko C (2012) CBTRUS statistical report: primary brain and central nervous system tumours diagnosed in the United States in 2005—2009. Neuro Oncol 14(5):1–49 13. Louis DN et al (2016) The 2016 World Health Organization classification of tumours of the central nervous system: a summary. Acta Neuropathol 131(6):803–820
Exploring Correlation of Deep Topic Models Using Structured Topic Coherence G. S. Mahalakshmi, S. Hemadharsana, K. Srividhyasaradha, S. Sendhilkumar, and C. Sushant
Abstract Correlated topic models are actually built upon Latent Dirichlet Allocation which are inherently probabilistic. This paper attempts to improvise deep topic models by introducing correlation measures. The existing correlated topic model is coupled with deep topic modeling to produce better topic coherence. Structured topic coherence evaluation performed over research articles of Journal of Bio-medical Semantics endorses the performance of deep correlated topic model over LDA, HDP, CTM, deep LDA and deep HDP. Keywords Deep learning · Correlation · Correlated deep topic model · LDA · HDP
1 Introduction Topic models are used for identifying the essence of underlying text data. These are widely applied in bibliometric research to get better interpretation of the published research articles. Research on topic models dates back to 1983 (refer Fig. 1). Though the discussion on topic model research is increasing, there are few key contributors apart from notable applications on topic models. Table 1 depicts various topic models and its abbreviations.
G. S. Mahalakshmi (B) · S. Hemadharsana · K. Srividhyasaradha · C. Sushant Department of Computer Science and Engineering, Anna University, Chennai, Tamil Nadu, India e-mail: [email protected] S. Sendhilkumar Department of Information Science and Technology, Anna University, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_28
315
316
G. S. Mahalakshmi et al.
# arcles in Google Scholar
8000 # arcles
7000 6000 5000 4000 3000 2000 1000
2021
2019
2017
2015
2011
2013
2009
2007
2005
2001
2003
1999
1997
1993
1995
1991
1987
1989
1985
1983
0
Year Fig. 1 Scientific articles discussing “Topic Models”—Google Scholar (as on 22.04.2022)
Table 1 Abbreviations of various topic models
Topic model
Topic model-abbreviations
LDA [2]
Latent dirichlet allocation
HDP [4]
Hierarchical dirichlet process
CTM [1]
Correlated topic model
DLDA [6, 8]
Deep latent dirichlet allocation
DHDP [6, 8]
Deep hierarchical dirichlet process
dCTM
Deep correlated topic model
2 Related Works Latent semantic indexing [5] followed by Latent Dirichlet Allocation [2] was the key contributors that started the topic modeling research. Hierarchical Dirichlet Process (HDP), Hierarchical Pitman Yor Dirichlet Language Model [4], Nested HDP [9], to be list a few. Structural topic modeling (STM) is a notable improvement in probabilistic topic models. STM uses expectation–maximization technique to decide upon the topic distributions within a document [11]. Deep topic models [6, 8] learn topic distributions from the contextual sentences which are picked by an auto-encoder which is a recurrent neural network with back propagation comprising three hidden layers. Neural variational document model (NVDM) [7] uses multivariate Gaussian distribution as a prior distribution for representing latent spaces. LDA-VAE and ProdLDA are also derived on LDA using variational auto-encoder technique [12]. Here, multivariate Gaussian distribution is replaced by logistic normal distribution for latent space representations. Local Latent Dirichlet Allocation (LLDA) model extracts words from overlapping windows. Therefore, unlike LDA, the topic of a word only affects the topic proportion of words which are in local proximity to the given word [10]. Earlier, Biterm topic
Exploring Correlation of Deep Topic Models Using Structured Topic …
317
models [3] used word co-occurrence patterns to produce topic distributions form a given short text. Correlated topic models [1] utilize the correlation between topic proportions. These tend to use logistic normal distributions unlike LDA. Of late, word embedding is also attempted with correlated topic models [13]. This paper proposes the improvisation of deep topic models by exploring deep topic correlations.
3 Correlated Deep Topic Models (dCTM) Deep topic models [6, 8] like DLDA and DHDP generate contextual topic proportions from the underlying text via deep text aka deep contextual sentences. Correlated deep topic models utilize the word correlations within and across the topic proportions of the underlying deep text as given by the deep auto-encoder of deep topic models. Figure 2 shows the proposed work of dCTM. The plate diagram for CTM and dCTM is in Fig. 3. The topics are apportioned from the sentences S which are actually derived from the original set of sentences of D. In other words, the topics Nds are contextual representations of Nd topics. The graphical model of dCTM (refer Fig. 3b) shall be interpreted as in Eq. 1. T i=1
⎞⎞ ⎛ ⎛ Nds
N d ⎜ P z di_s,di |Z di ⎟⎟
⎜ ⎟⎟ ⎜ P(ϕi |β) P θd | f N μ, P Z di,d |θd ⎜ ⎠⎠ ⎝ di_s=1 ⎝
d=1 di=1 Wsi,di_s |ϕ1:T , Z di,d (1) Di
where D i —Documents, Nd —Number of documents,Nds —Number of document topics summarized, μ—k Length vector of means, —k x k covariance matrix ( ) β—Topic parameter, ϕi —Topics, θd —Per-document topic proportions, Z di — Per-word topic assignment, Z dis —Per-word topic distribution, and Wsi —Perceived word. Therefore, by utilizing the contextual topic-word correlations, the proposed dCTM better learns to project the deep correlations of the words instead of exploring
Research Articles
Processed texts
Deep Stacked Sparse Auto Encoder
Deep Texts
Fig. 2 Deep correlated topic model-proposed work
Correlated Topic Model
Deep Correlated Topics
318
G. S. Mahalakshmi et al.
Fig. 3 Graphical model for a correlated topic model (CTM), b correlated deep topic model (dCTM)
all-word correlations. This benefits in reducing the processing time of large text data by equally preserving the underlying discussion context.
4 Results The research articles of Journal of Bio-medical Semantics of years 2013–2020 are downloaded manually and labeled as dataset. The research articles in pdf format were converted to text format and the spelling errors inserted during conversion were corrected. Also, the references of the articles were removed. The dataset has only the text portion of research article and the figures if any were hereby eliminated in the process of conversion. This text only dataset comprising 335 articles spanning across years 2013–2020 (refer Table 2) is further sent for correlated deep topic modeling. The resulting topics were evaluated with structured topic coherence [6]. Assumed that LDA [2], HDP [9], DLDA [6], DHDP [6], and CTM [1] as baseline topic models for comparison. Structured topic coherence which is derived from intrinsic structured UMass measure as defined by [6] is given as follows: Table 2 Journal of bio-medical semantics dataset Year
Total articles downloaded
Year
Total articles downloaded
2013
52
2017
57
2014
57
2018
24
2015
41
2019
24
2016
66
2020
14
Exploring Correlation of Deep Topic Models Using Structured Topic …
319
Table 3 Journal of Bio-medical Semantics–Average Structured Topic Coherence (close to zero, the better) Articles Year
LDA
HDP
DLDA
DHDP
CTM
DCTM
2013
− 0.439
− 0.296
− 0.428
− 0.245
− 0.031
− 0.027
2014
− 0.435
− 0.298
− 0.430
− 0.260
− 0.031
− 0.027
2015
− 0.460
− 0.306
− 0.452
− 0.266
− 0.031
− 0.017
2016
− 0.446
− 0.322
− 0.437
− 0.255
− 0.034
− 0.026
2017
− 0.464
− 0.324
− 0.448
− 0.265
− 0.033
− 0.029
2018
− 0.426
− 0.318
− 0.416
− 0.266
− 0.031
− 0.026
2019
− 0.426
− 0.285
− 0.411
− 0.270
− 0.029
− 0.026
2020
− 0.404
− 0.291
− 0.393
− 0.268
− 0.027
− 0.018
D wis , w sj + 1
struc_scoreUMass (wi , w j ) = log D wis
(2)
where D wis is the number of documents containing the word wi in the respective s s section s, D wi , w j is the number of documents containing both the words wi and w j co-occurring in the document in the respective section s, and D the total count of documents in the corpus. Table 3 depicts the structured topic coherence comparison across topic models. It is very much visible that CTM and dCTM are very close to zero intrinsic structured topic coherence depicting better coherence of derived topics. Word distribution is fixed as 10 across all topic models. Topic distribution is fixed as 10 for LDA, DLDA, CTM, and dCTM. Table 4 presents the top topic-words across all topic models. Figure 4 depicts the close comparison of structured topic coherence across CTM and dCTM for JBS articles of year 2020. It is very much visible that dCTM filters the relevant topics well within the context in a much faster manner compared to traditional CTM. The dataset and the topics obtained for all topic models are available in GitHub.
5 Conclusion Structured topic coherence captures the accurate topic modeling performed by deep CTM. Though deep CTMs are much faster compared to regular CTMs, it is a two-step process which involves harvesting the deep contextual sentences before feeding to CTM. Further, comparison of performance of deep CTMs across other neural topic models is reserved for future work.
320
G. S. Mahalakshmi et al.
Table 4 Top topic-words with probabilities across topic models Topic-word #
LDA
1
Disease
HDP 0.128
DLDA
Disease
0.088
Phenotypes
0.447
2
Method
0.060
Bacterial
0.077
Classified
0.224
3
Words
0.039
Disease
0.077
Arousal
0.224
4
Subclass
0.027
Terms
0.058
Signs
0.002
5
Classes
0.024
Class
0.047
Amrf
0.002
6
First
0.024
Method
0.045
Leaman
0.002
7
Method
0.021
Diseases
0.041
Islamaj
0.002
8
Classes
0.021
Ontologies
0.032
Dogan
0.002
9
Neural
0.018
Infectious
0.030
Lu
0.002
10
Syndrome
0.015
Based
0.030
Dnorm
0.002
Topic-word #
DHDP
1
Et
0.354
Disease
CTM 0.104
Normalization
dCTM 0.216
2
Leaman
0.178
Text
0.049
Dog
0.216
3
Lu
0.178
Ontologies
0.043
Positive
0.216
4
Occur
0.178
Use
0.029
Abnormal
0.216
5
Sleep
0.002
Learning
0.029
Et
0.002
6
Bigdata
0.002
Whatizit
0.020
Text
0.002
7
Project
0.002
System
0.015
Bigdata
0.002
8
Run
0.002
Model
0.015
Project
0.002
9
Files
0.002
Anatomical
0.015
Run
0.002
10
Keywords
0.002
Technology
0.015
Files
0.002
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
Strucured Topic Coherence
-0.005 -0.01 -0.015 -0.02 -0.025 -0.03 -0.035 -0.04
Arcles CTM
DCTM
Fig. 4 Structured topic coherence of manuscripts—journal of bio-medical semantics-2020
Exploring Correlation of Deep Topic Models Using Structured Topic …
321
References 1. Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17-35 2. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84 3. Cheng X, Yan X, Lan Y, Guo J (2014) Btm: Topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941 4. Chien JT (2015) Hierarchical Pitman-Yor-Dirichlet language model. IEEE Trans Audio Speech Lang Proc 23(8):1259–1272 5. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 14 6. Mahalakshmi GS, Hemadharsana S, Muthuselvi G, Sendhilkumar S (2018) Learning deep topics of interest. In: International conference on computational vision and bio inspired computing, Springer, Cham, pp 1517–1532 7. Miao Y, Yu L, Blunsom P (2016) Neural variational inference for text processing. In: International conference on machine learning, PMLR, pp 1727–1736 8. MuthuSelvi G, Mahalakshmi GS, Sendhilkumar S, Vijayakumar P, Zhu Y, Chang V (2018) Sustainable computing based deep learning framework for writing research manuscripts. IEEE Trans Sustain Comput. ISSN: 2377-3782 9. Paisley J, Wang C, Blei DM, Jordan MI (2015) Nested hierarchical Dirichlet processes. IEEE Trans Pattern Anal Mach Intell 37(2):256–270 10. Rahimi M, Zahedi M, Mashayekhi H (2022) A probabilistic topic model based on short distance co-occurrences. Expert Syst Appl 116518 11. Roberts ME, Stewart BM, Tingley D (2019) STM: an R package for structural topic models. J Stat Softw 91(1):1–40. https://doi.org/10.18637/jss.v091.i02 12. Srivastava A, Sutton C (2017) Autoencoding variational inference for topic models. In: 5th international conference on learning representations (ICLR), Toulon, France 13. Xun G, Li Y, Zhao WX, Gao J, Zhang A (2017) A correlated topic model using word embeddings, In IJCAI, pp 4207–4213
Value-Added Tax Fraud Detection and Anomaly Feature Selection Using Sectorial Autoencoders Nasser A. Alsadhan
Abstract The intentional alteration of a tax return form with the intent to reduce one’s tax base is known as tax fraud. Underreporting, which entails filing a tax return with a lower tax base by either raising purchases or lowering sales, is one of the most widespread types of tax fraud. Such an action weakens government spending by reducing government revenues. Since there are not enough resources or auditors to handle the situation, tax authorities must come up with low-cost solutions. Therefore, one of their top priorities should be identifying tax fraud. The vast bulk of research on tax fraud detection is based on supervised machine learning techniques that make use of the findings of tax return audits. Unfortunately, access to audited and labeled tax returns is quite restricted because it is an expensive and time-consuming process. This places severe restrictions on supervised machine learning techniques. The work in this paper focuses on finding solutions to these constraints. We specifically outline our method for finding anomalies in tax returns using stacked autoencoders (SAEs), along with a probability distribution of the suspicious values for each field on the tax return form. By comparing the outcomes of our method with two existing anomaly techniques that have been utilized in the literature, we show how well our model can identify current tax fraud schemes.
1 Introduction Tax fraud is a widespread issue that affects governments’ revenue, which has a direct impact on the country’s economy. The purposeful alteration of a tax return form with the intention of lowering one’s tax base is known as tax fraud. According to recent estimates, tax fraud costs the government 500 billion dollars annually [1, 2]. To assure long-term revenue for the future, it is crucial for governments to create effective tactics for detecting tax fraud and to take steps to mitigate its effects. They need to construct efficient models that can detect tax fraud in order to efficiently verify the compliance of the taxpayers. N. A. Alsadhan (B) College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_29
323
324
N. A. Alsadhan
As shown by Castellón in [3], the two main strategies used by tax authorities to combat tax fraud are rule-based systems and auditors’ prior experience. The first technique entails picking tax declarations at random and determining fraud using the auditor’s prior auditing experience and domain knowledge. The second approach makes advantage of approaches developed for rule-based systems, which can need a protracted procedure. Data mining techniques, which offer ways to extract and develop information from a sizable amount of data to support the discovery of fraudulent conduct and better the use of tax authorities’ resources, are a more contemporary method of detecting tax fraud [3–5]. The majority of studies in the literature use annotated data along with supervised learning approaches [6]. Because verifying tax statements is a time-consuming and expensive process, access to classified data is severely constrained. This limits the use of supervised techniques for finding tax fraud. Unsupervised techniques, on the other hand, can harness the full potential of the data as they do not require any labels. Studies that have used unsupervised techniques in the literature are few, which motivates the work of this paper: An anomaly detection approach to label each tax return with an anomaly score, as well as giving each field of the tax return, forms an anomaly percentage which helps the auditor to examine questionable fields. In order to achieve this, we use stacked autoencoders which have proven their success in multiple domains [7–10].
2 Literature Review Using data mining, the literature on detecting VAT fraud can be divided into methods that are supervised and unsupervised. The choice of what kind of strategy to employ depends on the availability of tax return labels. Therefore, the labeled data, which is typically low in tax administration, are a limitation of supervised approaches. An overview of the data mining techniques used by tax administrations around the world is provided by Castell’on et al. in [3]. Although supervised classification techniques have shown to be beneficial in many applications for fraud detection, in the case of VAT fraud detection, the labeled data only make up a small portion of the population. As a result of the labeled population’s lack of general representation, the supervised model will exhibit sample selection bias. Additionally, because auditing takes time, the fraud techniques discovered may already be out of date as a result of new fraud strategies. Therefore, since the classifiers can only distinguish between known fraud patterns [11, 12], auditing based on supervised models may prove to be inappropriate in the long run due to the dynamic nature of fraud methods. In a tax administration setting, unsupervised models, such as anomaly detection (AD) approaches, which identify outliers that exhibit traits that deviate noticeably from the majority of the population, are preferable. Instead of relying on a small, frequently biased labeled sample, AD approaches can make use of the entire data available. They make it possible to identify novel fraud tactics that vary from accept-
Value-Added Tax Fraud Detection and Anomaly …
325
able norms of behavior. A number of fraud types, including customs fraud [13], credit card fraud [14], urban demarcation tax fraud [15], and healthcare fraud [16], have demonstrated the usefulness of AD approaches. In order to detect outliers from regular data, AD approaches rely on assumptions. Therefore, the success of any AD strategy depends on how well-founded these presumptions are for the data at hand. To function well, AD approaches often require two fundamental premises [12, 17]: The first presumption is that there are significantly fewer anomalies than typical occurrences. The first presumption is met in the scenario of tax evasion [4]. The second premise is that anomalies naturally have a behavior in terms of how they portray their features that distinguishes them from typical occurrences. Although certain fraud tactics might try to hide their behavior and make an effort to imitate legitimate types of conduct, it appears that the second premise is at least somewhat supported. AD methods have rarely been used in the domain of VAT fraud detection [6]. One of the few studies done on tax fraud using AD methods is the work of Vanhoeyveld [6]. They use Belgium VAT declarations as their data. Their approach rely on sector profiling as a first step, followed by applying two AD techniques: fixed-width anomaly detection and local outlier factor. Another study done by Assylbekov et al. [18] utilizes a statistical outlier identification technique that surpass the Kazakhstani tax authority’s supervised model.
3 Proposed Approach 3.1 Stacked Autoencoders In our approach, we focus on using stacked autoencoders (SAEs) as an anomaly detection technique [19]. An SAE can be used to map the input features into a smaller set of features and back to a reconstructed value of the input features as shown in Fig. 1. The mapping is not 100% accurate as there is a loss of information when the number of features is reduced. We measure this loss by calculating the mean square error (MSE) between the inputs and the reconstructed inputs. We can use the MSE value to determine anomaly scores for the tax return forms. The idea behind this is if a feature’s value is common in all tax returns then the SAE will learn how to represent that value when it does the reduction of features, so the MSE score for that feature is going to be low, but when a value is rare, the SAE weights will not be able to map it as accurately as the common value. Therefore, creating a higher MSE score. We can use this information in two ways: first, to add all the MSE scores for all the features to determine an anomaly score. Second, to use the MSE score for each feature as a percentage of the total anomaly score to indicate which feature should the auditor focus on when they audit the tax return. The higher the MSE score for a certain feature the more suspicious that feature value is.
326
N. A. Alsadhan
Fig. 1 An SAE network with an encoder that learns the weights that best capture the relationship between the input (bottom layer) and the hidden layer h , and a decoder that decodes the hidden layer h to a reconstructed input (top layer)
3.2 Sectorial SAEs It is crucial to take into consideration the sector and business size of the taxpayer when we do the above analysis. If we compare a small business with a very large one, the values of the features we are considering will have a different range which has a significant impact on the way we calculate the anomaly score. For each sector and business size, we create a separate SAE model. This will ensure that the values for each feature are roughly in the same range. We can measure the anomaly score in two ways: First, by comparing the taxpayer tax return to their other tax returns. This will measure if any of the taxpayer returns is different from the rest of their returns. Second is by comparing the tax return of the taxpayer with the tax returns of their sector/business size in the same time period. This was achieved by calculating two modified z-score values for each field in the tax return. Unlike the traditional z-score calculation, the modified z-score takes into consideration anomalies when calculating the score. It uses the median instead of the mean and the median absolute deviation instead of the standard deviation. This insures that anomalies do not alter the scores of the population, and at the same time, anomalies will still have a score that is far away from zero. Equation 1 gives the modified z-score formula. The first value is derived by calculating the modified z-score value for all the taxpayer tax return fields. If one of the taxpayer tax return fields differs significantly from the rest, it will have a modified z-score further away from zero. The SAE will have a higher error for this value since it is multiple median absolute deviations away
Value-Added Tax Fraud Detection and Anomaly …
327
from the median. The second value is derived by calculating a modified z-score value between the taxpayer tax return and the tax returns of their sector/business size in the same time period. This is done for each of the taxpayer tax returns. In addition, this value will indicate if the taxpayer differs from their sector/business size, and in the same way, the SAE will have a higher error for this value if it is considered an anomaly. yi =
xi − X˜ ; MAD
X˜ = median of X MAD = median(|xi − X˜ |)
(1)
where X is a tax return feature, and xi is the value for tax return i, and yi is the modified z-score value for xi . The time complexity for calculating the score is O(n + m), where n is the number of taxpayers, and m is the number of distinct time periods in each sector/business size. Using this approach gives us an anomaly score that takes into consideration if the tax return is an anomaly when compared to tax returns of the same taxpayer and when compared to the tax returns of the same sector/business size. In addition, we calculate the percentage of MSE score in each feature. This gives the auditor additional information when they audit the taxpayer, by knowing if a value for a certain feature is an anomaly compared to the taxpayer’s tax returns, or an anomaly compared to the same sector/business size tax returns. It is important to note that not each anomaly case is considered risky. For example, if a taxpayer’s tax return has a high anomaly score for the sales field, the anomaly could be in the positive side. This means that this tax return has a high sales value compared to the population. From the tax authority’s point of view, this is not a risky case (higher sales produce more tax). On the other hand, a high anomaly score for the purchase field which is higher than the population is considered risky (higher purchases produce less tax). Therefore, for sales, the risk lies in a modified z-score that is below zero, and for purchases, the risk lies in a modified z-score that is above zero. Therefore, for each field do the following steps: Determine the risk direction (a modified z-score score below or above zero), if the risk lies in scores below zero, then change all scores above zero to zero, if the risk lies in scores above zero, then change all scores below zero to zero. This additional step insures that any anomaly case produced by our model is considered risky for the tax authority.
4 Experiment and Results 4.1 Data We use the Saudi VAT tax returns as our dataset, which includes 2m tax returns (120k audited tax returns). The dataset has 42 features which consists of tax return fields and financial ratios, as well as the business sector/size of the taxpayer (26 sectors/5
328
N. A. Alsadhan
sizes). It is important to note that due to confidentiality, we are severely limited with the amount of details we can share about the features that we are deriving from the tax return form and the results obtained from them, as well as the code used. However, this does not affect the effectiveness of our approach. Our model can be applied to any set of features. Based on the discussion in Sect. 3.2, for the tax returns in each sector/business size do the following: Calculate the modified z-score for the tax returns of each taxpayer, and the modified z-score for the tax returns of the same time period for all taxpayers in the same sector/business size. The result is 84 features, 42 features that capture the differences between the taxpayer’s returns, and 42 features that capture the differences between the taxpayer and their sector/business size tax returns in the same time period.
4.2 Results and Discussion For each sector/business size, we obtain an anomaly score for each tax return. We noticed that the anomaly scores drop sharply around the top 3–4% across all sectors/business sizes. Based on that, we chose our cut off for an anomaly to be at the top 4%. In addition, we noticed that our anomaly model captured 10% of the cases that have already been audited. This indicates that 10% of current auditing strategies actually capture anomalies. Based on that intersection, we are able to assess the strike rate for those tax returns (11,000∼ cases) at 73% using the data from all sectors/business sizes (due to confidentiality we cannot disclose ZATCA’s strike rate). Furthermore, we compare our results with two other techniques from the literature, isolation forests [20], and local outlier factor [21]. The results are reported in Table 1. Since each technique outputs a different set of anomalies, which contain audited and non-audited tax returns, and that we are constrained with the historically labeled data available for precision measure on those anomalies, the strike rate reported for each technique has a different audited population. The results vary between different sectors/business sizes, with the lowest match for our model being 42% and the highest match being 100%. It is important to note that these results are based on only a small sample of what we marked as an anomaly, and therefore, it is not reflective of the overall performance of our model. In order to have a representative strike rate for each sector/business size group, we only considered sector/business size groups that have 50 or more audited cases from the anomaly output. Beside detecting anomalies, our results were able to give us insights about each sector/business size. The first thing we noticed is that some sectors have a higher strike rate 90–100% (based on the 10% audited sample). This indicates that some sectors/business sizes are more homogeneous in terms of their financial performance. Thus, making the identification of fraud through anomaly detection easier. This was further confirmed by examining in which features the anomalies happened. This examination helped us to detect a behavior in one of the tax return fields which led to a higher probability of fraud in that sector/business size. Other sectors/business
Value-Added Tax Fraud Detection and Anomaly …
329
Table 1 Results obtained using the proposed model and state-of-the art approaches on ZATCA dataset Group/algorithm Proposed model Isolation forest [20] Local outlier factor (11k cases) (%) (12k cases) (%) [21] (9k cases) (%) Historical data Lowest sector/ business size Highest sector/ business size Test set (200 cases)
73 42
59.6 25
66 32
100
100
100
100
NA
NA
sizes had a lower strike rate (42–65%); this led us to two conclusions: The first is that the population is more heterogeneous, and therefore, anomalies are less likely to be considered as fraud. The second, which happened after further investigation, is that there is a misclassification in the sector for some taxpayers. Based on the results above, some limitations were found in our proposed approach. The model assumes that the sector/business size under observation is homogeneous. Moreover, what is considered as a suspicious tax return is different for each sector. Thus, using the model alone might not yield good results as a case might be considered an anomaly by our model, but that anomaly is not considered as a suspicious behavior after further examination by a domain expert. We were able to test our model by sending 200 cases from a homogeneous sector with a high suspicion of fraud in one of two fields. Our strike rate was 100% with the fraud happening in these fields. It is important to note that the 100% strike rate was possible because we were able to detect a fraudulent behavior in one of the sectors that guarantees a change in the VAT amount, and that such a strike rate is not the norm. Based on these results, it is important to take into consideration the characteristics of each sector/business size when choosing cases to be audited. The results obtained here are used as a part of bigger decision process that determines if a tax return needs to be audited or not. Our results highly suggest that studying the behavior of anomalies in homogeneous sectors/business sizes leads to discovering new tax evasion strategies deployed by taxpayers.
5 Conclusion In this paper, we presented an anomaly detection technique that allows tax authorities to prioritize in a data-driven fashion the tax returns to be audited without requiring historically labeled data. In addition, we mark which fields are more suspicious in each tax return. This approach also leads to discovering new tax evasion strategies deployed by taxpayers.
330
N. A. Alsadhan
Some limitations exist in our proposed approach. The model assumes that the sector/business size under observation is homogeneous. Moreover, what is considered as a suspicious tax return is different for each group. Therefore, using the model alone might not yield good results as a case might be considered an anomaly by our model, but that anomaly is not considered as a suspicious behavior after further examination by a domain expert. Extensions of our work include adding additional financial ratios and growthrelated features, as well as determining the homogeneity of each sector and the features in it based on the opinion of tax experts, and to incorporate that information when we calculate the anomaly score. Furthermore, a supervised auditing strategy on its own is insufficient because it is limited to detecting known fraud strategies. After a certain period of time, such a model may become outdated. However, on the known set of fraud strategies, supervised techniques consistently outperform unsupervised methods. As a result, future work should focus on auditing strategies that combine supervised and unsupervised detection methods, as well as other criteria. This strategy allows for the weighting of each detection method, resulting in an optimal auditing strategy in the short and long term.
References 1. Cobham et al (2018) Global distribution of revenue loss from corporate tax avoidance: reestimation and country results. J Int Dev 30(2):206–232 2. Crivelli et al (2015) Base erosion, profit shifting and developing countries. International Monetary Fund 3. González et al (2013) Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Syst Appl 40(5):1427–1436 4. Dias et al (2016) Signaling tax evasion, financial ratios and cluster analysis. BIS Q Rev 5. Dastgir et al (2016) Using data mining techniques to enhance tax evasion detection performance. Iran Natl Tax Admin (INTA) 23(28) 6. Vanhoeyveld et al (2020) Value-added tax fraud detection with scalable anomaly detection techniques. Appl Soft Comput 86:105895 7. Alsadhan et al (2015) Comparing SVD and SDAE for analysis of Islamist forum postings. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 948–953 8. Lin et al (2013) Spectral-spatial classification of hyperspectral image using autoencoders. In: 2013 9th international conference on information, communications & signal processing. IEEE, pp 1–5 9. Choi et al (2019) Unsupervised learning approach for network intrusion detection system using autoencoders. J Supercomput 75(9):5597–5621 10. Ferreira et al (2020) Recommendation system using autoencoders. Appl Sci 10(16):5510 11. Bolton et al (2002) Statistical fraud detection: a review. Stat Sci 235–249 12. Eskin et al (2002) A geometric framework for unsupervised anomaly detection. In: Applications of data mining in computer security. Springer, pp 77–101 13. Rad et al (2015) A novel unsupervised classification method for customs fraud detection. Indian J Sci Technol (8):35 14. Bolton et al (2001) Unsupervised profiling methods for fraud detection. Credit scoring and credit control VII, pp 235–255
Value-Added Tax Fraud Detection and Anomaly …
331
15. de Roux et al (2018) Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 215–222 16. Tang et al (2011) Unsupervised fraud detection in Medicare Australia. In: Proceedings of the ninth Australasian data mining conference, vol 121, pp 103–110 17. Chandola et al (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58 18. Assylbekov et al (2016) Detecting value-added tax evasion by business entities of Kazakhstan. In: International conference on intelligent decision technologies. Springer, pp 37–49 19. Hinton et al (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507 20. Xu et al (2017) An improved data anomaly detection method based on isolation forest. In: 2017 10th international symposium on computational intelligence and design (ISCID). IEEE, vol 2, pp 287–291 21. Cheng et al (2019) Outlier detection using isolation forest and local outlier factor. In: Proceedings of the conference on research in adaptive and convergent systems, pp 161–168
Hybrid Intrusion Detection System Using Machine Learning Algorithm N. Maheswaran, S. Bose, G. Logeswari, and T. Anitha
Abstract Random forest (RF) algorithm is utilized in the development of intrusion patterns in hybrid intrusion detection system (HIDS) over training data. With the deployment of outlier detection mechanism, the intrusions are detection for anomalies. Hybrid detection system brings the pros of both anomaly and misuse detection so as to enhance the detection performance. Various ML-based NIDS have been proposed earlier to safeguard the user from malicious online attacks. In this background, the focus of current study is upon performance assessment of hybrid approaches such as K-means clustering with RF classifier and Gaussian mixture clustering with RF classifier for intrusion detection. The proposed framework was evaluated for its performance with the help of NSL-KDD, an intrusion detection dataset. As per the study outcomes, there was a significant reduction achieved by the proposed model in terms of feature set size (20%) and required training sample size (up to 80%). In this study, the researchers proposed a novel hybrid ML-based NIDS framework to mitigate the computational complexity without compromising the performance of intrusion detection. Among all kinds of attacks, the suggested system demonstrated excellent performance with high accuracy and a low false positive rate. Keywords Network intrusion detection system · Hybrid intrusion detection system · Machine learning algorithm
N. Maheswaran (B) · S. Bose · G. Logeswari · T. Anitha Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_30
333
334
N. Maheswaran et al.
1 Introduction In recent years, the growth experienced in Information and Communication Technologies (ICT) is enormous in volume while, in parallel, cyber-attacks too have increased at a phenomenal rate. Preventing cyber-attacks on computers and ensuring data privacy and data security is a primary challenge for both organizations and individuals. Various investigations have been conducted on development of techniques and approaches to safeguard the system and data from attacks. It is possible to provide data protection by blocking data access, capturing data and preventing data transfer or data corruption by unauthorized persons. Intrusion detection system (IDS) is a system that conducts an evaluation about network traffic data on computer networks and checks for any malicious activities. In case of any malicious activities, it quickly alerts the user. In recent years, cyber-attacks have evolved and increased upon fully integrated servers, communication networks and applications due to high prevalence of IoT. For a certain period of time, the attacks on IoT network remain unnoticed or unobserved. This scenario reduces the effectiveness of the devices and harms the end users. Further, it incurs heavy costs, impacts revenue generation and increases cyber-threats and identity misusages. In order to ensure safety and security in IoT network, the interface assaults must be periodically verification on real-time basis. In this scenario, the current research work introduces a smart intrusion detection system that is applicable for IoT-based attacks. To be specific, a deep learning algorithm is deployed in this study to identify malicious traffic on IoT network. In this system, identity solution makes sure that the operations are secure while simultaneously it promotes the inter-operational activities of IoT connectivity protocols. Intrusion detection system (IDS) is the most popular network security technique that is deployed at most of the times to safeguard the network. The study results infer that the proposed architecture has the ability to detect and identify the real global intruders. The application of neural networks in cyber-attack detection is found to be exceptional. Further, user-focused cyber-security solutions should be given high priority since there exists a need to collate process and analyze big data for future-ready network connections such as 5G [1]. The auto-encoder model provided excellent performance in terms of reduced detection timing and enhanced detection performance of 99.76% accuracy. The intrusion detection system classifies the usual and unusual network traffic and also tries to protect the system for being attacked [2]. The inconsistencies in the system are identified by anomaly detection systems which assess the compatibility of the system under normal traffic. On the other hand, misuse detection systems deal with established attack symptoms. But, misuse detection systems are vulnerable to attacks as per the literature [3–6] since it shows effectiveness against established attacks only. These systems achieve low error rate and successfully detect the inconsistencies if it is known attack. Network traffic is constantly monitored by anomaly detection systems which in turn help in developing a normal traffic system so as to identify unknown attacks. An attack can be described as an unusual traffic observed
Hybrid Intrusion Detection System Using Machine Learning Algorithm
335
in a network. IDSs form a part of different ML methods in which the computer is provided training based on network traffic data. In case of encountering any unusual situations, the system is designed to alert the user [7]. The remainder of the paper is organized as is provided here. An overview of the experiments done on the NSL-KDD dataset, feature selection (FS) on ML algorithms and the data used in the system is given in Sect. 2. The proposed hybrid layer IDS model is detailed in Sect. 3 along with data pre-processing and FS techniques. Further, the evaluation criterion is explained in Sect. 4 in addition to performance analysis.
2 Related Works The studies conducted earlier upon IDS system design are discussed in this section. The section further covers information on feature selection techniques too. From the table that assessed the studies conducted earlier, it can be understood that the systems have been developed using different machine learning techniques together. The most widely used datasets used in experiments with developed systems were KDD-CUP99 and NSL-KDD as per the literature. As per the review of literature, it has been found that most of the studies did not include feature selection techniques in it. There is a drastic increase observed in computer technology and networking these days, thanks to increasing penetration of ICT. On the other hand, attacks on networks have also increased since online-mode has become the preferred one for a number of tasks these days. DOS poses huge attacks on the network and threats to the stability of the network [5]. DDOS is one of the established and deadly attacks that have the potential to collapse the computer network. In this attack type, the server is attacked by the attacker using DDOS and interrupts the traffic on network. There exist subclasses in this attack type which include distributed denial of service too. To identify and counter DDOS attacks, numerous research studies have been carried out, and machine learning and deep learning approaches play a significant part in these research [8]. For secure, reliable and secure communication and data transmission in today’s world of ever-increasing communication needs, we want comprehensive and efficient network systems with equally efficient and trustworthy security measures integrated [5]. In the mixed approach proposed by Puela and St. John’s, the authors integrated the best aspects from various FS techniques. This study determined six characteristics for NSL-KDD dataset while their outcomes were contrasted with the help of different classification methods [9]. The hybrid FS algorithm, proposed by Sethuramalingam and naganathan [10], was tested on NSL-KDD dataset after combining information gain and genetic algorithm. In this study, a layered as well as a hybrid IDS was proposed with the help of ML algorithms to compensate the shortcomings found earlier. The aim of the proposed system is to forecast the types of attacks using two distinguishable feature selection methods. High accuracy rate and false positive
336
N. Maheswaran et al.
rate were set as threshold for the best performing algorithm using ML techniques, according to the type of attack, and the resultant algorithm was used in computer design.
3 NSL-KDD Dataset Network infrastructures experience a lot of attacks on a daily basis. The target of such attacks is to disrupt the integrity, availability and confidentiality of the network. One such attack type is distributed denial of service (DDoS) which tend to create a negative impact on the availability of network. This attack is executed through command and control (C&C) mechanism. Various methods have been proposed earlier by researchers to overcome these attacks using machine learning techniques. The current research paper uses WEKA tool to detect the DDoS attacks using machine learning technique while the attack was made by following ping of death technique. In this study, the NSL-KDD dataset was used to evaluate the model, and the random forest technique was used to distinguish between attacks and normal samples. The suggested model correctly categorized the attacks by 99.76%. [2]. Better IDS is necessary for a secure system’s data connection process without sacrificing performance metrics. Monitoring node movements and data transfer events that occur in the system for potential intrusions is called intrusion detection. [1]. KDD-CUP99 dataset is one of the prominent datasets used for assessing the effectiveness of testing systems in terms of detecting network traffic inconsistency. There are many types of attacks recorded in this dataset. But, as per the literature, the dataset contains some adverse cases which tend to degrade the performance of the proposed system, whenever it is used for validation. In order to overcome this challenge, some entries in KDD-CUP99 dataset should be removed and use it as a new dataset in the name of NSL-KDD [8] to validate the proposed system. From NSL-KDD, it is essential to remove the unwanted samples so that the dataset can be used for training purposes. Further, the dataset size is also fixed at a reasonable value to detect the anomalies. For current study operations, the researchers used 20% NSL-KDD dataset with training attributes such as attribute, type of data, description and MIN and max values. Denial of Service Attack (DoS) In this type of attack, the network traffic gets drastically increased due to which the system becomes unavailable for rendering the demanded service (Apache 2, Land, Worm, Neptune, Pat, Smurf, Teardrop, Back and Understory). User to Root Attack (U2R) In this type of attack, a normal user account is misused to gain access to the root account (buffer overflow).
Hybrid Intrusion Detection System Using Machine Learning Algorithm
337
Remote to Local Attack (R2L) In this type of attack, the machine is delivered with a packet in the network to find the vulnerabilities so as to gain access to the user account (send mail, write password, guess named). Probes Attack In this type of attack, the network is scanned for misuse so as to collect information about the vulnerabilities in the network (Satan, Ipsweep, Nmap, Portsweep, Mscan, Saint).
4 Proposed System The overall system architecture of the proposed hybrid intrusion detection system (HIDS) is shown in Fig. 1. In this system, a combination of multiple ML techniques and FS techniques is used to achieve high-performing intrusion detection irrespective of the type of attacks. At first, the proposed system preprocesses the data retrieved from NSL-KDD dataset. Afterward, multiple feature selection algorithms are used to reduce the size of the dataset.
4.1 Data Preprocessing Remove unwanted observations from NSL-KDD dataset, including duplicate observations or inappropriate observations. Duplicate observations often occur during data collection. This can make the analysis more efficient and reduce distractions from primary goal and create a more manageable and more efficient dataset. Imbalanced
Fig. 1 Hybrid intrusion detection system
338
N. Maheswaran et al.
class involves the development of predictive models in classification datasets with severe class inequality. The challenge in working with unbalanced dataset is that most machine learning techniques tend to ignore the minority class and perform poorly, although the performance of the minority class is very important, by using SMOTE technique to get the balanced dataset to avoid class inequality in dataset. Algorithm 1: Synthetic Minority Oversampling Technique Begin • A minority class is set i.e., Set A, x A, The K nearest neighbor is determined by finding the Euclidian distance between x and every other distance in A. • The sample rate N is fixed on the basis of imbalanced proposition. Every N is randomly selected and the set A1 is constructed. • For every example, A1, the equation is found similarly. Equation 1 shows how to calculate clustering. x = x + random (0, 1) ∗ I x−xk I (1) End
4.2 Feature Selection In this phase, feature selection occurs for the detection of intrusions with the help of attribute ratio and feature selection in NSL-KDD dataset. This approach is followed in case of nominal variables too since it is encoded as binary variables only for attack selection using feature selection algorithm as shown in Fig. 2 and Fig. 3. One can have null values too here since the binary features tend to have Frequency (0) = 0. Such null values are replaced with 1000.0 (magic number). For NSL-KDD dataset, it is related only for ‘protocol type tcp’ variable. Vector assembler is used to combine a list of given columns into a single vector column. The vector index is then used to index the (binary) features. Indexing features allows algorithms to be manipulated properly to improve the performance. Algorithm 2: Vector Assembler Begin • When the instances fall under similar class, the leaf is also labeled under the same class. • The potential data is figured for every attribute whereas the data gain is considered from test on the attribute. • At the end, the optimum attribute is chosen based on the current selection parameter End
Hybrid Intrusion Detection System Using Machine Learning Algorithm
339
Fig. 2 Feature selection based on attack
Fig. 3 Four different type of attack
4.3 Feature Classification The primary idea is to group the data into clusters after which the clusters are trained using different random forest classifiers. Since the RF classifier returns with high probabilities, the chances for improving the detection rate also increases for new attack types by making changes in threshold values. During clustering process, only the numeric features are utilized since K-means do not have the provision to manage binary features. Clusters are categorized into two types between which the first category has clusters with normal as well as attack connections which are more than 25 in number. RF classifiers are applied for the first category.
340
N. Maheswaran et al.
On the other hand, the second category is inclusive of all other clusters which are mapped against either attack or normal type on the basis of majority. The whole set of clusters with connections are treated as ‘outliers’ and are mapped against attack type. This is because the test data might be from different distributions and are predicted to experience unexpected attack types. So, a probability threshold is adjusted at 0.01 for attack connections (0.99 for normal connections). In case of this approach, 98–99% detection rate is achieved with a false alarm rate of 14–15% approximately. This approach has a working mechanism, i.e., the data is clustered through Gaussian mixture and every cluster is trained using different RF classifiers. The results achieved out of both the approaches are integrated together so as to enhance the performance. Algorithm 3: K means+ Gaussian Mixture clustering Algorithm Begin • All the data elements are grouped under a cluster which is numbered between 1 and k. Here, k denotes the count of desired clusters. • The center of each cluster is located. • For every data element, the cluster center which is nearby the element is located. • Cluster center is recalculated after the allocation of novel elements. • Both 3rd and 4th steps are repeated until the clusters modify or reach a certain number. End
5 4. Adaptive Ensemble Model Ensemble approaches are utilized to enhance the detection rate. Figure 4 shows the best detection rate, i.e., 99.5–99.6% with a false alarm rate up to 16.1–16.6%. So, a total of 12,833 attack connections remain unrecognized which are unknown before too. Algorithm 4: Adaptive ensembler Begin • • • •
Predictions are made by several models. The predictions of each sample are considered as a ‘vote’. Integrate predictable class labels from each different model The model with the highest vote value used as the final estimate.
End
Hybrid Intrusion Detection System Using Machine Learning Algorithm
341
Fig. 4 Comparison of other system with proposed system
6 The Evaluation of the Performance Test Result 6.1 Data Preprocessing The data preprocessing process which performs the dataset conversion from unbalanced to balanced form using synthetic minority oversampling technique is shown in Fig. 5.
Fig. 5 Before and after applying SMOTE technique
342
N. Maheswaran et al.
Fig. 6 Principal component analysis for changes of basis on data
6.2 Feature Selection Random forest creates several decision trees (DT) and integrates the DTs together to achieve a highly accurate and consistent forecasting. There are many end trees in different subgroups of a given dataset, and it takes an average to improve the predictive accuracy of that dataset by using PCA show in Fig. 6.
6.3 Feature Classification K-means algorithm is a functional algorithm that attempts to divide a set of data into K pre-defined distinguished non-interconnected subgroups (clusters), in which every data points corresponds to the same group as portrayed in Fig. 7. At first, this section deals with the assessment criteria utilized for performance measurement. In order to validate the proposed system for its performance, the authors conducted performance evaluation and the outcomes were compared and contrasted against the literature. In Fig. 8, the results for Gaussian mixture clustering are shown.
6.4 Adaptive Ensemble Model Ensemble model is not only a classifier but also a set of classifiers known as a group of classifiers and then incorporating their predictions for classification using certain types of voting shown in Fig. 9.
7 The Evaluation Criteria The commonly used assessment criteria are discussed in this section. As per the literature, it is used in the evaluation of IDS effectiveness. The current section deals with the assessment criteria for performance evaluation.
Hybrid Intrusion Detection System Using Machine Learning Algorithm
Fig. 7 K-means data classification
Fig. 8 Gaussian mixture clustering attack and normal
343
344
N. Maheswaran et al.
Fig. 9 Adaptive ensemble model
TP (True Positive): The number of samples per intrusion class in the dataset and the intrusion class is accurately estimated. TN (True Negative): The count of samples categorized under normal class and appropriately predict under normal class in the dataset. FN (False Negative): The number of samples in the intrusive class of the dataset and predicted inappropriately under the normal class. FP (False Positive): The count of samples under normal class and incorrectly predicted in the intrusion class in the dataset. The following is an explanation of the rating criteria calculated using these values after obtaining the confusion matrix explained in Figure 10: Accuracy: Accuracy is calculated as a ratio of whole count of samples against the count of samples correctly classified. Equation 2 shows how to calculate Accuracy. Accuracy = (T P + T N )/((T P + T N + F P + F N ))
(2)
Detection Rate (DR): The true positive value and the ratio of penetrations to the total number of samples are used to calculate the DR. Equation 3 shows how to calculate DR. D R = T P/((T P + F P))
(3)
Hybrid Intrusion Detection System Using Machine Learning Algorithm
345
Fig. 10 Comparison of TN, FN, TP and FP
True Positive Rate (TPR): TPR is calculated as the ratio of correctly identified samples to the total number of samples in a particular class. Equation 4 shows how to calculate TPR. T P R = T P/((T P + F N ))
(4)
False Positive Rate (FPR): FPR is the ratio of the percentage of samples that were incorrectly assigned to a class to the total number of samples in that class. Equation 5 shows how to calculate FPR. F P R = ((F P))/((T N + F P))
(5)
8 Conclusion and Evaluation Results In current study, based on the type of attack, the researchers developed a hybrid layer intrusion detection system with the aid of several machine learning approaches. The current study has a valuable contribution toward the research community, i.e., a system has been developed to detect the attacks with high performance rates on most of the performance criteria. Further, it also achieved low error rate than the studies
346
N. Maheswaran et al.
conducted earlier. Figure 10 shows the feature numbers of the dataset such as TN, TP, FP and FN for the proposed feature selection method. From the dataset used, the proposed system achieved successful results. When the study results were compared with that of the earlier studies’ results, the proposed system was found to be superior to previous systems, in terms of attack detection for all the types of attacks.
References 1. Yadav N, Pande S, Khamparia A, Gupta D (2022) Intrusion detection system on IoT with 5G network using deep learning. Internet Things Multimedia Commun Syst. https://doi.org/10. 1155/2022/9304689 2. Pande S, Khamparia A, Gupta D, Thanh DNH (2020) DDOS Detection using machine learning technique. In: Studies in computational intelligence. Springer 3. Injadat M, Moubayed A, Nassif AB, Shami A (2021) Multi-stage optimized machine learning framework for network intrusion detection. IEEE Trans Netw Serv Manag 18:1803–1816 4. Govindaraj L, Sundan B, Thangasamy A (2021) An intrusion detection and prevention system for DDoS attacks using a 2-player bayesian game theoretic approach. In: 4th International conference on computing and communications technologies, pp 319–324 5. Avalappampatty Sivasamy A, Sundan B (2015) A dynamic intrusion detection system based on multivariate Hotelling’s T2 statistics approach for network environments. Sci World J 6. Poongodi M, Bose S (2015) A novel intrusion detection system based on trust evaluation to defend against DDoS attack in MANET. Arab J Sci Eng 40(12) 7. Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. ICML 99:258–326 8. Ulemale T (2022) Review on detection of DDOS attack using machine learning. Int J Res Appl Sci Eng Technol 10(3) 9. Online KDD-NSL Dataset (2009). http://nsl.cs.unb.ca/NSL-KDD/. Accessed Jul 2018 10. Online The KDD CUP 1999 Data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kdd cup99.html. Accessed Jul 2018
A Comprehensive Review of CNN-Based Sign Language Translation System Seema
and Priti Singla
Abstract One of the most crucial tools for connecting with others is communication. Effective communication skills can smooth our path and improve our interactions with people in our daily lives by allowing us to understand and be understood by others. Many deaf and mute people rely on sign languages as their primary mode of communication. Recent research in sign language translation systems (SLTS) has yielded impressive results. The aim of the paper is to study the existing translation mechanism of sign language. The review starts with the classification of sign language systems and contemplates country-wise sign languages, different data sets used for the development of the sign language translation system, the architecture of convolution neural network (CNN)-based models, and their performances. It is intended that this study will serve as a road map for future research and knowledge development in the field of sign language recognition as well as translation system in the field of CNN. Keywords Sign language recognition · Convolution neural network · Sign language translation
1 Introduction Human beings use communication to send messages, and communicate their feelings, ideas, and facts to other people. The deaf society finds it challenging to fully comprehend oral communications. Sign language came into existence to aid the growth of deaf society [1]. Simple tasks such as accessing mobile, reading newspapers, etc. are quite challenging for deaf-mute people. However, if the material is displayed to Seema (B) · P. Singla Department of Computer Science and Engineering, Baba Mastnath University, Rohtak, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_31
347
348
Seema and P. Singla
them in sign language, they will be able to comprehend and acquire knowledge. Sign language has been developed and matured organically in deaf and mute people like any other spoken language. The syntax and semantics of sign language vary from one area to another, with one common attribute that it is illustrated. Today, there are no well-defined standards for sign language (SL), and thus to communicate with people across the globe, one has to learn the native sign language. In this regard, to support the deaf-mute people, several nations have come up with applications that translate sign language to their regional language and vice versa. Sign language processing is an emerging research area that encompasses pattern matching, speech recognition, computer vision, image processing, natural language processing, and semantics. As per Fig. 1, we can classify sign language systems (SLS) into the following categories: Sign Language Validation—These types of systems take input in the form of sign language gestures and validates gestures in the output. Sign Language Generation (SLG)—These types of systems take text as input and generate corresponding sign language gestures. The gestures can be one-handed or two-handed. These gestures have the following variations. Static Gestures—It deals with gestures that are fixed in nature after being recorded such as stored images.
Fig. 1 Types of sign language system
A Comprehensive Review of CNN-Based Sign Language Translation . . .
349
Manual Features—It uses various attributes related to hand (shape, orientation, location, movement, etc.) are considered while generating gestures for sign language. It is used in the static as well as the dynamic approach of sign language interpretation. Non-manual Features—It uses other gestures for communication apart from hands such as movements of eyebrow, cheeks, shoulder, and mouth gestures, body posture, and face expressions. It is used in the static as well as the dynamic approach of sign language interpretation. Dynamic Gestures—It deals with a real-world or continuously changing sequence of sign language gestures that keep on changing such as the sequence of images in recorded videos or real-world videos. Symmetric sign—It is also known as Type 0 sign. It uses both hands for the generation of signs either concurrently or features such as the shape of the hand, movement of the hand, and location of hand must be the same for both. Asymmetric Sign—It is also known as Type 1 sign. It uses both the hands for generation for a sign but the leading hand does the key role of gesture and the passive hand acts as a helper. Sign Language Translation (SLT)—These are also known as sign language recognition (SLR) system. These types of systems take input as sign language gestures in their respective language and translate them into text. The text can be of the form alphabet, alphanumeric, word, sentence, or hybrid (combination of these). Sign language gestures can be made with a single hand (one-handed) or by both hands (two-handed). These gestures are represented through images or videos. SLR system consists of following steps—Input data of sign language gestures is collected through data acquisition. Data can be acquired through already available open access data set repositories or it can be self-made (SM). In SM-data set various input devices such as camera, web camera, mobile phone, and Kinect are used to acquire speech/images/videos for data set [2]. Data preprocessing is an important step to improve the quality of data set by resizing, standardising, amplification, etc. Segmentation is done for easier processing of input data includes conversion of colour model, background change, edge detection, etc. The process of translating raw data into numerical features that can be processed while keeping the information in the original data set is known as feature extraction. Feature extraction is most crucial step in SLR system and includes principal component analysis (PCA) [3], leap motion [4], scale invariant feature technique (SIFT) [5], linear discriminant analysis (LDA) [6], etc. Hidden Markov model (HMM) [7], artificial neural network (ANN), knearest neighbour (kNN) [8], support vector machine (SVM) [9], convolution neural network (CNN), transfer learning [10], recurrent neural network (RNN), etc. are used as classifiers (Fig. 2).
2 Background The foundation of CNN was laid by Yann LeCun in 1988. CNN began to acquire popularity in 2012 as a result of AlexNet [11]. It is intelligent and successful model
350
Seema and P. Singla
Fig. 2 Steps of sign language recognition [3]
for automatic image processing by extraction of spatial features. The goal of CNN is to use relatively high convolutions to acquire the attributes present in the input image. It is effective in recognising different people, medical images, faces, street signs, and other visual data elements. The architecture of CNN is explained through Fig. 3. If a neural network contains only one completely connected layer, it is said to be shallow CNN, while convolution layers, pooling layers, and fully connected layers constitute a deep CNN [12, 13]. Convolution layer, pooling layer, and fully connected layers are three basic layers for any standard CNN. The number of layers may vary depending on the application for which it is used. In addition to this, several other layers are used which we will discuss one by one. Convolution layer—It is based on standard convolution operation of mathematics and helps in feature extraction from the given input as shown in Fig. 4. It is core building block of CNN, as we can judge the importance of this layer by the fact that name of CNN is derived from convolution only. The input to convolution layer can be image or video. Videos are treated as sequence of images. The input image can be coloured/grey scale/black and white image. Any image of the form 32 × 32 × 3 represents width 32, height 32, and 3 coloured channels Red Green Blue (RGB). The kernel or filter is a matrix that traverses across the input data, performs a dot product with a sub-region of the input data, and outputs a matrix of dot products. The stride value moves the kernel on the input data. If the stride value is 1, the kernel advances the input matrix by one columns of pixels. In a nutshell, the kernel is used to extract high-level features from an image, such as edges [14–16]. Zero padding is a simple process of padding the border of the input and is an effective method to give further control as to the dimensionality of the output volumes. Convolution can be calculated using following formula ((V − R + 2Z ))/((S + 1)) [17] where V is input volume (height * width * depth), R is receptive field size, Z is amount of zero padding set, S is stride. Pooling
A Comprehensive Review of CNN-Based Sign Language Translation . . .
351
Fig. 3 Architecture of convolution neural network
Fig. 4 Convolution operation
Layer—Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The features contained in a region of the feature map generated by a convolution layer are summed up by the pooling layer. As a result, rather than precisely positioned features created by the convolution layer, following actions are conducted on summarised features. As a result, the model is more resistant to changes in the position of features in the input image. There are so many variations of pooling, such as max pooling, average pooling, min pooling, stochastic pooling, max pooling dropout, and S3 pooling which are used in CNN, some of the pooling techniques are depicted below with the help of Fig. 5. Fully Connected Layer—This layer receives a vector of numbers as input where each input is connected to every output, hence, the name fully connected. It is the last layer in the CNN process, generally after the last pooling layer. The output of this layer is N number of classes for image classification in vectored form. It contains maximum number of parameters for image classification. Since information flows from input to output, it is also called feed forward neural network [18].
352
Seema and P. Singla
Fig. 5 Different types of pooling [18]
3 Methodology This literature review is conducted by following PRISMA guidelines [19] as shown in Fig. 6. A systematic study of 120 research articles from various open access databases such as Pub-med, Google scholar, ACM, Science Direct, ArxiV, IEEE Xplore, and Research Gate has been performed for this review paper. Latest papers till March 2022 were included. Structured set of keywords were used such as sign language recognition, sign language translation, convolution neural network, transfer learning, and deep learning. As a result of the thousands of worthless results generated by this strategy, so the search was further limited by removing duplicates. In screening section, papers were scrutinised on the basis of the year of publishing, title, and abstract of research paper. For eligibility criteria, non-English, out of scope, case studies were removed from the existing results.
4 Results and Discussion We have reviewed 25 papers on the basis of four criteria represented through Tables 1, 2, 3, and 4. 14 Journal and 11 conference research articles were selected for this paper. Table 1 summarises literature review of sign language translation models (SLTM) based on CNN. It includes parameters such as whether the paper is from conference (C)/Journal (J), architectural details of CNN such as number of convolu-
A Comprehensive Review of CNN-Based Sign Language Translation . . .
353
Fig. 6 Flow chart of study selection according to PRISMA guidelines
tion layers (C.L), number of filters at input level, input size of image/video, pooling technique used, normalisation, activation function at inner and outer fully connected layer, regularisation and optimisation. Podder et al. [11] developed real-time alphabet and numeric classification model for Bangla sign language (BdSL) using three pretrained CNN models in transfer learning. Two sets of self-made publicly available data set were developed to evaluate the performance of the system. Ankita and Parteek [20] developed isolated sign language recognition model for alphabets, digits, and words using convolutional neural network (CNN) based on deep learning. This model uses two convolution layer, max pooling, ReLU activation function, dropout regularisation and various optimisers (SGD, RMSProp, Adam), and two fully connected layers. The performance is measured with precision, recall, and F1score on coloured and greyscale images. Razieh et al. [21] implemented a Persian sign language deep learning-based pipeline architecture for efficient automatic hand sign language recognition using single shot detector (SSD), 2D convolutional neural network (2DCNN) for 3D hand key points hand skeleton projection, hand patches, and heatmap, 3D convolutional neural network (3DCNN) for feature extraction, and long short-term memory (LSTM) from RGB input videos. Li et al. [22] developed real-time recognition of signer independent SLR model using tactical CNN and bidirectional long short-term memory (LSTM). Sensors were used to input gestures in the Myotac data set of 37,500 words available in public domain. The system attained an accuracy of 92.67. Malhotra and Bajaj [23] designed gesture detection model in ASL based on hand track points. The input to the system is hand gestures converted into track point coordinates using MediaPipe framework in Python. The coordinates are processed using
C/J
J
J
J
J
C
J
J
C
J
C
C
J
C
J
C
C
C
C
C
C
J
J
J
Refs.
[11]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[6]
[35]
[36]
[37]
[38]
[39]
[40]
3CL
2CL
5CL
6CL
6CL
6CL
3CL
3CL
10CL
4CL
6CL
2CL
27CL
8CL
3CL
2FC
3CL
7
6CL
25
9
Filters
16
64
32
32
32
32
64
32
32
32
32
64
16
16 (3 × 3)
Transfer learning
C.L
Table 1 SLTM based on CNN architecture
Stochastic Max Max Max Max Max Average Max Max Stochastic Max Max Max Max, GAP Stochastic Max Max
256 × 256 128 × 128 64 × 64 1080 × 1920 64 × 64 100 × 78 224 × 224 28 × 28 28 × 28 16 × 16 200 × 200 200 × 100 64 × 64 64 × 64 256 × 256 64 × 64 224 × 224
Max
17 × 80 Max
Average
224 × 224
227 × 227
Max
Global
331 × 331 128 × 128
Pooling
Input size
Batch
Batch
Batch
LCN
Batch
Batch
ZCN
Batch
Normalisation
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
LeakyReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
ReLU
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
Softmax
BLSTM
Softmax
Softmax
Outer
Activation function Inner
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Dropout
Regularisation
ADAM
NAG
SGD
AdaDELTA
ADAM
ADAM
Adam gradient descent
SGD
Adam
DiffGrad
SGD with momentum
Adam
AdaGrad
SGD
ADAM
Optimisation
354 Seema and P. Singla
2020
2022
2022
2022
2022
2022
2020
2011
2022
2019
2022
2019
2020
2018
2019
2019
2015
2019
2020
2021
2021
[23]
[24]
[25]
[26]
[27]
[28]
[41]
[30]
[31]
[22]
[32]
[33]
[34]
[6]
[35]
[36]
[37]
[38]
[39]
[40]
A
N
H
A+W
W
A
W
W
A
A
W
A
W
A
W
A
A
A
A
A
H(23A + 10N + 67W)
H
2022
[20]
H
2022
[11]
A/N/AN/W/S/H
Year
Refs.
Table 2 SLTM based on data set
N
N
N
Y
Y
N
Y
N
Y
Y
Y
N
N
Y
N
Y
N
N
N
N
N
Y
Y
Yes/No
30
10
30
29
20
24
50
200
24
24
30
26
7
24
20
26
26
26
29
24
100
87
87
Class
10
21
44
27
7
20
5
5
38
5
3
4
20
20
Total persons
8300
20,000
1320
87,000
6600
2500
10,000
60,000
27,455
47,445
37,500
52,000
7700
95,697
6000
104,000
6500
62,400
261,000
2880
35,000
2300
132,061
Samples
Akash
CLAP14
Signet
MNIST
MNIST
MyoTac
Mudra
BdSLHD2300
BdSL-D1500
Data set name
I I V I I V I I I, V I, V
28 × 28 128 × 128 200 × 200 200 × 100 64 × 64 200 × 200 256 × 256 64 × 64 640 × 480
I
I
V
I
I
I
V
I
I
I
I
Image/Video
28 × 28
Sensor
224 × 224
100 × 78
1080 × 1920
Resolution
30
30
60
40
50
Fps
A Comprehensive Review of CNN-Based Sign Language Translation . . . 355
356
Seema and P. Singla
Table 3 SLTM based on countries S. No. Country 1 2 3 4 5 6 7 8 9 10
Indian Persian Tactical American Turkish Thai Bangla Chinese Bhutan Spanish
References [6, 20, 25, 28, 31, 34–36] [21] [22] [23, 26, 27, 29, 32, 33, 37, 42] [24] [30] [11] [38] [39] [40]
various machine learning algorithms k-nearest neighbours, random forests, and a neural network on unprocessed data. Yirtici and Yurtkan [24] designed regional CNN model for recognition of Turkish sign language at alphabet level with transfer learning implemented through AlexNet. It uses max pooling, dropout layer, ReLU, SoftMax layers along with 2D convolution layers. Nandi et al. [25] implemented alphabet level static Indian sign language recognition technique based on fingerspelling method using stochastic pooling, batch normalisation, dropout, SoftMax, ReLU activation function, DiffGrad optimiser of convolution neural network. The model is compared with existing pooling (Max, Average), optimisers (SGD, RMSProp, Adam), architecture (Inception V3, ResNet 18, ResNet 50), and accuracy of the proposed architecture is measured by training and validation accuracy. Kumar et al. [26] developed alphabet level fingerspelling-based American sign language recognition using ReLU activation function, max pooling, dropout regularisation, SoftMax layer. The performance of proposed model is evaluated and compared using cross entropy loss function optimiser (SGD, Adam). Kasapba¸si et al. [27] developed real-time application for alphabet-based recognition of ASL using proposed 3 convolution layered CNN model. Variable conditions such as lighting and distance were taken into consideration while creating data set. Jayadeep et al. [28] designed Mudra, i.e. visionbased translation tool for recognition of dynamic ISL in the field of bank using CNN inception V3 and recurrent neural network (RNN)-based LSTM. The input data set consists of 1100 videos categorised into bank and everyday signs. Rajan and Selvi Rajendran [29] compared nine optimisers on finger spelling-based American sign language data set with and without augmentation using ASLNET model of CNN. Gedkhaw [30] designed word level SLR model for Thai sign language using CNN with Nvidia development kit with an accuracy of 0.9914 and loss value of 0.03537. Self-made data set was prepared for 7 words and a total of 7700 images. Intwala et al. [31] developed real-time Indian sign language translation model using transfer learning. MobileNet CNN was used as pretrained CNN model and self-made alphabet level data set was create with a total of 52,000 samples. The system achieved
A Comprehensive Review of CNN-Based Sign Language Translation . . . Table 4 SLTM based on performance Refs. Static/Dynamic Technique [11]
[20]
Static
[21]
Dynamic
[22]
Static
[23] [24]
Static Static
[25]
Static
[26] [27] [28]
Static Static Dynamic
[29]
Transfer learning with 3 pretrained CNN (ResNet18, MobileNet_V2, EfficientNet_B1) CNN
Epoch
357
Performance 99.9
20
Pipeline 20,000 architecture (SSD + 2D CNN + 3D CNN + LSTM + Heatmap) Tactical CNN + BidirectionalLSTM Neural network 28 Region based 10 CNN CNN 10 26 100
Static
CNN CNN CNN (Inception V3) + LSTM CNN
[30] [31]
Static Static
CNN with kit Transfer learning
50
[32] [33] [34] [6]
Static Static Video Static
3301 50 100 50
[35] [36] [37]
Static Video Static
CNN CNN CNN CNN with hybrid SIFT implementation CNN CNN with ANN CNN
[38] [39] [40]
Static Real time Static
CNN CNN CNN
30
50 30 34 50
T.A 99.17, V.A 98.8 99.8
92.67
90.95 99.7P T.A 99.7, V.A 99.64 98 99.38 85 With aug = 98.733, Without aug = 97.174 0.9914 87.69 (Real-time accuracy) 96 (T.A) 99.7 97.62 92.78
98.64 95.68 Alphabet (99.9), Digit (99.2) 89.32 97.62 96.42
358
Seema and P. Singla
87.69 accuracy in real time and 96 in testing accuracy. Ahuja et al. [32] designed real-time s alphabet level American sign language recognition model for open access data set of MNIST to achieve an accuracy of 99.7. It only covers 24 static alphabets in the data set with total samples of 47,445. Mehedi Hasan et al. [33] proposed deep CNN-based alphabet level recognition of ASL with accuracy of 97.62. It uses standard open access MNIST data set for training and testing of the model. A selfie-based continuous ISL recognition model was developed by Anantha Rao et al. [34] using CNN with recognition rate of 92.88. Self-made data set of 200 words was developed to train and test the system. Dudhal et al. [6] designed ISL recognition system with hybrid SIFT as feature extractor in CNN classifier. AdaDelta optimiser was used in this model to achieve the accuracy of 92.78. Also, word level data set of 10,000 samples were made with 50 classes and 20 signers. Ahuja et al. [32] introduced signer independent ISL translation model for 24 alphabets using open access data set (Signet) and stochastic gradient descent optimiser (SGD) in CNN-based model. The model achieved validation accuracy of 98.64. Pigou et al. [36] developed sign language recognition model for open access Italian sign language data set with local contrast normalisation (LCN) in hybrid model of CNN and artificial neural network (ANN). The open access CLAP14 data set consists of 6600 videos of 27 signers and 20 classes of word. The system achieved final score of 0.788804. Moklesur Rahman et al. [37] developed ASL recognition system for alphanumeric data taken from publicly available data sets to evaluate recognition accuracy of 99.99 8 layered CNN-based Chinese SLR model was developed by Jiang et al. [38], in which gingerspelling approach-based data set is created with 1320 images of 30 classes. The system attained accuracy of 89.32. Wangchuk et al. [39] proposed CNN-based Bhutanese SLR system in real mode with precision, recall, accuracy, F1-score of 98. The system works only for numeric data. Self-made data set of 10 classes was created with 20,000 samples in image and video mode. Martinez-Martin and Morillas-Espejo [40] developed model for translation of Spanish sign language using CNN and recurrent neural network (RNN). Alphabet level self-made data set of 30 classes with 10 signers and 8300 images was used to train and test the model. The system achieved accuracy of 96.42. Table 2 briefs SLTM based on data set. It includes parameters such as year of publication, recognition level of data set A (Alphabet)/N (Numeric)/AN (Alphanumeric)/W (Word)/S (Sentence)/H (Hybrid), whether it is publicly available Yes/No, number of classes, number of persons used for data set creation, total number of samples in data set, name of data set (if any), resolution of image/video in data set, type of data set Image/Video, frames per second (fps) in case of videos. Table 3 summarises SLTM based on countries. We have covered sign languages of 10 countries while focussing on Indian sign language (ISL) as mostly research articles are taken from ISL. Table 4 summarises SLTM based on performance. It includes technique, input modality (static/dynamic), number of epoch used, and performance as parameters. It has been observed that majority of work has been done in static recognition of sign language gestures using CNN. As we can see that all the research articles use Softmax as outer layer for classification of output and major articles uses Rectified Linear Unit (ReLU). The key benefit of employing the ReLU function over other
A Comprehensive Review of CNN-Based Sign Language Translation . . .
359
activation functions is that it does not simultaneously stimulate all of the neurons but it will lead to vanishing gradient problem. Another very famous pooling technique used in most articles is max pooling, which is recently overcome by stochastic pooling. Only non-negative activations were used in this approach, while negative activations were inhibited to zero. Lot of researches has been done on finetuning of the model using optimisers and DiffGrad optimiser has shown better performance. A lot of data sets have been created in different countries but there is need of standard benchmark open access data sets. As data set available in public mode has certain constraints such as background, lightening, distance, number of persons involved, number of classes and total number of samples.
5 Conclusion and Future Scope A comprehensive analysis of sign language in general and Indian sign language in particular has been done in this article. The review has been done on well-defined criteria including: accuracy, data set, country, technique, and architecture of CNN. The state-of-the-art PRISMA approach has been used for the review purpose. It has been observed that CNN is most popular choice for static and alphabet level sign language recognition and translation across the world with varying some number of convolution layers, filters, and input size. While transfer learning and RNN are used mostly in dynamic sign language gesture recognition and translation.Further, issues in the field of SLTS include bridging the gap between the existing lexicon of sign language and natural language, efficiently translating word and sentence-level gestures, a lack of standard open access data set for SLTS integrating manual and nonmanual features, the need for universal sign language and a two-way communication model for deaf-mute people, and the need for universal sign language and a two-way communication model for deaf-mute people, all of which provide areas of interest for future researchers.
References 1. Sharma S, Singh S (2021) Recognition of Indian sign language (ISL) using deep learning model. Wirel Pers Commun 123:671–692. https://doi.org/10.1007/s11277-021-09152-1 2. Suharjito RA, Wiryana F, Ariesta MC, Kusuma GP (2017) Sign language recognition application systems for deaf-mute people: a review based on input-process-output. Procedia Comput Sci 116:441–448 3. Ardiansyah A, Hitoyoshi B, Halim M, Hanafiah N, Wibisurya A (2021) Systematic literature review: American sign language translator. Procedia Comput Sci 179:541–549 4. Sawant SN, Kumbhar MS (2014) Real time sign language recognition using PCA. In: 2014 IEEE international conference on advanced communications, control and computing technologies, Ramanathapuram, India, May 2014. IEEE, pp 1412–1415
360
Seema and P. Singla
5. Chuan C-H, Regina E, Guardino C (2014) American sign language recognition using leap motion sensor. In: 2014 13th international conference on machine learning and applications, Detroit, MI, Dec 2014. IEEE, pp 541–544 6. Dudhal A, Mathkar H, Jain A, Kadam O, Shirole M (2019) Hybrid SIFT feature extraction approach for Indian sign language recognition system based on CNN. In: Pandian D, Fernando X, Baig Z, Shi F (eds) Proceedings of the international conference on ISMAC in computational vision and bio-engineering 2018 (ISMAC-CVB), vol 30. Lecture notes in computational vision and biomechanics. Springer International Publishing, Cham, pp 727–738 7. AlQattan D, Sepulveda F (2017) Towards sign language recognition using EEG-based motor imagery brain computer interface. In: 2017 5th international winter conference on braincomputer interface (BCI), Gangwon Province, South Korea, Jan 2017. IEEE, pp 5–8 8. Guo D, Zhou W, Li H, Wang M (2018) Online early-late fusion based on adaptive HMM for sign language recognition. ACM Trans Multimedia Comput Commun Appl 14(1):1–18 9. Al Rashid Agha RA, Sefer MN, Fattah P (2018) A comprehensive study on sign languages recognition systems using (SVM, KNN, CNN and ANN). In: Proceedings of the first international conference on data science, E-learning and information systems, Madrid, Spain, Oct 2018. ACM, pp 1–6 10. Imran A, Razzaq A, Baig IA, Hussain A, Shahid S, Rehman T (2021) Dataset of Pakistan sign language and automatic recognition of hand configuration of Urdu alphabet through machine learning. Data Brief 36:107021 11. Podder KK, Chowdhury MEH, Tahir AM, Mahbub ZB, Khandakar A, Shafayet Hossain Md, Kadir MA (2022) Bangla sign language (BdSL) alphabets and numerals classification using a deep learning model. Sensors 22(2):574 12. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25 13. Sitender, Bawa S (2021) Sansunl: a Sanskrit to UNL enconverter system. IETE J Res 67(1):117– 128 14. Bawa S et al (2020) Sanskrit to universal networking language enconverter system based on deep learning and context-free grammar. Multimedia Syst 1–17 15. Bawa S, Kumar M et al (2021) A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J Ambient Intell Humanized Comput 1–34 16. Bawa S et al (2021) A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach. Neural Comput Appl 33(7):2819–2838 17. Ba J, Caruana R (2014) Do deep nets really need to be deep? In Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., USA 18. Nirthika R, Manivannan S, Ramanan A, Wang R (2022) Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study. Neural Comput Appl 34(7):5321–5347 19. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 339(1):b2700 20. Wadhawan A, Kumar P (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 32(12):7957–7968 21. Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336 22. Li H, Zhang Y, Cao Q (2022) MyoTac: real-time recognition of tactical sign language based on lightweight deep neural network. Wirel Commun Mobile Comput 2022:1–17 23. Malhotra P, Bajaj Y (2022) International conference on innovative computing and communications proceedings of ICICC 2021, vol 1, OCLC: 1282251679 24. Yirtici T, Yurtkan K (2022) Regional-CNN-based enhanced Turkish sign language recognition. SIViP 16:1305–1311. https://doi.org/10.1007/s11760-021-02082-2
A Comprehensive Review of CNN-Based Sign Language Translation . . .
361
25. Nandi U, Ghorai A, Singh MM, Changdar C, Bhakta S, Pal RK (2022) Indian sign language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4 26. Kumar A, Kumar S, Singh S, Jha V (2022) Sign language recognition using convolutional neural network. In: Fong S, Dey N, Joshi A (eds) ICT analysis and applications, vol 314. Lecture notes in networks and systems. Springer, Singapore, pp 915–922 27. Kasapba¸si A, Elbushra AEA, Al-Hardanee O, Yilmaz A (2022) DeepASLR: a CNN based human computer interface for American sign language recognition for hearing-impaired individuals. Comput Methods Programs Biomed 2:100048 28. Jayadeep G, Vishnupriya NV, Venugopal V, Vishnu S, Geetha M (2020) Mudra: convolutional neural network based Indian sign language translator for banks. In: 2020 4th international conference on intelligent computing and control systems (ICICCS), Madurai, India, May 2020. IEEE, pp 1228–1232 29. Rajan RG, Selvi Rajendran P (2022) Comparative study of optimization algorithm in deep CNN-based model for sign language recognition. In Smys S, Bestak R, Palanisamy R, Kotuliak I (eds) Computer networks and inventive communication technologies, vol 75. Lecture notes on data engineering and communications technologies. Springer, Singapore, pp 463–471 30. Gedkhaw E (2022) The performance of Thai sign language recognition with 2D convolutional neural network based on NVIDIA Jetson nano developer kit. TEM J 411–419 31. Intwala N, Banerjee A, Meenakshi, Gala N (2019) Indian sign language converter using convolutional neural networks. In: 2019 IEEE 5th international conference for convergence in technology (I2CT), Bombay, India, Mar 2019. IEEE, pp 1–5 32. Ahuja R, Jain D, Sachdeva D, Garg A, Rajput C (2019) Convolutional neural network based American sign language static hand gesture recognition. Int J Ambient Comput Intell 10(3):60– 73 33. Mehedi Hasan Md, Srizon AY, Sayeed A, Al Mehedi Hasan Md (2020) Classification of sign language characters by applying a deep convolutional neural network. In: 2020 2nd international conference on advanced information and communication technology (ICAICT), Dhaka, Bangladesh, Nov 2020. IEEE, pp 434–438 34. Anantha Rao G, Syamala K, Kishore PVV, Sastry ASCS (2018) Deep convolutional neural networks for sign language recognition. In: 2018 conference on signal processing and communication engineering systems (SPACES), Vijayawada, Jan 2018. IEEE, pp 194–197 35. Sruthi CJ, Lijiya A (2019) Signet: a deep learning based Indian sign language recognition system. In: 2019 international conference on communication and signal processing (ICCSP), Chennai, India, Apr 2019. IEEE, pp 0596–0600 36. Pigou L, Dieleman S, Kindermans P-J, Schrauwen B (2015) Sign language recognition using convolutional neural networks. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops, vol 8925. Lecture notes in computer science. Springer International Publishing, Cham, pp 572–578 37. Moklesur Rahman Md, Shafiqul Islam Md, Hafizur Rahman Md, Sassi R, Rivolta MW, Aktaruzzaman Md (2019) A new benchmark on American sign language recognition using convolutional neural network. In: 2019 international conference on sustainable technologies for industry 4.0 (STI), Dhaka, Bangladesh, Dec 2019. IEEE, pp 1–6 38. Jiang X, Lu M, Wang S-H (2020) An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of Chinese sign language. Multimedia Tools Appl 79(21–22):15697–15715 39. Wangchuk K, Riyamongkol P, Waranusast R (2021) Real-time Bhutanese sign language digits recognition system using convolutional neural network. ICT Express 7(2):215–220 40. Martinez-Martin E, Morillas-Espejo F (2021) Deep learning techniques for Spanish sign language interpretation. Comput Intell Neurosci 2021:1–10 41. Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), Barcelona, Spain, Nov 2011. IEEE, pp 1114–1119
362
Seema and P. Singla
42. Varghese RM, Siddharth S, Biju J, Dutta S, Aggarwal A, Vaegae NK (2021) Sign language recognition using convolutional neural networks. In: Choudhury S, Gowri R, Paul BS, Do D-T (eds) Intelligent communication, control and devices, vol 1341. Advances in intelligent systems and computing. Springer, Singapore, pp 415–425
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization Datacenter Though Controlled VM Placement J. Manikandan
and Uppalapati SriLaskhmi
Abstract Virtualization is the tool to offer data center resources to remote users. Virtualization brings higher resource utilization by sharing large physical resource to multiple users in form of virtual machines. The advantages of virtualization are overshadowed by various attacks like hyper jacking, intrusion, data thefts, etc. Colocation is the security loop hole most adopted by attackers to launch such attacks. This work proposes a hidden Markov model (HMM)-assisted proactive vulnerability mitigation mechanism by effective control of VM placements to defend against co-location attacks. The mechanism monitors VM/user behavior continuously and classifies the behavior of VM into security risk labels. Based on the risk label, VM placement is adapted to reduce the probability of vulnerability. Keywords Virtualization · Co-location attack · Hidden Markov model
1 Introduction Virtual data center is an infrastructure that allows sharing physical resources of datacenter to any remote users in form of virtual machines (VM) [1]. A host of virtualization software’s at data center enables virtualization. The most important components facilitating virtualization in data center are shown in Fig. 1. Hypervisor is the important component which creates VM of varied capabilities and hosts it on the physical machines [2]. The VMs look like a physical machine to the user with the capabilities they have requested. The VMs can be accessed anywhere from internet. The user can also scale up or scale down the capabilities of VM on demand. This on demand resource adaptability, cost savings in infrastructure and maintenance makes virtualization an attractive value proportion for enterprise and retail user [3]. Enterprises are rapidly adopting virtualization. For data center service providers, the probability is in sharing the available resources to maximum number of users and J. Manikandan (B) · U. SriLaskhmi Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, Andhra Pradesh 522213, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_32
363
364
J. Manikandan and U. SriLaskhmi
increasing the utilization. With the goal to maximize the profit, the VM placement algorithms try to co-locate as much as VM in same physical host. Though co-location increases system utilization and profitability for the data center service providers, it exposes the VM to various security vulnerabilities in presence of malicious VM. Some of virtualization specific internal and external attacks are listed in Table 1. Most the internal attacks and triggers for external attacks are due to co-location of malicious VM with other VM in same host. Though many works have been proposed
Fig. 1 Virtualization components
Table 1 Virtualization attacks Attacks
Description
Internal attacks
Communication tampering between VMs, host-based VM monitoring, communication side channel monitoring, etc
External attacks
Attack on hypervisor or VMs, VM and hypervisor tampering from external sources, botnets, injection of attack code, etc
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
365
to defend against co-location attacks (discussed in survey section), they have many issues like reduction of resource utilization by limiting the users, in-effective user categorization, no dynamic monitoring of user and his VMs to detect attack at finer granularity. This work proposes a proactive vulnerability mitigation scheme using HMM-assisted security risk labeling of VM and controlling the VM placement in such way to reduce the probability of malicious co-locations. The behavior of VM owner and VMs is monitored continuously to extract various features. These features are used to train the HMM to provide security labels. The PMs are grouped in three categories of secure, in-secure and undecided. VM placement over PM categories is adapted based on the security label. Following are the novel contributions of this work. (i) A novel security risk labeling algorithm for VM based on owner and VM behavior. User and his VM behavior are monitored continuously. Features are extracted from the events and classified to security risk by machine learning algorithm. From sequence of security risk, the user’s malicious behavior is detected. (ii) A novel multi-objective VM placement with constraints of minimizing malicious VM co–location and thereby proactively mitigating security vulnerabilities. The multi-objective VM placement is without any limitations on user per PM; thereby, host utilization is increased. The rest of the paper is organized as follows. Section II presents the survey on mitigation of malicious VM co-location attacks in existing works. Section III presents the proposed HMM-assisted proactive vulnerability mitigation in virtualization datacenter. Section IV presents the results and comparison to existing works. Section V presents the concluding remarks and scope for future work.
2 Related Works Liang et al. [4] attempted to mitigate co-location attack using VM grouping-based placement strategy. VM placement in host is done in unpredictable way, such that probability of co-location with a target VM cannot be learnt by malicious VM. But the attacker can launch multiple VM spread across a time interval to learn the whereabouts of target VM. The approach did not consider co-location learning of malicious VM during migration stage. Agarwal et al. [5] adopted a strategy called previously co-located users first to mitigate co-location attacks. Users are categorized to two types: new user and already known user. VM of already known users is not co-located with VMs of new users. By this way, only VMs of users with proven reputation are co-located together. But the method cannot prevent selective attacks. Qiu et al. [6] proposed a deployment strategy based on two metrics of: VM coresidency coverage probability and user co-residence coverage probability to reduce the chances of same set of VM getting co-resided. Also, the number of users whose VM are co-located in same host is controlled; thereby, it becomes easy to spot the
366
J. Manikandan and U. SriLaskhmi
malicious users. But the data center utilization is degraded by larger factor in this method. Berrima et al. [7] prevent the probability of co-location of same of set of VM by queuing the incoming requests and making assignment in complete random order, so that chances of malicious VM co-locating with target VM are reduced. But this scheme works only for the case of malicious VM launched in small time difference to target VM. Natu et al. [8] attempted to mitigate VM co-location attack by placing the VM according to their trust profile. User defines the placement policies in terms of truest of users against who co-location can be done. Trust computation in form of reputation score is computed for all the users in the datacenter. VM placement is based on the trust policy set by the users. It becomes cumbersome for the users to define the trust policy. The policy adherence reduces the data center utilization. Han et al. [9] attempted to mitigate co-location attacks through VM placement policy called previously selected servers first (PSSF). In this policy, the number of users allocated to a host is limited, and the users are allocated to same host. In case of host unavailability, the hosts with least number of allocated VM are preferred. The solution underutilizes the datacenter. Aldawood et al. [10] attempted to mitigate co-location attacks using security aware VM allocation policy. VM allocation is solved as bin packing problem with constraint of minimizing the co-resident physical machines. Han et al. [11] proposed a co-location mitigation VM placement algorithm based on user categorization. Users are categorized to three types: low, medium and high. The categorization is based on the past behavior of the users. VM belonging to users of same category can only be co-located. By this way, the users with high risk profile were never allowed to co-locate. The risk profile is calculated based on user’s session active times and how they managed their VM’s over past. Dynamic behaviors like API’s used, memory regions accessed, etc. were never considered. Saxena et al. [12] proposed a multi-objective VM placement algorithm with security as one of the important constraints. The multi-objective optimization of VM placement is solved using a hybrid metaheuristics algorithm combining whale optimization with genetic algorithm. The security constraint placed during VM allocation was minimization of number of users allocated to a host. But limiting the users accessing the host alone was not sufficient to prevent co-location attacks. Chhabra et al. [13] attempted to mitigate co-location attacks by allowing only VM belonging to same user to a host. Also before allocating the VM to a physical machine, it is evaluated by intrusion detected system for any security risks. The solution is in secure against selective attacks. Long et al. [14] mitigated co-location attack using group-based strategy. The users are grouped. The grouped users are allocated on same host. Though the mechanism can detect perpetual attackers, it skips selective attackers. The grouping strategy proposed in this does not have any security constraints, and also, it does not consider resource utilization. From the survey, most of the works for preventing co-location attacks were found to reduce the data center utilization. Trust is not computed in long term, and trust computations are not based on temporal VM behavior.
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
367
3 Proactive Vulnerability The proposed HMM-based proactive vulnerability mitigation adopts co-location resistant VM placement. The placement decision is modeled as a multi-objective optimization problem of maximizing resource utilization of hosts, satisfying the VM capacity and minimizing the probability of co-location with malicious VM. The architecture of the proposed solution is given in Fig. 2. The user’ temporal characteristics in term of VM created, inactivated, etc. and the VM’s access patterns are collected. Features extracted from it are classified to three different types of events: malicious, not-malicious and undecided. The event sequences over a window length are passed to HMM to categorize the users into three categories of security risk. The physical machines (PM) in data center are split into three pools. The VM of user is moved to the pool based on his security risk. The VM’ request in each pool queue is allocated to PM’s in poll using a multi-objective placement optimization with particle swarm optimization algorithm. Based on load, the PM rebalancing across pool is done to maximize the data center utilization. The VMs are placed in host in such way its capacity in terms of CPU cycle is satisfied. It is expressed as VMi .C.CPU − Rq VM.C.CPU ≥ 0
(1)
The security score for the VM is calculated based on the past behavior of owner of VM and the VM dynamic characteristics. Following are the variables considered for modeling the security risks (Table 2). The total utilization of hosts in data center is calculated as sum of utilization of individual hosts as U=
|PM|
Ui
(2)
i=1
where the individual utilization (Ui ) of each PM is calculated as |R| | Rq VM| Ui =
q=1
i=1
Rq .VMi .C.CPU × Rq .VMi .B Ui. .CPU
(3)
where R is requests within an interval. Rq VM is the number of VM demands in a request Rq . Rq .VMi .C.CPU × Rq is the CPU cycles needed for Rq VM. Rq .VMi .B has two value of 0 or 1 depending on whether VM is placed in the server or not. A dataset D is created with N rows with each row being the vector of values V 1, V 2, V 3, V 4, V 5, V 6. The dataset is partitioned into K cluster with K value found using elbow method. The dataset is clustered to K cluster using modified K means clustering algorithm. Differing from usual Euclidean distance-based clustering, a density specific grouping metric is proposed.
368
J. Manikandan and U. SriLaskhmi
Modified K means clustering Events Database Labelled Clusters User’s temporal characteristics
Feature Extraction [V1,...V6]
KNN based Event Classifcaiton
Event sequences
HMM based user categorization
User VM
User VM Request Handler
Secure Pool Queue
In-secure Pool Queue
Undecided Pool Queue
PSO Placement
PSO Placement
PSO Placement
P M
P M PM
P M
P M
P M P M
Pool Rebalancing
Fig. 2 Proactive vulnerability mitigation
P M PM
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization … Table 2 Features
369
Variable Description V1
Number of VMs started in a time
V2
Number of frequent VMs
V3
Variance in active session times
V4
Ratio of inactive VM to total VM
V5
Ratio of access of system calls to total functional calls
V6
Ratio of average number of out of memory access to total access
The density metric between two points i, j is calculated as Di j = minpε Pi, j
p−1
L( pk , pk+1 )
(4)
k=1
The cluster centers (ck ) is calculated based on the density metric as min z =
x − d ( j) , x = (x1 , x2 , . . . xm ) ε R m
(5)
d j εSk
||.|| represents the 2-norm with xi calculated as xi =
n d ( j) 1 i / , i = 1, 2, ..m q q j ( j) j=1 j
d
(6)
εSk
where q j = x − d ( j)
(7)
Once the K clusters are created, they are manually labeled with malicious event (M), non-malicious event (N) or undecided (U) by the domain expert. For every VM, the variables V 1, V 2, V 3, V 4, V 5, V 6 are extracted and classified using KNN clustering algorithm to find the corresponding label as M, N or U . The past event sequences for the user are maintained, and from it, the security risk score for the user is calculated using hidden Markov model (HMM). HMM is used to predict the security risk score for a user based on his past event sequence. HMM is characterized by three units hidden states X = {x1 , x2 , x3 }, observations state Y = {y1 , y2 , y3 } and transition probabilities A = ai j = {P[qt+1 = x j |qt = x j ]} and emission probabilities B = bi j . HMM can be represented as λ = (π, A, B)
(8)
370
J. Manikandan and U. SriLaskhmi
Fig. 3 Events over window
Each entry in the state transition matrix A is the probability of transition from one state to another. Each entry in the emission matrix B provides the probability of observing event Yt referred as b j (Yt ). π is the initial state transition matrix. The observation symbols are the events in the system given as O1 = {e1 , e2 , e3 , . . . , en }. Events are provided as inputs to the HMM model and model transitions to malicious, not malicious or undecided. The sequence of events over a sliding window of length t is shown in Fig. 3. In Fig. 3, vulnerability occurs at point F in the time window. State transition occurs till state of absorption is reached. The value selected for time step decides the learning rate and accuracy. The default values in state transition matrix are 0.5. The model parameters and the transition sequence are learnt in the training stage. The optimal values for the parameters found by maximizing the likelihood of the sequence. Training of the HMM model is done using expectation maximization (EM) algorithm. The algorithm runs in iteration from starting random seed, till the optimal values for parameters are achieved. In training stage, effective representation of error sequences and failure state transitions are made. This work applies Baum-Welch algorithm and gradient descent to increase the effectiveness of the training. Through observing of probability of observation of two pairs of observations Ot , Ot+1 , the model parameters λ are learnt with EM algorithm. Viterbi algorithm assists in finding the optimal state sequence. The state sequence optimality is calculated in Viterbi algorithm as S = argmaxs P(S, O, λ)
(9)
where S is sequence of states. Viterbi algorithm run in steps. The optimal path with N state is found at step t and it is improvised at step t + 1. At the end of training, optimal state sequences and probabilities are established, and the trained model is used to decide attack (A), not attack (NA) or undecided (U). The physical machines (PM) are divided into three pools for secure, in-secure and undecided. The ratio of proportion is initially in ratio of attack, not attack and undecided users as decided by HMM. Later, they are balanced based on load in
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
371
category of attack, not attack and undecided users. There are three queues for pooling the incoming VM requests. First queue is for secure PM pool, second queue is for in-secure PM pool and third queue is for undecided. On arrival of VM requests, depending on the security risk label of the user, the requests are added to queue. For VM requests from not attack category of users, the requests are added to secure queue. The VM requests from attack category of users are added to in-secure pool. The VM requests from undecided category of users are added to undecided pool. The requests in secure queue are allocated only to PM in secure pool. The requests in in-secure queue are allocated only to PM in in-secure pool. The requests in undecided queue are allocated only to PM in undecided pool. From each queue, the requests in the queue are allocated to PM in its corresponding pool using particle swarm optimization (PSO) algorithm. PSO is nature inspired optimization algorithm based of foraging behavior of swarms. The method is quite popular due to its simplicity and versatility. Each particle (or candidate solution) initially has random values. The algorithm runs in iterations. At each iteration, the next position for particle is found based on its current position, current velocity (Vi ) and distance to its local best solution ( pbest ) and distance to globally best solution (gbest ). The position is updated for every iteration (t) as X i (t + 1) = X i (t) + Vi (t + 1) Vi (t + 1) = wVi (t) + c1 r1 ( pbesti (t) − X i (t)) + c2 r2 (gbesti (t) − X i (t))
(10) (11)
In the above equations, the constant c1 along with r1 controls the degree of influence of local best solution and the constant c2 along with r2 controls the degree of influence of globally best solution. Let m be the number of particles and n is the number of dimension space of the particle. Each PM is treated as particle each VM is denoted as a dimension t t element t , xi2 , . . . xin of the particle. The ith particle at iteration t is denoted as X it = xi1 where xitx ∈ (0, 1) with 1 indicating VM is placed on the PM and 0 indicating VM is not placed on the PM. Initially, random solution m particles are formed (m random solutions for VM placement). Fitness is calculated for each particle. Individual best position and global best position are adjusted with value of fitness function FF. The particle (X i (t + 1)) and speed of particle (Vi (t + 1)) are adjusted based on pbest and gbest . When PSO algorithm converges, meeting the termination criterion, the approximate solution for placement of VM onto corresponding PM satisfying the multiobjectives and constraints is got as result. When the PSO algorithm drops VMs to allocate with a certain loss percentage, pool rebalancing is triggered. The policy for rebalancing is different for different pool. For drops happening in secure pool, the rebalancing procedure is as follows. The idle PM in in-secure pool is moved to secure pool. In case there is no idle PM, the PM with least number of VM is selected and allocated to safe pool. If still a PM
372
J. Manikandan and U. SriLaskhmi
satisfying this condition cannot be found, an idle PM in undecided pool is selected and moved to secure pool. In absence of idle PM in the undecided pool, the PM with least number of VM is selected and allocated to safe pool. For drops happening in in-secure pool, no rebalancing is done. But for drops happening in undecided pool, idle PM or PM with least VM allocated is selected from unsafe pool and allocated to undecided pool. In failure of this case, idle PM is secure pool is moved to undecided pool.
4 Results Cloudsim environment was used for simulation of the proposed proactive vulnerability mitigation algorithm. The simulation was conducted against following configuration (Table 3). Performance was measured in terms of: VM co-location probability with malicious users VM, accuracy, sensitivity and specificity of attack detection and host utilization. The performance is compared against Aldawood et al. [10] VM placement algorithm, Saxena et al. [12] multi-objective VM placement algorithm and Long et al. [14] group-based VM placement algorithm. VM co-location probability with malicious user is measured varying the number of users, and the result is given in Table 4. The average VM co-location probability in proposed solution (Fig. 4) is at least 3.9% lower compared to Aldawood et al. 3.25% lower compared to Saxena et al. 3.4% compared to Long et al. This reduction in due to split of PM into three pools of Table 3 Simulation configuration Parameter
Values
Physical machine count
500
Configuration of host
20 GB RAM,100 GB disk space, 8 CPU cores
No of users
100–500
No of malicious users
20%
Table 4 VM co-location probability Users
Proposed
Aladwood et al
Saxena et al
Long et al
100
0.01
0.05
0.06
0.07
200
0.02
0.07
0.08
0.09
300
0.03
0.09
0.1
0.11
400
0.03
0.12
0.13
0.12
500
0.03
0.14
0.14
0.14
Average
0.024
0.094
0.102
0.106
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
373
Average VM co-locaon probability
0.15 0.1 0.05 0
Proposed
Aldawood et , Saxena et al al
Long et al
Fig. 4 Comparison of average co-location probability
secure, in-secure and undecided and allocation of users VM into these pools based on their security risk categorization. Though Long et al. proposed user grouping, their grouping is based only on session timeouts compared to six different attributesbased categorization in the proposed solution. This security risk categorization was not available Saxena et al. and Adlawood et al. instead they relied on randomness in allocation and reducing the number of users in a host. The inter arrival time of malicious user’s VM request is varied, and the VM co-location probability is measured. The results are given in Table 5. As the arrival time increases, the VM co-location probability increased in Aladwood et al. Saxena et al. and Long et al. But the stand deviation is very low in proposed solution compared to others. This is due to placement of VM based on user security risk in proposed solution. In the proposed solution, PM is split to three categories based on risk. User is also categorized based on security risk level and allocated to PM. Events of VM of user are continuously monitored, and risk level is found. Due to continuous grading of the user, the chances of placing the attacker VM to safe PM pool are not possible in proposed solution. It is possible in undecided pool, but even attack behavior is known, any VM of attack user is never co-located to safe PM pool. This way of PM, user categorization and dynamic event assessment has increased the security in the proposed solution. This has reduced the attack VM co-location probability in the proposed solution. Table 5 Co-location probability against inter arrival time Arrival time (s)
Proposed
Aladwood et al
Saxena et al
Long et al
30
0.01
0.05
0.07
0.08
60
0.027
0.08
0.10
0.09
90
0.031
0.10
0.14
0.12
120
0.032
0.15
0.16
0.14
150
0.034
0.18
0.19
0.16
Std. deviation
0.009
0.05
0.04
0.03
374
J. Manikandan and U. SriLaskhmi
Table 6 Average host utilization Users
Proposed
Aladwood et al
Saxena et al
Long et al
100
32
23
26
24
200
38
26
31
26
300
43
32
35
29
400
47
35
38
36
500
52
39
42
40
Average
42.4
31
34.4
31
Average host ulizaon % 50 40
Proposed
30 20
Aldawood et al
10
, Saxena et al Long et al
0 Proposed Aldawood , Saxena et Long et al et al al Fig. 5 Comparison of average host utilization
The average host utilization was measured by varying the number of users and the result is given in Table 6 (Fig. 5). The average host utilization in proposed solution is 26.8% higher compared to Aldawood et al., 34.4% higher compared to Saxena et al. and 26.87% higher compared to Long et al. The average host utilization has increased in proposed solution due to grouping PM into three pools and allocating VM in pool using PSO optimization. Aldawoods bin packing underutilized the host due constraint on colocation on each host. Saxena et al. used hybrid metaheuristics for VM placement but they had constraint on number of users allocated on host as result, host utilization reduced. Long et al. grouped users and allocated VM of groups to same host. But due to variations in group density, host utilization reduced. The effectiveness of the proposed solution in terms of attack detection is measured in terms of accuracy, sensitivity and specificity by varying the number of users and the result is given in Table 7. The proposed HMM-based detection with malicious event detection on six features is able to achieve an accuracy of 93%, sensitivity of 94% and specificity of 93.4%. These values for measured for varying event sequence window length and the performance are given in Table 8. The peak value of accuracy at 93.5% is achieved for an event sequence window length of 20 in the proposed solution. As the event length increases, more event
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
375
Table 7 Effectiveness of attack detection Users
Accuracy
Sensitivity
Specificity
100
93
94
93.4
200
93.2
95
94
300
93.5
95.1
93.8
400
93.1
95.2
93.7
500
93.3
95.1
93.7
Average
93
94
93.4
Table 8 Effectiveness of attack detection vs window length Event window sequence length
Accuracy
Sensitivity
Specificity
5
89
87
84
10
90.2
89
86
15
91
90
89
20
93.5
95.2
93.8
correlations can be learnt, and this has increased the accuracy of attack detection with higher window sequence length. The novelties in proposed solution compared to existing works which has contributed to better attack resilience in proposed solution are given in Table 9. The proposed solution features in terms of PM categorization, user labeling, no resource limitations, very few VM migrations, multi-objective optimization to maximize host utilization and attack detection based on joint user and his VM behavior have reduced the VM co-location probability in the proposed solution. The discussion of how proposed solution fared better than existing solutions is presented in Table 9.
5 Conclusion A proactive vulnerability mitigation scheme using HMM is proposed in this work. The users events are classified based on six dynamic features. Based on the sequence of events, the users are classified to three security risk categories. The physical hosts are pooled into three categories, and allocation user requests to a physical host pool are decided on their security risk categories, and within in pool, multi-objective optimization based on PSO is used to find the optimal VM placement policy. The proposed solution reduced the co-location probability by 3.25% and increased host utilization by 26.87% compared to existing works. Even with varied arrival patterns of malicious users, the proposed solution is able to provide lower VM co-location probability compared to existing works. Evaluating the solution against large user
376
J. Manikandan and U. SriLaskhmi
Table 9 Solutions comparison Features
Proposed
Aladwood et al
Saxena et al
Long et al
PM categorization
PM is categorized to three groups and allocation done
Not available
Not available
Not available
User labeling
User is labeled based on risk levels continuously by monitoring the VM events
Not available
Not available
Users are categorized but categorization has higher false positives. Also categorization is not dynamic
Resource limitations
There is no limitation due to which host utilization is increased
No limitation on users per host
Limits the number of users per host to reduced security risk. Due to which host utilization is reduced
Limits the number of users per host to reduced security risk. Due to which host utilization is reduced
VM migrations
Very few VM migrations
Frequent VM migrations
Few VM migrations
Few VM migrations
Multi-objective optimization
PSO-based multi-objective optimization without any constraints on limiting PM utilizaiton
Bin packing algorithm but limits the number of PM resulting is sub optimal placements
Whale optimization genetic algorithm is used but limits the number of users on PM resulting in sub optimal solution
Only optimization criteria is reduced the number of users per host. Has poor host utilization
Attack detection Does not depend Depend on timing Depend on timing on timing to of arrival of VM of arrival of VM detect attacks. to detect attack to detect attack Instead uses events of user and event of VM of user jointly to detect attack
Depend on timing of arrival of VM to detect attack
basis and extending the event classification model for higher accuracy is in scope of future work.
References 1. Bari MF et al (2013) Data center network virtualization: a survey. IEEE Commun Surv Tutorials
HMM-Assisted Proactive Vulnerability Mitigation in Virtualization …
377
15(2), pp 909–928(Second Quarter) 2. Alouane M, El Bakkali H (2016) Virtualization in cloud computing: NoHype vs HyperWall new approach. In: 2016 International conference on electrical and information technologies (ICEIT), pp 49–54. 3. Bari MF, Boutaba R, Esteves R, Granville LZ, Podlesny M, Rabbani MG, Zhang Qi, Zhani MF (2013) Data center network virtualization: a survey. IEEE Commun Surv Tutorials 15:909–928 4. Liang X, Gui X, Jian AN, Ren D (2017) Mitigating cloud co-resident attacks via grouping-based virtual machine placement strategy 1–8. https://doi.org/10.1109/PCCC.2017.8280448 5. Agarwal A, Binh Duong TN (2018) Co-Location resistant virtual machine placement in cloud data centers. In: IEEE 24th international conference on parallel and distributed systems (ICPADS) pp 61–68 6. Qiu Y, Shen Q, Luo Y, Li C, Wu Z (2017) A secure virtual machine deployment strategy to reduce co-residency in cloud. In: Trustcom/BigDataSE/ICESS, 2017. IEEE, pp 347–354 7. Berrima M, Nasr AK, Ben Rajeb N (2016) Co-location resistant strategy with full resources optimization. In: Proceedings of the 2016 ACM on cloud computing security workshop, pp 3–10 8. Natu V, Duong TN (2017) Secure virtual machine placement in infrastructure cloud services. In: 10th IEEE Conference on service-oriented computing and applications, pp 26–33 9. Han Y, Chan J, Alpcan T, Leckie C (2017) Using virtual machine allocation policies to defend against co-resident attacks in cloud computing. IEEE Trans Dependable Secure Comput 14(1):95–108 10. Aldawood M, Jhumka A, Fahmy SA (2021) Sit here: placing virtual machines securely in cloud environments. https://doi.org/10.5220/0010459202480259 11. Han Y, Alpcan T, Chan J, Leckie C, Rubinstein BI (2015) A game theoretical approach to defend against co-resident attacks in cloud computing: Preventing co-residence using semi-supervised learning. IEEE Trans Inf Forensics Secur 11(3):556–570 12. Saxena D, Gupta I, Kumar J, Singh AK, Wen X (2021) A secure and multi-objective virtual machine placement framework for cloud data centre. 13. Chhabra S, Singh AK (2020) A secure VM allocation scheme to preserve against co-resident threat. Int J Web Eng Technol 15(1):96–115 14. Long VD, Duong TN (2020) Group instance: flexible co-location resistant virtual machine placement in IaaS clouds. In: 2020 IEEE 29th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE), pp 64–69
ViDepBot: Assist People to Tackle Depression Due to COVID Using AI Techniques Jiss Joseph Thomas and D. Venkataraman
Abstract Taking care of one’s mental health properly is very important as we are trying to get past the effects caused by the COVID pandemic era, especially since the rate of COVID spread is still persistent. Many organizations, universities, and schools are continuing an online mode of learning or working from home situation to tackle the spreading of the coronavirus. Due to these situations, the user could be using electronic gadgets like laptops for long hours, often without breaks in between. This has eventually affected their mental health. The ‘ViDepBot’, Video-Depression-Bot aims in helping the user to maintain their mental health by detecting their depression level early, and taking appropriate actions by faculty/counselors, parents, and friends to help them to come back to normalcy and maintaining a strong mental life. In this work, a system is proposed to determine the depression level from both the facial emotions and chat texts by the user. The FER2013 dataset is trained using deep learning architecture VGG-16 base model with additional layers which acquired an accuracy of around 87% for classifying the live face emotions. Since people tend to post their feelings and thoughts (when feeling down, depressed, or even happy) on social media such as Twitter, the sentiment140 twitter dataset was taken and trained using the machine learning algorithm Bayes theorem which acquired an accuracy of around 80% for classifying the user input texts. The user is monitored through a webcam and the emotions are recognized live. The ViDepBot regularly chats with the user and takes feedback on the mental condition of the user by analyzing the chat texts received. The emotions and chat texts help to find the depression level of the user. After determining the depression level, the ViDepBot framework provides ideal recommendations to improve the user’s mood. This ViDepBot can be further developed to keep track of each student/subject person’s depression level, where they would be physically present in the classrooms, once the pandemic situation subsides.
J. J. Thomas · D. Venkataraman (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] J. J. Thomas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_33
379
380
J. J. Thomas and D. Venkataraman
Keywords COVID-19 · Coronavirus · Depression · Emotion · Deep learning · VGG-16 · MTCNN · Video monitoring · Chatbot
1 Introduction It is a commonly known fact that the coronavirus or the COVID-19 has changed the lives of people around the world. It affected the mental stability of the people in certain conditions when the governments implemented lockdown procedures. The lockdown situation has prevented people from having social contact or doing social/physical activities outside their homes. Companies, universities, and schools have implemented Work-From-Home (WFH) methods, where electronic gadgets like laptops or smartphones consume a major part of daily life. The hybrid work mode or complete work from home situation is continued in various regions. Spending hours of continuous online learning or working for students/employees has increased their concerns over the workload. Also, having limited access to medical and counseling services, etc. has affected their mental stability. This has eventually caused changes in their sleeping habits and patterns on having a timely diet, thus leading to evolving a depressive mentality, and sometimes even to suicidal thoughts [1, 2]. Various emotions like being happy, sad, angry, etc. could play an important role in expressing one’s feelings. These emotions could also be used for detecting different levels of depression [3]. In the online mode of learning, since there is no direct physical interaction between a student and teacher, it is difficult to understand whether the student is experiencing any stress or anxiety. This is also the same for an employee working from home. Proper detection of depression levels experienced by them at an early stage by monitoring their emotions would help in supporting them to achieve good mental health and recover from depression in both pandemic and post-pandemic situations. Even in offline situations like being in classrooms, finding the depression level, and giving attention to their mental health could help them to improve their lives. Since the depression level experienced by a person is most likely unknown to the outside world in the current hybrid mode of learning/work, the sources to identify whether one is experiencing the same could be through their social media posts or by monitoring them closely through video. With the help of advancements in the field of artificial intelligence, appropriate deep learning architectures could be applied, and the emotions expressed by a user can be detected from their facial expressions. Such architectures could classify emotions with the help of labeled facial images of people expressing different emotions. Retrieving the text contents that people tend to post on their social media pages such as Twitter whenever they are feeling down or depressed and training them with the latest machine learning algorithms may help in classifying the current mood of the user via input texts. The predicted emotion and classified sentiment text could be further used to determine the level of depression experienced by the user. A chatbot could be useful to release stress
ViDepBot: Assist People to Tackle Depression …
381
and improve work efficiency. Using the proposed ‘ViDepBot’ system, an attempt is made to identify the depression level and prevent the user from falling into a deep depression state.
2 Background Study From the real-time prediction done in the work [4], it could be noted that there is a chance of variation in the number of people affected by Covid-19 day by day. The work done in [5] has analyzed the spread of the Covid-19 virus across a specific district in India. They have performed time series forecasting for the prediction of infection rate and the recovery rate. Although it would be unclear when would the governments reinstate the regulations for the people, so that they could come back to their normal life where they had social interactions with each other, especially for the students going back to school or college. The study in [1] investigated the mental health of students during the COVID-19 pandemic in the US as a study survey. Depression and suicidal thoughts were shown by a large proportion of respondents compared to the studies conducted in non-pandemic time. Most of the students also had concerns over their health and the health of their loved ones, followed by the changes in sleeping habits, eating patterns, and depressive thoughts. The authors of the work [6] mention that recognition of emotions from facial expressions and body postures plays an important role in detecting depression and diagnosing mood disorders in potential patients. Emotions could be recognized from different areas like facial expressions, speeches [7], postures, social media texts, electroencephalogram (EEG) [8], etc. In facial expression recognition, the initial stage is acquiring the facial image data, in the form of images or videos. The work done in [9, 10] uses the FER2013 data set for the classification of facial emotions. The authors of the work done in [11] have examined language usage and its relation to the psychological characteristics of the subject person. They have focused on detecting depression using analysis of texts from the interview transcripts. The key concepts used in [12] are the recognition of emotions expressed by a college student and the prediction of depression levels to help overcome the psychological problems, using image processing through computer vision and support vector machines (SVM) algorithm. The work done in [13] is related to the recognition of emotions from facial expressions using the VGG-16 architecture model that produced a 54% test accuracy, and the custom model they have used produced around 69% test accuracy.
3 Proposed System In the proposed ‘ViDepBot’ system, referring to Fig. 1, the image data ‘FER2013’ [14] is taken into consideration for the classification of facial emotions expressed.
382
J. J. Thomas and D. Venkataraman
Fig. 1 Proposed system architecture
Fig. 2 Convolutional neural network architecture for face emotion classification
Duplicate values are checked and removed as part of the preprocessing of the image data FER2013 to prevent causing bias to the model. The pixel values present in each row are then converted to their corresponding images. According to the label from the raw file, the data is split into train, test, and validation sets. The data for training is then fed into the VGG-16 model architecture [15, 16] with some additional layers such as flatten layer, dense layer, (as shown in Fig. 2) are used to classify the emotions. The data is validated using the validation data, after the model training. The face region present in the live video collected from the user’s side is recognized using the MTCNN model [17] architecture after going through preprocessing techniques such as greyscale conversion, resizing. It is then fed into the trained model for predicting the emotions expressed by the user in real-time. The ‘happy’ emotion is categorized as a positive emotion while the emotions, ‘anger’, ‘sad’, ‘fear’, and ‘disgust’ are
ViDepBot: Assist People to Tackle Depression … Table 1 Category and weights assigned for the face emotions Emotion Category Happy Neutral Surprise Angry Disgust Fear Sad
383
Weight (E w )
Positive Neutral
+1.5 −0.05
Negative
−0.95
categorized as negative emotions. The ‘neutral’ and ‘surprise’ fall into the neutral emotions category as they could be associated with both the positive and negative emotions. Specific weights are given to the categorized emotions, to calculate the depression coefficient as shown in Table 1. The depression coefficient from face emotions is calculated as the sum of weights of the emotions present in each frame, divided by the total number of times the face region is detected in the video as given in Eq. 1. The Depression coefficient from the video is given as: Nv Ew Dvc = i=1 i (1) Nv where Dvc is the Depression Coefficient obtained from video, E wi the emotional weight in ith frame, Nv the total number of frames, where the face is detected. For classifying the input text to the ‘ViDepBot’ system from the user, the ‘sentiment140’ dataset [18] is used. It is preprocessed using data cleaning processes such as removing duplicate rows, expanding contractions, removing user mentions (words starting with ‘@’), links (words starting with ‘http’, ‘https’, ‘www’, etc.), and special characters such as #, $, %, etc. Tokenization and lemmatizing are also done as part of the text preprocessing, but without removing the stop words. The processed data is then split into train and test data. Train data is then fit and transformed using ‘TfidfVectorizer’, the term frequency-inverse document frequency (TF-IDF) vectorizer to extract the features. It is then trained using Baye’s theorem classifier. Baye’s Theorem is used for the problem of sentiment classification from the text dataset. It is determined by calculating the conditional probability of the depression class as given in Eq. 2; P(sentiment data | depression) · P(depression) , P(sentiment data) (2) where P represents the corresponding probability. A chatbot (similar to [19]) is implemented in the ‘ViDepBot’ framework for receiving input texts from the user. The text input from the user can be in the form P(depression | sentiment data) =
384
J. J. Thomas and D. Venkataraman
Table 2 Sentiment and corresponding weights assigned Sentiment Weight (Sw ) Positive Negative
+0.75 −0.95
where they used to post on social media. The live face emotion recognition is performed initially and following that, the user can chat with the bot about their feelings or day experience. The corresponding depression levels are calculated at the end of each session and finally combined to measure the overall depression level. The Depression coefficient from the text is calculated as: Nt Dtc =
i=1
Swi
Nt
(3)
where Dtc is the Depression Coefficient obtained from the input text, Swi the corresponding sentimental weight (refer to Table 2) in ith text, Nt the total number of times sentiments were analyzed from the user input text. After determining the respective depression coefficients from video and text using Eqs. 1 and 3, the average of them is found and multiplied by −100 to convert them into corresponding depression levels. The calculation of the depression level DL is simplified and given in Eq. 4. The Level of depression found in the subject (user) is given as: DL = −50(Dvc + Dtc )
(4)
where DL is the calculated Depression Level. The depression level (DL) of the student could be further classified into five levels: ‘no depression’, ‘low depression’, ‘moderate depression’, ‘high depression’, and ‘severe depression’. If the depression level value is less than 40, the student could be categorized as having no depression present in him/her and it could be depicted that their mental health is good and activities such as playing music could be suggested to enlighten their mood. If the value of depression level is between 40 and 60, the depression level detected in him/her is low, and suggestions such as listening to music, playing some simple and engaging games, talking to their friends, could be suggested. If the depression level value is between 60 and 80, the categorized depression level could be moderate, suggestions like talking to their parents and friends could be suggested so that they could get some comfort from the stress or overload situation and improve their mental health. When the value of depression level is between 80 and 90, the probability of depression experienced by the student is high, they must be suggested to talk to their parents and concerned faculty or mentor regarding the stress they are experiencing
ViDepBot: Assist People to Tackle Depression …
385
Fig. 3 Sample output from the prediction of emotion through live video
so that they could be able to provide support in academics too. If the value is greater than 90, then it means that the student is experiencing a higher possibility of a negative mood and must be considered a serious case of depression state (severe). In such cases, it is highly important to suggest letting the parents, faculty, and a professional counselor know about the situation and provide the student with their immediate attention so that any activities which may lead to life-threatening situations could be avoided. Figure 4, shows sample screenshots of the bot, where the user had communicated their feelings as input texts, and the recommendations are given by the bot, based on the data from [20].
4 Results and Analysis This section discusses the results achieved through implementing the proposed system. Referring to Fig. 3a, the predicted emotion is ‘sad’, and the actual emotion is ‘sad’ as well. In Fig. 3b, the actual emotion expressed is a ‘surprise’ emotion and the model was able to predict it correctly. From Fig. 3c, it could be noted that the actual emotion expressed was ‘happy’, whereas the model predicted the emotion as ‘sad’. This is a sample case of misclassification given by the model. In Fig. 3d, the model predicted the emotion as ‘neutral’, even though the actual emotion could be ‘angry’. If a larger number of sample outputs are to be considered, the accuracy rate in the prediction of the emotion expressed by the user could vary. From the short sample video testing which captured around 29 frames, the emotional weight sum produced by the system was −23.3 and the Dvc was −0.8. From the text input from the user where the user had given input text 6 times, the sentimental weight sum calculated by the system was −0.8 and the Dtc was −0.13. It is observed that the system was able to detect the presence of depression present in the text input. Hence the system categorized the depression level, DL with a value of 46.5 as a ‘low’ level. From the calculated depression level, it could be inferred that most of the emotions expressed by the user during the sample input were negative emotions. The system could work more effectively when the depression level is calculated with a longer period of live face emotion recognition and the chat text input from the user regularly. The standard and commonly used performance metrics such as accuracy, loss, precision, recall, and F1-Score are used for evaluating the performance of the models.
386
J. J. Thomas and D. Venkataraman
Fig. 4 Sample outputs from the chatbot with the user’s text input
Accuracy is a general metric that defines how the model has performed across all the classes in the given data. When the accuracy is greater, it could be depicted that the overall performance of the model was better. The accuracy score usually ranges from 0 to 1. It gives the total number of correct predictions given by the model to the number of predictions performed. The overall accuracy for the training and validation set on the VGG-16 model with additional layers were 87.76% and 87.58% respectively. Adding extra layers in the CNN might also help to improve the model’s accuracy but one should be aware to prevent over-fitting. The deep learning model used here for the face emotions has given more accuracy when compared to the accuracy achieved by the model used in [12] and the framework is predicting the emotions in real-time, apart from capturing a video and processing it later. The accuracy of Baye’s theorem implementation was 79.92%. Loss is the number that gives an idea about how bad the predictions of the model were. The lesser the loss number, the better the model performance. It is defined as the average of the squared differences between the predicted and actual values. The training loss was 1.42 and the validation loss was 1.44 for facial emotion recognition. The precision value gives the number of positive class predictions that belong to the positive class. The value of precision becomes greater when the predicted emotions belong to the actual class. The value of precision was 74.83% for the image data and 80% for the textual data. The recall denotes the number of correct predictions made in positive class from all the positive samples given in the dataset. The value of recall was 21.63% for FER2013 which is a low value and the Bayes theorem achieved 79.5% for sentiment140. Since the recall value for the facial emotion recognition model is low, the ability to predict or capture negative emotions is greater, when compared to predicting the positive ones. This makes the overall emotion negative every time. To
ViDepBot: Assist People to Tackle Depression …
387
Table 3 Model performance for face emotion classification Performance metric Value Training accuracy Validation accuracy Training loss Validation loss Precision Recall F1-score
87.76% 87.58% 1.42 1.44 74.83% 21.63% 33.55%
Table 4 Model performance for text emotion classification Performance metric Value (%) Accuracy Precision Recall F1-score
79.92 80.0 79.5 80.0
balance this out, more weightage is to be given to positive emotions. However, in the case of text sentiments, both positive and negative sentiments are detected almost fairly, leading to an overall outcome being a neutral sentiment, when the number of sentiments detected are equal and if the weightage assigned are same for both positive and negative cases. Since the motive of the work is to find the negative feelings of the user that is associated with depression, more weightage must be given to negative text sentiments (with a weightage of −0.95), and lesser weightage to positive emotion (with a weightage of +0.75) allowing them to be detected even at low levels so that the person can be saved from falling into a depression state. F1-score is calculated by finding the harmonic mean of precision and recall. The F1-score obtained was 33.55% and 80% for the image and text data respectively. The performance values are denoted in Tables 3 and 4.
5 Conclusion and Future Work Maintaining mental stability is a crucial task in our daily life, especially when the coronavirus has affected people’s lives and they are advised to be at their homes. This work is aimed at helping users to boost their mental health during and after the COVID-19 era. In the proposed ‘ViDepBot’ system, the live video captures the facial emotions expressed by the user and the negative ones are captured easily, which plays a vital role in determining the depression level. The ‘ViDepBot’ could be able to detect the depression more precisely when the accuracy of the deep learning
388
J. J. Thomas and D. Venkataraman
model VGG-16 with additional layers and the machine learning model is improved further. A chatbot is also deployed to take input from the user where the user’s mood is further analyzed from the chat texts and recommendations are given based on it. Appropriate suggestions could improve the mood of the user. Most of the time, the user would be sitting idle in front of electronic gadgets such as laptops or smartphones. By monitoring the activities on users’ social media and time spent in front of the gadgets, suggestions can be provided for maintaining good physical health and mental stability. For example, getting sunlight in the early morning or late evening can increase Vitamin-D levels in the body. Furthermore, activities such as maintaining a 20-20-20 vision principle, simple exercises like sit-ups, pull-ups, pushups, squat exercises, walking, etc. could be suggested. When the situation switches to an offline mode such as classrooms, the depression level of multiple subject users can be found by recognizing each of them. This ViDepBot could also be developed to keep track of each student/subject person’s depression level, where they would be physically present in the classrooms, once the pandemic situation subsides too. The future scope of the ViDepBot system is to consider these additional factors and merge them into the current work.
References 1. Wang X, Hegde S, Son C, Keller B, Smith A, Sasangohar F (2020) Investigating college students’ mental health during the COVID-19 pandemic: an online survey study (preprint). J Med Internet Res 22. https://doi.org/10.2196/22817 2. Ilieva G, Yankova T, Klisarova-Belcheva S, Ivanova S (2021) Effects of COVID-19 pandemic on university students’ learning. Information 12:163. https://doi.org/10.3390/info12040163 3. Cheng X, Wang X, Ouyang T, Feng Z (2020) Advances in emotion recognition: link to depressive disorder. Neurol Ment Disord. https://doi.org/10.5772/intechopen.92019 4. Kiran SR, Kumar P (2021) Real-time statistics and visualization of the impact of COVID-19 in India with future prediction using deep learning. Adv Intell Syst Comput 1393:717–731. https://doi.org/10.1007/978-981-16-2712-5_56 5. Vennela GS, Kumar P (2021) Covid-19 pandemic spread as growth factor using forecasting and SIR models. J Phys: Conf Ser 1767:012014. https://doi.org/10.1088/1742-6596/1767/1/ 012014 6. Vaishya R, Javaid M, Khan IH, Haleem A (2020) Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr: Clin Res Rev 14:337–339. https://doi.org/10. 1016/j.dsx.2020.04.012 7. Kalpana Chowdary M, Jude Hemanth D (2021) Deep learning approach for speech emotion recognition. Data Anal Manag 367–376. https://doi.org/10.1007/978-981-15-8335-3_29 8. Malathi M, Aloy Anuja Mary G, Senthil Kumar J, Sinthia P, Nalini M (2022) An estimation of PCA feature extraction in EEG-based emotion prediction with support vector machines. Proc Data Anal Manag 651–664. https://doi.org/10.1007/978-981-16-6289-8_53 9. Gautam KS, Thangavel SK (2019) Video analytics-based facial emotion recognition system for smart buildings. Int J Comput Appl 1–10. https://doi.org/10.1080/1206212x.2019.1642438 10. Arun Kumar K, Koushik M, Senthil Kumar T (2021) Human annotation and emotion recognition for counseling system with cloud environment using deep learning. In: Peter J, Fernandes S, Alavi A (eds) Intelligence in big data technologies-beyond the hype. Advances in intelligent systems and computing, vol 1167. Springer, Singapore. https://doi.org/10.1007/978-981-155285-4_3
ViDepBot: Assist People to Tackle Depression …
389
11. Kalyan S, Ravishankar H, Arunkumar C (2021) Distress-level detection using deep learning and transfer learning methods. Smart Comput Tech Appl 225:407–414. https://doi.org/10. 1007/978-981-16-0878-0_40 12. Namboodiri SP, Venkataraman D (2019) A computer vision based image processing system for depression detection among students for counseling. Indones J Electr Eng Comput Sci 14:503. https://doi.org/10.11591/ijeecs.v14.i1.pp503-512 13. Siam SC, Faisal A, Mahrab N, Haque AB, Suvon MdNI (2021) Automated student review system with computer vision and convolutional neural network. In: 2021 international conference on computing, communication, and intelligent systems (ICCCIS). https://doi.org/10. 1109/icccis51004.2021.9397164 14. Goodfellow IJ et al (2013) Challenges in representation learning: a report on three machine learning contests. Neural Inf Process 8828:117–124. https://doi.org/10.1007/978-3-64242051-1_16 15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556v6 16. Thakur R (2019) Step by step VGG16 implementation in Keras for beginners. Medium, 06 Aug 2019. https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-forbeginners-a833c686ae6c 17. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23:1499–1503. https://doi.org/10. 1109/lsp.2016.2603342 18. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report. Stanford 1(12):2009 19. Seelam T (2022) AI-chatbot-GUI-using-tkinter. GitHub. https://github.com/Trinadhreddy1184/ AI-chatbot-GUI-using-tkinter 20. Melinda (2019) Coping with depression. HelpGuide.org. https://www.helpguide.org/articles/ depression/coping-with-depression.htm
A Review on Prevalence of Worldwide COPD Situation Akansha Singh, Nupur Prakash, and Anurag Jain
Abstract COPD is the third deadliest disease globally causing millions of deaths every year. It has always been seen as a self-inflicted disease; hence, not much importance is given to it, despite being recognized on a global scale. Millions of people have COPD without even realizing it. The aims of this systematic review are twofold: (i) to determine the prevalence of COPD globally based on its association with other respiratory diseases, different causes, and income status and (ii) to review the previous AI models designed for diagnosing COPD. A systematic review was conducted for published articles from 2020 to 2021. The key search term includes “DL in COPD,” “COPD in LMICs,” “air pollution,” “respiratory disease,” etc. The results showed that most of the COPD cases occur in air-polluted and low-middle-income countries (LMICs). Also, the overlapping of COPD and other similar diseases based on similar symptoms have caused the misdiagnosis of COPD. Although there are methods available such as spirometry for diagnosing COPD, but it is very expensive. Also, the lack of diagnosing tools, radiology experts, and non-awareness about COPD in LMICs is leading to an increase in COPD cases. In order to avoid the use of expensive tools for diagnosing COPD, there is need for a non-invasive system that can predict COPD in an efficient and timely manner. However, only a few artificial intelligence (AI)based studies were conducted regarding the detection of COPD because of lack of much data available on COPD publicly. Studies performed on respiratory diseases have used some popular open datasets, but they rarely contained any information on COPD. Majority of studies were based on ML rather than DL because of small size data. As the data is increasing day by day, it would be better to use DL models which can provide quick and robust results. Keywords COPD · Artificial Intelligence · Deep Learning · Air Pollution
A. Singh (B) · N. Prakash · A. Jain USCIT, GGSIPU, New Delhi, Delhi, India e-mail: [email protected] A. Jain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_34
391
392
A. Singh et al.
1 Introduction COPD is a group of chronic lung diseases that causes airway inflammation and makes it difficult for a person to breathe. According to a WHO report [1], it comes in the third spot on the list of the top ten deadliest diseases in the world. Despite getting recognition on a global scale and being ranked as the third deadliest disease globally, COPD is neglected in various parts of the world. One of the major reasons is that COPD is seen as a self-inflicted disease which is caused by smoking. However, there are also other several factors contributing to the development of COPD such as air pollution. The Global Initiative for Chronic Obstructive Pulmonary Disease (GOLD) report has defined COPD as a preventable and treatable disease if it is detected at an early stage [2]. It cannot be cured completely, but the progression of symptoms can be reduced to a certain percentage. Morbidity and mortality caused by COPD vary from country to country. In terms of income status, 90% of COPD deaths occur in low-middle-income countries (LMIC) [3]. Although the major cause of COPD is tobacco smoking, it can also be caused by various irritants like dust, chemical fumes, secondhand smoke, etc. There is also a genetic condition known as Alpha-1antitrypsin deficiency which can also cause this condition. For the early diagnosis, artificial intelligence can greatly help radiologists, especially secondary care clinicians who have little experience in identifying a respiratory disease. Artificial intelligence (AI) is a branch of computer science concerned with developing machines that can reason and act like a human. Machine learning (ML) and deep learning (DL) are subsets of artificial intelligence. ML uses statistical and mathematical models to generate algorithms that can help machines to learn without explicit programming and improve themselves from the experience. ML has been used in the healthcare system for a very long time. But in recent years, DL, especially convolution neural network (CNN) has shown tremendous growth in the field of image analysis, whether it is a CT scan, chest X-ray, or MRI. With these techniques available, it will be a lot easier to diagnose such chronic diseases at an early stage, especially COPD. The objectives of this study are: • to determine the prevalence of COPD globally based on its association with other respiratory diseases, causes, and income status • to review the previous AI-based models designed for diagnosing COPD. This review will help researchers to understand the importance and severity of COPD and to develop better models that can correctly predict COPD. The paper is further divided into the following sections: Sect. 2 describes methods used in the study to select the relevant parameters for research. Section 3 describes COPD and other respiratory diseases. Section 4 discusses the techniques and dataset used in various researches done in the literature. The limitations of the study are listed in Sect. 5. Section 6 concludes the paper.
A Review on Prevalence of Worldwide COPD Situation
393
2 Materials and Methods Search strategy and selection criteria A systematic search was conducted on the databases Google Scholar and PubMed. The search keywords used for searching relative published articles are “COPD,” “respiratory disease,” “air pollution”, “prediction of COPD using ML,” “DL in COPD,” “ML in COPD”, etc., combined with Boolean operators “OR,” and “AND”. The search was refined by selecting the year of publication from 2020 to 2021. The studies that showed the relationship between COPD and other respiratory diseases, morbidity, and mortality due to COPD, and COPD prevalence in countries with different air quality statuses and income statuses were explored. Studies based on exposure to air pollution and smoking were considered. All the papers published in languages other than English were excluded. Studies conducted on animals were also excluded. Studies based on diagnosing COPD using methods other than DL and ML were excluded. Original research articles, editorials, letters, and observational research conducted on human participants were included in this review. It includes all the studies conducted on humans of all age groups, people living in low-middle-income countries (LMICs), high-middle-income countries (HMICs), low-income countries (LICs), and high-income countries (HICs), and air-polluted cities. In this systematic review, apart from tobacco exposure, exposure to both indoor and outdoor pollution was also considered. Study selection and data abstraction All the studies from the initial search went through title/abstract screening and then full-text screening. Editorials, reviews, letters, and original articles were included. The full-text screening was then performed on the selected studies. Data is extracted from the studies which went through full-text screening. The following information is extracted from the screened papers: type of exposure (air pollution, smoke), study design (study based on detecting COPD using ML/DL methods), study setting (urban/rural areas), statistical approach, type of participants (all age groups, male and female), types of disease (COPD), study period (2020–2021), and health outcome (mortality and morbidity from COPD). The tools used in the study This study has used QGIS 3.18 tool and Miro tool [4]. QGIS tool has been used for generating maps, showing the income status of different countries and the quality of air in different countries. The data for the concentration of PM2.5 in different countries were collected from the world air quality report 2020 [5]. The Miro tool is used to show the link between similar respiratory disease through a Euler diagram.
394
A. Singh et al.
3 COPD and Similar Diseases COPD consists of two diseases: emphysema and chronic bronchitis. Chronic bronchitis is also a chronic lung condition that majorly damages the airways also known as bronchi. It destroys the cilia (tiny hair) present inside the bronchi which helps in removing the mucus. This condition helps in the accumulation of mucus on the lining of the lungs. This irritates the lungs and removing the mucus cause us to cough. Emphysema is a chronic lung condition that causes an irreversible enlargement of the alveoli and ruptures its walls. They also damage the septum between the air sacs. Normally, the air sacs are elastic. When a person breathes in, the air travels through the airway and enters the air sacs causing them to fill with air. These air sacs then help oxygen to get diffuse to the bloodstream. When a person breathes out, the air sacs deflate. With this condition, it becomes difficult to breathe air in and out of the lungs as the air sacs lose their elasticity, also the accumulated mucus makes it difficult for a person to breathe. Some of the symptoms of COPD are shortness of breath, coughing a lot of mucus, wheezing, and chest tightness. COPD is permanent and progressive. It stays over a lifetime. Although emphysema and chronic bronchitis come under the umbrella term “COPD,” they are not synonymous. A person could have only emphysema or simply chronic bronchitis. There are four categories of COPD which are measured by forced expiratory volume (FEV1 ). It is the amount of air the lungs displace during forced expiration. In the first stage, the forced expiratory volume is above 80% and in the final stage, this value comes lower than 30%. The four stages defined in the GOLD study (2019) are shown in Table 1. There are some other respiratory diseases which are similar to COPD based on the symptoms. These diseases can be categorized based on the area of the lungs that they affect. They are classified into different categories, that is, lung diseases affecting (a) the air sacs, (b) airways, and (c) interstitium. The classification of these similar diseases is shown in Fig. 1. Pneumonia, tuberculosis, and emphysema are Table 1 COPD stages defined by GOLD [2], FEV1 stands for forced expiratory volume to check the amount of air lungs displace during forced expiration COPD stages
Status
Value
GOLD 1
Mild
FEV1 ≥ 80%
GOLD 2
Moderate
50% ≤ FEV1 < 80%
GOLD 3
Severe
30% ≤ FEV1 < 50%
GOLD 4
Very severe
FEV1 < 30%
A Review on Prevalence of Worldwide COPD Situation
395
generalized to the diseases which affect the air sacs (alveoli). Pneumonia fills the alveoli with pus causing hindrance in the diffusion of the oxygen. When a person breathes in the Mycobacterium Tuberculosis bacteria, it settles into the lungs and multiplies, thereby causing the person to cough up blood. Emphysema ruptures the air sacs resulting in the formation of a big air pocket instead of several small air sacs. This damaged portion traps a large portion of oxygen inside which further reduces the lung surface area. This prevents the oxygen to get diffuse into the blood vessels, hence causing difficulty in breathing. Idiopathic Pulmonary Fibrosis (IPF) is a serious chronic lung condition that affects the interstitium. It damages the tissue lining the air sacs or alveoli. The scarring does not allow the alveoli to expand as they should be, hence making it difficult to breathe. Asthma, bronchiectasis, and bronchitis are generalized to the diseases which affects the airways. Asthma is a chronic (long-term) lung disease that causes inflammation in the airway and makes them much narrower leading to breathing
Fig. 1 Classification of respiratory diseases based on the area of the lungs that they affect
396
A. Singh et al.
Fig. 2 Euler diagram showing the common and distinguishable symptoms among the similar respiratory diseases
difficulty. It is a chronic lung disease that causes the thickening of the bronchi and hence damages them. Link between COPD and other similar respiratory diseases This section describes the link between different respiratory diseases based on either similarity of symptoms or in the form of comorbidity through a Euler diagram as shown in Fig. 2. Although all lung diseases have their symptoms, there are certain lung diseases whose symptoms might overlap. It also happens that during the occurrence of a certain disease some comorbidities might exist along with that disease. When two or more diseases exist at the same time in the body, they are known as comorbidities. For example, if a person has COPD, he might have pneumonia, bronchiectasis, or it could be different types of disease like coronary artery disease. COPD is one of the top 5 deadliest diseases in the world. The detection of COPD at an early stage is very important. It might be the case that a person has pneumonia and other respiratory diseases, but along with it some other comorbidity like COPD is also progressing in the patient. At that time, it is necessary to diagnose this disease.
A Review on Prevalence of Worldwide COPD Situation
397
4 Analysis and Result Prevalence of COPD worldwide based on income status World Bank [6] defines the income classification of countries in four categories: low income, low-middle income, high-middle income, and high income. This whole income classification among countries is shown in Fig. 3. In terms of the country’s income status, it is estimated that 90% of COPD deaths occur in LMIC. One of the reasons is that in most of the High income countries (HICs), the air quality is quite good, also the healthcare infrastructure is impressive as compared to the LMICs. WHO has estimated the number of COPD deaths in low-income countries (LICs), LMICs, high-middle-income countries (HMICs), and high-income countries (HICs). According to WHO, LICs have very few COPD deaths. The top 10 deadliest diseases in LICs do not include COPD. However, COPD is one of the top 5 diseases in LMICs, HMICs, and HICs. One of the main reasons is that most of the specialists available in urban areas have expertise in performing spirometry and pulmonary function test. In LMICs, nearly 3 billion people rely on biomass fuel, especially, in rural areas people.
Fig. 3 Estimation of income status worldwide using QGIS
398
A. Singh et al.
In LMICs, 25–45% of COPD patients are non-smokers. The therapy for COPD is inhaler-based, which is neither available nor affordable in LMICs. There should be proper guidelines on the prevention and treatment of COPD. Prevalence of COPD worldwide based on air pollution status Air pollution is one of the major health issues affecting people all over the world. Globally, 7 million people die every year from air pollution out of which 3.8 million deaths are because of indoor pollution and 4.2 million are because of ambient air pollution [7]. The air quality status of different countries is shown in Fig. 4. 19% of the total 7 million deaths are associated with COPD. COPD prevalence, mortality, morbidity, and hospital admissions vary across different countries. According to WHO, the air quality of Ghana is quite unsafe as it does not meet the air quality guidelines of WHO; the concentration of PM2.5 is 49.47 µg/m3 . According to China Pulmonary Health (CPH), the prevalence of COPD in individuals. Over 40 years old was 13.7%. In the Chengdu region of China [8] the major air pollutants causing more COPD admissions are PM2.5 and PM10 in both long-term and short-term exposure. It is observed that short-term exposure to PM10 , SO2 , and
Fig. 4 Air quality status worldwide based on WHO defined PM2.5 levels using QGIS
A Review on Prevalence of Worldwide COPD Situation
399
NO2 causes a high mortality rate from COPD. The daily average concentrations of PM2.5 , PM10 , SO2 , NO2 , and O3 were 59.03, 90.48, 12.91, 48.84, and 91.77 µg/m3 , respectively. In America [9], the prevalence of COPD is relatively high because of the rising amounts of major pollutants causing COPD in certain areas. It was estimated that in 2020, 37.310,657 people had COPD. The study predicted that by 2050, this number would rise to 65,524,526 which is almost a 75.6% increase in COPD patients. It was also estimated that, globally, 10% of adults over age 40 have COPD. India has the most cases of COPD in the world and it ranked second in COPD deaths. PM2.5 is considered as dangerous among all the other pollutants and its concentration is extremely high in most of the cities. More precisely, the concentration of PM2.5 is the highest in Indo-Gangetic plains [10] consisting of Bihar, Delhi, Haryana, Uttar Pradesh, and West Bengal. Of the total deaths that occurred in India in 2019, the majority of the deaths were due to COPD (32.5%). AI-based models used in the detection and diagnosis of COPD AI plays an important role in helping clinicians in the prompt decision-making and management process. The applications of AI, that is, ML and DL, have been used for a long time in the field of respiratory disease. Different researchers have provided different models for the diagnosis of various respiratory diseases. Most of the work has been done on the diagnosis of diseases pneumonia, asthma, and tuberculosis. But there were very few studies on COPD as compared to the other diseases. One of the reasons might be the absence of data available for making the prediction or diagnosis of COPD. Most of the studies which have used AI models for the prediction of COPD have used private data collected from private hospitals in collaboration with them. Such data is not available publicly, hence limiting the further research on COPD. Some of the popular datasets used in the respiratory field are shown in Table 2. AI has many applications in COPD like detecting the stage of COPD, predicting upcoming exacerbation in COPD patients, classifying COPD from emphysema and Table 2 Some of the popular respiratory disease datasets used in the previous studies Paper Id
Dataset
Year
Attribute
11
MIMIC CXR
2019
377,110 images
12
PADCHEST
2019
160,868 images, 174 different radiographic findings
13
CheXpert
2019
224,316 chest radiographs, 14 classes
14
NIH Chest X-Ray
2017
100,000 de-identified images
15
Exasens dataset
2020
Contain demographic information of 4 groups of saliva samples. 399 instances, 4 attributes
400
A. Singh et al.
Table 3 Studies on diagnosing various aspects of COPD using ML/DL techniques Paper Task Id
Number of Type of Data subjects/ participants
ML/DL technique used
Performance measures
16
Prediction of the GOLD stage in patients hospitalized with COPD exacerbations
155 patients
Blood neutrophils and demographic parameters
Support vector machines (SVM)
Accuracy—90.24% ROC—0.84
17
Classification of normal lungs and COPD
6749 samples
Chest radiograph samples
CNN (pre-trained AUC—0.814 with ImageNet), also its comparison with the NLP model
18
Diagnose 28 patients Acute Exacerbations in COPD patients (AE-COPD)
Cough sounds
Alert system and its Questionnaire false comparison with rate—0.101 the questionnaire Alert system false rate—0.012 (very low, making it a clinically relevant tool)
19
Normal and COPD patients
Serum metabolic biomarkers
Least support vector machines (LS-SVM)
Accuracies: 80.77 (linear) and 84.62% (polynomial)
20
Predicting 9 patients COPD Exacerbations
Oxygen saturation, pulse rate, and blood pressure
Comparison of one, and two-layer probabilistic models (9 classifiers were also used)
AUC increased by the mean value of 0.11 with two-layer probabilistic model
21
Early 67 patients detection of Exacerbations in COPD patients in the upcoming 7 days
Wearable device data collected from tech-sensing device (EDIMAX tool)
Classifiers-RF, decision trees, KNN, LDA, adaptive boosting, DNN
Accuracy—92.1% Sensitivity—94% Specificity—90.4% AUC—0.9
22
Detection of COPD
Respiratory sound data
CNN, Librosa ML Accuracy—93% library features such as MFCC, MEL-Spectrogram, Chroma CENS
128 patients
(continued)
A Review on Prevalence of Worldwide COPD Situation
401
Table 3 (continued) Paper Task Id
Number of Type of Data subjects/ participants
ML/DL technique used
Performance measures
23
Classification of COPD subjects
596
3 D-CNN and parametric response mapping (PRM)
Accuracy—89.3% Sensitivity—88.3%
24
Classification of COPD and pneumonia
920 Lung sound Recordings recordings
Quadratic discriminate classifier
AUC—0.997
25
Identification 8980 of structural phenotypes of COPD
Neural network
AUC—Predominant emphysema/airway phenotype (0.80) predominant emphysema/small airway phenotype (0.91)
Parenchymal functional variables of functional small airway disease (fSAD%) and emphysema percentage (Emph%)
Data points from expiratory flow-volume curves
chronic bronchitis, etc. Table 3 shows the studies that diagnosed COPD or its aspects using either ML or DL techniques. It can be seen in the Table 3 that most of the papers have diagnosed COPD by using various ML and statistical models. Only a few papers have diagnosed COPD using DL. The reason for this could be the small size of the datasets available. Deep learning models such as convolution neural network (CNN), deep neural network (DNN) give better performances when the size of training sample is very large which is generally not the case with the datasets available publicly. Challenges encountered in predicting COPD using AI-based models • Most of the studies were based on a particular type of data, either structured (electronic medical records) or unstructured (images, sound). None of the studies has used heterogeneous data.
402
A. Singh et al.
• Most of the datasets used in the literature are region specific. Results from such studies cannot be generalized to different regions. • AI models developed for detecting diseases are not robust yet. There is always a risk of having negative consequences. • Size of most of the datasets are very small. The more the training data can be given to the models, the better will be the performance. • A correct and precisely defined definition of COPD has not been concluded yet which results in the under or over diagnosis of COPD.
5 Discussion The primary objective of this study was to assess the association between COPD and other respiratory diseases, to find the prevalence of COPD in air-polluted countries based on the country’s income status, and to analyze the previous AI-based models designed for diagnosing COPD. This review showed the prevalence of COPD in different income countries and air-polluted countries. From the various studies considered in this review, it was estimated that most COPD deaths occur in LMICs. There is not enough equipment available for the diagnosis of COPD, and even if they are available in some areas, then expert doctors are not there to correctly diagnose COPD. One of the reasons is that in LMICs the effect of indoor pollution is more as compared to HMICs, and HICs. Most of the cases occur due to exposure to biomass fuel, which is heavily used in rural areas of LMICs. The other reason is the growing population. With the increased population, the number of AQI levels is also rising. In Asian and African countries, the majority of people use these domestic fuels for cooking and other domestic purposes. It affects 2.45 billion people in developing countries. It contributes to 7.7% of global mortality out of which more than 33% of deaths are from COPD. As discussed in Sect. 3, there are some serious lung diseases that contribute majorly to death worldwide. It can be seen from the discussion that almost all the diseases show symptoms years later even after settling in the lungs of the person. It might be possible for two or more lung diseases to coexist at the same time. In such cases, rather than waiting for the symptoms to show up, it would be better if such diseases get detected at the earliest. It also describes the link between some of these diseases. Because of similar symptoms, one disease could be easily mistaken for another. Hence, along with an early diagnosis, it is also crucial to compare such similar yet different diseases. Just by looking at the symptoms, it is not easy to tell whether the person has COPD, emphysema, IPF, or asthma. Hence, other medical
A Review on Prevalence of Worldwide COPD Situation
403
procedures are needed to correctly diagnose the disease. However, this process takes a lot of time sometimes as it requires conducting different tests to be confirmed a particular disease. Hence, there is a need for a single system that can easily identify the type of lung disease. The chest X-ray is one of the inexpensive methods which can be used in examining the disease. Other tests available for diagnosing COPD are spirometry and pulmonary test. However, this requires high expertise and COPD sometimes remains underdiagnosed during the tests. Also, spirometers are expensive medical tools which are not available in most of the clinics. In such cases, the diagnosis is simply made on the basis of symptoms and demographic information. Hence, there is a need for a system such as AI which can eliminate the necessity of using expensive tools for predicting COPD accurately diagnose COPD. AI has been increasingly used in the medical field. It has shown great performance in many medical applications like in the prognosis and diagnosis of the coronary artery disease (CAD), classifying skin lesions, detection of diabetes, etc. As discussed in Sect. 4, there are only a few datasets available for the detection of respiratory diseases, especially COPD. As mentioned above, COPD is a disease which shows symptoms later in life. Hence, most of the time, it has been declared as asthma or bronchiectasis. That is why there is not much data available on COPD, and even if it is, it does not get recorded, especially in developing countries. There were not much studies available on COPD but after the availability of public datasets in recent years such as MIMIC, NIH chest X-ray, a lot of COPD works can be seen now. But those studies have also various challenges as mention in Sect. 4. The COPD studies based on AI models should use both structured and unstructured data. Most of the time, some unstructured data such as clinical notes are left out from such studies which might contain some important information. The data should be collected on a huge level because the larger the data the better will be the performance of the model. Although GOLD has defined COPD, it only tells the symptoms of COPD. The definition cannot be used to differentiate it from other similar diseases such as asthma. Hence, there is a need for proper definition and guidelines regarding COPD so that correct diagnosis can be made. The models designed for predicting disease should be implemented on real-world platforms so that their efficiency and robustness can be proved. These AI systems should be developed in such a way so that the diagnosis of heterogeneous diseases can be done accurately with asthma and COPD. Such AI systems would be of great help, especially in rural areas or in secondary care clinics, where there is a shortage of experienced doctors and equipment. These systems can provide an early, timely, and accurate diagnosis which can further reduce the risk of death.
404
A. Singh et al.
6 Conclusion This study has assessed the severity of COPD and how its negligence and nonawareness among people can cost millions of lives. The severity of COPD has been assessed based on the evidence collected from various parts of the world, especially air-polluted countries, and LMICs. Asian countries are the ones that have most of the most COPD cases. This study has also shed light on the link between COPD and other respiratory diseases. It also provides information regarding how this disease can be mistaken as others and how this could lead to either the underdiagnosis or overdiagnosis of COPD which further can have lethal side effects. This study also assessed previous COPD studies based on AI models and concluded that there is not much data available on COPD as most of the time it is misdiagnosed as asthma or bronchiectasis. One of the reasons for this misdiagnosis is the non-availability of a correctly defined definition of COPD which can help in making correct diagnosis. Also, the datasets available are of small size which greatly affects the efficiency and robustness of the AI models. Hence, there is a need for proper guidelines in every country and a system that can provide an early and accurate diagnosis of COPD, even in underdeveloped areas. In the future, we aim to review various ML and DL models used for diagnosing COPD in detail. We also aim to develop a novel method that can detect COPD in an accurate and timely manner.
References 1. The burden of COPD, WHO. https://www.who.int/respiratory/copd/burden/en/. Accessed 26th Feb 2022 2. Global Initiative for obstructive pulmonary disease report (2020). https://goldcopd.org/goldreports/. Accessed 7th Mar 2022 3. Air pollution fact-sheets WHO. https://www.who.int/news-room/fact-sheets/detail/chronicobstructive-pulmonary-disease-(copd)#:~:text=The%20Global%20Burden%20of%20Dise ase,in%20low%20and%20middleincome%20countries. Accessed 7th Mar 2022 4. Miro Tool. https://miro.com/app/board/uXjVO8Wqjtg=/. Accessed 18th Mar 2022 5. World Air Quality Report. https://www.iqair.com/world-air-quality-report. Accessed 6th Mar 2022 6. World Bank Report (2020). https://www.worldbank.org/en/country/mic/overview. Accessed 8th Mar 2022 7. Air pollution, WHO. https://www.who.int/health-topics/air-pollution#tab=tab_1. Accessed 26th Feb 2022 8. Zhang Y, Wang Z, Cao Y, Zhang L, Wang G, Dong F, Deng R, Guo B, Zeng L, Wang P, Dai R (2021) The effect of consecutive ambient air pollution on the hospital admission from chronic obstructive pulmonary disease in the Chengdu region, China. Air Qual Atmos Health 18:1–3
A Review on Prevalence of Worldwide COPD Situation
405
9. Tellez D, Gondalia R, Barrett M, Benjafield A, Nunez CM, Malhotra A (2021) An estimate of the Americas’ prevalence of chronic obstructive pulmonary disease in 2050. In: TP41. TP041 DIAGNOSIS AND RISK ASSESSMENT IN COPD, American Thoracic Society, pp A2274–A2274 10. Manojkumar N, Srimuruganandam B (2021) Health benefits of achieving fine particulate matter standards in India–a nationwide assessment. Sci Total Environ 1(763):142999
Optimal Decision Making to Select the Best Suppliers Using Integrating AHP-TOPSIS Zahra M. Nizar, Watheq H. Laith, and Ahmed K. Al-Najjar
Abstract Any productive or service organization’s success is largely dependent on its ability to choose the best suppliers. Decision-making process affects the price, quality and delivery time of the products produced by these organizations, allowing businesses to produce innovative, high-quality goods at reasonable prices and gain a competitive edge in the market. The problem addressed in the paper is that most manufacturing or service institutions’ decision makers relied solely on their own experiences or on one factor, particularly cost, when comparing providers, which meant ignoring the other criteria. The goal of this article is to assist decision makers in manufacturing and service organizations in selecting the best suppliers based on a variety of criteria. The best suppliers are identified using a three-stage process that integrates the hierarchical analysis process (AHP) with the Technique for Ordering Preferences by Similarity to the Ideal Solution (TOPSIS). The first step is to choose the criteria that will be used to compare providers, and the second is to calculate the relative weights of these criteria using AHP. The third stage involves comparing the options and ranking them in order of best to worst using TOPSIS technology, which makes use of weights that were retrieved using AHP. This paper is implemented at Thi Qar Oil Company, utilizing an Excel sheet for quick and precise calculations. Keywords Supplier selection · Multi-objective · Decision making · AHP · TOPSIS
1 Introduction In the recent years, researchers have used multi-criteria decision making while choosing suppliers [1]. Finding the supplier with the best likelihood of continuously delivering a company’s demand at an affordable price is the process’s main Z. M. Nizar · W. H. Laith (B) University of Sumer, Thi-Qar, Rifai, Iraq e-mail: [email protected] A. K. Al-Najjar Ministry of Higher Education and Scientific Research, Bagdad, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_35
407
408
Z. M. Nizar et al.
objective. This choice is reached after a careful comparison of suppliers according to a set of criteria [2]. However, the choice of best suppliers requires consideration of other variables such as quality, durability and delivery, in addition to the lowest price offered by suppliers, to assist in solving complicated issues and contradictory criteria in decision making; Thomas Saaty created a well-known approach called analytic hierarchy process (AHP). AHP allows to create a hierarchy tree with goals, criteria, sub-criteria and alternatives at various levels. Many businesses have used AHP to choose the best supplier in the past. AHP is a powerful support tool that is used in a variety of sectors, including manufacturing, layout design and supplier selection. AHP, on the other hand, has a maximum of nine things each level. As a result, AHP is unable to deal with difficult situations. TOPSIS treatment this shortcoming [3]. TOPSIS is others Multi Criteria Decision Making used in a variety of sectors and decision-making processes. TOPSIS chooses the best solution among ideal and counter-ideal alternatives based on two distance functions. “The best solution should be closest to the Positive Ideal Solution (PIS) and farthest from the Negative Ideal Solution (NIS); after determining the distance between each option using PIS and NIS, a closeness coefficient is calculated for each, and the alternatives are ranked using the closeness values. In many issues, the use of MCDM approaches in conjunction has yielded promising results with robust solutions [4].” The computation via AHP cannot utilize the precise results. Additionally, TOPSIS is utilized to assess supplier performance while AHP is used to calculate the weights of the criterion. Pairwise comparisons to decision makers are computationally challenging [5]. The best decision-making processes have been developed via the efforts of many scholars. The proposal methodology is designed to make the most of MCDM techniques. AHP and TOPSIS, two distinct methods, were integrated to rank the solutions in accordance with the criteria. The TOPSIS method is used to organize supplier options, while the AHP approach is utilized to form the hierarchy and determine the relative weight of the criterion. The main contribution of this paper is that decision makers have made decisions based only on their own experience or on a single criterion to compare providers, which implies that they have ignored other criteria that may have changed their minds and led them to pick a different option after their choice had been authorized. The paper is organized as follows: The first paragraph is introduction to our work, while second and third paragraphs explain the techniques that are presented in this paper. Then, integration between these techniques is given in fourth paragraph. Fifth paragraph introduces case study for this paper, and important conclusions are explained in sixth paragraph.
Optimal Decision Making to Select the Best Suppliers Using Integrating …
409
2 Analytic Hierarchy Process (AHP) Saaty invented AHP in (1977 and 1994), which is MCDM technique. The AHP has attracted the interest of more academics due to the method’s appealing mathematical features and the ease with which the essential input data may be obtained [6]. AHP is the most well mathematical calculation approach for structuring multicriteria choice, comparing criteria in a natural pairwise manner and generating real or approximate total weights to help with decision making and ranking suitable supplier alternatives [7]. The hierarchical analysis process consists of three levels: the goal, the criteria and the alternatives. The goal of the supplier selection problem is to choose the overall best supplier. Quality, pricing, service and delivery are examples of criterion that might be employed. The alternatives are the many proposals provided by the suppliers [8]. The steps are as follows: Step 1. Build a hierarchy for the decision, as described in Fig. 1 [9]. Step 2. Create pairwise comparison matrix. The pairwise comparison matrix is defined as part of the problem structuring this matrix as follows [10].
Fig. 1 Generic hierarchic structure [9]
410
Z. M. Nizar et al.
⎡
a11 ⎢a ⎢ 21 ⎢ . ⎢ . ⎢ . A=⎢ ⎢ ai1 ⎢ . ⎢ . ⎣ . an1
⎤ a1n a2n ⎥ ⎥ .. ⎥ ⎥ . ⎥ ⎥ ai2 · · · ai j · · · ain ⎥ .. .. .. .. .. ⎥ ⎥ . . . . . ⎦ an2 · · · an j · · · ann a12 a22 .. .
· · · a1 j · · · a2 j .. .. . .
··· v .. .
where ai j = wwij ; i, j = 1, 2, . . . , n ai j = 1/a ji ai j = 1, when i = j n = number of criteria to be evaluated aij = importance of ith criteria according to jth criteria The basic Saaty scale is mentioned in Table 1 as the most common form of grading [11]. Step 3. “Focus on consistency leads to the eigenvalue formulation as follows”: ⎛ ⎞ ⎞⎛ ⎞ w1 w1 w1 /w1 w1 /w2 · · · w1 /wn ⎜ w2 ⎟ ⎜ w2 /w1 w2 /w2 · · · w2 /wn ⎟⎜ w2 ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ .. ⎟ ⎜ .. .. .. .. ⎟⎜ .. ⎟ ⎜ ⎜ . ⎟ ⎟ ⎟ . . . ⎟⎜ . ⎟ = n ⎜ ⎜ . ⎟ ⎜ ⎜ . ⎟ ⎜ . ⎟⎜ . ⎟ . . . .. .. .. ⎠⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. wn /w1 wn /w2 · · · wn /wn wn wn ⎛
Step 4. Estimate the relative weights The relative weights (W ) of matrix A is obtained from the following equation [13]. A × W = λmax × W Table 1 Standard scale for making pairwise comparisons [12] Importance
Definition
Description
1
Equally important
Two parameters have equal importance
3
Moderately important
One parameter is slightly preferred over another
5
Strongly important
One parameter is strongly preferred over another
7
Very strongly important
One parameter is very, strongly preferred over another
9
Extremely important
Evidence, preferring one attribute, is of the highest preference halfway
(2.4, 6, 8)
Intermediate values
Intermediate weights between above provisions
Optimal Decision Making to Select the Best Suppliers Using Integrating …
411
Table 2 Values of the random index (RI) [15] n
1
2
3
4
5
6
7
8
9
10
RI
0
0
0.58
0.9
1.12
1.24
1.32
1.41
1.45
1.49
where λmax : the biggest eigenvalue of matrix A. W i : vector weight of individual elements of a hierarchical structure. Step 5. Compute the Consistency Ratio (CR) The Consistency Ratio (CR) must be less than 0.1 to be the judgments of the decision makers can be accepted as consistent, otherwise the decision makers are repeat the pairwise comparison until the judgments become consistent [14]. The consistency rate (CR) is calculated using CR =
CI RI
where random index (RI) is the random consistency index, (RI) value changes with the differences in the dimensions shown in Table 2, while the consistency index (CI) is calculated by the following equation: CI =
λmax − n n−1
Step 6. The final step is to find relative weights for all alternatives and repeat arranging the alternatives from best to worst [16].
3 TOPSIS Technique Yoon and Hwang created TOPSIS, which is one of the main MCDA techniques [17]. This method is based on a straightforward premise: The optimum option must be the geometrically closest to the ideal positive solution while being as far away from the ideal negative solution [18]. TOPSIS estimates the outcomes by comparing the Euclidean distances between the actual and proposed alternatives [19]. Let α is a standard decision, where consist of n criteria, C 1 ,…, C n m alterThese evaluations form a decision matrix X = natives. A1 ,…, Am is formulated. × n. Let W = w , j = 1, . . . , n be the vector of the criteria weights, Xi j , m j n wi=1 [20]. where i=1 The TOPSIS algorithm consists of the six steps which are as follows: Step 1: Build a standard decision matrix as shown in Table 3 [21].
412
Z. M. Nizar et al.
Table 3 Standard decision matrix [21] Alternatives
Criteria C1
C2
…
Cj
…
Cn
A1 A2
X 11
X 12
…
X1 j
…
X 1n
X 21
X 22
…
X2 j
…
X 2n
.. .
.. .
.. .
…
.. .
…
.. .
Ai .. .
X i1 .. .
X i2 .. .
…
Am
X m1
X m2
Wi
W1
W2
…
…
Xi j .. .
…
Xmj
…
X mn
…
Wi
…
Wn
…
X in .. .
Step 2: Normalize the decision matrix using the following equation [22]. Xi j ri j=
n i=1
X i2j
i = 1, 2, 3, . . . , n j = 1, 2, 3, . . . , m
where r ij = the value of normalize, X ij = numerous attributes dimension, i = no. of alternatives, and j = no. of criteria. Step 3: Create the normalized weighted decision matrix: V i j = normalized weighted value and W i j = criterion weight. Step 4: “Determine the positive ideal solution (A+ ) and negative ideal solution − (A− ), where Vi+ denotes the maximum values of V+ ij and Vi denotes the minimum values as shown below [23]”. max vi j i ∈ I min vi j i ∈ J j j max vi j min vi j − − − i∈I A = V1 , . . . , Vn = j i ∈ J j A+ = V1+ , . . . , Vn+ =
where I is associated with benefit criteria and J is associated with cost criteria. Step 5: Calculate the Euclidean distance Di+ and Di− as follows [24]. Di+
=
Di− =
m j=1
m j=1
2 vi j − v +j , i = 1, 2, . . . , m 2 vi j − v −j , i = 1, 2, . . . , m
Optimal Decision Making to Select the Best Suppliers Using Integrating …
413
Step 6: Calculate performance score as shown in equation: Pi =
D− D− + D+
where Pi = Score of performance.”
4 Decision Making Using Integrating AHP-TOPSIS After the widespread use of scientific methods in the decision-making process, and not being satisfied with the self-experience of decision makers only, but the exploitation of this experience in mathematical models, which help in the speed and accuracy decisions make. Over the past decades, a number of researchers have devoted as [1, 5, 25–28]. When study Multi-criteria decision-making to supplier selection process using multiple methods, for example AHP and TOPSIS, based on the above, has been proposed methodology to integrate the two methods AHP and TOPSIS method, for supplier selection, 1.
Determining the criteria used in the comparison process between suppliers through personal interviews with decision makers. 2. Determining the relative weights of these criteria by using AHP. 3. Comparing the alternatives and arranging the alternatives from best to worst by using the TOPSIS technique, which uses weights that were extracted using AHP. This methodology is shown in Fig. 2.
5 Case Study A general tender for the delivery of (drilling fluids testing devices) is declared by Thi Qar Oil Company after it received five offers. Using above proposed methodology described in Fig. 2 that consist from three steps as following: Step 1. Determining the criteria used in the comparison process between suppliers through personal interviews with decision makers, where eight criteria are as follows: 1. Technical offer 2. The cost 3. The origin 4. Financial capacity
414
Z. M. Nizar et al.
Fig. 2 Describe methodology integrated AHP-TOPSIS
5. Specialized experience 6. Delivery period 7. After-sales services 8. The security. and ease use of these criteria will be represented by symbols (C 1 , C 2 ,…, C 8 ), respectively. Step 2. Determining the relative weights of these criteria using AHP as described in Table 4. When using the data in Table 4 and by applying the AHP, then using the Excel Sheet to find the results quickly, where the weights for criteria as shown in Table 5. The technical offer criteria is most relative importance form rest the criteria when using AHP. The value for Consistency Ratio (CR) is (0.03311), where the value less than (0.1) means the judgments of the decision makers can be accepted and consistent. Step 3. Using weights in Table 5 to comparing between the alternatives, and arranging the alternatives using TOPSIS technique, the names of the five suppliers
Optimal Decision Making to Select the Best Suppliers Using Integrating …
415
Table 4 Pairwise comparison for criteria Criteria
C1
C2
C3
C4
C5
C6
C7
C8
C1
1
2
2
2
3
5
5
5
C2
1/2
1
2
2
2
4
4
4
C3
1/2
1/2
1
1
2
3
3
3
C4
1/2
1/2
1
1
2
3
3
3
C5
1/3
1/2
1/2
1/2
1
2
2
2
C6
1/5
1/4
1/3
1/3
1/2
1
2
1
C7
1/5
1/4
1/3
1/3
1/2
1/2
1
2
C8
1/5
1/4
1/3
1/3
1/2
1
1/2
1
Table 5 Weights for criteria in Thi Qar Oil company No.
Criteria
Weight
1
Technical offer
0.27
2
Cost
0.20
3
Origin
0.14
4
Financial capacity
0.14
5
Specialized experience
0.09
6
Delivery period
0.05
7
After-sales services
0.06
8
Security
0.05
Sum of weight
1
(A, B, C, D, E), respectively, and their data with respect to these criteria as shown in Table 6.
Table 6 Description for criteria No
Criteria
Description
1
Technical offer
Match = 1, not match = 0
2
Cost
Much less than = 1, little less than = 0.75, match = 0.5, little more than = 0.25, much more than = 0
3
Origin
Match = 1, not match = 0
4
Financial capacity
Strong = 1, medium = 0.5, weak or not mentioned in offer = 0
5
Specialized experience
High = 1, medium = 0.5, weak or not mentioned in offer = 0
6
Delivery period
Match = 1, not match = 0
7
After-sales services
Available = 1, not available = 0
8
Security
Match = 1, not match = 0
416
Z. M. Nizar et al.
Table 7 Data for suppliers Alternatives
Criteria Technical Cost Origin financial Specialized Delivery After-sales Security offer capacity experience period services
A
1
1
0.5
0.5
1
0
1
B
1
0.75 0.75
1
1
1
1
1
C
0
0.25 0.25
0
0.5
1
1
1
D
1
0.5
0.5
0.5
1
1
0
1
E
0
0
0
1
1
0
0
1
Weight
0.27
0.20 0.14
0.14
0.09
0.05
0.06
0.05
Table 8 Ranking of alternatives in Thi Qar Oil company
1
Alternatives
Ranking
A
2
B
1
C
5
D
3
E
4
The data for suppliers of drilling fluids testing devices will be analyzed depending on eight criteria as mentioned in Tables 5 and 6 which shown in Table 7. After using the data in Table 7 and by applying the TOPSIS technique in Excel Sheet to find the ranking of alternatives as shown in Table 8. From Table 8, the optimal supplier is the supplier B and the rest of the suppliers are arranged as (A, D, E, C), respectively.
6 Conclusion and Future Work Determining the real criteria and their relative importance in selecting the best suppliers, in a transparent and unbiased manner, where approved eight criteria (cost, specialized experience, technical offer, delivery period, origin, financial capacity, security, after-sales services). The technical offer criteria is most relative importance form rest the criteria when using AHP, then using the TOPSIS for supplier selection. The supplier B is the optimal based on the proposed methodology in Thi Qar Oil Company, using tools like Microsoft Excel. It is possible to expand on this paper as future work by integrating fuzzy AHPTOPSIS because it operates in an uncertain environment. This paper’s limitation is that the comparisons that are established in the presented approach depending on the opinion of experts as realistic numbers or in a deterministic environment.
Optimal Decision Making to Select the Best Suppliers Using Integrating …
417
References 1. Marzouk M, Sabbah M (2021) AHP-TOPSIS social sustainability approach for selecting supplier in construction supply chain. Cleaner Environ Syst 2:1–9 2. Tooranloo HS, Ayatollah AS, Iranpour A (2018) A model for supplier evaluation and selection based on integrated interval-valued intuitionistic fuzzy AHP-TOPSIS approach. Int J Math Oper Res 13(3):401–417 3. Chi HTX (2016) Supplier selection by using AHP-TOPSIS and goal programming-a case study in Casumina Rubber Company-Vietnam. In: MATEC Web of conferences, vol 68. EDP Sciences, pp 1–5 4. Salehi S, Amiri M, Ghahramani P, Abedini M (2018) A novel integrated AHP-TOPSIS model to deal with big data in group decision making. In: Proceedings of the international conference on industrial engineering and operations management, pp 1043–1053 5. Liu Y (2017) A decision tool for supplier selection that takes into account power and performance. Open University (United Kingdom) 6. Pathan TZ, Kazi MM, Kharghar NM (2015) Evolution of AHP in manufacturing industry l6(12):123–127 7. Vasina E (2014) Analyzing the process of supplier selection. The application of AHP method 8. Benyoucef L, Ding H, Xie X (2003) Supplier selection problem: selection criteria and methods. Diss. INRIA 9. Mu E, Pereyra-Rojas M (2017) Practical decision making using super decisions v3: an introduction to the analytic hierarchy process, Springer 10. Tavana M, Soltanifar M, Santos-Arteaga FJ (2021) Analytical hierarchy process: revolution and evolution. Ann Oper Res 1–29 11. Saaty TL (1980) The analytic hierarchy process. McGraw-Hill, New York. 12. Fentahun TM, Bagyaraj M, Melesse MA (2021) Seismic hazard sensitivity assessment in the Ethiopian Rift, using an integrated approach of AHP and DInSAR methods. Egypt J Remote Sens Space Sci 735–744 13. Thomas LS, Vargas LG (2012) Models, methods, concepts and applications of the analytic hierarchy process. 2nd (ed), Springer 14. Marimuthu G, Ramesh G (2016) Comparison among original AHP, ideal AHP and moderate AHP models. Int Res J Eng IT Sci Res 2(5):29–35 15. Kolios A, Mytilinou V, Lozano-Minguez E, Salonitis K (2016) A comparative study of multiplecriteria decision-making methods under stochastic inputs. Energies 9(7):415–422 16. Flynn JR (2016) A business process reengineering framework using the analytic hierarchy process to select a traceability technology for spare parts management in capital-intensive industries. Diss. Stellenbosch: Stellenbosch University, pp 1–200 17. Bera B et al. (2021) Susceptibility of deforestation hotspots in terai-dooars belt of Himalayan foothills: a comparative analysis of VIKOR and TOPSIS models. J King Saud Univ Comput Inf Sci 1–13 18. Effatpanah SK et al. (2022) Comparative analysis of five widely-used multi-criteria decisionmaking methods to evaluate clean energy technologies: a case study. Sustainability 14(3):1–33 19. Verdu FM, Bernabeu G (2016) Project finance and MCDM financial models: an application in renewable energy projects. Diss. Universitat Politècnica de València 20. Papathanasiou J, Ploskas N (2018) Multiple criteria decision aid. Methods Examples Python Implementations 136 21. Onder E, Sundus DAG (2013) Combining analytical hierarchy process and TOPSIS approaches for supplier selection in a cable company. J Bus Econ Finance 2(2):56–74 22. Al-Zubaidy SS, Al-Bayati EI (2019) Applying the TOPSIS approach for selecting the best small project. Wasit J Eng Sci 7(2):19–23 23. Kengpol A, Rontlaong P, Tuominen P (2013) A decision support system for selection of solar power plant locations by applying fuzzy AHP and TOPSIS: an empirical study, pp 470–481 24. Maghableh GM, Mistarihi MZ (2022) Applications of MCDM approach (ANP-TOPSIS) to evaluate supply chain solutions in the context of COVID-19. Heliyon 1–14
418
Z. M. Nizar et al.
25. Bhutia PW, Phipon R (2012) Application of AHP and TOPSIS method for supplier selection problem. IOSR J Eng 2(10):43–50 26. Hanine M, Boutkhoum O, Tikniouine A, Agouti T (2016) Application of an integrated multicriteria decision making AHP-TOPSIS methodology for ETL software selection. Springerplus 5(1):1–17 27. Azimifard A, Moosavirad SH, Ariafar S (2018) Selecting sustainable supplier countries for Iran’s steel industry at three levels by using AHP and TOPSIS methods. Resour Policy 57:1–15 28. Muhammad J, Rahmanasari D, Vicky J, Maulidiyah WA, Sutopo W, Yuniaristanto Y (2020) Pemilihan supplier biji plastik dengan metode analitycal hierarchy process (AHP) and technique for order preference by similarity to ideal solution (TOPSIS). J INTECH Teknik Industri Univ Serang Raya 6(2):99–106
Drug Discovery Analysis Using Machine Learning Bioinformatics S. Prabha, S. Sasikumar, S. Surendra, P. Chennakeshava, and Y. Sai Mohan Reddy
Abstract Bioinformatics is defined as the application of tools, computation, analysis to capture the bioactivity data and interpretation of biological data. To develop or to discover the drug, the biological information of proteins, cells, RNA’S, DNA is required to analyze those functional behavior of compounds structures, physical and chemical properties. For example: In the process of discovering a new protein sequence, it utilizes the known/existing sequences in order to compare the similarities or features of the newly discovered sequence. By looking, we can roughly tell about the functioning of the newly discovered sequence of proteins. Drugs are one of the chemical substances that can change the way our body and mind works. Discovery of drugs aims to find the compounds that are classified based on the threshold values to be active or an inactive and then analyzes drug-likeness properties of compounds like absorption, distribution, metabolism and excretion (ADME). Then, we compared the molecular features and properties of each and every compound once the salts, impurities, organic acids are removed from the molecular structures. This comparison helps in preparing the dataset with PubChem fingerprints and pIC50 values as X (input variables) and Y (output variables). In the proposed system, decision tree regressor got good performance in its R-squared value and got the very lowest RMSE. Hence, the time taken for the evaluation is also very very low. These values represent that the proposed dataset is fit for regression prediction based on values achieved in work. Finally, the scatter plot is builded in between the experimental and the predicted S. Prabha · S. Sasikumar · S. Surendra (B) · P. Chennakeshava · Y. S. M. Reddy Department of ECE, Hindustan Institute of Technology and Science, Chennai, India e-mail: [email protected] S. Prabha e-mail: [email protected] S. Sasikumar e-mail: [email protected] P. Chennakeshava e-mail: [email protected] Y. S. M. Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_36
419
420
S. Prabha et al.
values of pIC50 to get the line of regression. In general, this application can be helpful for doctors and researchers in developing a drug. Keywords Bioinformatics · Drug · Canonical smiles · Molecular features · Lipinski descriptors · pIC50 · PADEL descriptors · Machine learning · IC50 · Compounds
1 Introduction Bioinformatics is the study of the combination of computers and biology with the analysis of biological information like proteins, genes, cells, etc. To develop or to discover the drug, the biological information is necessary to interpret and to analyze. While interpreting biological information, first we need to know the functioning of proteins, genes. Such functioning can be analyzed from biological structures, and then, those structures will be obtained from raw data (sequences) because structure of the protein or any biological information is detected from its functioning only. Bioinformatics can also build the mathematical models of the process, such as DNA to RNA replication, DNA to RNA transcription and RNA to proteins translation. With the help of these processes, a predicted model can be built by inferring the relationships between the compounds of complex biological systems. Since biological systems consist of millions of different types of proteins, DNA and RNAs and those help to know how they are interacting to give life and how to coordinate with each other to build the relationships, everything will be known with the help of bioinformatics. Bioinformatics will definitely not do anything in real life, but it is creating the virtual image of a living cell. Generally, living cells consist of proteins, DNAs, RNAs such that everything works and coordinates with each other and then they carry information with each other. Hence, virtual cell models can be created with the help of computer-based techniques by simulation. The known biological information, sequence information and structural information are fed inside a computer to get a virtual cell simulation thereby utilizing bioinformatics mathematical modeling and statistical analysis that helps to govern the basic principles and application to prove the hypothesis. To analyze the compounds and the compound structures (Canonical Smiles), the drug-likeness properties are very essential to be achieved. To calculate the molecular features of compounds, padel descriptors are used. Hence, the lipinski descriptors are used to describe the global features of the molecule and provide the molecular description, whereas padel descriptors use the local features of the molecules, such that each molecule helps to define the building blocks of each and every molecule. As each molecule comprises several building blocks, those connected together with the similar properties of compounds and its structures. This refers to analysis of drug discovery with respect to the functioning of any biological activity. Generally, connectivity of different legal building
Drug Discovery Analysis Using Machine Learning Bioinformatics
421
blocks gives rise to the unique structure of the molecule or other similar physical and chemical properties of the molecule that can provide the hypothesis design of the drug.
2 Related Works There is immense progress in the field of machine learning, artificial intelligence and bioinformatics in the analysis of biological science by using the mathematical models and statistical tools to solve the real time problems and computational approaches [1]. With the evolving scenario of AI and big data, the discovery of the drug helps in calculating the molecular features and its properties from the database of various compounds that helps in building the ML algorithms for various evolutions [2]. With the help of ML techniques, different phases are used in the discovery of drugs without having much computational complexity and performance [3]. Such phases include target identification (Identifying what to be achieve for further analysis), testing the compounds to analyze drug interactions, compounds get filtered until tested in cell system, trying to tested on animals like rats, clinical trials based on the animals deriving the last phase, trials done on human volunteers, FDA approval (final stage after the success in all the phases of trials) to use the drug in public. Based on the prediction methods of machine learning, typical steps play a role for the implementation such as data preprocessing, learning phase, final step involving the evaluation methods based on the performance [2]. Big data plays an important role in developing the opportunities of machine learning methods in order to handle the four Vs: Volume, Velocity, Variety and Veracity [4]. Volume presents many challenges with the ML algorithms in the case of preprocessing time and requirement of memory [5]. Variety includes different structures of data. Velocity includes the speed with which the dataset is to be processed, and veracity includes the reliability of the data. To find the activity of a new drug, the molecular features and PubChem fingerprints of various compounds are necessary in search of its physical and chemical properties based on the compound structure modeling [6]. Motive is to reduce the cost analysis of new drug preparation and ability to deal with the complex high-dimensional structure of various compounds [6]. The prediction of interactions between drugs and its targets [7] describes the data required for the task of drug interaction followed by usage of ML methods for prediction. The drug target interaction prediction uses the recent methods of algorithms based on matrix factorization in terms of efficiency, such that these methods also use both physical and chemical information for the DTI prediction. The challenges in the predictions of DTI are mainly classified into two categories, such as the challenge based on the
422
S. Prabha et al.
database and the concerning computational difficulties based on the problem definition. The implementation techniques using artificial intelligence and deep learning which helps during the analysis and development of the drug may include research, designing and manufacturing [8] in trying to find out the drug likeness with respect to the human body. The implementations of many novel techniques are more than sufficient to encounter the challenges in computational approaches [9]. Artificial intelligence, machine learning and deep learning are emerging with possible solutions to overcome the difficulties in drug discovery and development. Generally, in designing there are many complex steps like selecting the target, drug screening, validation, pre-clinical trials, etc. All these steps have massive challenges in identifying the medicine against a disease [10]. Hence, finally the AI answered all these questions in a simple and scientific manner by reducing the time consumption and cost. Such AI implementation motivates the health sector to increase data digitization to overcome the difficulties [11]. Many of the ML algorithms are used to develop the models for predicting the physical and biological characteristics of various compounds in the discovery of drugs [12]. Those algorithms include to find out the new use of drugs, drug target interactions, to maintain the efficiency and to optimize the molecules bioactivity [13]. The algorithm’s usage and techniques makes the drug discovery to be continued success and allows for the development [14] with the real applications of ML makes the development more intelligent, costeffective and time-efficient to boost the efficiency. Many of the classical methods are used for drug discovery and found to be having false negative results and also time-consuming [15]. Along with the identification of target drugs, repurposing of drugs provides new molecular targets which are used based on their properties and the prediction of new drugs in the field of bioinformatics.
3 Methods A blood sample is collected from the coronavirus affected person. In general, blood consists of RNAs, DNAs, proteins and some other biological data, such that our proposed work is completely about the analysis of protein functional behavior. So, the bioactivity data can be retrieved from it with the help of various entities present in it. The entities are amino acids, compounds, compound structures, atoms and molecules, etc. The bioactivity data which was retrieved in the functioning of protein analysis consists of different compounds with different CheMBL IDs, compound structures standard type (IC50), standard values, i.e., concentration values. Once the bioactivity data is collected, preprocessing techniques are applied on the retrieved data. Such techniques remove the null values based on the standard type of various compounds and remove the duplicates in the data.
Drug Discovery Analysis Using Machine Learning Bioinformatics
423
Fig. 1 Block diagram to analyze the drug by various methods
In the preprocessing stage, our objective is to find out which compounds are active and inactive. Then, we label the compounds according to their bioactivity threshold. Compounds whose potency values are less than 1000 nM (nanoMolar) are considered as active while greater than 10,000 nM and will be considered as inactive and in between 1000 and 10,000 nM and will be considered as intermediate. Now, select the unique features of various compounds from the bioactivity data, and the further analysis is done by different stages as shown in block diagram of Fig. 1.
3.1 Lipinski Descriptors Lipinski descriptors are used to describe the global features of the molecule for evaluating the drug likeness of compounds. Drug likeness is based on the absorption, distribution, metabolism and excretion (ADME) of compounds. These lipinski descriptors can be calculated by stating that molecular weight < 500 dalton, water partition coefficient, solubility (Log P) < 5, hydrogen bond acceptors < 10, hydrogen bond donors < 5. Based on the Lipinski’s rule, all the above four parameters can be calculated for different compound structures of the bioactivity data. Those four
424
S. Prabha et al.
parameters help to find whether the molecular compound absorbs into the body, distributes to the tissues and organs or excrete from the body. Hence, such properties help in the development of drugs for further analysis in studying their physical and chemical properties. Threshold values of the four parameters helps in calculating the lipinski descriptors. Combining all the above four parameters helps in finding the molecular features and their properties.
3.2 pIC50 IC50, which is a unique standard type and is known as the inhibitory concentration at 50%, has low concentration for protein functioning compounds. Ideally, potency of the standard value is as low as possible, such that lower the number the better the potency of drug. IC50 data to distribute uniformly will convert IC50 to the negative logarithmic scale, −log10(IC50). Hence, pIC50 will be obtained by pIC50(Molar) = −log10(IC50(nm)) ∗ 10−9
3.3 Padel Descriptors To process further analysis of drug development, molecular features and its properties should be known. Then, the padel descriptors are used to remove the salts like sodium, chlorine from canonical smiles, such as cleaning of chemical structures without having any impurities or small organic acids. After the removal of unwanted impurities, molecular features can be found based on the physical and chemical properties of compounds. Hence, the dataset can be prepared by comparing each and every compound PubChem fingerprints (=, = o. = N). With one-to-one comparison of all the PubChem fingerprints with each of the compounds, the compounds which have similar properties with the PubChem fingerprint represent as 1, else 0. Hence, based on this, the dataset can be prepared.
Drug Discovery Analysis Using Machine Learning Bioinformatics
425
3.4 Performance Analysis In the proposed method, all the PubChem fingerprints are the X-variables and the pIC50 values are the y variables. As a part of total 881 PubChem fingerprints, the low variance features has been removed to build a model with two categories as training and testing set in the machine learning such that 70% of the training set and 30% of testing set involved in the proposed work. Now, from the lazy predict importing the lazy regressor to build the model with various algorithms by calculating the performance of R-squared, RMSE and time taken by each algorithm in evaluation. In our work, the decision tree regressor got good performance in its R-square value
Fig. 2 Bar plot of r-squared values by various regression algorithms
426
S. Prabha et al.
and the lowest RMSE. Hence, the time taken for the evaluation is also very very low. Finally, the scatter plot is built with the experimental values and the predicted values of pIC50. These values represent that the proposed data is fit for regression prediction based on values achieved in our work. Figure 2 tells about the graphical approach (bar plot) of r-squared values by the comparison of each and every algorithm in the model. In general, the r-squared values lie in the range of 0–1, based on the achievement of the result can be noted as to how far the proposed dataset is best fit for regression prediction. Here by looking at the graph of R-squared values, decision tree regressor got the better value when compared to the other algorithms because it builds the regression in the form of tree structure by breaking down a dataset into smaller and smaller subsets. Also, when compared to the other models, the decision tree has some more advantages in achieving a good score as it needs and requires less effort for data preparation during the preprocessing and does not require normalization and scaling of the data since it is fast and more efficient, easy to understand, interpret and also visualize. But the other algorithms like random forest regressor and Lars regression are producing unpredictable results sometimes due to slow training and overfitting problems. Also, Lars regression is used for high-dimensional data and it is highly sensitive to the noise. Gradient boosting regressor shows some variations in their results because it can optimize on different loss functions and provides several hyperparameter tuning options that make the function fit very flexible. In general, R-squared is one of the statistical measures to represent the proportion of variance for a dependent variable (pIC50 values) explained by an independent variable (PubChem fingerprints). R2 = 1 −
RSS TSS
Here R = Coefficient of determination 2
RSS = Sum of squares of residents TSS = Total sum of squares. Here, in Fig. 3 many of the algorithms have taken low RMSE values during the computation from each and every stage. Figure 4 shows the time taken by each and every algorithm and many of the algorithms have taken very very less time along with the decision tree regressor during the computation. But the decision tree regressor is very fast and most efficient that can handle any type of data whether it is numerical or categorical or Boolean such that based on the type of data, algorithms can go
Drug Discovery Analysis Using Machine Learning Bioinformatics
427
Fig. 3 RMSE values comparison by various algorithms
under the computation in achieving the results. ElasticNetCv is an algorithm which took much time in the computation because it is the combination of lasso and ridge regressions and during the computation; those are randomly and arbitrarily selecting a few variables from different sub groups or groups.
428
S. Prabha et al.
Fig. 4 Time taken computation by various ML algorithms
3.5 Exploratory Data Analysis The range of pIC50 values of active compounds and inactive compounds is distributed in an efficient manner to proceed in the analysis of drug discovery by knowing and comparing each of those molecular features and their properties with the help of Fig. 5. pIC50 values are the potency values of different active and inactive compounds. From the Fig. 5, active compounds are having the pIC50 values in the range of 6–7.6 molar and for inactive compounds are having the range in between 1 and 5 molar. The following distribution graph helps in better understanding in studying the bioactivity class and their properties.
Drug Discovery Analysis Using Machine Learning Bioinformatics
429
Fig. 5 pIC50 value analysis for active and inactive compounds from the drug
Figure 6 shows the line of regression between the experimental and predicted values of pIC50 which helps in knowing how far the dataset points are fitted for the line of regression and the behavior of the data. It is the graphical approach of a regression equation expressing the relationship between pIC50 and PubChem fingerprints. Fig. 6 Line of Regression b/w experimental and predicted pIC50 values
430
S. Prabha et al.
4 Conclusion and Discussion In the proposed drug discovery analysis, decision tree regression with r-squared value as 0.74 got better performance when compared to the other algorithms. Initially, the preprocessing is done on the bioactivity data of proteins to analyze the functional behavior and then finding the bioactivity class of each and every compound from the bioactivity data. Then, Lipinski descriptors are used to calculate the MW, solubility, hydrogen bond donors and acceptors in order to check the functioning of druglikeness properties as absorption, distribution, metabolism and excretion (ADME). Hence, finally applying the padel descriptors on the compound structures to remove the salts, organic acids and impurities in order to clean the chemical structure to calculate the physical and chemical properties of compounds. From the already existing methods, our work reduces the complexity of the project in each and every stage and improves the efficiency of various algorithms in terms of RMSE, R-squared values. In general, this application of approach can be helpful for doctors and researchers in developing a drug. Based on the analysis of the drug designing process, it helps further to develop the drug to get it into existence by making trials on animals like rats. Then, based on the successful attempts in the development, trials will be done on humans. Finally, we get the drug to be released to the outside world.
References 1. Quazi S (2021) Role of artificial intelligence and machine learning in bioinformatics: drug discovery and drug repurposing. 10.20944 / preprints 202105.0346.v1 2. Tripathi MK, Nath A (2021) Evolving scenarios of big data and Artificial Intelligence in drug discovery. Mol Diversity 1–23 3. Manne R (2021) Machine learning techniques in drug discovery and development. Int J Appl Res IJAR 21–28 4. Kitchin R, McArdle G (2016) What makes big data, big data? exploring the ontological characteristics of 26 datasets. Big Data Soc. https: //doi.org /10.1177/ 2053951716631130 5. Bhadani A, Jothimani D (2017) Big data: challenges, opportunities and realities. CoRR abs/1705.0 6. Carracedo-Reboredo P, Liñares-Blanco J (2021) A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 4538–4558 7. Bagherian M, Najarian K (2021) Machine learning approaches and databases for prediction of drug-target interaction oxford-department of computational medicine and bioinformatics 247–269 8. Gupta R, Srivastava D (2021) Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Diversity 1315–1360 9. Duch W, Swaminathan K, Meller J (2007) Artificial intelligence approaches for rational drug design and discovery. CurrPharmDes. https://doi.org/10.2174/13816120778076595
Drug Discovery Analysis Using Machine Learning Bioinformatics
431
10. Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today. https:// doi.org/ 10.1016/j. Drudis 08 010 11. Jordan AM (2018) Artificial intelligence in drug design–the storm before the calm? ACS Med Chem Lett. https://doi.org/10.1021/acsmedchemlett. 8b00500 12. Lo YC, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546 13. Refioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V Do ˘gan T (2019) Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. brief bioinform. Int J Appl Res IJAR 1878–1912 14. Patel L, Shukla T, Huang X, Ussery DW (2020) Machine learning methods in drug discovery MDPI. Molecules. https://doi.org/10.3390/molecules25225277 15. Chen SH, Jakeman AJ, Norton JP (2008) Artificial intelligence techniques: an introduction to their use for modeling environmental systems. Math Comput Simul 78(2–3):379–400
Effect of GloVe, Word2Vec and FastText Embedding on English and Hindi Neural Machine Translation Systems Sitender, Sangeeta, N. Sudha Sushma, and Saksham Kumar Sharma
Abstract One of the most useful applications of Natural Language Processing is language translation. However, with the invention of machine translation, users who are comfortable in their local languages can understand, study, or search for content produced in any language of the world. Neural machine translation (NMT) is a new machine translation approach that was developed recently. NMT can benefit from the use of dense word representations, known as Word Embeddings. In this paper we have studied the effect of 3 pre-trained word embeddings, GloVe, Word2Vec and FastText (for the languages English and Hindi) on English and Hindi neural machine translation systems. The performance of the models have been evaluated using the BLEU metric. In the end the proposed models have been compared with Google translate as well. Keywords Natural language processing · Neural machine translation · Word embeddings · GloVe · Word2Vec · Word2Vec · Dense distributed representations
1 Introduction Automated translation is known as machine translation (MT). It is a way of transforming a document from one language to another language using computer software. The original language’s essence must be fully recreated in the target language. While it appears simple on the surface, it is far more complicated under the hood. Translation is more than just a word-for-word replacement. A translator must be able to evaluate and analyze all of the text’s aspects, as well as understand how each word influences the others. This necessitates considerable knowledge of the source and target languages grammar, syntax (sentence construction), semantics (meanings), etc. Sitender (B) · S. K. Sharma Information Technology, Maharaja Surajmal Institute of Technology, Delhi, India e-mail: [email protected] Sangeeta · N. S. Sushma Computer Science and Engineering, Maharaja Surajmal Institute of Technology, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_37
433
434
Sitender et al.
There is a present demand for English to Hindi (or the other way around) translators because many brands aspire to join the Indian market due to its commercial viability. Introducing English to Hindi translators in the market is the greatest strategy to capture the attention of the Indian audience and increase sales. Hindi is among India’s primary languages, alongside English. More than 400 million people speak Hindi, which includes diverse dialects and pronunciation variants. All industries employ English to Hindi translation. It is required for all types of communication, including books and client communication. This even helps native Indian language speakers to interact with the technologies being used around the globe. For example, a person who knows Hindi can easily read an English article because of automatic language translation. Approaches to Machine Translation (MT) fall into two types methodology wise: rule-based methods and corpus-based methods. Since the introduction of the concept of Machine Translation in the 1990s, rule-based techniques have been extensively put into practice by researchers. However, this technique was extremely labor intensive because it was difficult to write and maintain all of the rules [1]. Even though this technique was quite popular for almost a decade after its first demonstration, with the report of the Automatic Language Processing Advisory Committee (ALPAC) in 1966 [2] the boom came to an abrupt halt. The report, which was extremely suspicious of MT and resulted in a significant reduction in funding for MT research. Meanwhile, researchers worked to increase the quality of translations. Beginning in the 1970s, RBMT techniques got more refined. However, corpus-based approaches became popular after the availability of bilingual corpora in the 2000s. Statistical approaches (SMT), example-based approaches (EBMT) and neural network based approaches (NMT) are the three types of corpus-based MT approaches. For a long time SMT dominated the industry but then with the advancements in deep learning, researchers started exploring and applying neural networks to machine translation. In 2014, [3, 4] proposed the first neural translation models and coined the phrase “neural machine translation” [5]. In this research, we will concentrate on how neural machine translation systems perform with different kinds of word embeddings. Most, if not all, Natural Language Processing (NLP) activities require the representation of words and documents. These representations are what are called word embeddings. In general, it has been discovered that representing them as vectors is advantageous since they have an appealing, intuitive interpretation, may be the subject of valuable operations, and lend themselves well to numerous Machine Learning (ML) methods and strategies. Modern research on word embeddings is motivated by attempts to improve the efficiency and accuracy of language modeling [5].
2 Literature Review Word embeddings are learned text representations of words in which words with related meanings are represented similarly. An embedding can be visualized as a
Effect of GloVe, Word2Vec and FastText Embedding …
435
dictionary that contains words and their corresponding n dimensional vectors that are mathematically supposed to cluster similar words together and thus, make a system understand their semantic meanings. Since each word is mapped to a single vector and the vector values are acquired in a manner that resembles a neural network, the technique is frequently grouped with deep learning. The main idea is to represent each word with a densely distributed representation. Therefore, each word is represented by a vector of numerous dimensions (50, 100, 200, etc.). Word embeddings were first introduced in 2003 [6]. We have seen a great deal of progress in this area since then. Mikolov et al. [7] first demonstrated how to train word vectors, inventing the Word2Vec model. They solved problems like Word Sense Disambiguation (WSD) with the models they introduced. GloVe [8] and FastText [9] improved on [7] results. In this study, we mainly focus on Word2Vec, GloVe and FastText word embeddings. Word2Vec. In W2V embeddings [7], the skip gram and CBOW architectures have been built using the gensim library [10]. Here unknown (out of vocabulary) words are those with a frequency of less than 2 in the total corpus. Lilleberg et al. [11] used Word2Vec for text classification. Ma and Zhang [12] clustered the similar words together and used the generated clusters to fit into a new data dimension so as to decrease the big text data dimensions. Xue et al. [13] used Word2Vec for sentiment classification. GloVe. They have been used for sentiment analysis [14]. Lauren et al. [15], Bawa et al. [16, 17] has made use of GloVe embedding for sequence labeling tasks. They have also found their applications in the field of biomedical natural language processing [18]. FastText. These word embeddings have been used for text classification [19]. Le et al. [20] used a combination of continuous FastText ngrams for reading DNA sequences.
3 Preliminaries 3.1 Word Embeddings Contextual information can be stored in a low-dimensional vector using word embeddings. Words having synonymous or interchangeable meanings usually are seen to appear in comparable contexts. Models learn the vectors using unsupervised learning on a large amount of data corpuses available online, such as Wikipedia, articles, news stories. Similarity between words can be calculated using two formulas: cosine similarity and Euclidean distance (1). T αs βs α·β Similarity = cos() = = s=1 α · β T T 2 2 s=1 as s=1 βs
(1)
436
Sitender et al.
The cosine similarity is between −1 and 1. Through a model, two vectors having a larger similarity are likely to provide similar results. We get word embeddings with techniques like GloVe, FastText, and Word2Vec, which produce more accurate results than models fed with One-Hot technique. Word2Vec first showed that a shallower and more accurate model can be trained on significantly more data. The hidden layer, which was present in prior models, is removed, resulting in increased efficiency. The Continuous Bag-of-Words (CBOW) and skip gram variants of Word2Vec were proposed (SG). CBOW tries to predict one target central word by looking at its surrounding words, whereas the latter does the opposite: it predicts the entire group of surrounding words (context) by looking at the central word. Their efficient training is ensured by the presence of one hidden layer in their model infrastructure. Stanford developed GloVe [8], which was an abbreviation for Global Vectors for Word Representation. Its goal is to reconcile models that predict words with word statistics across the whole corpus. They propose a model that considers the corpus’s co-occurrence statistics as well as the effectiveness of prediction-based methods. Facebook developed Fasttext [9]. It’s a technique for learning word representation that uses the Word2Vec skip gram model but also enhances its performance and efficiency.
3.2 Deep Neural Networks Deep neural networks (DNNs) are complex data processing systems that use advanced mathematical models [21]. Feed Forward Neural Networks. This considers each input feature to be distinct and unrelated to the others. The final layer for probability outputs is a common scenario. It’s worth noting that the multilayer perceptron (MLP) is a type of FFNN that has one or more hidden layers [21]. Recurrent Neural Networks. The previously discussed model (FFNN) has a significant flaw: it can only consider a limited amount of previous words when predicting the future word as FFNNs lack any kind of memory, this constraint is inherent in their design. The next word can only be predicted using the words that are supplied through the set number of input neurons. So as to keep track of the words in previous iterations also RNN came into the picture [22]. The data that traveled through the architecture is looped back on itself. For decision-making, each input is reliant on the prior one. For each layer in the network, equal weights and biases are assigned by RNN. As a result, all of the independent variables become dependent variables. The RNN’s loops ensure that the information in its memory is retained [23]. Long Short Term Memory. However RNNs faced something called the vanishing gradient descent problem. That means, RNNs were not capable of being able to remember information over long sequences. So to handle this problem, LSTMs were
Effect of GloVe, Word2Vec and FastText Embedding …
437
introduced as a special kind of RNN [24, 25]. LSTM consists of three parts, the first part is called the forget gate. The forget gate decides whether the information which is coming from the earlier timestamp is to be forgotten or remembered. In the second part, which is called the input gate, the new information which is received from the input, the cell tries to learn it. In the third part which is called the output gate, the updated information is passed by the cell from the current timestamp to the next timestamp. In addition to these, LSTM also has hidden states and cell states. Encoder Decoder Model. The encoder decoder framework [26] handles sequential data, wherein the source sentence’s length might be different from the target sentence’s length. The encoder is in charge of converting the sentence into a fixedlength vector called c. While, decoder is in charge of translating c into the target sequence. The sequence-to-sequence model, or Seq2Seq for short, is another name for the encoder decoder model [21]. Attention Mechanism. While NMT’s promising performance has shown that it has a lot of potential for identifying relationships of words inside a sequence, in practice, it still suffers a significant performance loss when the source sequence gets too long. The original NMT Encoder’s fundamental flaw, when compared to other feature extractors, is that it has to compress one sentence into a fixed-length vector. When the input sentence becomes longer, the network’s performance degrades since the output of the last layer of the model is a fixed-length vector, which may be limited in its ability to represent the entire sentence and result in some information loss. This information loss frequently includes the long-range dependencies of words due to the limited length of the vector [27]. Attention Mechanism was developed to solve this problem. Specifically, the Attention Mechanism is a component that sits between the Encoder and the Decoder and can aid in the dynamic determination of word association (word alignment information). Human behavior in reading and translating text data inspired the application of the Attention Mechanism to NMT. People usually read a text several times in order to extract the reliance inside the sentence, which means that each word has a variable dependency weight [27].
4 Experimental Setup The proposed architecture is depicted in Fig. 1.
4.1 Dataset The dataset used for this piece of work is the Hindi Visual Genome Dataset. Hindi Visual Genome is a multimodal dataset including text and images that can be used for multimodal machine translation and research from English to Hindi. English snippets (captions) from Visual Genome are mechanically translated to Hindi. The total
438
Sitender et al.
Fig. 1 Proposed architecture
number of English-Hindi pairs were 28930. The word clouds of the two languages of the dataset are shown in Figs. 2 and 3.
4.2 Preprocessing Both the english and hindi datasets were thoroughly preprocessed and cleaned before being fed into the models. The english dataset was normalized and then cleaned using regular expressions, the punctuation, numerical values and other such unwanted char-
Effect of GloVe, Word2Vec and FastText Embedding …
439
Fig. 2 Word cloud representation of English dataset Table 1 Word embedding sources Language Embedding Hindi Hindi Hindi English English English
GloVe Word2Vec FastText GloVe Word2Vec FastText
Source [28] [28], used CBOW [28] Gensim Gensim
acters were removed. The hindi dataset was cleaned and preprocessed using regular expressions and the Indic-nlp library. After cleaning both the datasets, (or ) and (or ) tokens were appended accordingly. Then the entire dataset was shuffled and divided into train and test sets. After this step, they were tokenized and padded. Now the data was ready to be fed into the models. Adding Word Embeddings. All the models have been tested with 3 word embeddings (GloVe, Word2Vec and FastText). All the embeddings for hindi language were extracted from [28]. For english, GloVe representations were taken from Stanford’s official website. Word2Vec and FastText were adapted from the standard Gensim models. The embedding dimension used was 100 for all the models. All these sources have been shown in a tabular format in Table 1.
440
Sitender et al.
Fig. 3 Word cloud representation of Hindi dataset
4.3 Model In this research work we have constructed two base models for each for the tasks of english to hindi and hindi to english machine translation. Model M 1 is a simple encoder decoder model and Model M 2 is an encoder decoder model with attention mechanism. They have been described in detail in the following paragraph. Then we make an English to Hindi translator and a Hindi to English translator copies out of those two baseline models. Now, each of these models are further divided into 3 subgroups, each containing a different type of word embedding. Hence we have 12 resulting models. A clearer pictorial representation of this and the nomenclature used for the models is shown in Fig. 4. Decipher the nomenclature as follows. For example, A Simple Encoder Decoder Model, converting English to Hindi using the GloVe embedding is named Model M 1.1.1. Model 1. Model M 1 is a simple encoder decoder with 1 layer of LSTM nodes. Two copies of this model were made. One of which (M 1.1) takes English as the input (source) language and Hindi as the output (target) language (see Fig. 5). The other one (M 1.2) takes Hindi as the source and English as the target (see Fig. 6). Then, three copies of each of those two models (M 1.1.1, M 1.1.2, M 1.1.3 and M 1.2.1,
Effect of GloVe, Word2Vec and FastText Embedding …
Fig. 4 Nomenclature of all the proposed models
Fig. 5 The architecture of model M 1.1
441
442
Sitender et al.
Fig. 6 The architecture of model M 1.2
Model 1.2.2, Model 1.2.3) were made and trained respectively with 3 different word embeddings, GloVe, Word2Vec and FastText. Model 2. Model 2 is an encoder decoder model with attention with 1024 LSTM nodes each in encoder and decoder. Two copies of this model also were made. Just like in the earlier case, the fist copy of model 2 (M 2.1) takes English as input language and Hindi as output language (see Fig. 7). While the other one (M 2.2) takes Hindi as the source and English as the target (see Fig. 8). Three copies of those two models (M 2.1.1, M 2.1.2, M 2.1.3 and M 2.2.1, M 2.2.2, M 2.2.3) were made and were trained respectively with 3 different word embeddings, GloVe, Word2Vec and FastText.
5 Results and Discussion We calculated the 1-gram, 2-gram and n-gram BLEU scores for all the models. They have been tabulated in the table below. Table 2 shows the comparison of BLEU scores of model 1 for English to Hindi NMT (M 1.1). Table 3 shows Model 1 for Hindi to English NMT (M 1.2). Table 4 contains the results of model 2 for English to Hindi NMT (M 2.1) and Table 5 contains results of Model 2 for Hindi to English NMT. In the first model, GloVe seems to be performing the best in all the scores calculated. However, for the second one, Word2Vec seems to have come quite to the par with GloVe. We have also compared the translations predicted by our models to each other and to standard google translate. Tables 6 and 7 contain predictions made by all the submodels of Model M 1 and google translate. Tables 8 and 9 contain predictions made by all the sub models of Model M 2 and google translate.
Effect of GloVe, Word2Vec and FastText Embedding …
443
Fig. 7 The architecture of model M 2.1 Table 2 BLEU scores of model 1: English to Hindi NMT Word embeddings Model-1 1-gram Glove W2V FT
Model M 1.1.1 Model M 1.1.2 Model M 1.1.3
0.59 0.70 0.64
Table 3 BLEU scores of model 1: Hindi to English NMT Word embeddings Model-1 1-gram Glove W2V FT
Model M 1.2.1 Model M 1.2.2 Model M 1.2.3
0.62 0.69 0.65
2-gram
n-gram
0.47 0.38 0.41
0.25 0.19 0.12
2-gram
n-gram
0.46 0.49 0.41
0.22 0.23 0.22
444
Sitender et al.
Fig. 8 The architecture of model M 2.2 Table 4 BLEU scores of model 2: English to Hindi NMT Word embeddings Model-2 1-gram Glove W2V FT
Model M 2.1.1 Model M 2.1.2 Model M 2.1.3
0.69 0.69 0.68
2-gram
n-gram
0.52 0.54 0.51
0.24 0.27 0.24
Effect of GloVe, Word2Vec and FastText Embedding … Table 5 BLEU scores of model 2: Hindi to English NMT Word embeddings Model-2 1-gram Glove W2V FT
Model M 2.2.1 Model M 2.2.2 Model M 2.2.3
0.72 0. 72 0.75
445
2-gram
n-gram
0.54 0.55 0.61
0.32 0.33 0.39
Table 6 A comparison of translations predicted by our model 1 and google translate (English– Hindi) Model Source sentence Translation Model 1—GloVe (Model M 1.1.1) Model 1—W2V (Model M 1.1.2) Model 1—FT (Model M 1.1.3) Google translate
Birds flying over surface of water Birds flying over surface of water Birds flying over surface of water Birds flying over surface of water
Table 7 A comparison of translations predicted by our model 1 and google translate (Hindi– English) Model Source sentence Translation Model 1—GloVe (Model M 1.1.1) Model 1—W2V (Model M 1.1.2) Model 1—FT (Model M 1.1.3) Google translate
Cat is sitting on a laptop Cat is sitting on a laptop Cat is sitting on a laptop Cat is slipping on a laptop computer
6 Conclusion In this paper, two machine translation models English–Hindi and Hindi–English has been used for testing the effect of different word embedding on the performance. This work can be extremely useful for selecting the appropriate word embedding while building a neural machine translation model. In order to extend this work, one can try building models with Bi-LSTM or GRUs and see how these word embeddings affect their efficiencies.
446
Sitender et al.
Table 8 A comparison of translations predicted by our model 2 and google translate (English– Hindi) Model Source sentence Translation Model 2—GloVe (Model M 2.1.1)
Birds flying over surface of water Black suit jacket on man
Model 2—W2V (Model M 2.1.2)
Birds flying over surface of water Black suit jacket on man
Model 2—FT (Model M 2.1.3) Birds flying over surface of water Black suit jacket on man Google translate
Birds flying over surface of water Black suit jacket on man
Table 9 A comparison of translations predicted by our model 2 and google translate (Hindi– English) Model Source sentence Translation Model 2—GloVe (Model M 2.1.1)
Cat snoozing on a laptop A man coming out of ski
Model 2—W2V (Model M 2.1.2)
Cat snoozing on a laptop computer A skier with right out
Model 2—FT (Model M 2.1.3)
Cat snoozing on a laptop computer Cat is sleeping on a laptop computer A runner overtakes first base
Google translate Google translate
References 1. Wang H, Wu H, He Z, Huang L, Church KW (2021) Progress in machine translation. Engineering 2. John H (2003) Alpac: the (in) famous report. In: Readings in machine translation, vol 14, pp 131–135 3. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 4. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, vol 27 5. Almeida F, Xexéo G (2019) Word embeddings: a survey. arXiv preprint arXiv:1901.09069
Effect of GloVe, Word2Vec and FastText Embedding …
447
6. Bengio Y, Ducharme R, Vincent P (2000) A neural probabilistic language model. In: Advances in neural information processing systems, vol 13 7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations. ICLR 2013— workshop track proceedings. MIT Press 8. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 9. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146 ˇ uˇrek R, Sojka P (2004) Software framework for topic modelling with large corpora. In: 10. Reh˚ Proceedings of LREC 2010 workshop new challenges 11. Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th international conference on cognitive informatics and cognitive computing (ICCI*CC). IEEE, pp 136–140 12. Ma L, Zhang Y (2015) Using Word2Vec to process big text data. In: Proceedings—2015 IEEE international conference on big data, IEEE big data 2015, Oct 2015, pp 2895–2897 13. Xue J, Liu K, Lu Z, Lu H (2019) Analysis of Chinese comments on Douban based on Naive Bayes, pp 121–124 14. Rezaeinia SM, Rahmani R, Ghodsi A, Veisi H (2019) Sentiment analysis based on improved pre-trained word embeddings. Exp Syst Appl 117:139–147 15. Lauren P, Qu G, Yang J, Watta P, Huang GB, Lendasse A (2018) Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cogn Comput 10(4):625–638 16. Bawa S et al (2021) A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach. Neural Comput Appl 33(7):2819–2838 17. Bawa S et al (2020) Sanskrit to universal networking language enconverter system based on deep learning and context-free grammar. Multimedia Syst 1–17 18. Chiu B, Baker S (2020) Word embeddings for biomedical natural language processing: a survey. Lang Linguist Compass 14(12):1–54 19. Zhou Y (2020) A review of text classification based on deep learning. In: ACM international conference proceeding series, pp 132–136 20. Le NQK, Yapp EKY, Nagasundaram N, Yeh H-Y (2019) Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous fasttext n-grams. Front Bioeng Biotechnol 305 21. Hou SL, Huang XK, Fei CQ, Zhang S-H, Li Y-Y, Sun Q-L, Wang CQ (2021) A review of text classification based on deep learning. J Comput Sci Technol 633–663 22. Draye J-P (2001) Recurrent neural networks: properties and models. In: Plausible neural networks for biological modelling, pp 49–74 23. De Mulder W, Bethard S, Moens M-F (2015) A survey on the application of recurrent neural networks to statistical language modeling. Comput Speech Lang 30(1):61–98 24. Yong Y, Si X, Changhua H, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270 25. Appiah AY, Zhang X, Ayawli BBK, Kyeremeh F (2019) Long short-term memory networks based automatic feature extraction for photovoltaic array fault diagnosis. IEEE Access 7:30089–30101 26. Cho K, Merriënboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 27. Yang S, Wang Y, Chu X (2020) A survey of deep learning techniques for neural machine translation 28. Saurav K, Saunack K, Kanojia D, Bhattacharyya P (2021) “A passage to India”: pre-trained word embeddings for Indian languages, vol 2
Resource Provisioning Aspects of Reserved and On-Demand VMs in Cloud Computing Environments Yogesh Kumar, Jitender Kumar, and Poonam Sheoran
Abstract On-demand resource availability and pay-as-you-go are the key benefits of cloud computing. It eases the cloud service users to adjust the computing demands and pay accordingly. However, cloud service providers offer different provisioning plans for virtual machines (VMs): reservation-based plans and on-demand access plans. The key premise of reserved VMs is the upfront fee paid by the users; i.e., the higher the upfront payments by users, the more significant the discounts. However, these benefits can only be availed by instance usage. In contrast, on-demand VMs can be leased with comparatively higher time quanta fees. Under such a scenario, leasing cloud VMs from an infrastructure service provider would be problematic. The variations in the arriving workloads can render either under-provisioning or overprovisioning for the Software as a Service (SaaS) provider. Under-provisioning would cause Service-Level Agreements (SLA) violations. In contrast, over-provisioning would inflict higher leasing costs at the SaaS provider. So, this paper chronologically investigates all such issues under high varying workloads as well as low varying workloads. Unlike other studies, this paper investigates VMs leasing plan for mobile device users by considering waiting time as a quality of service (QoS) metric. Simulation results show that purchasing a mixed blend of reserved and on-demand VMs yields better monetary cost-savings even in low varying workloads. We also outline the open challenges for the mixed resource leasing plans. Keywords Cloud computing · SaaS · SLA · Leasing plans · Virtual machines · QoS
Y. Kumar (B) · J. Kumar Computer Science and Engineering Department, DCRUST Murthal, Sonipat, Haryana, India e-mail: [email protected] J. Kumar e-mail: [email protected] P. Sheoran Biomedical and Engineering Department, DCRUST Murthal, Sonipat, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_38
449
450
Y. Kumar et al.
1 Introduction Cloud computing has rejuvenated the computing world. It delivers IT resources as a service and frees the users from the low-level implementation details. It allows cloud service providers (CSPs) and cloud service users (CSUs) to adjust the computing capacity according to the requirements [1]. Cloud computing has eased users’ life in many aspects; for instance, users do not need to rely on lengthy setup procedures for new hardware or software [2]. One of the significant cloud services is virtual machine (VM) provisioning. CSUs can purchase these VM instances through reservation, ondemand, or spot bidding. Reserved VM instances can be acquired by long-term contracts (typically 1–3 years) and require high upfront cost, but subsequent usage of reserved VMs is also available at significant discounts [3]. With reserved VMs, a maximum of 38% of the total price can be saved for one year’s heavy utilization. Similarly, 60% of the total price can be saved if the VMs are reserved VMs for three years. However, purchasing all VMs on a reserved basis may be sub-optimal in case of varying workload and randomness in the inter-arrival times of requests. On the other hand, on-demand VM instances can be acquired according to the requirements. However, the per-time quanta leasing cost of on-demand VMs is higher than reserved VM instances. It is a good option to use on-demand instances if the VM instance has to run for a couple of hours per day or a few days per month [4]. Spot instances can be acquired by online bidding, but spot instances do not provide any availability guarantee [4], so this paper does not consider such instances. Since a CSU may not be using the reserved VMs all the time, we have considered Software as a Service (SaaS) provider to investigate the implications of leasing plans. Moreover, either CSUs or CSPs would have to bear the costs of under-utilized reserved VMs [5]. The SaaS provider leases VMs from an infrastructure service provider, e.g., Amazon [6], and provides hosted services to the mobile devices. Such services relieve the enterprises from investment and maintenance of expensive hardware. Unlike other studies, this paper investigates VMs leasing plan for mobile device users by considering waiting time as a Quality of Service (QoS) metric. The waiting time violations would increase the energy consumption on mobile devices [7]. So, selecting an optimal purchase plan is the key challenge for the SaaS providers. The rest of the paper is organized as follows: Sect. 2 covers the related work, Sect. 3 discusses system architecture, Sect. 4 elaborates the problem statement, Sect. 5 presents a simulation scenario, Sect. 6 presents the results and discussions, Sect. 7 discusses the open challenges, and Sect. 8 presents the concluding remarks of the paper.
Resource Provisioning Aspects of Reserved and On-Demand VMs …
451
2 Related Work The related work discussion has two subsections. The first subsection covers the studies that argue for using purely on-demand VMs. The second subsection discusses the studies that advocate for hybrid VM leasing strategies. Purely On-Demand VM Leasing Strategies Since workloads may vary during the time of the day, so, Ferber et al. [8] argue that keeping VMs according to the arriving workloads can save monetary costs for the CSPs. So, the authors proposed an auto-scaling algorithm for increasing/decreasing the number of running on-demand VMs. The proposed auto-scaling strategy keeps 10% extra VMs for maintaining the QoS in all circumstances. However, keeping 10% extra VMs can cause over/under-provisioning. Similarly, Calheiros et al. [9] also exploit all on-demand VMs to maintain the QoS. However, the authors used a threshold rule-based auto-scaling policy for maintaining the number of running VMs according to the time of the day. However, threshold rule-based systems are application-specific [10]. Shi et al. [11] proposed a response time-based system for maintaining the QoS for CSUs. Nevertheless, determining the intended response time is a tedious task. So, Kumar et al. [7] proposed a service cost and waiting cost tradeoff-based auto-scaling system. However, using all VMs as on-demand VMs for long periods can result in higher monetary costs for the SaaS provider because reserved VMs are available at discounted rates. Hybrid VM Leasing Strategies In contrast to purely on-demand strategies, Chaisiri et al. [4] proposed a stochastic IPP-based algorithm for determining the number and type of reserved VMs, which continuously updates the number of reserved VMs using reservation and expending phases. However, this work is based on the artificial costs of reserved VMs. Also, for the low uncertainty load, this model performs well, but a higher uncertainty workload would increase the problem space of the IPP. Similarly, Shen et al. [12] also proposed an IPP-based mechanism for optimally scheduling the incoming requests on the reserved and on-demand VMs. Instead of using IPP, Hu et al. [13] exploit a mixed-integer programming approach for optimally scheduling the incoming tasks on the reserved and on-demand VMs. Instead of continuously updating the reserved VMs, Ambati et al. [14] exploit an offline approach for determining the optimal requirement of reserved and on-demand VMs. However, determining the long-term demand is virtually impossible. Moreover, these works do not consider waiting time as a QoS metric as violations in QoS would increase the energy consumption on mobile devices. Table 1 presents the summary of the different proposals on VM leasing.
452
Y. Kumar et al.
Table 1 A summary of VM leasing works Ref
Purely on-demand
Hybrid
Scaling
Workload variations
QoS
[4]
No
Yes
No
No
No
[7]
Yes
No
Yes
Yes
Yes
[8]
Yes
No
Yes
Yes
Yes
[9]
Yes
No
Yes
Yes
Yes
[12]
No
Yes
No
No
No
[13]
No
Yes
No
No
No
[14]
No
Yes
No
No
No
3 System Architecture The proposed structure contains three major entities: CSUs, SaaS Provider and Public Cloud, as depicted in Fig. 1. CSUs: For the current work, all the users are assumed to be registered users who send their software service requests to the SaaS provider. SaaS Provider: This entity contains four components, namely: Admission Control, Scheduler, Performance Analyzer and Service Provisioner. Admission control acts as the entry point in the cloud. In addition, it also acts as a database repository for CSUs. Scheduler takes the users’ requests and schedules them on the service provisioner VMs on a first come first serve basis. Performance Analyzer tracks the status of service provisioner VMs queues. It decides to start new on-demand VMs if the current running VMs are insufficient to cope with the current workload. In
Request 1
Request n
Admission Control
Request 2
Scheduler
Service Provisioner Service Provisioner
Public Cloud (IaaS) Fig. 1 System architecture
Performance Analyzer
S a a S
Resource Provisioning Aspects of Reserved and On-Demand VMs …
453
addition, it also helps in deciding the VM for shutting down. The service provisioner acts as hosted software service provider component. Public Cloud: This entity provides infrastructure as a service (IaaS) facility to the SaaS provider in the form of virtual machines (VMs).
4 Problem Statement Let the simple bag of task (BoT) application be modeled as a set of tasks denoted by S = {T 0 , T 1 … T n+1 }. A task is modeled as a 3-tuple Ti = {Sendi , MIi , Receivei }, where Sendi denotes the amount of input data, Receivei denotes the amount of by the output data and MIi denotes amount of CPU workload required service and R R R R V = v , v , . . . , v denotes the is measured in million instructions (MI). Let n 1 2 . . . , vmod denotes the set of on-demand set of reserved VMs and V od = v1od , v2od , type
type
type
VMs. A VM is modeled as a 2-tuple Vm = CFmips , pricemips where Cmips denotes type
the clock frequency of the reserved/on-demand VM type and pricemips denotes the per-time quanta price of a VM. The objective is to investigate the optimal leasing plan over a period while maintaining the QoS, i.e., to minimize the leasing costs: min t∈T
n
pricermips +
m
vm1
priceod mips
(1)
vm1
where the price of reserved VMs can be estimated as: n
pricermpis =
vm1
T
pricermips +
vm1
T
pricermips + . . .
T
vm2
pricermips
(4)
vmn
and price of on-demand VMs can be estimated as: m 1
priceod mips =
term initvm1
priceod mips +
term initvm2
priceod mips + . . .
term
priceod mips
(3)
initvmm
The reserved instances are kept running till the contractual period. For the current work, all VMs that need to be purchased are assumed to be of single type. And, the price of on-demand VMs comprises the sum of price of on-demand instances from their initiation time to termination time (the termination time is rounded up to its nearest integer hour because these instances are charged hourly).
454
Y. Kumar et al.
5 Simulation Scenario A. Simulation Setup The simulations are carried out using a single data center containing 500 hosts using CloudSim [15]. The configuration of each host is set as one octa-core processor with 32 GB of RAM. Each newly initialized VM is kept with a dual-core processor with a 2.5 GHz clock speed, similar to Amazon’s m3. Large Linux VM instance, along with a space-shared policy. As the current work focuses on the SaaS provider’s perspective, we opted for the simple VM allocation policy of CloudSim for the creation of new VMs on datacenter hosts. All VMs that need to be purchased are assumed to be of single type for the current work. Similarly, we considered the clock frequency of SMD as 1.5 GHz. Output metrics collected for each simulation are per day costs of reserved and on-demand VMs, service-level agreement (SLA) violations rate of requests and utilization rate of cloud resources. A SLA violates if the user’s mobile application waits for a time higher than the SLA negotiated waiting time for allocating a VM. Utilization rate stands for busy hours of the VM. There are various techniques for acquiring and releasing on-demand VMs, e.g., threshold-based [9] and tradeoff-based [7]. In the tradeoff-based system, penalties are imposed on the SaaS providers if any SLA violation occurs. Since the tradeoffbased system has the scope of imposing penalties on the SaaS providers, we adopted a tradeoff-based scaling policy with slight modification. In the modified scaling policy, whenever some VMs needs to be shut down, only on-demand VMs are considered for shutting down. Meanwhile, an on-demand VMs start-up time is kept at 150 s [8]. However, the penalty model of [7] is kept unchanged. B. Workload Scenario As no such workload and its characteristics are available for such a simulation, two synthetic types of workloads are examined (Fig. 2). Moreover, it is also impossible to predict the demands of the complete year duration, so the simulation is carried out for complete one-day requests. Unlike carrying out work using IPP models, the current work focuses on the latest analysis of data available through Amazon [3]. The first scenario represents a simple, uniformly distributed workload (shown by a straight line in Fig. 2). In contrast, the second scenario consists of a more dynamic and fluctuating workload obtained using Weibull distribution similar to [9] and is shown by varying curves. The task arrival rate is 2/min in the uniform distribution. In contrast, the task arrival rate parameters in the Weibull distribution are kept as 1.79 and 24.16. The average computation time of each task is kept at 840 s, whereas the maximum waiting time is kept at 10 s. C. Cloud Instances and pricing Amazon m3 large instances are used for carrying out the whole simulation. The price of on-demand VMs is kept at $0.196/h [6]. The price of reserved instances can be calculated on the basis of 1–3 yearly utilization of VMs and then using the data of
Resource Provisioning Aspects of Reserved and On-Demand VMs …
455
Variable Workload Fixed Workload 18
Requests / Minute
16 14 12 10 8 6 4 2 0 12:00 AM
4:40 AM
9:16 AM
1:56 PM
6:33 PM
11:13 PM
Fig. 2 Number of requests received by the service provisioner
Table 2 Amazon [3]. Let the VMs be purchased on a one-year heavy utilization basis and their yearly utilization be denoted by ‘U’. The price of reserved VM instance can be calculated as pricermips = priceod mips −
priceod mips ∗ U 100
(4)
where yearly utilization U denotes % of time an instance is running; e.g., if yearly utilization is more than 25%, then the VM instance runs for more than three months in the year. Table 2 shows that if the yearly utilization is 100%, reserved VM instances can save a maximum of 38% price. For the current work, VMs are assumed to be purchased under one year of heavy utilization or on-demand basis. Table 2 Savings comparison of 1 year reserved VMs over on-demand VMs abstracted from [3]
Yearly utilization rate (%) Savings by 1 year heavy utilization (%) 10
− 525
20
− 212
30
− 108
40
− 56
50
− 25
60
−4
70
11
80
22
90
31
100
38
456
Y. Kumar et al.
6 Results and Discussions Simulation results collected for the two workloads are shown in Figs. 3, 4, 5, 6, 7 and 8. (1) Fixed workload: The results in Fig. 3 depict the effect of the static reserved VM purchasing strategy on utilization and rejection rates for the complete simulation. Although using 25 VMs can lower down the rejection rate, it also reduces the VMs utilization. Figure 4 shows that a hybrid strategy can achieve a comparatively higher utilization rate with almost negligible rejection. The utilization rate by using 15 and 20 VMs as reserved VMs and the rest as on-demand VMs, is higher than by using flat 25 VMs as reserved VMs. Figure 5 shows the real significance of the hybrid purchase strategy. It shows that the use of 20 VMs as reserved VMs and the rest as on-demand VMs results in the least amount of payments, even in a homogeneous workload with almost negligible rejection rate. The price savings in this hybrid case is 9.16%. Utilization
Rejection Rate
100
Percent (%)
80 60 40 20 0 10
15
20
25
Fixed Reserved VMs Count
Fig. 3 Utilization and rejection rates with all reserved VMs for uniform distribution
Utilization
Rejection Rate
100
Percent (%)
80 60 40 20 0 10
15
20
25
Reserved VMs Count in Hybrid Purchase Policy
Fig. 4 Utilization and rejection rates with hybrid VMs purchase strategy for uniform distribution
Resource Provisioning Aspects of Reserved and On-Demand VMs …
On-Demand Vms Cost
457
Reserved Vms Cost
40000 35000
Cost ($)
30000 25000 20000 15000 10000 5000 0 10
15
20
25
Reserved VMs Count in Hybrid Purchase Policy
Fig. 5 Prices in hybrid VMs purchase strategy with uniform distribution Utilization
Rejection Rate
Percent (%)
120 100 80 60 40 20 0 20
30
40
50
60
70
Fixed Reserved VMs Count Fig. 6 Utilization and rejection with all reserved VMs for Weibull distribution
Percent (%)
VMs Utilization
Rejection Rate
90 80 70 60 50 40 30 20 10 0 10
20
30
40
50
60
Reserved VMs Count in Hybrid Purchase Policy Fig. 7 Utilization and rejection rates with hybrid VMs purchase strategy for Weibull distribution
458
Y. Kumar et al.
Cost ($)
On-Demand Vms Cost
Reserved Vms Cost
90000 80000 70000 60000 50000 40000 30000 20000 10000 0 10
20
30
40
50
60
70
Reserved VMs Count in Hybrid Purchase Policy Fig. 8 Prices in hybrid VMs purchase strategy for Weibull distribution
(2) Variable workload: Figs. 6, 7 and 8 show the impact of leasing when workloads are highly variable throughout the day. Figure 6 shows that the rejection rate can only be minimized by using 70 VMs as reserved VMs, but its corresponding utilization rate is also the lowest (less than 50%). Figure 7 shows that all hybrid strategies result in higher utilization and almost 2% rejection rates. Figure 8 shows that using 70 reserved VMs results in total over-provisioning. The purchasing costs in this strategy are highest with the least utilization. However, the purchasing costs are minimum when 20 reserved VMs are used in combination with on-demand VMs, and the maximum savings in this hybrid policy is 31.9%. So, the effects of hybrid purchase plans are more significant when the workload is highly variable. Alternate benefits of using a hybrid purchase plan It reduces the complexity of another higher level of problem, i.e., which VM instances need to be shut down in case of scaling down action. The scaling down action itself is very crucial as it contains another three sub-problems, i.e., either shutdown the recently activated VMs or shutdown the VMs which are nearing to complete their one hour tenure or use a combination of them, keeping in mind that if scaling up is required then a stopped VM would take extra time for restarting, also on-demand VMs are charged on an hourly basis. However, by using a combination of reserved and on-demand VMs, the scope of this decision problem reduces to only on-demand VMs, rather than on reserved instances. Similarly, it also reduces the problem space of the IaaS provider, as the IaaS provider can group reserved VMs on fixed servers and need to deal with the repacking of only on-demand VMs.
Resource Provisioning Aspects of Reserved and On-Demand VMs …
459
7 Open Challenges This section highlights the issues that further need to be resolved for determining the optimal combination of reserved and on-demand VMs. Determination of long-term demand: Since the application’s demand can change with the time of the day and year, long-term forecasting of the application’s demand is virtually impossible. VM Startup Overhead: Placing a new VM demand does not imply that a new VM would start immediately. VM’s start-up time can be significant, e.g., 150 s [8]. So, the QoS would degrade during the VM start-up time. Workload variation prediction: It is not easy to strictly predict the demand of arriving workloads over the day. Consequently, the prediction errors may lead to the under-provisioning of VMs and SLA violations. Newer on-demand pricing policies: In recent years, infrastructure providers have adopted different time quanta policies for on-demand VMs, e.g., per minute and per-hour. These circumstances would further aggravate the miseries of deciding the optimal purchasing plans.
8 Conclusion Cloud platforms are beneficial in many aspects but selecting an optimum purchase plan is vital. Auto-scaling techniques can minimize the prices for SaaS providers, but the prices can further be minimized by selecting an optimal hybrid purchase plan. However, the hybrid purchase plans may suffer from workload variations and QoS parameters like SLA negotiated waiting time. This paper chronologically investigated the implications of hybrid VM purchasing plans as well as purely on-demand VM purchasing plans. In addition, this paper also highlighted the associated challenges that need to be resolved for determining the optimal purchasing plans.
References 1. Mell P, Grance T The NIST definition of cloud computing (Draft). http://csrc.nist.gov/public ations/drafts/800-145/Draft-SP-800-145_cloud-definition.pdf 2. Armburst M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M (2009) Above the clouds: a berkeley view of cloud computing, EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28 3. Amazon Web Services-How AWS Pricing Works, July 2014 Available at. http://aws.amazon. com/whitepapers/ 4. Chaisiri S, Bu-Sung L, Niyato D (2012) Optimization of resource provisioning cost in cloud computing. IEEE Trans Serv Comput 5(2):164–177. https://doi.org/10.1109/TSC.2011.7 5. Kumar J, Rani A, Dhurandher SK (2020) Convergence of user and service side perspectives in mobile cloud computing environment: Taxonomy and challenges. Int J Commun Syst 33(18). https://doi.org/10.1002/dac.4636
460
Y. Kumar et al.
6. Amazon ec2. http://aws.amazon.com/ec2/ 7. Kumar J, Malik A, Dhurandher SK, Nicopolitidis P (2017) Demand-based computation offloading framework for mobile devices. IEEE Syst J 12(4):3693–3702. https://doi.org/10. 1109/JSYST.2017.2706178 8. Ferber M, Rauber T, Torres MHC, Holvoet T (2012) Resource allocation for cloud-assisted mobile applications. In: Proceedings of 5th IEEE conference on cloud computing, pp 400–407. https://doi.org/10.1109/CLOUD.2012.75 9. Calheiros RN, Ranjan R, Buyya R (2011) Virtual machine provisioning based on analytical and QoS in cloud computing environments. In: Proceedings of international conference on parallel processing, pp 295–304. https://doi.org/10.1109/ICPP.2011.17 10. Botran TL, Alonso JM, Lozano JA (2014) A review of auto-scaling techniques for elastic applications in cloud environments. J Grid Comput 12(4):2014. https://doi.org/10.1007/s10 723-014-9314-7,pp.559-592 11. Shi C, Habak K, Pandurangan P, Ammar M, Naik M, Zegura E (2014) COSMOS: computation offloading as a service for mobile devices. In: Proceedings of the 15th ACM international symposium on mobile ad-hoc networking and computing, pp 287–296. http://dx.doi.org/https:// doi.org/10.1145/2632951.2632958 12. Shen S, Deng K, Iosup A, Epema D (2013) Scheduling jobs in the cloud using on-demand and reserved instances, Euro-Par 2013. LNCS 8097:242–254 13. Hu M, Luo J, Bhardwaj V (2012) Optimal provisioning for scheduling divisible loads with reserved cloud resources. In: ICON, pp 204–209 14. Ambati P, Bashir N, Irwin D, Hajiesmaili M, Shenoy P (2020) Hedge your bets: optimizing longterm cloud costs by mixing VM purchasing options. In: 2020 IEEE international conference on cloud engineering (IC2E), Sydney, Australia, pp 105–115. https://doi.org/10.1109/IC2E48 712.2020.00018 15. Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. In: Software: Practice and Experience (SPE), vol 41(no 1), Wiley Press, New York, USA, pp 23–50, ISSN: 0038-0644
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks Santosh Kumar, Amol Vasudeva, and Manu Sood
Abstract Intelligent transportation systems (ITS) are a collection of cutting-edge technologies that are intended to offer a variety of road traffic management and safety services. The number of automobiles on the road and flying in three-dimensional (3D) space has increased dramatically, increasing the risk of accidents and other security issues. A new technology based on the network composed of flying drones, mobile vehicles, and ad hoc infrastructure known as UAV-assisted vehicular ad hoc network (VANET), a subtype of mobile ad hoc network (MANET), is designed to deal with security issues observed in VANETs. The applications or protocols designed for vehicular safety in such a network should be thoroughly tested before deployment. In many scenarios, simulations with the help of readily available tools play a significant role within the limited scope of testing in research at various levels. Therefore, such simulation tools and their appropriate selection for specific purposes are essential for validating any research concept before being implemented in various real-world network scenarios. This paper explores available open-source simulators that can be used to simulate UAV-assisted VANETs. Additionally, a comparison based on various features of simulators has also been presented aimed at helping in the selection of a particular tool for specific problems. Keywords Simulators · Network simulator · NS-2 · NS-3 · OMNet++ · SUMO · BonnMotion
S. Kumar (B) · M. Sood Department of Computer Science, Himachal Pradesh University, Shimla, India e-mail: [email protected] M. Sood e-mail: [email protected] A. Vasudeva Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Solan, Himachal Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_39
461
462
S. Kumar et al.
1 Introduction In recent years, wireless communication technologies have significantly impacted our daily lives. Wireless communication has made life easier for users worldwide, from indoor to cellular networks outside the home. The area of applications of vehicular ad hoc networks (VANETs) is already blooming, gaining momentum and recognition. Intelligent transport systems (ITS) and VANETs are developed to enable vehicle communication to reduce traffic congestion and ensure safety [1, 2]. A VANET establishes a mobile network for communication by using moving cars as wireless routers (nodes). The vehicles equipped with onboard wireless transceivers can participate in the ad hoc network and communicate data with neighboring nodes. These act as wireless communication nodes and can join the network without being screened or knowing each other’s presence. During communication with distant modes outside the sender’s communication range, data can be exchanged through road side units (RSUs) as one of the convenient communication modes [3]. For a safe and secure driving experience, lots of control messages must be exchanged among vehicles and static RSUs. These involve information exchanges such as traffic data, emergency message broadcasts, and road condition warnings. The data exchanged between wireless nodes can be altered by malicious attackers, resulting in traffic jams, network topology changes, or other serious lapses. Even though RSUs provide optimum network performance, they still have limited scope owing to their deployment and maintenance costs [4]. These issues can be countered by using unmanned aerial vehicles (UAVs) as mobile infrastructure nodes that can be integrated with an already established VANET to cooperate with vehicles on the ground [5]. The mobility of UAVs is generally predefined before deployment. Hence, they can enhance VANET security by connecting the RSUs and trusted authorities. Before implementing various protocols or security mechanisms in such UAVassisted VANETs, evaluating their applicability and effectiveness is of utmost criticality. The most straightforward method for this evaluation is simulation. Network simulators are essential as they can be employed to identify intrusions and develop overlapping security frameworks virtually to analyze overall network security. Simulators are quick and affordable compared to the money and time required to set up a comprehensive testbed to evaluate a complete network. In this context, many researchers and practitioners have analyzed various simulation tools to evaluate network performance and security. The authors [6] proposed an approach to select simulation tools. They presented a comprehensive study of network simulators, including OPNET Modeler, QualNet, and NS-2. It was concluded that OPNET Modeler is the best choice owing to its distinct features, such as the best model library, a structured modeling approach that allows users to develop complex models in a limited amount of time, modeling of network-related objects based on the most current standards, and a user-friendly interface. Furthermore, the authors [7] studied different simulators and proposed criteria to select network simulation
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
463
tools. In [8], the authors were interested in helping users who want to select a simulation tool for wireless networks. They focused their work on simulators with a free license and examined three well-established tools (J-Sim, OMNeT ++ , NS-2). The authors [9] presented a comparative study of OPNET Modeler and NS-2 to simulate a routing-based network. In Ref. [10], the authors presented a critical overview of simulation tools suitable for research in network technologies and communication protocols. They highlighted some criteria, such as general requirements, modules, statistical capabilities, end reports, and support for end-users. Additionally, they proposed a network model by integrating OMNet ++ and SUMO. The extensive review of available literature found that most of the work focused only on the study of simulators for mobile ad hoc networks (MANETs), VANETs, and sensor-based networks. However, very few studies have discussed the simulation tools for UAV-assisted VANETs. So, this paper aims to study various open-source simulators that can be employed to simulate UAV-assisted VANETs. The essential criteria and features for evaluating simulators are also discussed based on which various simulators are compared. This paper can be used as a reference for those interested in simulating their research proposal in UAV-assisted VANETs. In the remainder of the paper, Sect. 2 presents an overview of UAV-assisted VANETs; Sect. 3 describes the simulator selection criteria. In Sect. 4, a brief introduction to open-source simulators is provided, and in Sect. 5, various mobility generator tools are discussed and compared. Section 6 focuses on the study and comparison of various network simulators. Finally, Sect. 7 concludes the article along with the future work.
2 UAV-Assisted VANETs In VANETs, owing to ubiquitous hindrances like tunnels, buildings, and other obstacles, vehicle-to-vehicle (V2V) and vehicle-to-RSU (V2RSU) types of communications suffer from severe packet losses and shadowing effects [11]. Additionally, deploying RSUs at a regular distance imposes a considerable capital cost, resulting in security issues and degraded network performance. These issues can be resolved by deploying a group of UAVs to assist pre-deployed VANET [12]. UAV-assisted VANETs follow a two-tier network architecture which consists of a network of UAVs flying or hovering in the air and pre-deployed VANETs in the ground layer [13]. UAVs monitor ground vehicle networks for abnormal activities like detection and prevention of security breaches, traffic jams, and critical time-sensitive requirements. Their movements are controlled using a predefined mobility model [14]. As depicted in Fig. 1, the VANET part is the network’s active layer, consisting of vehicles with onboard units (OBUs), RSUs, and cellular networks. Dedicated shortrange communication (DSRC) radio waves are used to exchange information among nodes in the network. There are four types of communications involved in the VANET layer: (1) V2V, (2) Vehicle to RSU (V2RSU), (3) RSU to RSU (RSU2RSU), and (4) Cellular base station (CBS) to RSU (CBS2RSU) [15]. The UAV layer is augmented
464
S. Kumar et al.
Fig. 1 Architecture of UAV-assisted VANET
to monitor the nodes and communication in VANET. The flying nodes are equipped with directional antennas; if necessary, they can communicate through a flying ad hoc network (FANET). Additionally, they can communicate with vehicles, RSUs, and CBS.
3 Criteria for Simulators The simulators to be discussed in upcoming sections will be compared based on the following parameters.
3.1 License A simulator can be classified as either commercial or licensed under one of the open-source or general public agreements.
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
465
3.2 Timing Based on the chronological order of events that influence the system’s state, simulators may be either discrete or continuous events. In the case of discrete event simulation, events are scheduled in a queue and processed one by one, whereas simulators that simulate a set of equations representing a system along time zones are known as continuous simulators [16].
3.3 User Interface Based on the user’s interaction, simulators may have: (1) Graphical user interface (GUI) (2) Command line interface (CLI).
3.4 Scalability Network simulations are typically designed for a large number of nodes; hence the simulator tool must support scalability.
3.5 Statistics Another essential feature of a simulator is its output. In order to perform statistical analysis on the generated data and create graphs, the findings must be expressive and easy to edit. The simulator’s initial state should be preserved so that the simulation may be repeated to verify the consistency of the results [17].
3.6 Supported Platforms It validates the simulator’s source code’s usability across many platforms and operating systems.
466
S. Kumar et al.
3.7 Support for 2D and 3D Mobility Models The simulator tools should be able to accurately model various two-dimensional (2D) and three-dimensional (3D) mobility models like Gauss Markov, random waypoint, purse mobility, and reference point group mobility.
3.8 Performance The main goal of the performance analysis is to get a basic sense of how successful the simulator is in terms of implementation time and resources utilized. The performance can be evaluated in terms of CPU utilization, execution time, and memory usage.
3.9 Support for Network Protocols and Latest Technology The simulator software should be able to simulate various media access layer (MAC) and network-layer protocols. Additionally, support for all recent communication technologies like 4G, LTE, and 5G must be incorporated [10].
4 Open-Source Simulation Software Simulation of a UAV-assisted VANET is different from that of MANET or VANET because an additional layer of flying nodes imposes new restrictions and requirements, such as a 3D mobility model to cover the entire road topology and energy constraint nature, and limited processing capabilities of UAVs. Additionally, in VANETs, issues like roadside obstacles, multi-path fading, trip models, varying vehicular speed, traffic flow models, realistic mobility, traffic congestion, traffic lights, and drivers’ behaviour in real-world scenarios must be considered. For the simulation of a VANET currently, mobility generator tools are used to generate a realistic nodes mobility trace; these output traces are given to network simulators for further processing. Finally, based on the output provided by the network simulator, the VANET simulator controls vehicular movements. Similarly, the following tools are selected for the simulation of UAV-assisted VANET: (a) A traffic simulation tool for generating realistic mobility scenarios for flying and VANET nodes. (b) A network simulation tool to build topologies between the nodes at both layers. In the upcoming sections, various potential mobility and network simulators with open access are discussed in detail.
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
467
5 Mobility Simulators To obtain realistic results from UAV-assisted VANET simulation, it is vital to use a realistic mobility model. The mobility simulators are required to improve the degree of realism in simulations. They provide realistic traffic traces of nodes’ mobility to be utilized by the network simulator [18]. The mobility generator inputs consist of scenario parameters, a road model, and a path for flying vehicles. The output traces have details like the position of each vehicle, their profiles on mobility, and simulation time at each instant. The potential open-source mobility simulators for 2D and 3D mobility simulations are as follows. The comparison of these tools has been summarized in Table 1.
5.1 SUMO Simulation of urban mobility (SUMO) is an open-source 2D mobility generator tool. It has been available since 2001 and allows users to simulate multimodal traffic networks with vehicles, public transportation, and pedestrians. SUMO comes with several supporting tools, including route calculations, network import, visualization, emission calculation, execution, and evaluation of traffic simulations. SUMO may be extended using custom models, and the simulation can be managed remotely via many application programming interfaces (APIs). SUMO-generated traffic files can be used for the VANET part in UAV-assisted VANETs [19]. Table 1 Available open-source mobility generators Mobility simulator
Features Open 3D Realistic Graphical Ease of source visualization map Interface use
Trace Applicability for NS-2
SUMO
Yes
No
Yes
Yes
Hard
Yes
VANET
MOVE
Yes
No
Yes
Yes
Moderate Yes
VANET
FreeSim
Yes
No
Yes
Yes
Easy
No
VANET
CityMob
Yes
No
No
Yes
Easy
Yes
VANET
STRAW
Yes
No
Yes
Yes
Moderate No
VANET
VanetMobiSim Yes
No
Yes
Yes
Moderate Yes
VANET
BonnMotion
Yes
Yes
Yes
Easy
VANET, FANET
Yes
Yes
468
S. Kumar et al.
5.2 MOVE Mobility model generator for vehicular networks (MOVE) is also a 2D traffic simulator developed on top of SUMO which quickly generates real-world mobility traces for VANET simulations. MOVE generates a mobility trace file including information on realistic vehicle movements that may be utilized right away by network simulators like NS-2 or GloMoSim [20].
5.3 FreeSim FreeSim is an entirely adaptable microscopic, macroscopic, and free-flow 2D traffic mobility generator that provides the representation and loading of numerous highway systems as a network graph with edge weights dictated by current node speeds. The traffic data utilized by the network simulator may be user-generated or translated from real-time data obtained by a transportation system [10].
5.4 CityMob CityMob is a 2D open-source traffic simulator. CityMob can generate different mobility models, such as manhattan model (MM), downtown model (DM), and simple model (SM). The DM can simulate multiple lanes in both directions for every street, having multiple downtowns and vehicle queues due to traffic jams. The output mobility traces generated by City Mob for VANET are compatible with the NS-2 simulator [21].
5.5 STRAW Street random waypoint (STRAW) is a 2D vehicular traffic generation tool used to simulate vehicular traffic in actual US cities built as a part of the C3 (car-to-car cooperation) project. STRAW is only compatible with Java in simulation time/scalable wireless ad hoc network simulator (JiST/SWANS), a discrete event simulator. The mobility traces generated by STRAW cannot be directly submitted to other network simulators like NS-2 [22].
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
469
5.6 VanetMobiSim VanetMobiSim is a 2D traffic generator that can simulate realistic vehicular motion at microscopic and macroscopic levels. At the macroscopic level, maps from the US Census Bureau’s database known as “Topologically integrated geographic encoding and referencing” (TIGER) can be imported or arbitrarily generated using Voronoi tessellation. The mobility traces generated from TIGER database constitute a digital database of geographic features, like railroads, roads, lakes, rivers, and legal boundaries, covering the entire USA [23].
5.7 BonnMotion BonnMotion is a Java-based mobility generator tool to create and analyze multiple mobility scenarios to simulate numerous 2D and 3D mobility models like chain model, column mobility model, Gauss Markov model, nomadic community mobility model, random waypoint (RW), purse mobility model, and reference point group mobility model (RPGM). Some models (e.g., RPGM and RW) support 3D movements. During the generation of 3D movement, the negative z option controls the simulation area’s depth (in meters). The scenarios can also be exported for several network simulators [24].
6 Network Simulators Network simulation tools are designed to simulate the data communication inside a network. The majority of network simulators provide discrete event simulations. Preferably, all components participating in-network communication system must be simulated, and the outcome must include information for calculating critical network level and performance metrics like link status, device status, packet delivery rate, packet drop rates, and signal-to-noise ratio (SNR). Additionally, trace and log files must be generated to record the event timeline; these files may be processed for further investigation [25]. Network simulators can be categorized as (1) open-source and (2) commercial simulators. Open-source network simulators are free to academics and researchers, with no fees for using the simulator software, whereas commercial simulators can be used with a genuine license. Commercial simulators have benefits, like continuous dedicated professional support and proper documentation. Open-source network simulators offer advantages as they are more adaptable to represent recent technologies and topologies than licensed simulators but lack documentation and user support [10].
470
S. Kumar et al.
Numerous network simulators are utilized to simulate ad hoc networks. Some open-source network simulators which can be employed for the simulation of UAVassisted VANETs are explored in this section. A comparative study of the same has been depicted in Table 2.
6.1 NS-2 Network simulator (NS) version 2 is a discrete simulation of events developed by a study group at the University of California, Berkeley, for a project known as “Virtual inter network testbed” (VINT). At Carnegie Mellon Monarch Research Group University, the simulator has been expanded to include: (1) node mobility, (2) a device model for radio propagation, (3) communication with the radio network, and (4) Support for media access control (MAC) layer protocols. NS-2 uses C++ to specify the internal structure of the simulation, and OTCL is used to control the external simulation environment during the configuration of the objects. In terms of software installation, NS-2 has several constraints while operating on Windows [26].
6.2 NS-3 NS-3 is designed to simulate internet protocol (IP) and non-IP-based systems that use discrete events. It is primarily intended for research and instructional reasons. NS-3 is open-source and highly modular. It uses C ++ and Python programming languages; networks may be constructed in pure C++, and some simulation portions can optionally be done in Python. Like NS-2, these simulators do not use the OTCL APIs. NS-3 is highly scalable and supports wired and wireless technologies. Additionally, networks with a 3D mobility model can be easily simulated [27].
6.3 OMNeT++ OMNeT++ is an open-source network simulator developed by Andras Varga of the Budapest University of Technology. It uses a C++ simulation toolkit and framework that is flexible and robust. It is a discrete event simulation environment that represents significant communication networks. Because of its general and scalable design, it is widely employed for the simulation of various other applications like multiprocessor modeling, complex IT model, and queuing systems. This simulator bridges the gap between the open-source NS-2 and the commercial simulator optimum network performance (OPNET) by providing a free tool with various features, including modular, extendable, and component-based design. The components and modules
Discrete event
Discrete event
Discrete event
Discrete event
Kernel reentering method
Hybrid Simulator
Discrete event
Diverse numerical method
NS-3
OMNeT++
GlomoSim
NTCUns
GrooveNet
TraNs
J-sim
Timing
Features
NS-2
Network simulator
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Open source
Table 2 Available network simulators
Linux, windows
Operating system
Java
C++ , Java
C++
C++
C
C++
Windows, Linux
Linux, Windows
Linux, Ubuntu, SUSE
FreeBSD, Fedora, Red hat, Ubuntu, Debian
Linux, Windows, Sun
Linux, windows, MAC
C++ , Python Linux
C++ , OTCL
Languages supported
Yes
Yes
Yes
Yes
No
Yes
Yes
Limited
Graphical interface
Moderate
Large
Large
Moderate
Large
Moderate
Limited
Limited
Scalability
2D
2D
2D
2D
2D
3D
3D
2D
Visualization
WSN and VANET
VANET
Realistic network and traffic simulation
Wired, Wireless, Ad-hoc, and WSN
MANET, VANET
MANET, VANET, FANET, NFV, SDN, WSN, Body Area Networks
MANET, VANET, FANET, WSN, and IoT
MANET, VANET, FANET, WSN, IP-based Networks
Applicability
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks 471
472
S. Kumar et al.
of OMNeT++ are developed in C ++ utilizing the simulation kernel’s class library [28].
6.4 GlomoSim Global mobile information system simulator (GlomoSim) is an open-source network simulator frequently utilized after NS-2. It was created in California, USA, with the primary goal of simulating wireless networks. GlomoSim was developed using Parsec; therefore, all-new upgrades must also be defined in Parsec. GlomoSim had no support for a graphical interface; later, it was augmented with a Java-based interface. GlomoSim can work on a shared-memory symmetric processor (SMP): a memory accessible to all programs simultaneously and assists in program splitting. GlomoSim adheres to the OSI layer paradigm and offers a variety of protocols and templates for each tier. GlomoSim includes two beams as well as free-space radio propagation simulations. It was designed to accommodate millions of nodes, just as a single simulation is credited to the parallelism approach. After GloMoSim 2.0 in 2000, PARSEC discontinued working on freeware software; hence QualNet, a commercial version of GloMoSim, was released [10].
6.5 NCTUns National Chiao Tung University network simulator (NCTUns) is an advanced tool that is a hybrid of both simulator and emulator applicable for both wired and wireless networks [29]. The NCTUns was developed to overcome the drawbacks of the Harvard network simulator [30].
6.6 GrooveNet GrooveNet is a hybrid open-source network traffic simulation tool developed for geographic routing in a realistic network scenario. It is intended to be an opportunistic broadcast protocol with minimum handshaking between transmitting and receiving vehicles. It enables communication between simulated and real vehicles [25].
6.7 TraNS Traffic and network simulation environment (TraNS) [31] is a hybrid GUI-based simulation tool that combines a mobility generator with a network simulator (SUMO
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
473
and NS-2) to create realistic VANET simulations. This design enables users to create mobility traces before running network simulations. TraNs is written in C++ and Java and can run on both Linux and Windows systems. It supports two modes of operation: network-centric and application-centric [10].
6.8 J-Sim J-Sim is a fully Java-based open-source simulation platform [32]. J-Sim has two types of mobility models: random waypoint and trajectory-based. J-Sim is an excellent alternative to NS-2 because it is comparatively easier to use. However, it has not been updated since 2006 [33]. While selecting a mobility generator tool and network simulator for UAV-assisted VANETs, we must choose tools that can simulate a realistic mobility model for both VANETs and UAVs. The following factors must be considered during tool selection for accurate and realistic vehicular movements for 2D VANETs and 3D networks of UAVs. Considering node density, variable speed of vehicles, obstacles in mobility traffic conditions, weather conditions like weekends, and rush hours, play an essential role. Additionally, apart from the static hindrances, drivers face many other complex hindrances like neighboring vehicles and pedestrians. As a result, the mobility model must be capable of managing all of the network’s external factors. Table 3 presents a snapshot of the simulation tools used for respective applications found in the literature. Based on multiple features, the comparisons of various mobility generator tools and network simulators are presented in Tables 1 and 2, respectively. Furthermore, some significant instances of the simulation of UAV-assisted VANETs found in the literature are presented in Table 3. So, it has been observed that in UAV-assisted VANETs, SUMO and NS-2 are widely utilized for mobility generation and network simulation, respectively.
7 Conclusion and Future Work Owing to fixed roadside infrastructure and obstacles, VANETs suffer from numerous security issues. These issues can be detected and prevented by assisting a predeployed VANET with UAVs. Such systems can improve overall performance and provide security against various network attacks. The research proposals for such networks must be tested using simulation tools while designing communication or security protocols for UAV-assisted VANETs. Software-based simulations provide an alternative method of obtaining the desired outcomes. Many open-source and commercial simulators exist for VANETs and other ad hoc networks in literature and
474
S. Kumar et al.
Table 3 Applications of simulators Literature
Mobility simulator
Network simulator
Application(s)
[34]
BonnMotion
NS-2
ITS for smart cities
[13]
SUMO
NS-2
Collaborative-based drone-assisted VANET networking model
[35]
SUMO
NS-2
Simulated a reactive routing protocol for UAV-assisted VANET
[12]
SUMO
NS-2
Urban road safety using UAV-assisted VANET
[36]
NA
Opportunistic Network environment
Relay selection protocol
[37]
SUMO (Manhattan area)
NS-3
A lightweight and efficient framework for UAV-assisted VANET
[5]
SUMO
NS-2
Detection of selfish and malicious nodes
[38]
VanetMobiSim and MobiSim
NS-2
UAVR routing protocol for UAV-assisted VANET
practice. Commercially licensed simulators are not considered in this study because of their copyright nature and resistance to changing their source code. For the selection of simulators for UAV-assisted VANETs, some of the potential open-source 2D and 3D mobility generator tools and network simulators having support for the simulation of UAV networks are studied and compared based on multiple criteria. After a detailed study, it has been concluded that SUMO is widely employed for 2D realistic mobility scenario generation, and BonnMotion is the only open-source tool that can generate mobility traces based on a 3D mobility model, whereas NS-2 is mainly utilized network simulator, NS-3, and OMNet ++ are the most promising alternatives. This paper’s purpose was to understand better the challenges faced during the selection of simulators while carrying out simulations of research proposals for UAV-assisted VANETs. In the future, we will extend the work on VANET security through UAV assistance. The tools selected based on this study shall be used to evaluate the efficacy and trustworthiness of those proposed countermeasures.
References 1. Gillani M, Niaz HA, Farooq MU, Ullah A (2022) Data collection protocols for VANETs: a survey. Complex Intell Syst 1–30 2. Pande SD, Bhagat VB (2016) Hybrid wireless network approach for QoS. Int J Recent Innov Trends Comput Commun 4:327–332
Open-Source Simulators for Drone-Assisted Vehicular Ad Hoc Networks
475
3. Zhang S, Lagutkina M, Akpinar KO, Akpinar M (2021) Improving performance and data transmission security in VANETs. Comput Commun 180:126–133 4. Jang H-C, Li B-Y (2021) VANET-enabled safety and comfort-oriented car-following system. In: 2021 International conference on information and communication technology convergence (ICTC), IEEE, pp 877–881 5. Kerrache CA, Lakas A, Lagraa N, Barka E (2018) UAV-assisted technique for the detection of malicious and selfish nodes in VANETs. Veh Commun 11:1–11 6. Gamess E, Veracoechea C (2010) A comparative analysis of network simulation tools. In: MSV, pp 84–90 7. Hentati AI, Krichen L, Fourati M, Fourati LC (2018) Simulation tools, environments and frameworks for UAV systems performance analysis. In: 2018 14th International wireless communications and mobile computing conference (IWCMC), IEEE, pp 1495–1500 8. Bakare BI, Enoch JD (2019) A review of simulation techniques for some wireless communication system. Int J Electron Commun Comput Eng 10:60–70 9. Alameri IA, Komarkova J (2020) A multi-parameter comparative study of manet routing protocols. In: 2020 15th Iberian conference on information systems and technologies (CISTI), IEEE, pp 1–6 10. Aljabry IA, Al-Suhail GA (2021) A survey on network simulators for vehicular Ad-hoc networks (VANETS). Int J Comput Appl 174:1–9 11. Zeng F, Zhang R, Cheng X, Yang L (2018) UAV-assisted data dissemination scheduling in VANETs. In: 2018 IEEE International conference on communications (ICC), IEEE, pp 1–6 12. Jobaer S, Zhang Y, Iqbal Hussain MA, Ahmed F (2020) UAV-assisted hybrid scheme for urban road safety based on VANETs. Electronics 9:1499 13. Lin N, Fu L, Zhao L, Min G, Al-Dubai A, Gacanin H (2020) A novel multimodal collaborative drone-assisted VANET networking model. IEEE Trans Wireless Commun 19:4919–4933 14. Oubbati OS, Chaib N, Lakas A, Lorenz P, Rachedi A (2019) UAV-assisted supporting services connectivity in urban VANETs. IEEE Trans Veh Technol 68:3944–3951 15. Choi J, Marojevic V, Dietrich CB (2018) Measurements and analysis of DSRC for V2T safetycritical communications. In: 2018 IEEE 88th Vehicular technology conference (VTC-Fall), IEEE, pp 1–5 16. Salem AOA, Awwad H (2014) Mobile ad-hoc network simulators, a survey and comparisons. Int J P2P Netw Trends Technol (IJPTT) 9:12–17 17. Bakni M, Cardinale Y, Moreno L (2018) An approach to evaluate network simulators: an experience with packet tracer. Rev Venezolana de Comput 5:29–36 18. Vahdatikhaki F, El Ammari K, Langroodi AK, Miller S, Hammad A, Doree A (2019) Beyond data visualization: a context-realistic construction equipment training simulators. Autom Constr 106:102853 19. Oliveira A, Vazão T (2021) Generating synthetic datasets for mobile wireless networks with sumo, In: Proceedings of the 19th ACM international symposium on mobility management and wireless access, pp 33–42 20. Aljeri N, Boukerche A (2022) Smart and green mobility management for 5G-enabled vehicular networks. Trans Emerg Telecommun Technol 33:e4054 21. Boucetta SI, Guichi Y, Johanyák ZC (2021) Review of mobility scenarios generators for vehicular ad-hoc networks simulators. In: Journal of physics: conference series, IOP Publishing, p 012006 22. Kezia M, Anusuya KV (2022) Mobility models for internet of vehicles: a survey. Wirel Pers Commun 1–25 23. Puttagunta H (2021) Simulators in vehicular networks research and performance evaluation of 802.11 p in vehicular networks (PhD Thesis), University of Cincinnati 24. Pandey MR, Mishra RK, Shukla AK (2022) An improved node mobility patten in wireless ad hoc network. In: Applied information processing systems, Springer, pp 361–370 25. Walia AK, Chhabra A, Sharma D (2022) Comparative analysis of contemporary network simulators. In: Innovative data communication technologies and application, Springer, pp 369–383
476
S. Kumar et al.
26. Temurnikar A, Verma P, Choudhary JT (2021) Design and simulation: a multi-hop clustering approach of VANET using SUMO and NS2. In: Soft computing for problem solving, Springer, pp 555–569 27. Sokolova O, Rudometov S (2021) Simulation of data transmission among moving nodes. In: 2021 17th International Asian school-seminar optimization problems of complex systems (OPCS), IEEE, pp 117–120 28. Lakhwani K, Singh T, Aruna O (2022) Multi-layer UAV ad hoc network architecture, protocol and simulation. Artif Intell Tech Wirel Commun Netw 193–209 29. Babu S, Raj Kumar PA (2022) A comprehensive survey on simulators, emulators, and testbeds for VANETs. Int J Commun Syst e5123 30. Wang S-Y, Chou CL, Huang CH, Hwang CC, Yang ZM, Chiou CC, Lin CC (2003) The design and implementation of the NCTUns 1.0 network simulator. Comput Netw 42:175–197 31. Piorkowski M, Raya M, Lugo AL, Papadimitratos P, Grossglauser M, Hubaux J-P (2008) TraNS: realistic joint traffic and network simulator for VANETs. ACM SIGMOBILE Mob Comput Commun Rev 12:31–33 32. Sobeih A, Chen W-P, Hou JC, Kung L-C, Li N, Lim H, Tyan H-Y, Zhang H (2005) J-sim: a simulation environment for wireless sensor networks In: 38th Annual simulation symposium, IEEE, pp 175–187 33. Tarapiah S, Aziz K, Atalla S (2017) Analysis the performance of vehicles ad hoc network. Proc Comput Sci 124:682–690 34. Raza A, Bukhari SHR, Aadil F, Iqbal Z (2021) An UAV-assisted VANET architecture for intelligent transportation system in smart cities. Int J Distrib Sens Netw 17:15501477211031750 35. Sami Oubbati O, Chaib N, Lakas A, Bitam S, Lorenz P (2020) U2RV: UAV-assisted reactive routing protocol for VANETs. Int J Commun Syst 33:e4104 36. He Y, Zhai D, Wang D, Tang X, Zhang R (2020) A relay selection protocol for UAV-assisted VANETs. Appl Sci 10:8762 37. Sedjelmaci H, Messous MA, Senouci SM, Brahmi IH (2019) Toward a lightweight and efficient UAV-aided VANET. Trans Emerg Telecommun Technol 30:e3520 38. Oubbati OS, Lakas A, Lagraa N, Yagoubi MB (2016) UVAR: an intersection UAV-assisted VANET routing protocol. In: 2016 IEEE wireless communications and networking conference, IEEE, pp 1–6
A Comparative Analysis of Various Methods for Attendance Framework Based on Real-Time Face Recognition Technology A. M. Jothi, Sandeep Kumar Satapathy, and Shruti Mishra
Abstract Face recognition technology plays a major role in identification of a person and a subset of machine vision. Managing the attendance manually becomes a timeconsuming and challenging operation in our day-to-day life. Real-time face recognition can be used to build the automatic attendance system which can be utilized by the schools, offices, and colleges to mark the attendance without any human intervention. This paper provides a literature survey of papers associated with face recognition-based attendance technology. In this study, the major contribution of our work lies in the comparative learning between machine learning and deep learning approaches implemented to recognize the human faces and give the attendance of candidates or employees based upon it, and it also includes the in-depth analysis of different face recognition techniques, related discussions, and hints for future work. Keywords Face recognition · Machine learning · Local binary pattern histogram · Deep learning · VGG19 · FaceNet · Attendance
1 Introduction Attendance is one of the significant component in any organization based on which the performance of an individual is followed. Traditional way of giving attendance is a complicated task, and there may be chances of making errors while marking attendance manually. Hence to automate this type of attendance framework, face recognition automation can be used. It is the mix of machine vision and artificial intelligence. As a result of its tremendous testing advancement and wide application A. M. Jothi · S. K. Satapathy (B) · S. Mishra School of Computer Science and Engineering, Vellore Institute of Technology University, Chennai, India e-mail: [email protected] A. M. Jothi e-mail: [email protected] S. Mishra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_40
477
478
A. M. Jothi et al.
possibilities, it has turned out to be the most challenging technical topic among the researchers. The main goal of this paper is to perform a comparative learning between machine learning and deep learning techniques so as to mark the attendance automatically. Face recognition using local binary pattern histogram (LBPH), VGG19 transfer learning model, and deep neural network FaceNet model was compared and evaluated using the own dataset created. The face detection process for all the models is carried out using Haar cascade classifier. In testing phase, the input test image is captured from the live Web camera for all the frameworks. Finally, attendance for all the candidates will be given based on the recorded video. This paper is separated into various divisions as follows: Related works of face recognition are explained in Sect. 2. A brief description of comparison of various approaches for attendance framework based on real-time face recognition technology is included along with its system architecture and process flow in Sect. 3. We have described about the experiments performed, performance analysis, results obtained, and discussions toward the results in Sect. 4. The summary of the key aspects of the paper provides conclusion along with hints for the upcoming work of the study which has been included in Sect. 5.
2 Related Work In this, we would like to discuss about the previous work done by the earlier researchers in order to detect and recognize the faces, so that we may come to know about their proposed methodologies and also the results observed. Yang and Han [1] proposed the attendance framework with real-time video proceeding and face recognition time. The system chooses Gabor features along with Fisher-based discriminant analysis strategy formed on equilateral basis to become a linear discrimination technique. Experimental results showed that this video face recognition framework has attained an accuracy of up to 82%. AlMaadeed, Mahmood, and Uzair [2] proposed a novel multi-order statistical signifiers and to reduce the dimensionality of the signifiers, and methods like MLDA and KLDA are performed with the polynomial kernels. Probes six benchmark datasets approve that the proposed technique accomplishes altogether better characterization exactness with minor computational intricacy than the previous procedures. Mahmood and Abuznied [3] proposed an improved system for human face recognition utilizing a BPNN neural network, LBPH signifier, and multi-KNN. Awais et al. [4] proposed a video surveillance system which utilizes histogram of oriented gradients (HOG) features and feedforward backpropagation neural network classifier. The main piece of this framework comprises of face localization, detection, and recognition. Zhao et al. [5] used the OpenCV face Haar-like features to recognize face part; for face features extraction, the principal component analysis (PCA) was utilized, and Euclidean distance was also included. The experimental outcomes indicated that
A Comparative Analysis of Various Methods for Attendance Framework …
479
the framework has steady activity, and high recognition frequency can be utilized in movable and mobile ID and validation. Vázquez et al. [6] used the neural network modular framework, and the best identification was acquired with the highest accuracy of 98.2667%. Bah and Ming [7] proposed a new strategy utilizing local binary pattern (LBP) method [8] along with innovative image progressing techniques like contrast alteration, bilateral filter, histogram coordination, and image merging. Colmenarez and Huang [9] described a visual learning method for face detection and exact facial outline tracking/detection. A 2D- template matching is the fast technique for face recognition. Huang and Hu [10] proposed a novel face detection method formed on facial features and Hough transform. In the first place, it utilizes geometry similarity and computes the mouth area. Then, at that point, as indicated by the triangle connection among mouth and eyes, it make a check of face revolution. At last, it understands the human face detection precisely. Mei et al. [11] proposed a novel dimensionality reduction technique–multidimensional orthogonal subspace projection (MDSP), and the technique utilizes a fresh projection technique tensor to vector projection (TVP). Yuan [12] proposed a visual attention structure guidance module that utilizes the visual attention machine to direct the framework feature the clear region of the blocked face; the identification issue of face is streamlined into the significant level linguistic element recognition issue through the upgraded analytical network, and the position and proportion of the face are anticipated by the activation map to stay away from extra parameter settings. Xu and Wang [13] proposed an attendance detection framework that involves face recognition automation and network remote technology, that is, fed to the smart class management framework to mark candidates attendance. The framework involves AdaBoost algorithm based on Haar features and LBP features. In [14, 15], the authors utilized three pre-trained convolutional neural networks like SqueezeNet, GoogleNet, and AlexNet. The authors proposed a system which involves CNN, SVM, KNN, Gabor filters, and generative adversarial networks in [16].
3 Comparison of Approaches for Attendance Framework Based on Real-Time Face Recognition Technology In our work, we perform a comparative learning between machine learning and deep learning algorithms to recognize the human faces in real time and mark the attendance automatically. A. Face Detection Using Haar Cascade Classifiers and Face Recognition Using Local Binary Pattern Histogram This machine learning approach applies Haar cascade classifier for face detection combined with local binary pattern histogram (LBPH) technique for face recognition. Faces will be recognized utilizing a live stream video, and it will matched with faces
480
A. M. Jothi et al.
Fig. 1 Process flow diagram
available in the dataset folder. The attendance will be updated in the attendance_files folder when faces got matched. Figure 1 shows the process flow diagram of this approach. Dataset Generation: We generated our own dataset which included 50 images of each for 10 individuals. Pictures of individuals are caught utilizing a Web camera. Various pictures of an individual will be captured with different signs and positions. These pictures go through pre-processing. The face pictures are trimmed to acquire the region of interest (ROI). The upcoming stage is to rescale the trimmed pictures to specific pixel area. Then, these pictures will be changed over from RGB to gray scale. As a next step, these images will be stored in the dataset folder as shown in Fig. 2. Face Detection Process: The process is executed utilizing Haar cascade classifier along with Python language and OpenCV. Haar cascade method has to be trained first, and then, it would be utilized for face detection. This process is done to extract the face features. The training data of Haar cascade utilized here is haarcascade_frontalface_default XML file. The Haar features applied for feature extraction are shown in Fig. 3 [14]. Here, we have applied detectMultiScale function which creates a rectangle on all sides of the face in a picture. The specifications applied in this process are minNeighbors and scaleFactor with the values 5 and 1.3 accordingly.
A Comparative Analysis of Various Methods for Attendance Framework …
Fig. 2 Dataset generated in the dataset folder
481
482
A. M. Jothi et al.
Fig. 3 Haar features for feature extraction
Face Recognition Process: This cycle is separated into several stages—train_set data preparation, train our dataset with face recognizer, prediction of input test image. The training information is the pictures available in the collection_dataset folder. They will be allotted with a number tag of the individual it is associated to. These pictures are later utilized for recognition process. Recognizer utilized in this framework is local binary pattern histogram. At first, the record of local binary patterns of whole face is gained. These LBPs are changed over into decimal digit, and afterward, histograms of every one of those decimal values are built. Toward the conclusion, a histogram will be framed for each pictures in the training set. Then in the time of face recognition, histogram of the face to be matched is determined and later contrasted with the previously figured histograms and restores the most paired tag related with the individual it is associated to. Attendance Marking: Following face recognition cycle, attendance of the particular candidate will be saved in attendance_files folder as an excel sheet with columns like name and present status along with the current date on the top of it. In the attendance sheet, the matched faces will be updated as Yes, and the absentees will be marked as No under present column. B. VGG19 Transfer Learning Model The attendance face recognition framework starts with the training dataset phase. Before the dataset is handled utilizing transfer learning, the images from the dataset
A Comparative Analysis of Various Methods for Attendance Framework …
483
are resized to change the size of the transfer learning model that has earlier been trained utilizing ImageNet information. The dataset is trained utilizing transfer learning VGG19 model only after data augmentation process. The outcome is an h5 model, which will be assessed to observe the best model in view of loss and accuracy values. In the testing stage, input test image is captured and resized. If the predicted value is greater than 0.5, the face image is matched, the recognized face with their name gets displayed on the video frame, and the attendance is given based on the recorded video. In case, if it is lower than 0.5, the image is unmatched, and no face found will be displayed on the video frame. Figure 4 shows the system flow diagram of this approach. Dataset Description: We generated our own dataset which included 200 images of each individual which was saved as train folder as well as 50 images were saved as test folder, all together were stored in the DatasetCollection folder. Pictures of individuals are caught utilizing a Web camera. For each individual, 80% of the data
Fig. 4 System flow diagram
484
A. M. Jothi et al.
are used as train_set, and the rest 20% were used as test_set. Prior to transfer learning process, images are resized to 224 × 224 pixels. Data Augmentation: ImageDataGenerator was used for data augmentation process. In the train_set, the parameters like rescale, horizontal flip, zoom range, and shear range with values 1./255, true, 0.2 and 0.2, respectively was used for the augmentation process. As a augmentation process, only the rescale parameter with value 1./255 was used in the test_set. Transfer learning process: The VGG19 architecture is a convolutional neural network (CNN) that is 19 layers deep, and it has proactively won the ImageNet 2014 contest. The top layers of the VGG19 model have been removed, and the additional layers have been connected above it, and then, the model was trained with our dataset. In this model, softmax layer is used for multi-classification of human face recognition. The VGG19 system architecture is shown in Fig. 5. C. Deep Neural Network Facenet Model In this approach, the first step is to produce the database of face-signatures for which the passport photos of all the candidates must be collected. The candidate photos are given as input into the Haar cascade classifier to capture their faces; then, each face is fed into the pre-trained network FaceNet model, and all the face-signatures will be stored into the database. In testing phase, the input test image is captured from the live Web camera. The input is fed into the Haar cascade classifier and then to FaceNet which gives a face-signature as a result. This face-signature is then compared with the signatures already in the database. Finally, the closest similarity will be found, and the attendance will be recorded. The system architecture of this approach is shown in Fig. 6.
Fig. 5 VGG19 architecture
A Comparative Analysis of Various Methods for Attendance Framework …
485
Fig. 6 FaceNet system architecture
Face Detection Process: This process is executed utilizing Haar cascade classifier along with Python language and OpenCV. Haar cascade method has to be trained first, and then, it would be utilized for face detection. This process is known as extraction of face features. The training data of haar cascade utilized here is haar cascade_frontalface_default XML file. Here, we have applied detectMultiScale function which creates a rectangle on all sides of the face in a picture. The specifications
486
A. M. Jothi et al.
applied in this process are minNeighbors and scaleFactor with the values 5 and 1.3 accordingly. Attendance Based on Face Recognition using FaceNet Model: In face recognition, we use a pre-trained network called FaceNet which is a deep neural network that takes input as the picture of an individual’s face and delivers a vector embedding of 128 numbers, which are then projected in a high-layered Euclidean space. Here, the distance between focuses compares to a proportion of face comparability. To find the face similarity between the input image face-signature and signatures in the database, Frobenius Norm is used where the difference between signature and value, square of the difference, its root is calculated for all the 128 numbers, the summation from 1 to 128 is applied, and then, the average is calculated. Finally, if the smallest norm is found, the candidate is recognized by the framework, otherwise the candidate is not in the database. Attendance for all the candidates will be given based on the recorded video.
4 Results and Discussion Experiments were performed with the machine learning framework utilizing both Haar cascade classifiers and local binary pattern histogram as described in the previous section. When generate_dataset file was run and executed, Webcam got opened and it automatically started capturing photos until 50 images of that individual are collected or q is pressed and showed the notification message ‘successfully captured’. The 50 samples of each individual were stored into the dataset folder along with the integer label. Then, the model was trained with the dataset. Finally, the face recognition step where the particular individual was recognized is shown in Fig. 7, and attendance gets updated accordingly. The Fig. 8 shows the excel sheet where the attendance is marked next to recognition procedure. Recognized candidates are marked as ‘Yes’, and rest are marked as ‘No’. In face recognition process, we examined 2 m as the distance of the candidate face to be recognized. In Table 1, the performance evaluation of the framework is shown where the recognition rate and false positive rate under brighter conditions are 80% and 20%, whereas under darker regions, it is resulted as 40% and 30%, respectively. If the confidence value is greater than 85, then that candidate is said to be an unknown individual. For experimenting the transfer learning VGG19 model, we have done the implementation utilizing libraries like keras, OpenCV, and PIL with the help of Python language and Google Colab GPU. We have set the parameters like batch size to be 32 and epoch to be 5 to check the accuracy and loss values for training and validation of the model. The training and testing process include total params to be 20,074,562, trainable params to be 50,178, and non-trainable params to be 20,024,384. The dataset collection output is shown in Fig. 9.
A Comparative Analysis of Various Methods for Attendance Framework …
Fig. 7 Face recognition
Fig. 8 Excel sheet after attendance updation
487
488 Table 1 Performance analysis
A. M. Jothi et al. Performance analysis
Rate (%)
Recognition rate (brighter conditions)
80
False positive rate (brighter conditions)
20
Recognition rate (darker regions)
40
False positive rate (darker regions)
30
Recognition rate for unknown faces
60
False positive rate for unknown faces
15
Fig. 9 Dataset collection for VGG19 model
Figures 10 and 11 show the training and validation loss and accuracy graphs. The results show that on the last epoch, training loss is 0.0317, accuracy is 1.0000, and validation loss and accuracy were 0.3320 and 0.8950. The model has been tested with new input test image captured from the live Web camera, the face got matched as the predicted value was greater than 0.5, and hence, the recognized face along with their respective name has been displayed on the video frame as shown in Fig. 12. The face recognition attendance framework using FaceNet model has been experimented utilizing Google Colab GPU and coding with Python. The libraries included are keras, OpenCV, PIL, and NumPy. We include FaceNet pre-trained model from keras, which is a combination for feature extraction and human face classification. As a first step, passport photos of all the candidates have been collected and stored in FaceNetDatabasePhotos folder along with their names as labels, and it also included a photo labeled as unknown. This dataset is then trained with the framework which results in face-signatures of each candidate being stored into the database file. In testing phase, a input test image has been captured from the Web camera, and the face-signature value has been
A Comparative Analysis of Various Methods for Attendance Framework …
489
Fig. 10 Validation and training loss
Fig. 11 Validation and training accuracy
acquired for this input image. To find the closest similarity of faces, Frobenius Norm has been utilized as the calculation distance method. Finally, the attendance is marked based on the video recorded. The face recognition carried out under brighter lightning conditions is shown as a sample output in Fig. 13. An example output screenshot for more than one face recognized using FaceNet model is shown in Fig. 14. The face recognition using FaceNet network also performs well even if it is carried out in the darker regions as shown in Fig. 15.
490
A. M. Jothi et al.
Fig. 12 Face recognition output for VGG19 model
Fig. 13 Face recognition output for FaceNet model
5 Conclusion and Future Work Attendance is a significant factor in any association in light of which the performance of an individual is followed. Physical way of taking attendance is a timeconsuming task; hence, attendance can be given automatically by utilizing the face recognition technology which is a part of computer vision. In this paper, we have done a comparative learning between machine learning and transfer learning face recognition approaches to give the attendance for the candidates automatically.
A Comparative Analysis of Various Methods for Attendance Framework …
491
Fig. 14 Example screenshot for group of faces
Fig. 15 Face recognition output for FaceNet model under darker conditions
According to the experiments followed and results obtained, face recognition method using FaceNet has outperformed other models like VGG19 and local binary pattern histogram as it was able to recognize similar faces correctly even though if the candidate was captured in different positions, expressions, or gestures, and it was also able to grasp with various lightning conditions and changes. The novelty of this research is to replace the manual attendance methods with pre-trained neural networks and transfer learning techniques for face recognition to mark attendance along with features like data augmentation and additional layers on the top using our own dataset so as to increase the performance of the framework. Upcoming research instructions for face recognition techniques are listed below:
492
A. M. Jothi et al.
(1) In bank lockers, for access control verification and identification of authentic users. (2) Monitoring candidates during competitive examinations. (3) To identify duplications in the voting systems. (4) In government and private sectors. This work can be stretched out by researching more pre-trained neural network frameworks and by involving more human facial picture information. It is intriguing to research assigning these frameworks to masked face human verification proof undertakings.
References 1. Yang H, Han X (2020) Face recognition attendance system based on real-time video processing. IEEE Access 8:159143–159150. https://doi.org/10.1109/ACCESS.2020.3007205 2. Mahmood A, Uzair M, Al-Maadeed S (2018) Multi-order statistical descriptors for real- time face recognition and object classification. IEEE Access 6:12993–13004. https://doi.org/10. 1109/ACCESS.2018.2794357 3. Abuzneid MA, Mahmood A (2018) Enhanced human face recognition using LBPH descriptor, multi-KNN, and back-propagation neural network. IEEE Access 6:20641–20651. https://doi. org/10.1109/ACCESS.2018.2825310 4. Awais M et al (2019) Real-time surveillance through face recognition using HOG and feedforward neural networks. IEEE Access 7:121236–121244. https://doi.org/10.1109/ACCESS. 2019.2937810 5. Zhao H, Liang XJ, Yang P (2013) Research on face recognition based on embedded system. Math Prob Eng 2013(6) Article ID 519074. https://doi.org/10.1155/2013/519074 6. Vázquez JC, López M, Melin P (2010) Real time face identification using a neural network approach. In: Melin P, Kacprzyk J, Pedrycz W (eds) Soft computing for recognition based on biometrics. Studies in computational intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_10 7. Bah SM, Ming F (2020) An improved face recognition algorithm and its application in attendance management system. Array 5(100014) ISSN: 2590-0056. https://doi.org/10.1016/j.array. 2019.100014 8. Sarangi SK, Paul A, Kishor H, Pandey K (2021) Automatic attendance system using face recognition. In: 2021 International conference in advances in power, signal, and information technology (APSIT), pp 1–5. https://doi.org/10.1109/APSIT52773.2021.9641486 9. Colmenarez AJ, Huang TS (1998) Face detection and recognition. In: Wechsler H, Phillips PJ, Bruce V, Soulié FF, Huang TS (eds) Face recognition. NATO ASI Series (Series F: computer and systems sciences), vol 163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3642-72201-1_9 10. Huang H, Hu G (2009) A face detection based on face features. In: Cao B, Li TF, Zhang CY (eds) Fuzzy information and engineering, vol 2. Advances in intelligent and soft computing, vol 62. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03664-4_19 11. Mei M, Huang J, Xiong W (2018) A discriminant subspace learning based face recognition method. IEEE Access 6:13050–13056. https://doi.org/10.1109/ACCESS.2017.2773653 12. Yuan Z (2020) Face detection and recognition based on visual attention mechanism guidance model in unrestricted posture. Sci Program 2020(10) Article ID 8861987. https://doi.org/10. 1155/2020/8861987
A Comparative Analysis of Various Methods for Attendance Framework …
493
13. Xu F, Wang H (2021) A discriminative target equation-based face recognition method for teaching attendance. Adv Math Phys 2021:11 Article ID 9165733. https://doi.org/10.1155/ 2021/9165733 14. Smitha PSH, Afshin (2020) Face recognition based attendance management system. Int J Eng Res Tehnol V9(05). https://doi.org/10.17577/ijertv9is050861 15. Alhanaee K, Alhammadi M, Almenhali N, Shatnawi M (2021) Face recognition smart attendance system using deep transfer learning. Proc Comput Sci 192:4093–4102, ISSN: 1877-0509. https://doi.org/10.1016/j.procs.2021.09.184 16. Dev S, Patnaik T (2020) Student attendance system using face recognition. Int Conf Smart Electron Commun (ICOSEC) 2020:90–96. https://doi.org/10.1109/ICOSEC49089.2020.921 5441
Identification of Efficient Industrial Robot Selection (IRS) Methods and Their Performance Analysis Sasmita Nayak, Neeraj Kumar, and B. B. Choudhury
Abstract The quick advancement of mechanical robots and their utilization by fabricating businesses for a variety of applications could be a basic errand for robot choice. As a result, the industrial robot selection (IRS) process for potential clients gets to be greatly complicated since they have to get to numerous parameters for the robot accessible. In this article, a new predictive model-based IRS technique is proposed, and six different optimization techniques are tested using industrial robot specifications. Partial least square regression (PLSR), principal component regression (PCR), scaled conjugate gradient-based backpropagation method, gradient descent with momentum-based backpropagation, fuzzy topsis, and a case-based approach are some of the optimization models examined in this suggested study. As a whole, 11 distinct factors are taken into account as inputs in this suggested technique, and Robot Rank is used to identify the best robot (RR). Using the suggested method, the rank of the preferred mechanical robot is determined from the absolutely best possible robot, providing the most accurate benchmark for robot selection for the given application. Additionally, MSE: mean square error, RSE: R-squared error, and RMSE: root mean square error are used to evaluate the effectiveness of the robot selection methods. Keywords IRS · MSE · RMSE · RSE · PLSR · PCR · Gradient-based backpropagation · Scaled conjugate · Fuzzy topsis · Case-based approach
S. Nayak (B) Department of Mechanical Engineering, Government College of Engineering, Bhawanipatna, Odisha, India e-mail: [email protected] N. Kumar Department of Mechanical Engineering, Suresh Gyan Vihar University, Jaipur, India e-mail: [email protected] B. B. Choudhury Mechanical Engineering Department, Indira Gandhi Institute of Technology Sarang, Dhenkana, Odisha, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_41
495
496
S. Nayak et al.
1 Introduction Industrial robot systems mostly handle tasks that are difficult for an individual robot to accomplish, although not impossible. To overcome the issues of industrial robot (IR) systems, researchers started investigating industrial robot systems. Domains like welding, spray painting, cutting, and underwater and space exploration are among the few IRS application areas. It is believed that industrial robot systems provide redundancy and perform tasks in a reliable, faster, and cost-effective way as compared to IR systems. Many research articles address the concerns of the IRS and proposed various IRS techniques. The robot selection process is complicated due to robot performance which is limited by many parameters and tasks to predict uses in the environment of dynamics. Coordination of robots, scheduling of tasks, and cooperation among robots are critical factors when designing multi-robot systems. The standard selection and task allocation of the robot are a complicated and critical issue now a day. Here, a huge availability of different attributes of robots with varied acts is shown in the market. So, many kinds of literature that address robot selection procedures are discussed. In the article by Deng et. al., the weighted Euclidean distances are used against a weighted decision matrix [1]. Therefore, the positive and ideal negative points do not relate to the weighted decision matrix. Another multi-criteria decision analysis (MCDM) method proposed by Chen and Hwang for finding a solution to decision-making issues stand on the concept that the preferred option requires the least gap after the perfect positive point than the farthermost length from the ideal adverse point that is overlapping [2]. Similarly, Agrawal et al. [3] developed a method for robot selection in industrial applications. In this context, the industrial robot selection process is based on graph theory and matrix method concepts, which identify four criteria. Agrawal et al., including the degree of freedom (DOF), continue to be utilised. In the present context, the industrial robot selection process is based on graph theory and matrix method concepts where four attributes are identified. Agrawal et al. including the degree of freedom (DOF) are still being used. For this model, higher quantitative values stand lower qualitative feature values are needed. In a similar context, Omoniwa et al. presented an multiple criteria robot selection problem (MCRSP) using gray relational analysis (GRA) [4]. Further, Offodile et al. prepared a mapping code and sorting scheme that was applied to preserve robot features and attributes in a robotic database as well as then picked out a robot with precise showing [5]. Rao and Padmanabhan suggested a process developed upon digraph and matrix method for the valuation of suitable heavy-duty/industrial robots [6]. The digraph developed upon features/characteristics for the choice of robots along with several proportional consequences to viewed utilization. A measure-bymeasure progress for the valuation of a robot selection index was proposed through a graded process. Further, Boubekri et. al. prepared an effective technique to choose the right kind of heavy-duty robots keeping in view the operational, constitutional, and economic elements of the choice procedure [7]. Agarwal et al. developed a multigoal optimization for coalition formation by using evolutionary algorithms for the allocation of the task [8]. In the article by Kristina et al., dynamic task allocation is proposed as the requirement for MRS functioning in the environment of dynamic [9]. Multi-task choice, rather than multi-task work, was advocated by Lope et al.,
Identification of Efficient Industrial Robot Selection (IRS) Methods …
497
which means that merchants or robots pick their responsibilities or are given a duty by a basic director [10]. Social creepy-crawly division of labor, the usefulness of a reinforcement learning algorithm-based wholly on the learning automaton idea, and ant-colony optimization-based deterministic algorithms all contribute to the power of the recommended methods. The purpose of this research piece is to demonstrate the effectiveness of several robot selection algorithms for IRS. This article uses industrial specifications to assess the performance of six different optimization models, including partial least square regression (PLSR), principal component regression (PCR), scaled conjugate gradient-based backpropagation algorithm, gradient descent with momentum-based backpropagation, fuzzy topsis, and a case-based approach. The robot selection approaches, result analysis, conclusion, and the future area of study are the first four components of this article, which is then followed by the reference section.
2 Robot Selection Techniques Utilizing common optimization techniques like partial least square regression (PLSR), principal component regression (PCR), scaled conjugate gradient-based backpropagation algorithm, gradient descent with momentum-based backpropagation, fuzzy topsis, and a case-based approach, etc., the proposed industrial robot selection scheme is validated. The effectiveness of each strategy was evaluated using a set of common inputs. The inputs and their ranges utilized to verify the effectiveness of the suggested optimization models are listed in Table 1 [11]. This collection of inputs is the primary requirement for selecting an industry-ready robot. Positioning of the robot’s strategy was recommended with regards to the essential parameters of the robot indicated in Table 2 [11]. Some of the parameters are Table 1 Primary criteria for choosing a robot fit for industry Sl. No
Parameter
Values
1.
Minimum reach
Minimum 500 mm
2.
Minimum load
Minimum 10 kg
3.
Range of the repeatability
± 0.1 mm
4.
Production rate per hour
≥ 25 tasks/hour
5.
Minimum velocity
Minimum 255 mm/s
6.
Range of the degree of freedom
From 1 to 7
7.
Controller type
From 1 to 4
8.
Actuator type
From 1 to 3
9.
Arm geometry
≤8
10.
Programming
≤5
11.
Cost
7.55–604 K USD
498
S. Nayak et al.
included, viz., degrees of freedom, arm geometry, controller type, actuator types, and programming types as shown in Table 3 [11]. These parameters have been categorized by some subcategories specified with a number. Some of these subcategories of the parameters are arm geometry types: spherical, articulated, rectangular, cylindrical; controller type: explained below: actuator types: hydraulic, electric, pneumatic; and programming: task-oriented, offline program, online program, etc. The proposed IRS technique is evaluated by implementing and testing six different predictive models as discussed in the above section. The proposed strategy is a unique predictive model, and the calculation utilized is a productive calculation for the determination of a suitable industrial robot. In this work, 11 numbers of parameters of the robots are utilized for the selection of the industrial robots, and 10 numbers of wide categories of the robot are utilized to validate the proposed robot selection architecture. These individual categories are called robot ranks. The detailed architecture of the proposed IRS technique is shown in Fig. 1. Different errors such as R-squared error, mean square error (MSE), and root mean square error (RMSE) are calculated to validate the robot selection strategy. Further, the proposed robot selection models are discussed with suitable mathematical analysis. The partial least square regression (PLSR) is a multivariate linear regression model that has been extended. Before using traditional regression models, partial least squares regression can be used as an exploratory analytic method to establish an appropriate prediction model by formulating a mathematical model between the predictor variables and the dependent variable. Equation 1 shows a linear model between a set of independent variables (X’s) and dependent variable (response) Y. Y = w0 + w1 X 1 + w2 X 2 + . . . + w p X p
(1)
where W i (for i = 1: p) is the regression coefficient calculated from the data [12]. Partially, least squares regression is used to display prediction functions using components from your Y “XX” Y matrix [12–14]. The number of such prediction functions that may be developed often surpasses the ideal due to the diversity of Y and X variables. Similar to how principal component analysis (PCA) is based on principal component regression (PCR), PCA is a kind of regression analysis. For each of the initial predictor variables, PCR produces one regression coefficient plus an intercept. Assume that our regression equation can be written in standard notation and represented as a matrix, as in Eq. 2. O = ZB +e
(2)
where O denotes the dependent variable, Z is the set of independent variables, B is the estimated regression coefficients, and e is the errors or residuals. The third model used to implement the IRS is the scaled conjugate gradient (SCG)-based backpropagation algorithm. Scaled conjugate gradient (SCG) is a feed-forward neural network (NN) supervised learning technique that belongs to the conjugate gradient methods family [15–17]. The SCG is similar to the general optimization approach in that it uses the
Production rate
Robot arm geometry (AG)
Type of the controller (C)
6
7
8
Robot cost
Degree of freedom
5
11
Velocity
4
Type of the actuator (A)
Load
3
Type of programming (P)
Reach
2
10
Repeatability
1
9
Name of robot parameter
Sl. No
USD ($)
Nos
Nos
Nos
Nos
Task/hour
Nos
mm/s
kg
Mm
± mm
Unit
7.55K
001
001
001
001
101
001
510
11
550
5.6
1
Level
Table 2 Essential parameters of the robot indicated with levels
15.1K
001
001
001
002
201
002
1010
21
1050
5.0
2
30.2k
002
002
001
003
251
003
1510
31
1550
4.6
3
45.3k
002
002
002
004
301
004
2010
41
2050
4.0
4
60.4k
003
002
002
005
351
005
2510
51
2150
3.6
5
75.5k
003
002
003
006
401
006
3010
61
2250
3.0
6
151k
004
003
003
007
451
007
3510
71
2350
2.6
7
302k
004
003
003
008
501
007
4010
81
2450
2.0
8
453k
005
003
004
009
550
007
4510
91
2550
1.6
9
604k
005
003
004
010
600
007
5000
100
2650
1.0
10
Identification of Efficient Industrial Robot Selection (IRS) Methods … 499
500
S. Nayak et al.
Table 3 Industrial robot attributes Sl. No. Attributes Subcategories Subcategories with their indexes separated by brackets 1
AG
010
1: Light spherical, 2: medium spherical, 3: light articulated light, 4: medium articulated, 5: light rectangular, 6: light cylindrical, 7: medium rectangular, 8: medium cylindrical, 9: heavy rectangular, 10: heavy cylindrical
2
C
004
1: Non-servo type, 2: servo type point-to-point type, 3: servo type continuous path, 4: combined types of PTP and CP
3
P
005
1: Task-oriented program type, 2: offline program type, 3: online program type, 4: teach-pendant program type, 5: lead through teach program type
4
A
003
1: Hydraulic type, 2: electric type, 3: pneumatic type
Fig. 1 Industrial robot selection architecture
second-order approximation information to better choose the search direction and step size [11, 18]. Each iteration of SCG calculates the ideal distance. As shown in Eq. 3, a line search is then used to estimate the best distance to advance along the current search direction. Ok+1 = Z k + sk ∗ rk
(3)
The next search direction is then executed, which is conjugated with the previous search instructions. In reality, sk , its Hessian matrix, the error function, and the
Identification of Efficient Industrial Robot Selection (IRS) Methods …
501
matrix of the second derivatives are all functions of r k . SCG makes use of a scalar to manage the indefiniteness of the Hessian matrix. In a similar context, the momentum based on gradient descent with the backpropagation algorithm is quite similar to the first way. The negative gradient of the objective is addressed via stochastic gradient descent (SGD). As indicated in Eq. 4, the classic gradient descent strategy revises the parameters of the objective J(θ ). θ = θ − α∇θ E[J (θ )]
(4)
The fuzzy topsis method predicts some parameters as well as their quality. To revise and update the weight and bias values of the different training functions, a feedforward fuzzy rule is made and also analyzed for guiding robot selection prediction techniques. The linguistic description is connected with the level of certainty that affects the identification and selection of robots. So, fuzzy logic will help to select an industrial robot in a fashion that provides linguistic terms (highly accepted not accepted) with a considered degree of doubt [19–21]. Further, the case-based robot selection approach is used as an intelligent decision-making model. The case-based approach has a higher capability for similar query problems and confirmed cases result in the success of solving the problem [19, 22]. The performance analysis of all the above techniques is discussed in the results section of this article.
3 Results and Discussion An improvement-based study of systematic robot selection in industrial manufacturing is discussed in this section. The proposed approach utilizes a backpropagationbased scaled conjugate gradient (SCG) to select appropriate robots using a standard set of robot parameters as inputs. Further, a backpropagation-based gradient descent momentum (GDM) is configured and trained to select robots with appropriate parameters. Partial least square regression (PLSR) and principal component regression (PCR) models are used to assess the same robot selection procedure. The target and estimated robot ranks using PLSR, PCR, SCG, GDM, case-based approach, and fuzzy topsis techniques are plotted in the Fig. 2. It can be observed from the Fig. 2 that the PLSR and PCR-based robot selection models produce less error as compared to other techniques. Further, the residual errors obtained for the proposed IRS techniques are shown in the Fig. 3. It is observed from the Fig. 3 that the residual errors for the techniques “SCG, GDM, case-based approach, and fuzzy topsis” are quite high as compared to the PLSR as well as PCR-based IRS. The mean square error (MSE), root mean square error (RMSE), and R-squared error are all used to evaluate performance (RSE). Finally, the regression model of the PLSR and PCR approach is shown in the Eqs. 5 and 6, respectively. RT =(a1) + [(a2) × R] + [(a3) × W ] + [(a4) × P] + [(a5) × V ]
502
S. Nayak et al.
Fig. 2 Target and estimated robot ranks using proposed the IRS techniques
+ [(a6) × D] + [(a7) × PR] + [(a8) × IG] + [(a9) × C] + [(a10) × IA] + [(a11)×] + [(a12) × CS]
Fig. 3 Residual errors obtained for the proposed IRS techniques
(5)
Identification of Efficient Industrial Robot Selection (IRS) Methods …
503
where a1 = −2 0,341,754,789,398.5, a2 = 3,390,292,464,899.03, a3 = 3.14942297397367e−06 , a4 = − 67,294,028,522,664.6, a5 = 1,362,832,032,777.79, a6 = − 0.856955362615008, a7 = 0.0144793395567071, a8 = − 6,780,584,929,798.05, a9 = 0.776501617166275, a10 = 0.572737461733881, a11 = 0.343661894835220, a12 = − 0.00575152555285495 RT =(a1) + {(a2) × R} + {(a3) × W } + {(a4) × P} + {(a5) × V } + {(a6) × D} + {(a7) × PR} + {(a8) × IG} + {(a9) × C} + {(a10) × IA} + {(a11) × IP} + {(a12) × CS}
(6)
where a1 = − 0.0346067647579718, a2 = − 1.87705e−06 , a3 = 0.0001223898, a4 = 3.75210518e−05 , a5 = 0.00187605, a6 = 1.42634638e−06 , a7 = 0.000174122, a8 = 3.75210e−06 , a9 = 1.4972054e−06 , a10 = 6.39640e−07 , a11 = 1.87304162794836e−06 , a12 = 0.000431045 The variables are R = Repeatability, W = Work envelop (reach), P = Payload, V = Velocity, D = Degrees of freedom, PR = Production rate, IG = Index of arm geometry, C = Controller type, IA = Index value (actuator type), IP = Index value (programming), CS = Cost. It is observed from all these selection methodologies that the MSE, RMSE, and R-squared values and implementation are significantly better in the case of PLSR and PCR-based approaches. These two methodologies are also easy to implement in any platform as the model is based on only a mathematical equation consisting of several robot parameters and the rank as output. Therefore, PLSR and PCR models are the best approachable methods to perform necessary robot selection by considering robot parameters as inputs (Table 4).
4 Conclusion and Future Scope An improvement-based study of systematic robot selection in industrial manufacturing is discussed, and the results were analyzed in this article. Firstly, the robot selection process is tested using PLSR and secondly PCR models. Thirdly, the proposed approach utilizes a backpropagation-based scaled conjugate gradient (SCG) to select appropriate robots using a standard set of robot parameters as inputs. Fourthly, a backpropagation-based gradient descent momentum (GDM) is configured and trained to select robots with appropriate parameters. Other two methods such as fuzzy topsis and the case-based approach are also implemented in this work. The MSE, RMSE, and RSE are estimated during the performance review. Finally, to construct rank-based robot selection, we thoroughly connect the task need and manipulator. The robot characteristics of a standard problem faced by the individual manufacturer are taken into account while generating the robot selection rank. The MSE, RMSE, R-squared values, and implementation of the PLSR and PCR-based techniques are far superior in all of these selection procedures. Because the model is
9.3432e−29
9.6626e−15
12.5366e−9
NA
NA
PLSR
PCR
Case-based approach
Fuzzy topics
NA
NA
11.9554e−12
11.3538e−07
Gradient descent with momentum 3.6794e−04
11.3538e−07
3.6794e−04
Scaled conjugate gradient-based backpropagation
RMSE
MSE
Methodology
NA
NA
0.899997 7
0.999999 8
1.0000
1.0000
R-squared error
Table 4 Performance analysis of the proposed robot selection methodologies
0.666
0.864
0.899
0.999
0.98
0.989
Relative closeness
Moderate
Very high
Low
Low
Moderate
Moderate
Computational time
Moderately significant
Moderately critical
Easy to implement
Easy to implement
Very critical
Moderately critical
Ease of implementation
504 S. Nayak et al.
Identification of Efficient Industrial Robot Selection (IRS) Methods …
505
built on a mathematical equation with various robot parameters as input and rank as output, these two strategies are also straightforward to implement on any platform.
4.1 Future Scope In future, research work may be undertaken in the area of development of more efficient algorithms in a neural network, fuzzy logic-based methodology, linear programming, genetic algorithm, and ACO technique. The presented framework is a suitable approach for the selection and task allocation of industrial robots and can be applied to multiple complex requirements of an industry. The robot selection using an interactive Web application/framework has a tremendous potential that needs to be explored in future scope. Future exploration will include the two upgrades in arrangement strategies and augmentations to the current model.
References 1. Deng H, Yeh CH, Willis RJ (2000) Inter-company comparison using modified TOPSIS with objective weights. Comput Oper Res 27:963–973 2. Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making methods. In: Fuzzy multiple attribute decision making. Lecture notes in economics and mathematical systems, vol. 375. Springer, Berlin, Heidelberg 3. Agrawal VP, Kohli V, Gupta S (1991) Computer aided robot selection: the multiple attribute decision making approach. Int J Prod Res 29:1629–1644 4. Omoniwa B (2014) A solution to multi criteria robot selection problems using grey relational analysis. Int J Comput Inf Technol 3(2):329–332 5. Offodile OF, Lambert BK, Dudek RA (1987) Development of a computer aided robot selection procedure (CARSP). Int J Prod Res 25(8):1109–1121 6. Rao RV, Padmanabhan KK (2006) Selection, identification and comparison of industrial robots using digraph and matrix methods. Robot Comput Integr Manuf 22(4):373–383 7. Boubekri N, Sahoui M, Lakrib C (1991) Development of an expert system for industrial robot selection. Comput Ind Eng 21:119–127 8. Agarwal M, Agrawal N, Sharma S, Vig L, Kumar N (2015) Parallel multiobjective multi-robot coalition formation. Expert Syst Appl 42(21):7797–7811 9. Lerman K, Jones C, Galstyan A, Mataric MJ Analysis of dyanamic task allocation in multi-robot systems, University of Southern California, Los Angeles, CA 90089-0781, USA 10. de Lope J, Maravall D, Quiñonez Y (2015) Self-organizing techniques to improve the decentralized multi-task distribution in multi-robot systems. Neurocomputing 163:47–55 11. Nayak S, Kumar N, Choudhury BB (2017) Scaled conjugate gradient backpropagation algorithm for selection of industrial robots. Int J Comput Appl 7(6):2250–1797 12. Neumann C, Förster M, Kleinschmit B (2016) SibylleItzerott, utilizing a plsr-based bandselection procedure for spectral feature characterization of floristic gradients. IEEE J Sel Top Appl Earth Observ Remote Sens 9(9):3982–3996 13. Nagaraja VK, Abd-Almageed W (2015) Feature selection using partial least squares regression and optimal experiment design. Int Joint Conf Neural Netw (IJCNN) 1–8 14. Kr¨amer N, Sugiyama M (2011) The degrees of freedom of partial least squares regression. J Am Stat Assoc 1–23
506
S. Nayak et al.
15. Efron B (2004) The estimation of prediction error: covariance penalties and cross-validation. J Am Stat Assoc 99(467):619–633 16. Frank I, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135 17. Andrei N (2007) Scaled conjugate gradient algorithms for unconstrained optimization. Comput Optim Appl 38:401–416 18. Ceti¸sli B, Barkana A (2010) Speeding up the scaled conjugate gradient algorithm and its application in neuro-fuzzy classifier training. Soft Comput 14(4):365–378 19. Kutlu Gündo˘gdu F, Kahraman C (2020) Spherical fuzzy analytic hierarchy process (AHP) and its application to industrial robot selection. In: Kahraman, intelligent and fuzzy techniques in big data analytics and decision making, INFUS 2019. Advances in intelligent systems and computing, vol 1029. Springer, Cham. https://doi.org/10.1007/978-3-030-23756-1_117 20. Aktas A, Kabak M (2022) An integrated fuzzy decision making and integer programming model for robot selection for a baggage robot system, intelligent and fuzzy techniques in aviation 4.0. In: Studies in systems, decision and control, vol 372. Springer, Cham. https://doi.org/10.1007/ 978-3-030-75067-1_4 21. Ali A, Rashid T (2021) Best–worst method for robot selection. Soft Comput 25:563–583. https://doi.org/10.1007/s00500-020-05169-z 22. Büyüközkan G, Ilıcak O, Feyzio˘glu O (2021) An integrated QFD approach for industrial robot selection. In: Advances in production management systems. artificial intelligence for sustainable and resilient production systems. APMS 2021. IFIP Advances in information and communication technology, vol 632. Springer, Cham. https://doi.org/10.1007/978-3-03085906-0_61
Brain Tumor Diagnosis Using K-Means and Morphological Operations Jadhav Jaichandra, P. Hari Charan, and Shashi Mehrotra
Abstract The human brain is the central processing unit of the body. It is arduous to diagnose brain diseases due to the brain’s complex structure and the existence of a skull around the brain. Tumor is a lethal medical condition where uncontrollable tissue starts to grow inside the patient’s skull in an excess amount, leading to the inevitable demise of the patient if not diagnosed at the early stages. This paper presents an effective brain tumor diagnosis framework using K-means clustering algorithm. We use the median filter on the brain MRI scan to get rid of the noise or other unwanted artifacts, and we then separate the portion of the skull from the brain tissue in the MRI scan. We apply binary thresholding, which only retains the tumorous areas in the MR image, followed by morphological operations. Morphological operations like erosion increased the overall effectiveness of binary thresholding. The experimental outputs distinctly exhibit that the framework successfully detects the tumors given brain MR images. Keywords Brain tumor · Segmentation · K-means clustering · Morphological operations
1 Introduction The brain administers various body functions such as vision, speech, hearing, and response to stimulus. A brain tumor is referred to as an uncontrollable development of tissues in the skull. This growth of tumor cells inside the skull causes damage to various parts of the brain, leading to many health disorders such as severe headaches, problems in vision, difficulties in hearing, seizures, and balancing disorders [1]. Tumors in the brain are normally classified into two types, they are benign and malignant. If the tumor is benign, the tissue growth is non-cancerous, whereas the malignant tumor contains active cancer cells [2]. A brain tumor is a pre-eminent cause for the increase in mortality among adults and children [3]. The tumors in the J. Jaichandra · P. Hari Charan · S. Mehrotra (B) Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_42
507
508
J. Jaichandra et al.
modern day are detected using magnetic resonance imaging (MRIs). MRIs provide noteworthy particulars about the morphology of the brain [4]. The lifespan of a person who is suffering from a brain tumor could be increased if it is detected at the early stages. The latest technology allows us to visually focus on the region and lesion of the tumor [5]. In this study, we used clustering, thresholding cum morphological operations to efficiently detect tumors given an MR image of the brain. The clustering technique we follow here is K-means. K-means is mostly preferred by researchers because of its simplicity and efficiency [5]. Experiments in [6] demonstrate a better precision rate of K-means among all the different clustering algorithms used. The set of data points are segmented into K clusters using K-means [7]. The K-means algorithm requires K to be given in the beginning [8]. Therefore, it is normally preferred to use K-means for medical image examination because the K clusters in K-means are always predefined. After K-means, we applied binary thresholding over the MR image where a threshold value is applied over every pixel in the image, and the respective pixel is only retained when it is greater than or equivalent to the pre-defined threshold [9]. Finally, we used morphological operations on the thresholded image to avoid classifying the normal brain tissue as a tumor. Morphological operations give this approach an upper edge compared to other similar approaches. The tumor-affected regions are now much clear and selfdescriptive compared to other brain tumor detection techniques which did not use morphological operations. Using morphological operations, it prevents uncertainty in the pathologists or doctors when they try to figure out the healthy regions and the tumor-affected regions in the MRI scans.
1.1 Motivation MR images are mostly used in brain tumor diagnosis but detecting brain tumors in the MRI scans can be challenging due to the composite anatomy of the brain and varying image intensities in the MRI scans. Tumor detection at an early stage provides doctors the time to diagnose and treat the tumors effectively and precisely. Diagnosing brain tumors at the earliest stages might save the patient’s life. Our main contribution to the study is as follows: Brain MRI is segmented using K-means, and the segmentation performance is improved using morphological operations. The rest of the article is assembled in the following way. Section 2 delineates the literature survey we have executed while carrying out the proposed framework. Section 3 provides a brief intuition on the design and workflow of the study. Section 4 delineates the experiment cum result analysis where we come up with the final tumor detection mechanism. In Sect. 5, we conclude and discuss future work.
Brain Tumor Diagnosis Using K-Means and Morphological Operations
509
2 Literature Survey/Related Work Kabade and Gaikwad [3] proposed a model which detects brain tumors in MRIs using K-means followed by a fuzzy C-means clustering algorithm. Before the clustering begins, the data is first filtered (denoised) and K-means clustering is applied, which detects mass tumors, and then fuzzy C-means clustering is applied, which detects malignant tumors. Now, from the output of FCM, the tumor is derived using thresholding where all the pixels are compared with a threshold. If the intensity of a pixel is lesser compared to the threshold, it is made zero (black), and if the intensity of a pixel is greater when compared to the threshold, the pixel is made one (white). Vijay and Subhashini [5] proposed a tumor diagnosis model which takes brain MR images and detects the tumors using K-means clustering cum segmentation. The authors prove that K-means is the most efficient, simple, and easy to implement compared to other clustering techniques such as divisive hierarchical clustering and fuzzy C-means clustering. The authors also promise that a wee bit of data can be given to the framework to obtain reliable results. Wu et al. [10] presented a model detects brain tumors which using K-means, histogram clustering to precisely detect the lesion, size, and region of the tumor. This paper focuses on reducing the complexity of the K-means clustering algorithm. Here, unlike all the other tumor detection models where grayscale MRI images are used, color-based segmentation is used. The image detection method maintains accuracy by utilizing color-based segmentation using K-means on the M.R. images. Finally, to get rid of the unnecessary pixels in a selected cluster, histogram clustering is used which derives the final segmented result. Mankikar [4] proposed an interesting brain tumor detection mechanism where he first applies image preprocessing on the brain MRI to turn it into grayscale, and he also removes film artifacts. Then, the image is enhanced where the noise is excluded by using a median filter, and K-means algorithm is applied. Then, the thresholding is applied. Thus, detecting the place, volume, and severity of the tumor. But the drawback is K-means which can only detect a mass or benign and not malignant tumors. The author used fuzzy C-means clustering to detect malignant tumors, he then applied thresholding to maximize the model’s efficiency.
3 Design and Methodology Figure 1 presents an overview of the workflow for the tumor detection approach. The MR images are first denoised using a median filter. We then stamp out the portion of the skull from the MR image of the brain. After the skull removal, we use Kmeans, which segments the brain image into K clusters. We then apply thresholding to enhance the tumorous area from the robust brain tissue in the segmented image. Finally, we apply morphological operations to avoid miss clustering the brain tissue as a tumor.
510
J. Jaichandra et al.
Fig. 1 Workflow process of the study
4 Experiment and Result Analysis We use Python and OpenCV for the experiments, and the data set presented in Fig. 7 is used for the experiments.
4.1 Data Preprocessing Preprocessing is the primary step in image segmentation. To obtain clear segmentation outcomes, a successful preprocessing step plays a pivotal role [11]. We used a median filter to stamp out all the external noise in the MRI. The possibilities of the presence of noise in modern MRI scans are rare, but the thermal effect can sometimes be the cause of the noise [3]. The image after noise removal is then sent to the next phase, which is skull removal. For better intuition of the filter (median), the noise (salt and pepper) was added to the brain MR image. Figure 2a, b presents brain MRI prior and later noise removal, respectively.
4.2 Skull Removal After the elimination of noise from the MRIs, we stamped out the skull from the MR image because the presence of the skull in final output might be fuzzy and may lead to confusion. Skull removal is helpful in segmentation, thresholding, and morphological operations and minimizes error rate [12]. Figure 3a, b presents MR image with skull and without skull, respectively.
4.3 Segmentation After removal of the skull from the input MR images, we applied the K-means clustering for segmentation of MR image. We start with K clusters and n pixels in
Brain Tumor Diagnosis Using K-Means and Morphological Operations
511
Fig. 2 a Brain MR image with noise, b brain MR image after noise removal
Fig. 3 a Brain MR image with skull, b brain MR image without skull
the MR image. We assign each pixel to its closest cluster centroid by evaluating the distance (Euclidean) from the pixel to the centroid. Centroids are modified based on the average of new data points. The distances are reformed based on the modified centroids, and the pixels or data points are assigned again [13]. The algorithm ceases only when all pixels in the image are converged (Fig. 4).
4.4 Thresholding The output from the K-means clustering algorithm is given as an input to the thresholding process. Thresholding is where a threshold value is applied over all the pixels in the image. If the corresponding pixel intensity is greater than or equivalent compared to the threshold, it becomes one, and if pixel intensity is lesser compared to the threshold, it becomes zero. Binary thresholding makes the dark pixel (intensity >
512
J. Jaichandra et al.
Fig. 4 Output images after the application of K-means
threshold) darker and the white pixel (intensity < threshold) brighter. Thresholding converts the grayscale output image from the K-means clustering to a binary image [14]. Binary thresholding is an adaptive method where only those pixels with magnitudes greater than or equivalent compared to the threshold are retained within each block [2]. An image g(x, y) after binary thresholding is given as g(n) =
1, if x(n) > Threshold 0, if x(n) < Threshold
where n number of pixels x input grayscale image g output binary image. Outputs after thresholding (Fig. 5).
Brain Tumor Diagnosis Using K-Means and Morphological Operations
513
Fig. 5 a Binary thresholding, b inverse binary thresholding
4.5 Morphological Operations The output of the thresholding is given as input to the morphological operations to extract image components based on the region or features of the image. We use morphological operations to avoid the miss clustering of normal brain tissue as a tumor. Morphological operations are generally performed on binary images. We need two inputs to perform morphological operations, our original image, and a kernel. Kernel decides the type of operation. There are many morphological operations, but we only focus on the two important morphological operations that are erosion and dilation. Outputs after applying morphological operations (Fig. 6). Result Analysis By analyzing Fig. 8, the tumor-affected tissue could be clearly noticed. The following MR images contain multiple early-stage tumors, as displayed in Fig. 7 and the extracted tumor images from input MR images in Fig. 8.
5 Conclusion and Future Work The paper presents an effective tumor detection framework using brain MR images. An initial experiment conducted on various MR images of the brain illustrates motivating results. The study used K-means clustering, binary thresholding, and morphological operations. The noise removal with the help of the median filter and the removal of skull tissue effectively contributed to the segmentation process. It can be clearly claimed that the overall effectiveness of the model improved when we used
514
Fig. 6 a Output after erosion, b output after dilation
Fig. 7 Multiple early-stage brain tumor MR images
Fig. 8 Extracted tumors from the above brain MR image
J. Jaichandra et al.
Brain Tumor Diagnosis Using K-Means and Morphological Operations
515
the morphological operations on the threshold binary images. The location and area of the tumor now became clear, free from all the unwanted artifacts. The suggested system can help physicians with the better diagnosis of tumors for further treatment. In future, we will work on improving the efficacy of the framework to detect tumors. In the coming days, we will use large data sets by using neural network techniques. We will also focus on enhancing the effectiveness of the tumor segmentation process.
References 1. Vijayakumar T (2019) Classification of brain cancer type using machine learning. J Artif Intell 1(02):105–113 2. Singh G, Ansari MA (2016) Efficient detection of brain tumor from MRIs using K-means segmentation and normalized histogram. In: 2016 1st India international conference on information processing (IICIP). IEEE 3. Kabade RS, Gaikwad MS (2013) Segmentation of brain tumour and its area calculation in brain MR images using K-mean clustering and fuzzy C-mean algorithm. Int J Comput Sci Eng Technol 4(05):524–531 4. Mankikar SS (2013) A novel hybrid approach using Kmeans clustering and threshold filter for brain tumor detection. Int J Comput Trends Technol 4(3):206–209 5. Vijay J, Subhashini J (2013) An efficient brain tumor detection methodology using Kmeans clustering algorithm. In: 2013 international conference on communication and signal processing. IEEE 6. Mehrotra S, Kohli S (2015) Comparative analysis of K-means with other clustering algorithms to improve search result. In: 2015 international conference on green computing and internet of things (ICGCIoT). IEEE 7. Hooda H, Verma OP, Singhal T (2014) Brain tumor segmentation: a performance analysis using K-means, fuzzy C-means and region growing algorithm. In: 2014 IEEE international conference on advanced communications, control and computing technologies. IEEE 8. Mehrotra S, Kohli S, Sharan A (2019) An intelligent clustering approach for improving search result of a website. Int J Adv Intell Paradigms 12(3–4):295–304 9. Reddy D et al (2018) Brain tumor detection using image segmentation techniques. In: International conference on communication and signal processing, April 3–5, 2018, India 10. Wu M-N, Lin C-C, Chang C-C (2007) Brain tumor detection using color-based k-means clustering segmentation. In: Third international conference on intelligent information hiding and multimedia signal processing (IIH-MSP 2007), vol 2. IEEE 11. Sharif M et al (2018) Brain tumor segmentation and classification by improved binomial thresholding and multi-features selection. J Ambient Intell Humanized Comput 1–20 12. Preetika B et al (2021) MRI image based brain tumour segmentation using machine learning classifiers. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE 13. Nasor M, Obaid W (2020) Detection and localization of early-stage multiple brain tumors using a hybrid technique of patch-based processing, k-means clustering and object counting. Int J Biomed Imaging 2020 14. Lakshmi GJ, Ghonge M, Obaid AJ (2021) Cloud based IoT smart healthcare system for remote patient monitoring. EAI Endorsed Trans Pervasive Health Technol e4
Anomaly Detection Techniques in Intelligent Surveillance Systems Viean Fuaad Abd Al-Rasheed and Narjis Mezaal Shati
Abstract Finding strange behavior in busy places is important and a hot topic in the computer vision and information retrieval communities. Unlike the hand-made features that are often used in traditional anomaly detection methods, the features found by deep neural networks are much more accurate representations of how things look and move in different situations. In this research, we came up with a new way to find things that are not normal by using convolution kernels to learn how to personify features. Our system is completely monitored. First, we feed our proposed CNN a set of raw frames by resizing images to 64 pixels and compressing them. We then prepare data for training and testing by splitting liable frames into training and testing sets. A multi-layer (CNN), which is a type of deep learning algorithm, was used to train the system. Lastly, this method is tested using a large-scale video surveillance dataset. Experimental results benchmark datasets: UCF-Crime. Compared to the most modern methods, it is also cost-effective and easy to use. We got to our version 99.99% precision. Keywords Deep learning · Video surveillance system · UCF-crime · Anomaly detection · CNN · Supervised learning
1 Introduction However, even though there have been a lot of improvements in extracting features, modeling of human behavior, and detecting anomalies, the automated detection of unusual activities in video surveillance has become a major source of research in computer vision and pattern recognition. However, finding and localizing anomalies is still a hard problem in smart video surveillance. Also, tracking in public places is inefficient and takes a lot of time. So, it is important to have a smart surveillance V. F. A. Al-Rasheed (B) · N. M. Shati Department of Computer Science, College of Sciences, Mustansiriyah University, Baghdad, Iraq e-mail: [email protected] N. M. Shati e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_43
517
518
V. F. A. Al-Rasheed and N. M. Shati
system that can both recognize and find things that are not normal [1, 2]. One of the biggest problems is that a lot of real-world surveillance footage makes it hard to figure out what is wrong. Anomalies are things that happen that are very different from normal events in the big picture. This means that abnormalities are not based on categories or features but on normal events [3], because people may be running in the second scene but not in the first, something that is unusual in one scene may not be strange in another. So, the sizes and similarities of anomalies are not enough for them to be modeled well. Novelty detection, also known as a one-class, semi-supervised knowledge troubles, is the main challenge in finding odd things in a crowd [4, 5], since the available dataset (training data) just has normal human behaviors, but the test data to be validated contains both regular and odd human behaviors. Deep learning algorithms have lately demonstrated extraordinary effectiveness in a range of computer vision tasks, including object categorization and detection [6]. Their implementations are based on labeled supervised learning. Meanwhile, techniques for feature extraction based on unsupervised learning, such as auto-encoder, have grown in prominence [7]. In comparison with traditional methodologies, these studies have revealed that a wide range of traits, as well as specific attributes, can be learned. Because of this, auto-encoders are now used for anomaly detection instead of feature extractors that were made by hand. Some studies used a fully convolutional network (FCN) to rebuild or create an entirely new framework based on these learned characteristics [8, 9]. The total difference between the new framework and the starting frame was used to show that something was wrong [8]. Other studies predicted the anomaly scores of learned features using probability evaluation models such as one-class SVM models [10] and Gaussian models [11]. We provide a basic architecture for an anomaly detection system for challenging surveillance scenes. In this work, which uses (CNNs) to learn visual patterns from image pixels, this is more general because the feature extraction technique is automated. However, we will focus on the UCF-Crime dataset below [12]. A large number of deviant, criminal, and violent activities caught on public surveillance cameras, which may pose significant challenges for individuals and society as a whole. It is recommended to use (CNN). To extract features as a result of interacting with the video dataset, the model returns whether or not the input video contains suspicious behavior. We developed a supervised DL approach for irregularity detection in crowded scenarios in order to define and clarify the significance of crowd anomalies. This framework can recognize anomalies and outperform existing systems in terms of accuracy. The core goal is to increase the efficiency of video anomaly detection using markers class. Therefore, we presented a private CNN evaluation structure for the proposed work in video streaming in large data to be looked at. In the remainder of this article, we will look at the numerous other related occupations that employ different versions of the logic for anomaly detection in security video cameras that we discussed in Sect. 2. Following that, we define our recommended design in Sect. 3. Following that, in Sect. 4, we will assess our work through a series of experiments. Finally, the conclusion is examined.
Anomaly Detection Techniques in Intelligent Surveillance Systems
519
2 Related Works The accompanying articles offer many methods for detecting activity in videos. The purpose of the project was to identify any unexpected events in a video security system. Aside from the increased need for automated anomalous event detection, a variety of approaches for dealing with various forms of anomaly detection in video datasets have been proposed [13]. Unsupervised approaches explain and interpret data features, whereas supervised methods differentiate data classes. In comparison, strategies for detecting anomalies under supervision outperform unsupervised anomaly detection approaches utilizing labeled data [14]. The dividing limit is learned from training data in supervised anomaly detection, and test data is then classified as normal or abnormal using the learned design. In [2018], Singh and Pankajakshan [15] used a deep neural network (DNN) for modeling natural actions as well as predicting future frames from past frames of distortion-free data. The recommended technique was tested on the publicly offered UCSD Anomaly Detection Datasets 4: UCSDPed1 and UCSDPed2. The proposed approach yielded the following results for the first metrics: Area under Curve (AUC), which was achieved (74.8% in UCSDPed1) and (80.2% in UCSDPed2). Thakur and Kaur [16]: This work proposes a CNN-based Anomaly Detection System (CNN-ADS), which is the organization of several layers of a hidden unit with the maximum MSER characteristic using a Genetic Algorithm (GA). The UCF-Crime dataset is used to evaluate the training and testing of the proposed system; the average error rate is 1.29%, and the average accuracy (AUC) is 98.71%. Mahdi et al. [17] published their findings in [2021]. The video is classified as abnormal or normal by the supplied system. In the event of unacceptable behavior, the authorities will be notified through SMS. They employ CAVIAR, KTH, and YouTube scenes in their method. The goal of this study was achieved by using CNN to extract features from frames and the LSTM structure to categorize them as abnormal or normal. The precision was 95.3%. Habib et al. [18]: To recognize pilgrims from the monitoring video cameras, a lightweight CNN version is trained using this very own explorer’s dataset. In the second stage, these preprocessed notable frames are sent into a lightweight CNN model for spatial feature extraction. LSTM is constructed in the third phase to extract temporal features. The suggested model achieved an accuracy of 81.05 and additionally 98.00. In [2022] Vosta and Yow [12]: ResNet50 was used as a CNN for feature extraction in the proposed design. Then, as a result of working with the video dataset, an RNN, ConvLSTM, was included into the architecture of the model, and the UCF-Crime dataset was utilized to obtain an AUC of 81.71% and an Accuracy of ResNet101ConvLSTM of 63.75%.
520
V. F. A. Al-Rasheed and N. M. Shati
Fig. 1 Proposed system framework
3 Proposed System Framework Generally, the proposed system includes some basic stages to perform all relevant machine and verification tasks. Figure 1 shows the system framework, which requires the main stage to have some sub-stages or modules that require some steps or tasks to achieve their final goal. Our framework is divided into several major phases, including:
4 Proposed Method Samples of training video frames are already processed, and a label encoder is used to mark those frames as either abnormal or normal (positive class, negative class). Then, each group of frames is sent to the CNN that was made just for it. After the input layer, our system is made up of a series of hidden layers, such as: The previous layer’s output is then handled by (ReLU). The Flowed by Flatten layer is a very important layer when you want to make a long feature vector from a multi-dimensional feature map. We flatten the output of the convolutional layers or max pooling layer to make a single long feature vector. A dense layer is used which is fully connected (fc) to the output node (Fc). Layers are where classification actually happens. The output of this layer is put through an activation function (sigmoid).
Anomaly Detection Techniques in Intelligent Surveillance Systems
521
Table 1 Structure of CNN’s network Dimension
Convolution kernel
Stride
Output
Conv
64 × 64
3×3
1
62 × 62 × 16
Maxpool
62 × 62
–
–
31 × 31 × 16
Conv
31 × 31
3×3
1
29 × 29 × 32
Maxpool
29 × 29
–
–
14 × 14 × 32
Conv
14 × 14
3×3
1
12 × 12 × 64
Maxpool
12 × 12
–
–
6 × 6 × 64
Flatten
6 × 6 × 64
–
–
2304
Dense
2304
–
–
512
Dense
512
–
–
1
Sets for both training and testing. We split our dataset into two parts: the training set, which has 100 samples of both normal and unusual videos, and the testing set, which also has 100 samples of both normal and unusual videos. Both the training set and the testing set have all 13 anomalies at different points in time. Also, some of the videos have more than one strange thing going on. How long each training video is depending on how it is spread out. The number of frames in each video was 70. If you divide the total number of frames by 70, you can find the number of interval actions (Table 1).
5 Dataset In this work, we apply the suggested approach to the UCF-Crime dataset, which contains a large amount of irregular, unlawful, and violent activity caught on public surveillance cameras thus instance campuses, businesses, and streets. The rationale for choosing this dataset is that it is derived from real day-to-day events that can occur every day and all over the world. Furthermore, these types of abnormal behaviors can cause significant problems for both individuals and society; thus dataset, 13 classes of anomalous occurrences are represented by uncut feeds of surveillance camera, as well as a set of normal events. To Fig. 2 in our studies, we used 80% of the data for training and 20% for testing from the UCF-Crime dataset (Table 2).
6 Experimental Outcomes and Result Evaluation In this section, we compare our experimental outcomes to those of previous methodologies utilized on the “UCF-Crime” to determine how well the proposed model performs. For the evaluation, the AUC and accuracy measures are used. The accuracy
522
V. F. A. Al-Rasheed and N. M. Shati
Fig. 2 Samples of UCF-crime dataset, a abnormal activity, b normal activity Table 2 Number of UCF-crime datasets training and testing videos
Activity
No. of videos
Abuse
16
Arrest
16
Arson
16
Assault
16
Burglary
8
Explosion
8
Fighting Normal
8 100
Anomaly Detection Techniques in Intelligent Surveillance Systems
523
Fig. 3 The proposed system accuracy model
Table 3 The results of proposed system Matrices
CNN approaches CNN approaches with grayscale CNN approaches with unbalance number of samples (positive and negative)
Accuracy 99.3
88.1
98.00
AUC
99.00
–
–
F1 score
99.00
–
–
of the design can be enhanced for the initial epoch 97 by increasing the number of iterations. The frames are removed from the sights and saved in a single folder for testing reasons. The frames are classified as such by the algorithm (normal or abnormal) using our trained design. As shown in Fig. 3, 99.00% accuracy was achieved, as indicated by the accuracy windows. Table 3 illustrates the results of all the matrices we utilized in evaluation design. And the assessment achieved results with the state-of-the-art models on the UCFCrime dataset in Fig. 4, which illustrates that our accuracy rate is the highest when compared to existing jobs. Table 4 displays the comparison AUC of our proposed formula with the others on the UCF-Crime dataset.
7 Discussion We showed the results of all of the matrices we used in our evaluation model. The accuracy of our CNN approaches, those with grayscale, and those with an uneven number of samples (positive and negative) were 99.3, 88.1, and 98.00, respectively, and the AUC, F1 score was 99.00. Table 4 shows how the AUC of our suggested algorithm compares to other algorithms on the same dataset. We can say that our
V. F. A. Al-Rasheed and N. M. Shati
accuracy
524
accuracy 2019 Divya
98.71
2022 Soheil Vosta
63.75
ours
99.3
Fig. 4 Shows compared recommended system with algorithms discussed in the associated works
Table 4 The comparison AUC of ours with other algorithm used UCF-crime dataset
Model
AUC (%)
Sultani et al. (loss without constraints) [19]
74.44
Sultani et al. (loss with constraints) [19]
75.41
Vosta and Yow [12]
81.71
Dubey et al. (3D ResNet + constr. + new rank. loss) [20]
76.67
Dubey et al. (3D ResNet + constr. + loss) [20]
75.62
Our proposed
99.00
proposed method has the highest AUC score of all methods. In Fig. 4, we show that, compared to current efforts, our accuracy value is the highest.
8 Conclusions and Limitations We offer a DL technique to discover anomalies in surveillance videos. We used a proprietary CNN to gain so much important information from each frame of input video so the model could focus on any abnormalities in the video file. The experimental results demonstrate that the proposed model outperformed other existing models by achieving 99.3% accuracy on the popular UCF-Crime dataset. Limited the proposed work is that the RAM cannot fully accommodate the clips since each clip is longer than 30 s. Memory is full. The normal class has 100 videos. Therefore, the number of videos in a class must be approximate to avoid over fitting and under fitting. For events with a dataset, the events do not start from the beginning of the clip and may only occupy a few seconds. Our suggested strategy beats competing methods on the UCF-Crime dataset. Acknowledgements The authors would like to thank Mustansiriyah University in Bagdad, Iraq, for their cooperation with this study (http://uomustansiriyah.edu.iq).
Anomaly Detection Techniques in Intelligent Surveillance Systems
525
References 1. Xu M, Yu X, Chen D, Wu C, Jiang Y (2019) An efficient anomaly detection system for crowded scenes using variational autoencoders. Appl Sci 9(16):3337 2. Ye R, Li X (2017) Collective representation for abnormal event detection. J Comput Sci Technol 32(3):470–479 3. Ramchandran A, Sangaiah AK (2020) Unsupervised deep learning system for local anomaly event detection in crowded scenes. Multimedia Tools Appl 79(47):35275–35295 4. Biswas S, Gupta V (2017) Abnormality detection in crowd videos by tracking sparse components. Mach Vis Appl 28(1):35–48 5. Afiq AA, Zakariya MA, Saad MN, Nurfarzana AA, Khir MHM, Fadzil AF, Faizari M (2019) A review on classifying abnormal behavior in crowd scene. J Vis Commun Image Represent 58:285–303. https://doi.org/10.1016/j.jvcir.2018.11.035 6. Alex K, Ilya S, Geoffrey EH (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. http://doi.org/10.1145/3065386 7. Kingma DP, Welling M (2013) Auto-encoding variational Bayes. arXiv preprint arXiv:1312. 6114 8. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431– 3440 9. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection—a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545 10. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127 11. Sabokrou M, Fayyaz M, Fathy M, Moayed Z, Klette R (2018) Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes. Comput Vis Image Underst 172:88–97 12. Vosta S, Yow KC (2022) A CNN-RNN combined structure for real-world violence detection in surveillance cameras. Appl Sci 12(3):1021 13. Yu J, Yow KC, Jeon M (2018) Joint representation learning of appearance and motion for abnormal event detection. Mach Vis Appl 29(7):1157–1170 14. Görnitz N, Kloft M, Rieck K, Brefeld U (2013) Toward supervised anomaly detection. J Artif Intell Res 46:235–262 15. Singh P, Pankajakshan V (2018) A deep learning based technique for anomaly detection in surveillance videos. In: 2018 twenty fourth national conference on communications (NCC). IEEE, pp 1–6 16. Thakur D, Kaur R (2019) An optimized CNN based real world anomaly detection in surveillance videos. Int J Innov Technol Exploring Eng (IJITEE) 8(9S). ISSN: 2278-3075 17. Mahdi M, Mohammed A, Waedallah A (2021) Detection of unusual activity in surveillance video scenes based on deep learning strategies. J Al-Qadisiyah Comput Sci Math 13(4):1. http://doi.org/10.29304/jqcm.2021.13.4.858 18. Habib S, Hussain A, Albattah W, Islam M, Khan S, Khan RU, Khan K (2021) Abnormal activity recognition from surveillance videos using convolutional neural network. Sensors 21(24):8291 19. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488 20. Dubey S, Boragule A, Jeon M (2019) 3D resnet with ranking loss function for abnormal activity detection in videos. In: 2019 international conference on control, automation and information sciences (ICCAIS). IEEE, pp 1–6
Effective Detection of DDoS Attack in IoT-Based Networks Using Machine Learning with Different Feature Selection Techniques Akash Deep and Manu Sood
Abstract Due to its high impact at a low cost of implementation, the Internet of Things (IoT) is being adopted at an exponential rate for a diverse range of applications based upon Industry 4.0 and 5.0 Standards. It is helping in building a smarter world by associating smartness with objects or entities using the Internet and making them an integral part of various setups providing useful services. With a high rate of its adoption, the number of IoT devices being installed worldwide is also going up, leading to the vulnerability of such devices to security risks including malicious attacks. One of the common attacks on such setups is the distributed denial of service (DDoS) attack that disrupts the user’s resources and services by targeting them from different servers within the network. This attack may last for hours without the user(s) knowing about it, making it a dangerous and harmful attack. Machine learning (ML) supports the tools and techniques which make the machine learn from the historical data collected from past experiences to make intelligent future choices. The supervised ML techniques are employed for classifying the data into various predefined classes in real-time scenarios for the purposes like a prediction. These are being used extensively for anomalies and attack detection/prediction using their property of classification to predict the target class as normal or malicious. In this paper, the authors have compared different supervised ML algorithms along with different feature selection techniques to predict DDoS attacks in an IoT setup after training the models on the DDoS attack dataset. From the analysis of the results of experiments, it is concluded that the KNN classifier performs the best in predicting DDoS attacks by achieving more than 98% accuracy using all three feature selection techniques. Keywords DDoS attack · Machine learning · Classifiers · Accuracy · Feature selection · Prediction model
A. Deep (B) · M. Sood Department of Computer Science, Himachal Pradesh University, Shimla, India e-mail: [email protected] M. Sood e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_44
527
528
A. Deep and M. Sood
1 Introduction Internet of Things (IoT) is one of the latest sensations that is changing the lifestyles of humans at a fast pace. Internet-based smart devices are being used with convenience for making day-to-day affairs faster and easier. The services being rendered bymerchandise outlets in the malls/shopping complexes, drone deliveries, smart fire and smoke detection/extinguishing systems, smart libraries, smartwatches, smart cities, smart transportation, etc., are among the most familiar applications to everyone in the urban areas. GPS-based maps and related applications are being rendered in rural as well as difficult terrains. The Industry 4.0 and 5.0 Standards provide the basis for the industry world over to integrate this technology and related devices with the existing standard industry practices. The main idea of IoT is to exchange meaningful data among real-world’s entities and ICT systems for ubiquitous and pervasive computing. The ubiquity of common IoT devices such as Bluetooth, Wi-Fi, and mobile phone data services has intensified the transformation of IoT development. In 2011, the number of interconnected devices has already overtaken the size of the actual population whereas currently, there are 9 billion interconnected devices which were expected to reach 24 billion by 2020 [1]. The term ‘things’ in IoT refers to various objects/entities with a connection to the Internet and are loosely called devices in the IoT. These ‘things’ use wireless sensors and software applications to perform the task of sensing, acquiring, processing, sharing, and communicating data. Technologies like BLE, ZigBee, WiFi, LPWAN, Cellular 3G/4G/5G/6G, and modern tools like Arduino or Raspberry pie are used to convert these things into IoT devices. As the usage of interconnected devices increases, various security issues also do come into play [2]. Vulnerabilities in IoT devices are not a hidden fact, for example, a patient monitoring system using an IoT device and an automobile along with personal IoT gadgets are vulnerable to security threats resulting in either poor services or disruption of service(s) and/or leakage of sensitive information. Even devices in a typical household such as IP cameras, smart TVs, or security monitoring systems are susceptible to the risk of getting hacked thereby increasing their vulnerability to malevolent actors. Though the exponential growth of IoT devices has resulted in faster delivery of services and has made life much simpler in various quarters, it has also resulted in the arousal of security issues in IoT devices in equal measures if not more. Various layers of a typical IoT architecture are open to different types of vulnerabilities and different types of attacks that take place on each layer [3]. Some of the common attacks that take place in IoT devices include denial of service (DoS) attacks, Sybil attacks, wormhole attacks, and spoofing attacks. These attacks not only disrupt the quality of service and user resources but may also result in the manipulation or theft of sensitive information through illegitimate access by the hackers/malevolent actors exploiting these vulnerabilities. Table 1 highlights the most common threats and vulnerabilities.
Effective Detection of DDoS Attack in IoT-Based Networks Using … Table 1 Threats attached with vulnerabilities in IoT devices
Vulnerability
Threats
Alteration, corruption, deletion
Tracking DoS, spoofing
Packet manipulation
Hacking
Eavesdropping DOS
Bluesnarfing, bluejacking
The rogue access point, misconfiguration
Hacking, signal lost
Manipulation of the data, extortion hack
Hijacking of equipment
Smart city DOS
Security plagued
DOS, Sybil, exhaustion
Flooding
529
Distributed denial of services attack or DDoS attack is a type of DoS attack that feeds on the user resources attacking them from different sources on the bandwidth of the devices and consumes these resources to make the services being rendered by them unavailable [4]. A DDoS attack deploys multiple attacking units to disrupt the user resources, the goal being the depletion of infrastructure-based resources of the user(s) often using data flooding. It is estimated that over 7000 such attacks occur daily—a number that has grown rapidly in recent years [5]. Malicious hackers have successfully exploited these connected devices causing illegitimate access and services prevention [6, 7]. DDoS attack plays a major role in exploiting the vulnerabilities and threats to both hardware and software as mentioned in Table 1. Most attackers initiate DDoS attacks by deploying multiple machines for heavy impact and tying up all of the tellers at all of the active windows. In such cases, it can be difficult to detect and block attackers manually [4]. Therefore, it is crucial to deploy automated tools for detecting these DDoS attacks whenever they happen in a network environment. Machine learning (ML) being a subclass of artificial intelligence deals with data analysis and studying patterns, making the machine learn, de-learn, and re-learn from the data gathered in the past. Supervised learning is used when the data contains a target value and the data is labeled, whereas unsupervised ML is used in the cases where the target value is not known or the data is unlabeled. In the terms of accuracy, supervised learning algorithms are known to perform better [8]. In this paper, various supervised ML algorithms have been used on the ‘Network Logs’ dataset on DDoS attacks to analyze how well these algorithms perform in the detection of DDoS attacks. The classifiers when used on the datasets with all their features included, provided far from satisfactory results in classifying the data into two classes of attack data or non-attack data when various performance metrics were computed. To enhance the performances of various classification models to satisfactory levels or beyond, a need for selecting a suitable set of features was felt. It is clear from the literature that the selection of appropriate features that too in optimum numbers has a direct bearing on the performance of any supervised ML-based prediction model.
530
A. Deep and M. Sood
Hence, three different feature selection techniques have been employed to analyze which subsets of features of the dataset contribute toward the best performance of classifiers in the detection of the attack. The techniques used are (1) recursive feature elimination (RFE), a wrapper method based on logistic regression, (2) feature importance, another wrapper method based on extra tree classifier, and (3) univariate statistical method, a filter method based on chi-square. Only those feature sets that are selected by each of these feature selection techniques have been used for the evaluation of the results. Seven different classifiers have been deployed as prediction models, all of them being supervised machine learning classifiers. These are (a) support vector machine (SVM), (b) random forest (RF), (c) decision tree (DT), (d) Naïve Bayes (NB), (e) K-nearest neighbors (KNN), (f) linear discriminant analysis (LDA), and (g) logistic regression (LR) [8]. The objective of this research work is to find out (a) the supervised ML algorithms which perform better on the given dataset, (b) how well can these models predict the DDoS attack with their best performances, and (c) To analyze how does the receiver operator characteristics (ROC) perform on these models. The significant contribution of the paper is in presenting the objective evaluation of three different feature selection techniques on the performances of each of the prediction models selected for the given dataset. This paper has been organized into four sections. After the basic introduction to the topic under consideration in Sect. 1, Sect. 2 describes the research methodology used for achieving the objectives and includes a brief about the experimental setup, the dataset used, pre-processing carried out on the dataset, and an explanation of how different feature selection techniques and classifier algorithms have been utilized. Section 3 presents the results obtained through the specific experiments and their analysis. Section 4 has been included to highlight the conclusion and future scope.
2 Research Methodology This section provides a crisp explanation of various processes, subprocesses, methods, and techniques used in this research work for achieving the set target in terms of fulfilling the objectives of the paper. The experimental setup includes Python language that is used for practical implementation of the specified feature selection techniques as well as classifier models. Google Colab has been used for the execution of the code for various algorithms. It is open-source software that provides necessary disk space and read-only memory (ROM) to the users for implementation purposes. It also includes special libraries like Keras and TensorFlow which are of utmost necessity for implementing various algorithms.
Effective Detection of DDoS Attack in IoT-Based Networks Using …
531
Table 2 Number of features selected by different feature selection techniques Name of the technique
Total number of features
Features selected
Dataset name
RFE
28
13
FS1
Feature importance
28
13
FS2
SelectKBest (chi2 )
28
13
FS3
2.1 Dataset The dataset used in the experimental analysis is the ‘Network Logs’ on DDoS attack which has been taken from the Kaggle website. The dataset originally contained 28 features and 2,100,000 labeled network logs collected from various types of network attacks on different IoT devices. The dataset contained data pertaining to four different types of DDoS attacks: HTTP-FLOOD, UDP, SIDDoS, and Smurf along with data pertaining to normal traffic. In the data preparation stage, all these four types of attacks were merged to get encapsulated DDoS attacks data as the objectives of this research work could be obtained with binary classification only. This dataset now contained approximately 1,000,000 rows with 28 labeled features. The experiments have been performed on this dataset after applying three feature selection techniques. These three techniques resulted in three different sets of datasets named FS1, FS2, and FS3 as shown in Table 2.
2.2 Data Pre-processing Data pre-processing is an essential subprocess of any process using ML techniques on any dataset. It normally includes cleaning, scaling, and normalizing the data and is a necessary part of the data preparation stage to take care of noisy data, irregular data, and not-a-number (NaN) values. All this stuff needs to be filtered or corrected before supplying the data to the machine for learning (training, testing, and validation) [8, 9]. The dataset used for the experiments here contained 28 features and the data pre-processing was carried out on all these features of the dataset.
2.3 Feature Selection Feature selection is the process of removing insignificant or unwanted and noisy features, and the process is expected to choose only the input variables which would contribute significantly to predicting the target variable [10–12]. It is necessarily employed for keeping only those features that contribute the most to predicting the target value. A feature selection technique helps us to select the best optimal set of features out of all the features in the dataset. Three different supervised feature
532
A. Deep and M. Sood
selection techniques (one belonging to the wrapper method and two belonging to filter methods) have been considered for experimental exploration in this work. The first technique which is used in selecting the features is the recursive feature elimination (RFE), RFE helps to improve the prediction accuracy by automatically eliminating the least important features. RFE eliminated 15 least important features from our dataset and returned a subset of 13 selected features. The second feature selection technique used is the feature importance which assigns weights to every feature, and important features are gathered by checking the highest weights. Feature importance is a filter method technique, and an extra tree classifier was used to get the important features from our dataset. This method also produced a subset of 13 features for the evaluation. The last technique which we used was the univariate statistical method using chi-square (SelectKBest). It is a filter method where the statistical test is applied to all the features to calculate the correlation between the features by using their frequency distribution. Again, a subset of a total of 13 features was filtered out of 28 features [8]. All three feature selection methods returned a different set of features. In all these selected subsets of features, 13 features were considered to get a fair comparison between the performances of different feature selection techniques.
2.4 Classifiers Used After performing the feature selection technique, the data was split into 70:20:10 ratios. 70% of data was assigned for training the models, 20% of data was assigned for testing, and 10% of data was reserved for cross-validation. Seven classifiers were used to evaluate the model performance. The classifiers which were used are as follows. (1) SVM uses a support vector to classify the data into the points that are closer to the hyperplane than the points which are away from it [13, 14]. (2) RF is an ensemble learning algorithm that provides learning techniques for both classification and regression where its output depends on the target value picked up by the majority of the trees [15–17]. (3) DT classifier is a classification and regression algorithm that builds the classification model by using a decision tree [18]. (4) NB is a set of different classification algorithms which uses the Bayes theorem where all the features classified are recognized as independent features [18]. (5) KNN is also a classification and regression method that uses a nonparametric approach and classification is done by a plurality vote of its neighbors where (n) is the number of neighbors [8]. (6) LDA uses continuous independent variables and dependent categorical variables to perform the analysis [9]. (7) LR is a classification algorithm that predicts the output of a categorical dependent variable and returns the probabilistic values which lie between (0, 1) [19]. The metrics used for evaluating and comparing the performances of various classification models using the confusion matrix are accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), and receiver operating characteristics (ROC). All these metrics have been computed after calculating the values of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
Effective Detection of DDoS Attack in IoT-Based Networks Using …
533
3 Experimental Results (Stimulation Process) The confusion matrix is used in supervised ML to compute the above-mentioned metrics which in turn form the basis of evaluating the performances of the models. The confusion matrix is a (2 ∗ 2) matrix and provides four values corresponding to TP, TN, FP, and FN [20]. TP is the number of actual target classes which are detected as target classes, TN is the number of correct predictions of normal classes as the normal classes, FP is the number of false predictions of normal classes as the target class, and FN is the number of predictions that were classified as normal classes but are actually in the target class.
3.1 Cross-Validation Results on the FS1 Table 3 presents the computed values of all the seven metrics for all the seven classifiers. As can be seen from this table models based on LR, LDA, KNN, and NB classifiers have performed better as compared to the other three models. Among these four too, KNN can be seen to perform the best. The receiver operator characteristics (ROC) which is the AUC under all possible threshold values have also been used to evaluate the performance of the models. The 10% data which was reserved for the cross-validation test was consumed for validation using the split train test. Cross-validation is performed to check how well the models perform on the unseen data after training. Table 4 presents the results of cross-validation for six classifiers. The pictorial representation of comparative performances of all classifiers is depicted in Fig. 1. The results for the SVM classifier could not be achieved even after repeated attempts, probably due to some resource restrictions/crunch in our experimental setup. It can be concluded from Tables 3, 4, and Fig. 1 that KNN is clearly ahead of other models with results in Tables 3 and 4 endorsing each other. Table 3 Results of all seven classifiers on FS1 Algorithms
Precision
Recall
Accuracy
F1-score
Specificity
AUC
ROC
Logistic regression
0.999
0.980
0.982
0.990
0.992
0.986
0.936
LDA
0.999
0.981
0.982
0.981
0.996
0.989
0.924
KNN
0.999
0.985
0.986
0.992
0.998
0.991
0.936
Naïve Bayes
0.999
0.980
0.940
0.990
0.999
0.999
0.936
Decision tree
0.984
0.985
0.973
0.984
0.866
0.9259
0.928
Random forest
0.980
0.985
0.977
0.985
0.902
0.943
0.936
SVM
0.996
0.981
0.904
0.982
0.966
0.974
0.936
534
A. Deep and M. Sood
Table 4 Results of cross-validation using all seven classifiers on FS1 Algorithms
Logistic regression
LDA
KNN
Naïve Bayes
DT
Random forest
SVM
Accuracy
0.982
0.982
0.986
0.973
0.972
0.977
–
Supervised Classfier Techniques
Performances of 7 Supervised Classifier Techniques with RFE on FS1 SVM Random forest DT Naïve Bayses KNN LDA Logistic Regression 0.75 Validation Accuracy Specificity Precision
0.8 0.85 0.9 Values of Parameters AUC F1-score Accuracy
0.95
1
ROC Recall
Fig. 1 Performance comparison of seven classifiers on FS1
3.1.1
Observations (Discussion)
It can be clearly observed from Table 3 that experimental setup 1 used 13 features in FS1 to evaluate the performance of each of the seven classifiers, KNN gave the best performance out of all classifiers with an accuracy of 98.6%, precision of 99.9%, recall of 98.5%, specificity of 99.8%, and ROC of 93.6%. The results of crossvalidation are also encouraging with Naïve Bayes showing better performance than the testing accuracy. The missing SVM results are still awaited and the authors are working on them.
3.2 Cross-Validation Results on the FS2 Table 5 presents the computed values of all the seven metrics for all the seven classifiers. As can be seen from this table models based on LR, LDA, KNN, and NB classifiers have performed better as compared to the other three models. Among these four too, KNN can be seen to perform the best. The receiver operator characteristics
Effective Detection of DDoS Attack in IoT-Based Networks Using …
535
Table 5 Results of all seven classifiers on FS2 Algorithms
Precision
Recall
Accuracy
F1 support
Specificity
AUC
ROC
Logistic regression
0.999
0.980
0.981
0.989
0.992
0.986
0.936
LDA
0.998
0.981
0.981
0.989
0.984
0.991
0.936
KNN
0.999
0.985
0.986
0.992
0.998
0.992
0.936
Naïve Bayes
0.999
0.972
0.974
0.985
0.999
0.992
0.936
Decision tree
0.984
0.985
0.972
0.984
0.865
0.924
0.928
Random forest
0.988
0.985
0.972
0.984
0.909
0.947
0.936
SVM
0.996
0.981
0.904
0.989
0.966
0.974
–
Table 6 Results of cross-validation using all seven classifiers on FS2 Algorithm
Logistic regression
LDA
KNN
Naïve Bayes
DT
Random forest
SVM
Accuracy
0.982
0.981
0.986
0.974
0.973
0.978
–
(ROC) which is the AUC under all possible threshold values have also been used to evaluate the performance of the models. The 10% data which was reserved for the cross-validation test was consumed for validation using the split train test. Cross-validation is performed to check how well the models perform on the unseen data after training. Table 6 presents the results of cross-validation for six classifiers. The pictorial representation of comparative performances of all classifiers is depicted in Fig. 2. The results for the SVM classifier could not be achieved even after repeated attempts, probably due to some resource restrictions/crunch in our experimental setup. It can be concluded from Tables 5, 6, and Fig. 2 that KNN is clearly ahead of other models with results in Tables 5 and 6 endorsing each other.
3.2.1
Observations
It can be clearly observed from Table 5 that experimental setup 2 used 13 features in FS2 to evaluate the performance of each of the seven classifiers, KNN gave the best performance out of all classifiers with an accuracy of 98.6%, precision of 99.9%, recall of 98.5%, specificity of 99.8%, and ROC with 93.6%. The results of cross-validation are also encouraging with Naïve Bayes showing better performance compared to the testing accuracy. The missing SVM results are still awaited and the authors are working on them.
A. Deep and M. Sood
Supervised Classification Techniques
536
Performances of 7 Supervised Classifier Techniques with Feature Importance on FS2 SVM Random forest Decision Tree Naïve bayses k-nearest nieghbour LDA Logistic Regression 0 ROC
0.5 Values of Parameters
AUC
Specifity
1 Accuracy
Fig. 2 Performance comparison of seven classifiers on FS2
3.3 Cross-Validation Results on the FS3 Table 7 presents the computed values of all the seven metrics for all the seven classifiers. As can be seen from this table models based on LR, LDA, KNN, and NB classifiers have performed better as compared to the other three models. Among these four too, KNN can be seen to perform the best. The receiver operator characteristics (ROC) which is the AUC under all possible threshold values have also been used to evaluate the performance of the models. The 10% data which was reserved for the cross-validation test was consumed for validation using the split train test. Cross-validation is performed to check how well the models perform on the unseen data after training. Table 8 presents the results of cross-validation for six classifiers. The pictorial representation of comparative performances of all classifiers is depicted in Fig. 3. The results for the SVM classifier could not be achieved even after repeated attempts, probably due to some resource Table 7 Results of all seven classifiers on FS3 Algorithms
Precision
Recall
F1 support
Accuracy
Specificity
AUC
ROC
Logistic regression
0.992
0.981
0.986
0.976
0.924
0.953
0.919
LDA
0.996
0.981
0.988
0.980
0.968
0.974
0.906
KNN
0.999
0.985
0.992
0.986
0.998
0.992
0.936
Naïve Bayes
0.999
0.900
0.947
0.900
0.998
0.949
0.930
DT
0.984
0.985
0.985
0.972
0.864
0.925
0.928
Random forest
0.987
0.985
0.986
0.975
0.887
0.936
0.936
SVM
1.00
0.896
0.945
0.896
0
0.44
–
Effective Detection of DDoS Attack in IoT-Based Networks Using …
537
Table 8 Results of cross-validation using all seven classifiers on FS3 Logistic regression
LDA
KNN
Naïve Bayes
DT
Random forest
SVM
Validation accuracy
0.976
0.980
0.986
0.900
0.972
0.975
–
Supervised Classification Techniques
Classifiers
Performances of 7 Classifiers Techniques with SelectKBest on FS3. SVM Random forest DT Naïve bayes KNN LDA Logistic Regression 0
ROC
AUC
Specificity
0.5 Values of Parameters Accuracy
f1 support
Recall
1 Precision
Fig. 3 Performance comparison of seven classifiers on FS3
restrictions/crunch in our experimental setup. It can be concluded from Tables 7, 8, and Fig. 3 that KNN is ahead of other models with results in Tables 7 and 8 endorsing each other.
3.3.1
Observations
It can be clearly observed from Table 7 that experimental setup 3 used 13 features in FS3 to evaluate the performance of each of the seven classifiers, KNN gave the best performance out of all classifiers with an accuracy of 98.6%, precision of 99.9%, recall of 98.5%, specificity of 99.8%, and ROC with 93.6%. The results of cross-validation are also encouraging with Naïve Bayes showing better performance compared to the testing accuracy. The missing SVM results are still awaited and the authors are working on them.
538
A. Deep and M. Sood
4 Results Analysis Looking at the results of all the models of supervised ML based on three datasets from three feature selection techniques, it is observed that many of the classifiers exhibit good accuracy on all the datasets i.e., FS1, FS2, and FS3, while KNN performs best on every feature set with 98% accuracy. LR, LDA, KNN, and NB gave the best values for precision on all these datasets, while LR and KNN exhibit the best value for precision on the dataset FS2. It can be clearly observed from Table 3 that for the experimental setup involving FS1, 13 features were used to evaluate the performance of each of the seven classifiers. KNN showed the best performance out of all classifiers with accuracy of 98%, specificity of 99%, and ROC with 93%. The results of cross-validation are also encouraging with NB showing better performance than the corresponding accuracy of the testing data. KNN seems to work very well with the wrapper selection method, whereas other models also gave encouraging results. Table 5 presents the results of all the classifiers after implementation on dataset FS2. It was observed that most of the classifiers performed very well in detecting the attack. KNN gave the best accuracy, while NB has the best specificity out of all the classifiers. Comparing Tables 3 and 5, it seems that the specificity of LDA has improved in the FS2 technique which is also a wrapper selection method. Table 7 presents the results of all the classifiers after implementation on dataset FS3. It was observed that most of the classifiers performed very well in detecting the attack. KNN gave the best accuracy while having the best specificity out of all the classifiers. Comparing Tables 3, 5, and 7, it can be seen that the accuracy of NB seems to have increased in FS3 which is a statistical selection method. NB and KNN exhibited the best precision on the dataset FS3. KNN gave the best specificity on dataset FS1, whereas NB provided the best specificity on the dataset FS2, KNN, and NB showed the best specificity on the dataset FS3. The ROC is an important parameter that depicts the accuracy value of a classifier on every threshold value. All the classifiers on the dataset FS1 gave 93% ROC except LDA and DT, and all classifiers gave 93% ROC except SVM whose results are awaited and authors are working on it. KNN and RF showed the best ROC on dataset FS3. The missing results in Tables 3, 4, 5, 6, 7, and 8 are still awaited due to the high demand for computation for each dataset. The authors are still working to compute the missing results in these tables.
5 Conclusion and Future Work DDoS attacks in a typical IoT setup can play havoc with the services being rendered by this environment since its detection itself becomes very hard leaving aside the prevention or removal. As ML can make a machine learn from the data collected in the past from a similar environment, it comes in handy in such detection mechanisms. In this research work, one such dataset was downloaded from the Kaggle website which
Effective Detection of DDoS Attack in IoT-Based Networks Using …
539
included DDoS attack data in a dataset named ‘Network Logs’ and included data from various IoT devices in a typical environment. This dataset was pre-processed using data cleaning and data preparation processes and after pre-processing, three different feature selection techniques were used to explore their effect on the performance of various supervised ML-based prediction models. All these three feature selection techniques returned three different sets of features. Seven prediction models based on seven supervised ML classifiers were deployed on these three sets of datasets for evaluating the performance of all these models. Seven standard metrics based on the confusion matrix were chosen for this evaluation. Although most of the prediction models gave encouraging results, however, KNN, NB, and RF outperformed all other classifiers in most of the parameters. Currently, the authors are working on obtaining the requisite for the model based on SVM (ROC and cross-validation results). In the future, they intend to propose and develop approaches for effective prevention as well as the removal of such DDoS attacks on IoT devices.
References 1. Santhi Vandana T, Sreenivasa Ravi K (2018) A survey overview: on wireless body area network and its various applications. Int J Eng Technol 7(2.7):936 2. Abiodun OI, Abiodun EO, Alawida M et al (2021) A review on the security of the internet of things: challenges and solutions. Wireless Pers Commun 119:2603–2637 3. Fedrik D, Vinitha A, Vanitha B (2019) Review on vulnerabilities of IoT security. Int J Trend Sci Res Dev 3(4):1117–1119 4. Mohammed S (2021) A machine learning-based intrusion detection of DDoS attack on IoT devices. Int J Adv Trends Comput Sci Eng 2021:2792–2797 5. Radware (2013) DDoS survival handbook [PDF Document] [Online]. Available at: https:// cupdf.com/document/ddos-survival-handbook.html. Accessed 2 May 2022 6. Varma R (2019) IoT security: a review of vulnerabilities and security protocols. J Mech Continua Math Sci 14(2) 7. Mirkovic J, Reiher P (2004) A taxonomy of DDoS attack and DDoS defense mechanisms. ACM NSIGCOMM Comput Commun Rev 34(2):39–53 8. Géron A (2019) Hands-on machine learning with scikit-learn and tensorflow concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc., Sebastopol 9. Zheng A, Casari A (2018) Feature engineering for machine learning. O’Reilly Media, Inc., Sebastopol 10. Sharma S, Sood M (2020) Exploring feature selection technique in detecting Sybil accounts in a social network. In: Advances in intelligent systems and computing, pp 695–708 11. Kumari A, Sood M (2020) Performance analysis of the ML prediction models for the detection of Sybil accounts in an OSN. In: Advances in intelligent systems and computing, pp 681–693 12. Nkiama H, Zainudeen S, Saidu M (2016) A subset feature elimination mechanism for intrusion detection system. Int J Adv Comput Sci Appl 7(4) 13. Bhatta M (2022) [Online]. Available at: http://doi.org/10.21275/ART20203995. Accessed 30 May 2022 14. Brownlee J (2016) Machine learning mastery with Python [Online]. Google Books. Available at: https://books.google.com/books/about/Machine_Learning_Mastery_With_Python. html?id=BgmqDwAAQBAJ. Accessed 31 May 2022 15. Jayaprakash S, Krishnan S, Jaiganesh V (2020) Predicting students academic performance using an improved random forest classifier. In: 2020 international conference on emerging smart computing and informatics (ESCI), pp 238–243
540
A. Deep and M. Sood
16. Brownlee J (2016) Master machine learning algorithms, pp 34–124 17. Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190 18. Nasteski V (2017) An overview of the supervised machine learning methods. Horizons B 4:51–62 19. Kurt I, Ture M, Kurum A (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374 20. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst 80:56–71
A Collaborative Destination Recommender Model in Dravidian Language by Social Media Analysis Muneer V. K.
and K. P. Mohamed Basheer
Abstract Data generation in social media has seamlessly gone beyond imagination nowadays. It could be in varieties of types like images, videos, audio, multimedia formats, and text. The reports or text can also be in different languages. Language processing has been emerged as one of the hottest research area with advancements of artificial intelligence. Social media is considered as the largest data repository and growing time to time. This paper focuses on information retrieval methods from Facebook.com in travel and tourism domain which are written in Malayalam language, one of the prominent Dravidian languages used in the southern part of Indian state Kerala. The second topic discusses on developing an algorithm to recommend the suitable locations for individuals by fetching their travel histories, personal choices, and preferences using unsupervised machine learning techniques. A customized dataset has been generated from the largest Malayalam Facebook group in travel domain named “Sanchari” whose URL is www.facebook.com/groups/teamsanchari. Algorithm can suggest a set of suitable destinations for each user with an accuracy of 90% with help of collaborative filtering. Keywords Data mining · Recommender system · Natural language processing · Collaborative filtering
1 Introduction The concept of big data became inevitable in every aspect of business organizations, real-time data processing, and delivering various services. Social media is considered one of the major contributors to big data. Social networking sites like Facebook, Twitter, Instagram, and YouTube are the most popular players in digital activities Muneer V.K (B) · K. P. Mohamed Basheer Research Department of Computer Science, Sullamussalam Science College, University of Calicut, Kerala, India e-mail: [email protected] K. P. Mohamed Basheer e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_45
541
542
Muneer V. K. and K. P. Mohamed Basheer
worldwide, and hence, the penetration across all regions is constantly increasing. People use online platforms to share their photos, reviews, and activities. Information retrieved from such sources properly processed to supply value added services for commercial purposes, branding, and item recommendations. Several organizations largely depend on social media sites as source of information because the current trends and customer preferences can easily identified [1]. Facebook, Twitter, WhatsApp, Instagram, YouTube, and snapchat are lead players in social media data production. Collecting data from social media is considered the most crucial thing [2]. The three V’s in big data [3], velocity, variety, and volume of data processing need to be properly addressed. High-end machines, architecture, and algorithms are used here. The quality of outputs directly depends on how good the data is processed. Information retrieval from social media can automate in three ways. (a) Web services (b) Through API (c) Web scraping. Web scraping can be done either by built-in packages or manually develop a tool for customized retrieval. Scraping can be termed as extracting data from web sources and store and retrieve for further processing. The recommendation is the process of suggesting an item, book, film, destination, music, or anything to a person based on previous transactions or preferences. With the advancement in IT and AI, there are unimaginable opportunities for social media RS [4] in every domain. The travel and tourism sector is one of the emerged domains recently. As technology is boosted at par with artificial intelligence, another area that enjoys its advantage is language processing. There are a huge number of research works carried out in region-based language processing in speech, text, speech to text, and text to speech. A recommender system in the Malayalam language for the travel and tourism sector is the motive of this article.
2 Literature Review Anitha Ananthan examined the efficiency of recommendation model by conducting and experiment on 61 excellent research articles published in SCOPUS and WoS in between year 2011 and 2015 [5]. Ilham Safeek proposed an appropriate career path and preprocessing Facebook data for analyzing user-based posts, engagements, and sentiments [6]. J. He and W. W. Chu introduced methods to design more efficient RS by evaluating the way how past choices and reviews of users correlate with those of friend circle. Sunita Tiwari et al. carried out a study to improve the accuracy of the recommender system by processing data generated from Twitter [7]. The hurdles and challenges while conducting social media analytics studied by Stefan and concluded that data discovery, collection, preparation, and analysis are the four distinct steps in analytics [2].
A Collaborative Destination Recommender Model in Dravidian …
543
There are several notable works completed in natural language processing. Remiyya Devi used the structured skip-gram model to extract useful information from social media articles written in the Malayalam language [8]. SUMMARIST is a tool proposed by E. Hovy for automatically summarizing text into two models as extractive summarization and abstractive summarization [9]. A multi-model language processing model implementing the algorithm for sentiment analysis in Malayalam and Tamil languages [10]. Jayan proposed methodologies for annotating tokenized words in the Malayalam language that undergo chunking and then tagged with parts of speech tagger with the help of statistical approach; TNT-tagger [11, 12]. A finite state transducer model for deep-level morphology analyzer for Malayalam language [13] discussed by Santhosh. Y. Babu investigated code-mixing issues between Malayalam and English languages for sentiments analysis using sentence level BERT and sentiment features [14, 15]. S. Thara discussed about transformer-based language identification for Malayalam-English code-mixed text [16]. Text summarization with pretrained encoders, Liu proposed a unique document-level encoder based on the BERT model that was able to express the semantical details of a document and obtain representations for its sentences and claimed an excellent performance in both extractive and abstractive summarization [17]. A deep-level tagger for Malayalam has been implemented with the help of word embedding and support vector machine algorithms [18]. Recent advancements of recommender models in Arabic languages evaluated by Srifi et al. [19]. Prominent types of data filtering such as collaborative filtering (CF), content-based filtering (CBF), and hybrid filtering techniques (HF) discussed in [20]. A multi-criteria decision making method based on deep encoder by Yahya and Team to study the nonlinear relation between users and proposed correlation coefficient and standard deviation approach [21]. A personalized recommendation developed by using hybrid method of multi-attribute collaborative filtering by analyzing social media sites [22]. Travel recommendations for different users through deep learning techniques under global information management discussed in [23]. Jia-Li Chang proposed a personalized travel RS by using hybrid method with CF and social network analysis [24].
3 Dataset As aforementioned, Facebook is one among the largest social media platform where people express their opinions, upload images, and share reviews. Thousands of groups and pages are there in various domains in various languages. For this experiment, )” translated the largest Malayalam Facebook travel group, “Sanchari ( as “Traveler” in English. The total member count of this groups is 7.2 lakh and number of travel posts is more than 50,000, about various locations around the globe. The lengthy unstructured travelogues are written in Malayalam language. The tool scraped 12,500 posts from different users, its likes, comments and shares, personal preferences, public check-ins, academics, and jobs.
544
Muneer V. K. and K. P. Mohamed Basheer
4 Methodology The overall process is categorized into seven segments. 1. 2. 3. 4. 5. 6. 7.
Data scraping Language preprocessing and dataset preparation Part of travelogue tagger (POT tagger) creation Travel DNA formation Location DNA preparation Traveler–location mapping Personalized location recommendation.
1. Data Scraping Facebook plays an important role in data generation and user management. Processing of these directly depends upon the quality and quantity of data. Facebook itself provides few methods to retrieve data through their own native API’s [2]. Graph API explorer can collect few details from pages and groups. But the amount of data which can provide to external sources would not be sufficient enough to carry out research and development. Another possibility of data fetching is to create a custom tool for scraping. The tool succeeded in fetching sufficient data with help of various scripting languages and supporting tools. The activities of scraping tools are listed below, • Enter desired group/page by Facebook login authorizations. • Identifying DOM and locating DOM elements. • Collecting users’ posts, travelogues, and count comments, likes and their reactions. • Fetch the personal preferences of the user and check-in history details. • Transforming data from JSON to CSV or Excel sheet. 2. Language Preprocessing of Malayalam Text and Dataset Preparation To remove noises and unwanted details from the extracted text, a couple of methods are commonly used. Data cleansing is very crucial in the recommender system as the efficiency of the application directly depends on the quality of input data. As Malayalam is considered one of the most agglutinative languages in India, the processing involves a few more steps and executing functional modules. The steps are (a) (b) (c) (d) (e)
Removal of punctuations and special characters. Process code-mixing and numerical values. Sentence tokenization and word tokenization. Malayalam stop word removal. Stemming—through the root-pack extractor.
A Collaborative Destination Recommender Model in Dravidian …
545
Table 1 Additional tag set of part of travelogue tagger for Malayalam texts S. No.
Category
Label
Annotation convention
1
L
L_N
Travel type
1
TT
TT_T
3
Travel mode
1
TM
TM_N
4
Climate
1
LC
LC_T
5
Location type
1
LT
LT_N
Top level
Sub level
1
Location
2
Example
The main tags used in POT tagger are Location (L): London, Manali Travel type (TT): Train, road, bike Travel mode (TM): Solo, family, friends Location climate (LC): Summer, winter, rainy Location type (LT): Historical, pilgrimage, natural, adventurous
3. Part of Travelogue Tagger Creation Unavailability of a tagger for annotating Malayalam text in travel domain resolved by creating a new tagger. The part of travelogue tagger (POT) contains tags for annotating essential details related to the domain. The tagger is created by integrating additional fields and tags in the existing part of speech tagger of Malayalam language [25, 26] (Table 1). 4. Travel DNA Formation As the name DNA implies, Travel DNA is one of the most important features of each travelogue. The descriptive write-ups from the dataset are taken as input and annotated with POT tagger. Among these hundreds of words, the most important tags in each domain are used to constitute the Travel DNA of travelogue. The essential components of Travel DNA will be location, travel mode, travel type, location climate, and type of location (Table 2). From table, Travel DNA of two travelers, A and B, from the features extracted from their travelogues and posts, their nature and tastes of travel can be summarized with the help of POT tagger. Each traveler may or may not prefer different modes in different situations. Every traveler may visit multiple destinations. Each travel may be in a different season, travel mode, and travel type. All these details mapped to form individual Travel DNA. These individual Travel DNAs are used to create user clusters. The members of each cluster ideally have similar tastes and preferences. Moreover, these clusters can
546
Muneer V. K. and K. P. Mohamed Basheer
Table 2 Sample of Travel DNAs Traveler
Location
Tr_Type
Tr_Mode
Climate
Purpose
A
Manali 1
Bike 1
Friends 3
Winter 1
Adventure 1
Hampi 3
Train 4
Solo 1
Summer 2
Historic 2
Munnar 4
Car 2
Family 2
Winter 1
Nature 5
B
A and B are specimen users
be dynamically created and updated based on the parameters considering different situations (Fig. 1). 5. Location DNA Preparation As the Travel DNA for each user is created, another set of data related to individual destinations has been created which will be more helpful for clustering of locations. This list is named Location DNA. As long as the length of Location DNA increases, the accuracy of prediction through clustering and collaborative filtering increases. This table is created while processing the POT tagged travelogues to filter the five most relevant features of each location. This is a consolidated information about every destination to where people reached by which mode, which type of travel, in which climate and total visits (Fig. 2). 6. Traveler–Location Mapping
Fig. 1 Travel DNA of travelers, a 3D view
A Collaborative Destination Recommender Model in Dravidian …
547
Fig. 2 Location DNA—a summarized list of destinations
During this phase, an extensive comparative study conducted between location to location and traveler to traveler. The correlation between similarities of locations helped to group them to form clusters. Preferences and prior travel histories of travelers along with their travel modes in various situations are grouped to form different user clusters (Fig. 3). By applying the unsupervised method to process these data and automatic clustering with the help of collaborative filtering methods. CF is a method of producing automatic suggestions/predictions based on the interests of a traveler by collecting preferences or choice information from many travelers (collaborating). There are commonly two models of CF. In user-based CF, which measures the resemblance between target travelers and other users. In item-based (location-based) CF, which
548
Muneer V. K. and K. P. Mohamed Basheer
Fig. 3 Traveler—location mapping and rating matrix
measures the connection between the locations that travelers rate or interact with and other locations. Once a list of travelers similar to a traveler U has been determined, next to calculate the rating R, that U would give to a particular location L or item I. Again, just like similarity, it can be done in multiple ways. It is also possible to predict a traveler’s rating R for an item I or location L, will be close to the average of the ratings given to I by the top 5 travelers most similar to user U. The mathematical equation for the average rating of n travelers would look like this RU =
n
Ru /n
u=1
The formula can be summarized that the average rating of n similar travelers is equal to the sum of the ratings given by these travelers divided by the number of similar travelers, which is n. 7. Personalized Destination Recommendation By processing Travel DNA, Location DNA, and rating matrix, the implicit correlations between travelers to travelers and likewise locations to locations could be calculated. This comparison is helpful for internal clustering to resolve the traveler– destination mapping in many ways. Once the model has developed and trained enough to produce suggestions, it is designed to display two modes of recommendations: primary recommendations and secondary recommendations. (a) Primary recommendation set is a list of locations visited by the user or similar travelers who have the same preferences and tastes. (b) Secondary recommendation set is a list of locations that are more suitable to that user where the user never visited. The list contains five locations ranked in descending order to match with input given by the user in that context. During the training phase of the model, all the above-mentioned procedures have been successfully completed and the algorithm is ready to suggest the most suitable
A Collaborative Destination Recommender Model in Dravidian …
549
Fig. 4 Recommendation output of most suitable destinations
five destinations to that user. While performing the testing phase, the recommendation model prompts the user to enter a few details such as preferred travel mode, travel type, and traveling season. By processing this input, the user is able to see the primary recommendation list and secondary recommendation list (Fig. 4).
5 Conclusion Social media has become principal source of data as it provide public platform to share comments, photos, reviews, and updates. This paper discussed about two different algorithms. First, development of a customized data extraction algorithm to scrape all relevant information from the largest Malayalam Facebook travel group named “Sanchari”. As initial phase, 12,500 full-length Facebook posts each contain an average of 40 sentences extracted. These lengthy travelogue undergone data cleansing and preprocessing through natural language processing lead to creation of dataset. The tool scraped the public personal information of 3781 travelers and their check-in details about 84,463 in numbers around the globe. Secondly, the paper discussed about the algorithm for recommending personalized travel destinations for individual travelers. By providing answers for the questions from the recommendation model, it will recommend the most suitable five destinations to the user. By conducting manual testing, the secondary recommendation found 80% accuracy. The result can be improved by optimizing part of travelogue tagger and enlarging the corpus capacity of Travel DNA and Location DNA. There are no similar works in the Malayalam language for the personalized travel recommender system where the algorithm became significant.
References 1. Rieder B (2013) Studying facebook via data extraction: the Netvizz application. In: Proceedings of 5th annual WebSci’13, pp 346–355. http://doi.org/10.1145/2464464.2464475 2. Stieglitz S, Mirbabaie M, Ross B, Neuberger C (2018) Social media analytics—challenges in topic discovery, data collection, and data preparation. Int J Inf Manage 39:156–168. http://doi. org/10.1016/j.ijinfomgt.2017.12.002 3. Shabana M (2021) A study on big data advancement and big data. J Appl Sci Comput 4099 4. He J, Chu WW (2010) A social network-based recommender system (SNRS)
550
Muneer V. K. and K. P. Mohamed Basheer
5. Anandhan A, Shuib L, Ismail MA, Mujtaba G (2018) Social media recommender systems: review and open research issues. IEEE Access 6:15608–15628. http://doi.org/10.1109/ACC ESS.2018.2810062 6. Safeek I, Kalideen MR (2017) Preprocessing on facebook data for sentiment analysis, vol 2017, pp 69–78 7. Tiwari S, Saini A, Paliwal V, Singh A, Gupta R, Mattoo R (2020) Implicit preferences discovery for biography recommender system using Twitter. Procedia Comput Sci 167(2019):1411–1420. https://doi.org/10.1016/j.procs.2020.03.352 8. Devi GR, Veena PV, Kumar MA, Soman KP (2016) Entity extraction for Malayalam social media text using structured skip-gram based embedding features from unlabeled data. Procedia Comput Sci 93:547–553. https://doi.org/10.1016/j.procs.2016.07.276 9. Hovy E, Lin C-Y (1999) Automated text summarization in SUMMARIST. Adv Autom Text Summ 81–97 [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. 21.2103 10. Chakravarthi BR et al (2021) Dravidianmultimodality: a dataset for multi-modal sentiment analysis in Tamil and Malayalam [Online]. Available: http://arxiv.org/abs/2106.04853 11. Jayan JP (2014) Parts of speech tagger and Chunker for Malayalam—statistical approach, vol 1719, no 3, pp 6–11 12. Mubarak DMN, Shanavas SA (2018) Malayalam text summarization using graph based method, vol 9, no 2, pp 40–44 13. Thottingal S (2019) Finite state transducer based morphology analysis for {M}alayalam language. In: Proceedings of 2nd workshop on technologies for MT of low resource languages, pp 1–5 [Online]. Available: https://www.aclweb.org/anthology/W19-6801 14. Babu YP, Eswari R, Nimmi K (2020) CIA_NITT@Dravidian-CodeMix-FIRE2020: Malayalam-English code mixed sentiment analysis using sentence BERT and sentiment features. In: CEUR workshop proceedings, vol 2826, pp 566–573 15. Chakravarthi BR et al (2022) DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Springer, Netherlands 16. Thara S (2021) Transformer based language identification for Malayalam-English code mixed text. IEEE Access 9. http://doi.org/10.1109/ACCESS.2021.3104106 17. Liu Y, Lapata M (2020) Text summarization with pretrained encoders. In: EMNLP-IJCNLP 2019—2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing proceeding conference, pp 3730– 3740. http://doi.org/10.18653/v1/d19-1387 18. Ajees AP, Abrar KJ, Sumam MI, Sreenathan M (2020) A deep level tagger for Malayalam, a morphologically rich language. J Intell Syst 30(1):115–129. https://doi.org/10.1515/jisys2019-0070 19. Srifi M, Oussous A, Ait Lahcen A, Mouline S (2021) Evaluation of recent advances in recommender systems on Arabic content. J Big Data 8(1). http://doi.org/10.1186/s40537-021-004 20-2 20. Malik S, Rana A, Bansal M (2020) A survey of recommendation systems. Inf Resour Manag J 33(4):53–73. https://doi.org/10.4018/IRMJ.2020100104 21. Bougteb Y, Ouhbi B, Frikh B, Zemmouri EM (2021) A deep autoencoder based multi-criteria recommender system, pp 56–65. http://doi.org/10.1007/978-3-030-76346-6_6 22. Chang JL, Li H, Bi JW (2021) Personalized travel recommendation: a hybrid method with collaborative filtering and social network analysis. Curr Issues Tour 1–19. http://doi.org/10. 1080/13683500.2021.2014792 23. Zhang, Song Y (2021) Research on the realization of travel recommendations for different users through deep learning under global information management. J Glob Inf Manag 30(7) 24. Chang J-L, Li H, Bi J-W (2021) Personalized travel recommendation: a hybrid method with collaborative filtering and social network analysis. Curr Issue Tour. https://doi.org/10.1080/ 13683500.2021.2014792
A Collaborative Destination Recommender Model in Dravidian …
551
25. Francis M, Nair KNR (2014) Hybrid part of speech tagger for Malayalam. In: Proceedings of 2014 international conference on advances in computing, communications and informatics, ICACCI 2014, pp 1744–1750. http://doi.org/10.1109/ICACCI.2014.6968565 26. Anish A (2008) Part of speech tagging for Malayalam. Amrita Vishwa Vidyapeetham
Analysis of Influential Features with Spectral Features for Modeling Dialectal Variation in Malayalam Speech Using Deep Neural Networks Rizwana Kallooravi Thandil
and K. P. Mohamed Basheer
Abstract Over the past few decades, research has focused heavily on automatic speech recognition (ASR). Although ASR for a few languages is close to reality; ASR for the low resource languages like Malayalam is still in its infancy. Here, in this work, the authors discuss the experiment conducted on accented data of Malayalam speech using two approaches. One approach uses the spectral features for modeling using deep convolutional neural network and the other uses the influential features of speech signals for modeling using LSTM-RNN approach. The proposed methodology comprises three distinct stages; dataset preparation, feature extraction, classification, and hence the construction of deep learning models that recognize the accent-based spoken sentences in the Malayalam language. Mel-frequency cepstral coefficient (MFCC) algorithm, short-term Fourier transform (STFT), and mel spectrogram methodologies are used for feature engineering and hence the features that represent the speech signals are used for constructing the accented ASR system for the Malayalam language using long short-term memory (LSTM) a recurrent neural network (RNN). The spectrogram dataset has been constructed for the speech dataset and used for constructing the ASR model with deep convolutional neural network (DCNN). The result of the experiment shows that LSTM-based RNN outperforms DCNN for the proposed dataset that has been constructed for the experiment in the natural recording environment. Keywords Automatic speech recognition (ASR) · Long short-term memory (LSTM) · Deep convolutional neural network (DCNN) · Mel-frequency cepstral coefficients (MFCCs) · Malayalam speech recognition
R. K. Thandil (B) · K. P. Mohamed Basheer Sullamussalam Science College, University of Calicut, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_46
553
554
R. K. Thandil and K. P. Mohamed Basheer
1 Introduction Speech is the most natural way of expressing thoughts and ideas for effective communication. Enabling a machine to mimic the human hearing system and thus understanding the spoken words comprises the ASR. ASR for a few languages has significant advancements in research. But for a low-resourced language like Malayalam, only a few works have been recorded, and hence efforts should be invested in research in the area of ASR for the language. Accent-based automatic speech recognition system is still an unexplored area in the Malayalam language. Malayalam is a language spoken in the Indian state of Kerala and the Lakshadweep islands. When moving from the south to the north across Kerala, there are many different varieties of accents spoken in the Malayalam language. Malayalam is spoken with many dialects that vary according to geographical distribution and socio-cultural and religious disparities. Accented speech recognition is yet to be an addressed problem in the Malayalam language. Developing a speech recognition system that can recognize both spoken and nonspoken words in Malayalam is very challenging since the language has a rich vocabulary and many consonant letters. Speech-to-text processing for the language is difficult since it is an agglutinative and highly inflected language. The unavailability of a benchmark dataset for conducting the research also poses a great challenge for the researchers in the area. Few works that are happening in parallel unaware of each other if properly monitored and compiled together might contribute to a faster ASR for the language. ASR is very useful for handicapped individuals, illiterate individuals, hands-free driving, and the list of advantages goes on. Speech is influenced enormously by the elements like articulations, accent, noise, pitch, volume, amplitude, echoes, and gender. The method of explanation, nasality, enthusiastic state, use of words, and speed fluctuation in speech make ASR a difficult task as well.
2 Related Work The research on Malayalam speech recognition is advancing at a slower rate. In the past few years, research on speech recognition and ASR technology has made great advancements in languages like English and Arabic. Teferra et al. [1] proposed DNN-based frameworks ASR for four Ethiopian dialects and found that DNN-based ASR had outperformed GMM-based ones for all the dialects. El-Moneim et al. [2] proposed a text-free speaker acknowledgment under various conditions using MFCCs, range, and log-range highlights with an LSTM-RNN classifier that yielded good results. Palaz et al. [3] in their work performed a comparison of the CNNbased methodology against the traditional ANN-put together methodology for the Wall Street Journal corpus. Sasikuttan et al. [4] have illustrated an ASR for Malayalam language words using python’s speech recognition libraries. It apperceives any amalgamation of formal
Analysis of Influential Features with Spectral Features for Modeling …
555
Malayalam words pronounced with a gap between the words. They proposed a system that provides a simple and basic interface for the conversion as it is mainly intended for illiterate people. Abdel-Hamid et al. [5] propose CNN for building ASR and present a succinct depiction of the essential CNN, and clarify how it very well may be utilized for speech recognition. Also, proposed a restricted weight-sharing plan that can all the more likely model speech highlights. Issa et al. [6] proposed methods for speech emotion recognition utilizing one-dimensional deep CNN with a mix of five unique sound features. Each of the proposed models works straightforwardly with speech data without the requirement for a change to visual portrayals in contrast to the conventional methods. Passricha et al. [7] discuss the CNN-based direct speech recognition model. The model straightforwardly takes in the important portrayal of the speech signal in an information-driven way and works out the contingent likelihood for every phoneme class. The proposed model shows comparable execution as shown by the MFCC-based traditional mode. Senior et al. [8] proposed a model that used LSTM-RNN engineering that outperforms standard LSTM networks and DNNs that utilize the parameters which yield the best results when used for training. Yi et al. [9] proposed an adaptation method for ASR in multi-accented Mandarin speech. They focused on constructing an ASR model using LSTM-RNN with the connectionist temporal classification loss function. The authors could bring out better outcomes without overfitting the model by adding a regularization term apart from the training process. Ghule et al. [10] developed a phonetic database of isolated words in Marathi and an ASR system for the same. They also concluded that artificial neural networks gave the best results when used for the construction of ASR. Shanthi and Chelpa [11] proposed an arrangement of isolated word speech recognition for the Tamil language utilizing the hidden Markov model (HMM) approach. They used mel-frequency cepstral coefficients (MFCC) algorithm to extract the significant speech components. The authors chose a triphone-based acoustic model for Tamil digits. The experiment yielded a good rate of accuracy of about 90%. Radzikowski et al. [12] proposed a method to modify the accent of the non-native speaker so that it closely resembles the accent of the native speaker. The authors used the spectrogram which is the graphical representation of the speech signals for experimenting. The experiment yielded better outcomes when done with an autoencoder based on CNN.
3 Proposed Methodology and Design The principal goal of this paper is to come up with a better approach to constructing ASR for multi-accented Malayalam speech. This paper centers around proposing a word-based ASR for the Malayalam language utilizing deep learning algorithms. The authors have constructed a dataset using crowdsourcing from people hailing from different localities. Though the speech data recorded in the studio environment gives better results the authors of this paper preferred to collect speech data recorded in a
556
R. K. Thandil and K. P. Mohamed Basheer
natural recording environment. The authors hereby discuss the outcomes of the experiments conducted using LSTM-RNN and DCNN on the accented speech. Different approaches were taken for developing two models with two different techniques throughout the experiment. We shall discuss the entire process of the experiment in detail in the coming sessions. We propose a comparative analysis of two different approaches for constructing accent-based ASR for the Malayalam language. Both approaches have yielded better outcomes when experimented with so many low-resourced languages. The performance outcome of an experiment is highly dependent on the nature of the dataset and the methodologies adopted to address the problem that varies with languages. The approach suitable for one language may not suit the other. It is by constantly experimenting with a language with different approaches that concludes a better approach for a particular language. This experiment used sound datasets that belong to 20 classes. The spoken words are collected using crowdsourcing methods recorded in the natural recording environment as discussed earlier. A corpus of 4000 data has been collected for conducting this experiment. Input from speakers of various age groups and different localities is considered for conducting this experiment. Speech signals are preprocessed to retrieve the features which thusly is utilized for preparing the model. The model is trained using the LSTM-RNN and CNN algorithms and mapped onto the word models. The test set is split arbitrarily from among the dataset which is preprocessed and the features extracted. These features are then used to build the models in both approaches. The features of the speech signals are then analyzed and compared against the target classes and weights get updated likewise during the training stage. The features of the test signals are fed to the networks, and the results are predicted accordingly.
3.1 The Proposed Methodology 1. 2. 3. 4. 5. 6. 7. 8. 9.
Construct the dataset. Extract the audio features using MFCC. Extract the audio features using short-term Fourier transform (STFT). Extract the audio features using mel spectrogram. Construct accent-based ASR using LSTM-RNN. Construct accent-based ASR using DCNN. Model evaluation and prediction. Comparative analysis. Conclusion and future scope.
Analysis of Influential Features with Spectral Features for Modeling …
557
3.2 Dataset Here, we constructed a dataset that contains utterances of the words in the Malayalam language. The dataset is constructed from Kasaragod, Kannur, Kozhikode, Wayanad, and Malappuram which are the five different districts in north Kerala. The dialect spoken in these regions highly deviates from the original dialect. The samples are collected from all age groups to represent the signals in the dataset. The speech corpus is constructed from 30 distinct native speakers belonging to different age groups. Multiple utterances of the words by different speakers (both male and females) were recorded and then every recording is sampled to a frequency of 16,000 Hz which is then converted into the .wav format. A speech corpus of about 1.25 h has been constructed for doing this experiment (Table 1). Kasaragod is the end of northern Kerala that shares its border with Karnataka where people speak Kannada. So the Malayalam spoken by the natives have a stronger influence on Kannada. We have collected a speech sample of 760 recordings in that accent. Kannur shares the borders with Kasaragod on one end, Kozhikode on the other, and Karnataka on another end which again makes the dialect influenced by the Kannada language. We have collected 760 samples from this locality to represent the accented data. Then, we have Kozhikode where the language is less infected than the other two accents. We have collected 1090 samples of the data from the Kozhikode and for the Wayanad accent, the district shares borders with Karnataka and Tamil Nadu on two sides. We have collected 630 samples from this district. So the dialect spoken here influences the accent mixing with the Tamil and Kannada languages here. Then, we collected 760 data from the Malappuram district. Each of these districts has so many sub-dialects spoken in different localities. Table 2 contains the statistical data of the speech recordings collected based on the different age groups. The dataset contains data recordings from speech donors of all ages that speak. We have collected the majority of the data from the age group that falls between 20 and 45 to represent clear and quality data. Table 3 represents the 20 random classes of isolated words for building the dataset. Multiple recordings of the audio samples that fall under the same class from different speakers are collected for building the dataset. Table 1 Statistics of data collected from different districts based on dialects
Dialects based on districts Kasaragod Kannur Kozhikode Malappuram Wayanad Total
Number of samples collected 760 760 1090 760 630 4000
558 Table 2 Statistics of the dataset based on age groups
R. K. Thandil and K. P. Mohamed Basheer Age-wise category
Size of the data collected
5–12
660
13–19
660
20–45
1500
46–65
690
66–85
490
Total
4000
Table 3 Example classes used in the experiment
4 Methodology for LSTM-RNN The experiment is conducted with two different approaches. The first approach is with the LSTM-RNN methodology. The steps involved in the process are 1. Feature extraction 2. Model building using LSTM-RNN 3. Result analysis.
4.1 Feature Extraction Feature extraction is the most crucial step in developing machine learning models. Selecting the significant features for the experiment yields a better-trained model. Insignificant features if selected would highly affect the quality of the model generated. Here, in this experiment, we have adopted three different methodologies for feature extraction. The different approaches are 1. Mel-frequency cepstral coefficients (MFCC)
Analysis of Influential Features with Spectral Features for Modeling …
559
Fig. 1 Forty MFCC features extracted from the speech signal
2. Short-term Fourier transform (STFT) 3. Mel Spectrogram. 4.1.1
Feature Extraction Using MFCC
Forty prominent features from each speech signal using the MFCC algorithm have been extracted from the input speech. The MFCC algorithm comprises the following steps for receiving the input signal: 1. 2. 3. 4. 5. 6. 7.
Pre-emphasis Framing Windowing Discrete Fourier transform Apply mel filter bank to obtain mel spectrum Apply log to obtain log mel spectrum Discrete cosine transformation.
The speech signal flows through the above steps and the most prominent features or the speech coefficients that range from 13 to 40 are obtained. Here, in this experiment, we use 40 prominent frequency values from the signal (Fig. 1).
4.1.2
Feature Extraction Using STFT
After extracting the 40 features from the speech signal using the MFCC, we used the short-term Fourier transform method to retrieve the amplitude of the frequency of the signal at any point in time. We have extracted 12 features that correspond to the amplitude of the signal for this experiment. Figure 2 shows the visualization of the twelve features extracted from the input speech signal that corresponds to the amplitude of the uttered word.
560
R. K. Thandil and K. P. Mohamed Basheer
Fig. 2 Twelve features extracted from the speech signal using STFT
4.1.3
Feature Extraction Using Mel Spectrogram
After extracting the 52 features mentioned above, we extracted 128 features from each speech signal using the mel spectrogram method. The mel scale returns the features that closely correspond to the human hearing system. The human ear is sensitive to low frequencies rather than high frequencies where the loudness is perceived logarithmically. It can be accounted as a decibel scale. A total of 180 features are used in this experiment. Mel spectrogram is intended to use a logarithmic scale composed of mel scale and decibel scale to represent the amplitude and frequency of the speech signal. Figure 3 shows the mel spectrogram of a speech signal used in the experiment. The Y-axis is the mel scale instead of frequency and can be efficiently used in deep learning rather than a simple spectrogram. The entire feature set contains 180 features extracted for each speech signal. A comma-separated file has been constructed comprising an entry of these 180 features
Fig. 3 Features extracted using mel spectrogram and the total 180 speech signal features
Analysis of Influential Features with Spectral Features for Modeling …
561
Fig. 4 Unfolded RNN and the gated architecture
for each signal labeled appropriately. These are then encoded where each class of the speech has been enumerated at random.
4.2 Model Building The model is built using LSTM-RNN due to its capacity to forget the unwanted information through the forget gate and remember the required information that can help predict in future. Since speech is sequential data the LSTM gates can be used to remember the sequence of the occurrences that makes LSTM unique for processing the speech signals. A common LSTM is composed of a cell with three gates: an input, a forget, and an output gate. The cell is capable of storing information for an arbitrary interval of time, and the three gates regulate the flow of information through the cell. The cell takes X 0 as input and produces h0 as output which along with X 1 is input to the next step. It is seen that the output of the previous step along with the input of the current step forms the input to the current step. This procedure is repeated in all steps that make RNN remember the sequence while building the model [13]. An unfolded RNN structure and the gated architecture are shown in Fig. 4. The sigmoid function decides which value to let through and the tanh determines the weightage of the values that are to be passed based on the relevance [14].
5 Methodology for Deep CNN The second approach in the experiment is by using deep CNN for the accented speech recognition for the Malayalam language. The steps involved in the process are 1. Construct a spectrogram dataset that corresponds to the speech signals. 2. Model construction using DCNN. 3. Result analysis. The speech features are plotted as spectrograms and used for building the model with this as the input set. The model construction using DCNN involves the following steps:
562
1. 2. 3. 4. 5. 6.
R. K. Thandil and K. P. Mohamed Basheer
Initialize the model Add the CNN layers Add the dense layers Configure the learning process Train the model Evaluate the model by making predictions.
The same dataset has been used for both the study. In this study, the signals are plotted as spectrograms. Here, the speech data is represented as image data, and thus, speech processing can be accomplished using image processing methods. The model is initialized as a sequential model in this experiment. Then, we added the three layers of CNN; the convolutional network layer, the pooling layer, and the flattening layer. The input is resized to (224, 224, 3) and fed to the convolutional layer. This is then fed into the next convolutional layer with an activation size of (222, 222, 32) and then max-pooling is applied to a width equal to 111, height equal to 111, and depth equal to 32 which is used for downsampling of the feature maps. This is again fed to a convolutional layer with activation size (109, 109, 64). After this layer, a max-pooling is applied with values (54, 54, 64). And then a dropout layer is added to prevent overfitting of the data. This is then fed as input to a convolution layer of activation size (52, 52, 64) and the output of this layer is fed to the max-pooling layer of size (26, 26, 64) which is then applied to a dropout layer again to reduce the overfitting of data. The output of this layer is fed to the convolution layer again to the activation size (24, 24, 128). This is again applied with a max-pooling of (12, 12, 128) and then flattened to a one-directional array of size 18,432. This is then applied to a dense layer of 64 neurons which is again fed to a dropout layer. The output from the dropout layer is finally fed into a dense layer of 20 neurons that corresponds to the 20 different classes of data in our experiment. The model is trained for 4000 epochs with a total of 56,000 training steps with a total of 4000 spectrograms as input. Eighty percent of the input is used for training, and the remaining 20% is used for testing.
6 Result and Discussion Two models were constructed using LSTM-RNN and CNN for this doing this experiment. Both the models were built using 4000 speech samples that were collected across five districts in Kerala. The dataset was constructed in a natural recording environment. The main focus of this experiment was to construct a better approach for ASR for multi-accent speech in the Malayalam language. In both, cases the model was constructed using 80% of the data for training and the rest 20% for testing. The performance of the two models can then finally be compared against each other. The model produced a training accuracy of 95% over 98,000 steps. The model used 1.25 h of speech data as input and took 15 h of training hours to build the model. The training features that are split randomly are one hot encoded and fed
Analysis of Influential Features with Spectral Features for Modeling …
563
Fig. 5 Total loss, accuracy, and validation versus computational steps
to the LSTM-RNN. The prominent width and height components of the input audio signals are extracted using MFCC, STFT, and mel spectrogram. This in turn is used for constructing the model which predicts the test features into any of the twenty classes of data mentioned above. The visualization of the accuracy, loss, and validation of the model across the training steps is shown below. Figure 5 is the visualization of the overall accuracy, loss, and validation in constructing the model versus computational steps. The loss was 2.5 at the beginning which has reduced to 0.24 toward the end of training the model at step 98,000. The faded line in the above graph is the original classification, whereas the darker line is obtained with a smoothing of 0.5 on TensorBoard. The model is constructed with a validation accuracy of 82% over 98,000 steps. A model has been constructed using CNN with 4000 epochs. The model was trained for 12 h of training hours. The model has been constructed with 4000 spectrograms where 3020 samples are used for training and the remaining 800 samples were used for testing. The train and test set were split randomly. The model resulted in 74% of train accuracy and 39% test accuracy. Figure 6 visualizes the test-train accuracy and test-train validation of the model constructed using CNN. The training loss is 9% and the validation loss is 17% over 56,000 steps. Given, are the results of recognition by both LSTM-RNN and CNN, using MFCC, STFT, and mel spectrogram feature values for model construction using LSTM-RNN and spectrogram images for model construction using CNN. The model constructed with LSTM obtained 95% of train accuracy and 82% test accuracy, with a validation loss of
Fig. 6 Training and testing accuracy, loss versus epochs
564
R. K. Thandil and K. P. Mohamed Basheer
0.24% over 98,000 steps and 3000 epochs. The model constructed with CNN obtained 74% of train accuracy and 39% of test accuracy. The model had a validation loss of 17% and a train loss of 9%. Here, LSTM has shown a better recognition accuracy. On evaluating the overall performance of both LSTM and CNN, it is evident that LSTMRNN is far ahead in performance than DCNN for the multi-accented Malayalam dataset we have constructed.
7 Conclusion and Future Scope We proposed two approaches for accented ASR for the Malayalam language. The first approach extracted the prominent and influential features from the speech signals and constructed a multi-dialect model using LSTM-RNN. The model yielded good results when tested by the test data with which the model is validated and also with the live input data that are outside the dataset with which the model is constructed. In the second approach, the features are extracted from the spectrograms and the multidialect model is constructed using DCNN. The model also performed well with the different test data. When comparing against the approach that used influential features the model constructed using spectrogram features performed less. The major challenge we faced when working with this experiment was the construction of the dataset and its normalization. The limitation in the availability of data was a major challenge for doing the research. We need a huge amount of data for working with ASR to yield better results. The authors would work on developing accented speech dataset for continuous speech and also propose better approaches for building the ASR.
References 1. Teferra S, Tachbelie MY, Schulkz T (2020) Deep neural networks based automatic speech recognition for four Ethiopian languages. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) 2. El-Moneim SA, Nassar MA, Dessouky MI, Ismail NA, El-Fishawy AS, Abd El-Samie FE (2020) Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools Appl. https://doi.org/10.1007/s11042-019-08293-7 3. Palaz D, Magimai-Doss M, Collobert R (2015) Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp.2015.717 8781 4. Sasikuttan A, James A, Mathews AP, Abhishek MP, Sebastian K (2020) Malayalam speech to text conversion. Int Res J Eng Technol (IRJET) 7(6) 5. Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. In IEEE/ACM transactions on audio, speech, and language processing, vol 22, no. 10, pp 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 6. Issa D, Fatih Demirci M, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894
Analysis of Influential Features with Spectral Features for Modeling …
565
7. Passricha V, Kumar Aggarwal R (2018) Convolutional neural networks for raw speech recognition. From Book-natural to artificial intelligence algorithms and applications. https://doi.org/ 10.5772/intechopen.80026 8. Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. 10.48550/arXiv.1402.1128 9. Yi J, Wen Z, Tao J et al (2018) CTC regularized model adaptation for improving LSTM RNN based multi-accent mandarin speech recognition. J Sign Process Syst 90:985–997 10. Ghule KR, Deshmukh RR (2015) Automatic speech recognition of Marathi isolated words using neural network. (IJCSIT) Int J Comput Sci Inf Technol 6(5):4296–4298 11. Shanthi T, Chelpa L (2014) Isolated word speech recognition system using Htk. Int J Comput Sci Eng Inf Technol Res (IJCSEITR) 4(2):81–86. ISSN(P): 2249-6831; ISSN(E): 2249-7943 12. Radzikowski K, Wang L, Yoshie O et al (2021) Accent modification for speech recognition of non-native speakers using neural style transfer. J Audio Speech Music Proc 2021:11 13. https://aditi-mittal.medium.com/understanding-rnn-and-lstm-f7cdf6dfc14e. 14. https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memorylstm.
Effect of Feature Selection Techniques on Machine Learning-Based Prediction Models: A Case Study on DDoS Attack in IoT-Based Networks Shavnam and Manu Sood
Abstract The Internet of Things (IoT) is one of the leading platforms which has developed at a rapid pace in the past few years. As a concept, it is a collection of different interconnected devices and things over the Internet. The IoT devices are constantly proliferating into our daily lives be it smartwatches, smart TVs, intelligent transport systems, security cameras, commercial activities, or entry gates to secured premises. The faster and seamless delivery of various services through the applications supporting these interconnected devices is responsible for the exponential growth in the number of IoT devices. This increase in the number of devices has given hackers and users with ulterior motives the opportunity to breach device security through attacks on IoT devices by using specific vulnerabilities to their advantage. Though, IoT devices are vulnerable to many attacks, one most common attack is the distributed denial-of-services (DDoS) attack. It is initiated from different servers with a motive to drain out the system resources disrupting the fulfillment of desired goals of the system and can cost a lot to the genuine stakeholders in the network. Machine learning (ML) is being used over the years to detect the anomaly and/or predict errors or results by using its power to learn patterns and behaviors. In this paper, we have used supervised ML techniques to train the machine for the detection of DDoS attacks and evaluate the performance of seven different models for this purpose. We have used the DDoS attack scoreboard dataset which we took from the Internet. The dataset contained 5923 rows and 21 features. Three different feature selection techniques have been deployed to study the significance of the selection of appropriate features for a dataset and its impact on the overall performances of the classifiers. Seven models based on seven different ML techniques were used to train the machine using the dataset and their performances have been compared based on seven evaluation parameters. It has been found that the model based on random forest algorithms produces the best performance for all the parameters as far as the
Shavnam (B) · M. Sood Department of Computer Science, Himachal Pradesh University, Shimla, India e-mail: [email protected] M. Sood e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_47
567
568
Shavnam and M. Sood
detection of DDoS attacks is concerned. The univariate feature selection technique outperformed other feature selection techniques in the terms of detection accuracy. Keywords DDoS attacks · Classifiers · Machine learning · LDA · KNN · RF · NB · DT · SVM · Supervised learning
1 Introduction The emergence of the Internet of Things (IoT) as an exponential technology has given rise to applications that are based on extensive use of interconnected things. These things are the entities or objects supported by embedded technology for the purposes of (a) collection of data often through sensors, (b) local processing, and (c) Internet-based communication. Their features like the collection of data, sharing of data and low-cost implementations/management/maintenance, and convenient integration with different services/applications have played a very significant role in its widespread usage. The number of interconnected devices has grown exponentially so much that it is soon going to cross 24 billion devices [1]. Whereas the number of devices of such a magnitude has made the adoption of IoT things into many aspects of the daily life of so many humans easier with the support of faster communication and efficient delivery of the services, it also has a significant bearing on the security of such interconnected devices. The security risks have made interconnected devices vulnerable to attacks. These attacks on the IoT devices can cause harm not only to the quality of services and intended outcomes but to supported services too even leading to their disruption altogether. Different types of attacks can occur on an IoT device in a given environment, the most common of them being distributed denial-of-services (DDoS) attack, spoofing attack, and man in the middle attack (MITM) [2]. Table 1 highlights the threats and attacks associated with the networks associated with various IoT devices. The threats and attacks have been grouped into four categories: (a) hardware, (b) network infrastructure, (c) smart applications, and (d) attack surface. It is clear from the table that DDoS is one of the attacks that is preferred the most by various attackers. The DDoS attack is one of the attacks on any type of network, wireless or wired that finds favors from unscrupulous hackers/malevolent entities in the network. These attacks often target a single server through different sources, the goal mostly being to drain out the user resources and chock its availability to the intended user(s). DDoS attacks are dangerous but they have become more dangerous concerning the IoT devices since the opportunities for the attack have also increased multifold with IoT growing at a massive pace [3]. The DDoS attack is a subtype of denial-ofservice attack (DoS) which not only intends to disrupt the system resources but also affects the quality of services for the user(s) of an IoT setup. DDoS attacks not only affect the individual devices or services but can equally affect the big businesses by causing security breaches in the IoT-based applications. Larger scale industries use a large variety of IoT devices in their infrastructure, devices such as security cameras,
Effect of Feature Selection Techniques on Machine Learning-Based …
569
Table 1 Threats and attacks associated with the IoT devices Group
Type
Threat
Attacks
Hardware
RFID
DoS, spoofing
Eavesdropping,
Sensor node
DoS, DDoS, Sybil
Jamming, tampering, collision
Bluetooth
Eavesdropping DoS
Bluebugging, car whisper
Network infrastructure
Wireless
Rogue access point
Dos/DDoS, MITM
Wired
Extortion hack, manipulation of the data
Weak link, DoS/DDoS, malicious attacks
Smart application
Smart city
Smart city DoS, manipulation of data
DoS/DDoS on mobile apps
Health care
Data theft and misuse, DoS
DoS/DDoS, internal and cyber attacks
6LoRaWAN link layer protocol
Traffic flow, DoS
Hello flood, DoS/DDoS
ZigBee node
DoS, node energy consumption
DoS/DDoS, impersonating, and frame counter
Attack surface
thermostats, or fire alert systems are used mostly in every large-scale industry [4]. Individual devices such as smartwatches, smart TVs, or those found in smart homes can also be the victim of DDoS attacks. Compromising security can cause a plethora of problems both, for individuals as well as large businesses. A report from Cloudflare shows that DDoS attacks on the application layer have grown to 641% in the year 2021 [5]. The report also shows that the DDoS attacks often last under 4 h (the average time for the attack) and within this short period only the security of the devices can get violated completely. It is therefore important to prevent such an attack. But if somehow, the attack gets initiated due to the failure of preventive measures in place, it becomes crucial to detect and remove the attack with no or little harm caused to the resources and the quality of services. Many researchers have tackled the applicability of machine learning (ML) algorithms to find security breaches and different attacks on IoT networks [6]. Under such circumstances, machine learning (ML) approaches to train the machines for the detection or prediction [7] of DDoS attacks on IoT devices play quite a significant role. ML as technology is today dominating the data science world because of its ability to make the machines learn from past experiences helping the users to make better-informed decisions. It supports powerful algorithms that can be used to train the machines and then used for predictions, classifications, clustering, or imitating certain behaviors. ML uses known data in its feature space to find results for unlabeled data. Hence, a successful ML model can refer to its experiences and understandings to find outputs. The accuracy of such a model depends on the accuracy of its output
570
Shavnam and M. Sood
as well as on model training [8]. The machine, here, is made to learn the statistical perspective frames data in the context of a hypothetical function (f ). Out of the various categories of ML techniques, supervised learning ML techniques are more in use since these are fast and produce better results than other categories of ML. Various types of supervised learning techniques can further be used to perform classification and regression. Classification is used when the target class or target value is known., a student could be a graduate or undergraduate, tall or dwarf, good or bad, a student can be put into A, B, C, D, or E grades based on his aggregate marks in a test, such types of problems are labeled as classification problems, whereas regression is used when the knowledge about the target value is not available. Problems such as stock market predictions and the rate of Nifty or Sensex are known as regression problems. Several supervised ML classification techniques can be used to predict DDoS attacks in an IoT environment. For making the machine learn from a supervised learning technique, an appropriate dataset holding the labeled data from past experiences is the first prerequisite. The data in such datasets may not be proper for use as such, and this data needs to be cleaned and preprocessed for making it suitable for further processing. Also, it is important to clean and refine the data before applying any ML method to achieve high accuracy and speed up the learning process [9]. Further, as a dataset may contain many such features which may not contribute positively to the learning process of the machine, there is a need for extracting such features from the complete feature set which has a significant contribution toward the better performance of the classification techniques. So, an appropriate feature selection technique out of the many available techniques needs to be carried out for such extractions. Feature selection is the process of choosing the optimal features from the dataset. The frequency of a feature is measured in training for every positive and negative class instance independently [10]. The appropriate feature selection is also very crucial for expecting the best results from any classification technique. In this paper, the authors have selected a dataset on the DDoS attack scoreboard from a website, cleaned and preprocessed it and then before classification, have used three different feature selection techniques to study their significance in the performances of various supervised classification techniques. After feature selection, seven classifiers have been used for classification to evaluate the prediction performance of all the models based on these classifiers. The classifiers used are as follows: (1) support vector machine (SVM), (2) random forest (RF), (3) decision tree (DT), (4) Naïve Bayes (NB), (5) K-nearest neighbors (KNN), (6) linear discriminant analysis (LDA), and (7) logistic regression (LR). The dataset originally contained 5193 rows and 21 features, which have been filtered out by three feature selection techniques resulting in three different sets of datasets. All the classifiers are used on the same number of instances of these three datasets and the results have been evaluated. The standard classification metrics are used to analyze the performances of different models. The main purpose of the research paper is to analyze the effect of various feature selection techniques on the performances of classification models.
Effect of Feature Selection Techniques on Machine Learning-Based …
571
The main objectives of this research work include (a) to study the effect of various feature selection techniques on the performances of classification techniques and (b) analyzing how accurately different classification models perform on the given data to detect DDoS attacks in an IoT network environment. There are four sections in this research paper. Section 2 highlights the research methodology used for our research work. Section 3 gives a brief about the experimental setup (stimulation process) and results, and Sect. 4 presents the conclusion and future scope. The contributions of this research work are as follows: (a) DDoS attack scoreboard dataset has been used for training and testing the prediction model. (b) Three different feature selection techniques, one each from three different categories of feature selection techniques (RFE—wrapper method, random feature importance— embedded method, and univariate statistical method—filter method) have been used separately to find the best optimal features on the dataset. (c) Prediction models based on seven different ML classifiers namely LR, LDA, RF, DT, NB, KNN, and SVM have been used for training the machine for detection of the DDoS attack. (d) The seven evaluation parameters namely precision, recall, F1 support, accuracy, specificity, ROC, and AUC have been used to evaluate and compare the performances of each of the seven models.
2 Research Methodology This section presents a brief about the overall methodology we have followed in this research work. The dataset that is used in this research is the DDoS attack scoreboard dataset, taken from the Mendeley website and it originally contained 21 features and 5923 rows. The features have been reduced using three feature selection techniques as mentioned below.
2.1 Experimental Setup (Stimulation Process) Python language has been used for simulating the experimental setup that includes feature selection, classification, and evaluation of performances of classifier models. The main libraries used are Keras and Scikit-learn. We have used Google Colab for the implementation of various algorithms (Google Colab). In the data pre-processing stage, the data was cleaned, rescaled, standardize, and normalized. The dataset which is used contained irrelevant features which were removed by the feature selection techniques. Feature selection helps in removing irrelevant features [11]. This technique helps in selecting those features which contribute most toward the output variables and helps to make the model more accurate and precise [11]. Three feature selection techniques (two wrapper methods and one filter method) are used in this research work named recursive feature elimination (RFE), random feature importance, and
572
Shavnam and M. Sood
univariate statistical method [12]. Three feature selection techniques were used in the experiment: recursive feature elimination, random feature importance, and univariate statistical method. Recursive feature elimination (RFE) is a feature selection method that selects features by recursively considering smaller and smaller sets of properties/features. Random feature importance is a feature selection technique that uses a random forest algorithm to select the important features, it is used as it generalizes better and provides good accuracy. The third feature selection technique that was used is the univariate statistical (SelectKBest) method using chi2 , this technique takes only one dependent variable and the statistical method is applied based on the dependent variable. This technique, a filter method, uses chi2 method for assigning the weights to the feature based on the correlations between the features. A study by Mathew has proposed that the accuracy of a model can be improved using RFE [13]. We used these methods to obtain the best optimal feature sets out of the original feature set. We have used RFE to remove the features which do not play a very high role in the prediction of the target class. Next, we used random feature importance which assigned the weights of the input features based on their importance. Lastly, we used the univariate statistical method using chi2 (Select best). The purpose of using these three different feature selection techniques is to observe which techniques perform best in the data where a dependent variable or a target value is given. All the feature selection techniques gave different feature sets, and finally, the ten best features were selected in each feature set. After feature selection, the best features were retained for the evaluation and the dataset was split into 70:20:10 ratio. 70% of the data was reserved for training, 20% of the data was used for the testing, and the remaining 10% of data was reserved for cross-validation. After using the feature selection techniques, we used different classifiers on our datasets. A classifier is an algorithm that is used to map the data into some category for the purpose of classification [14]. We have used seven classification algorithms under supervised ML: logistic regression (LR), linear discriminate analysis (LDA), K-nearest neighbor (KNN), and Naïve Bayes (NB), decision tree (DT), random forest (RF), and support vector machine (SVM). The experimentation done in this research was conducted to observe the model which is most accurate in detecting the DDoS attack. LR is used for predicting a binary independent variable [15], and LDA uses Fisher’s discriminant to reduce the dimensionality in the features, LDA is also used for feature selection, and the KNN classifier is one the most used classifiers in pattern recognition. Naïve Bayes is used for finding out the dissimilarity of the features, it finds out the variance between the features and then gives a prediction [16]. The decision tree is used for classification purposes, decision tree sorts the instances of the data on their feature values [17]. Random forest is vastly used in the field of machine learning and is considered a type of ensemble method that employs several decisions during the training time [18]. Finally, we used SVM which divides the different classes on the bases of decision boundary [19]. For the evaluation process, the confusion matrix was used for all the seven classifiers on all the features set. All the classifiers for classifications are used in our experiment and stimulation results for every classifier are calculated using the confusion matrix also known as
Effect of Feature Selection Techniques on Machine Learning-Based …
573
the error matrix [20]. A confusion matrix is a classification metrics technique used in supervised ML classification, the matrix returns (n ∗ n) matrix, where n = 2. The matrix contains the following entities: (1) TP or True Positive, (2) FP or False Positive, (3) FN or False Negative, and (4) TN or True Negative. These entities are used for calculating the parameters of the model’s performance, the parameters which are selected are precision, recall, F1-score, specificity, accuracy, receiver operator characteristics (ROC), and area under the curve (AUC). The experiments in this research work are conducted to find out the model which is most accurate in detecting the DDoS attack when trained with an appropriate dataset.
3 Results and Analysis This section highlights the results obtained through various simulation experiments for the evaluation parameters. The results are presented in six tables and three illustrative diagrams. Table 2 represents the results obtained from feature set 1 (with RFE), Table 4 represents the results obtained from the feature set 2 (with random feature importance), and Table 6 represents the results obtained from feature set 3 (with the univariate statistical method). Table 3 represents the cross-validation accuracies of all the classifiers for the feature set 1, and Table 5 represents the cross-validation accuracies of all the classifiers used in feature set 2. Table 7 represents the cross-validation accuracies of all the seven classifiers from feature set 3. It is observed from Table 2 that experimental setup 1 uses ten features in dataset 1 (with RFE) in order to evaluate the performance of each of the seven prediction models. The model based on the RF classifier shows the best performance out of all prediction models as in its computations, accuracy is 99.9%, precision is 99.3%, Table 2 Results of seven prediction models on dataset 1 (with RFE) Classifiers
Precision
Recall
LR
0.416
0.720
LDA
0.429
0.703
KNN
0.872
0.890
NB
0.288
0.430
DT
0.979
RF SVM
F1-score
Accuracy
Specificity
AUC
ROC
0.590
0.906
0.525
0.904
0.920
0.813
0.817
0.922
0.812
0.872
0.655 0.375
0.970
0.895
0.892
0.977
0.862
0.902
0.666
0.973
0.829
0.969
0.994
0.997
0.989
0.991
0.993 0.436
1.00
0.994
0.999
0.999
0.999
0.999
0.802
0.564
0.915
0.923
0.862
0.937
Table 3 Cross-validation accuracies of seven prediction models on dataset 1 (with RFE) Classifiers
LR
LDA
KNN
NB
DT
RF
SVM
Accuracy
0.909
0.908
0.976
0.864
0.993
0.999
0.917
574
Shavnam and M. Sood
Table 4 Results of seven prediction models on dataset 2 (with random feature importance) Classifiers
Precision
Recall
F1 support
Accuracy
Specificity
ROC
AUC
LR
0.281
0.711
0.402
0.895
0.904
0.850
0.805
LDA
0.38
0.651
0.479
0.897
0.916
0.836
0.783
KNN
0.989
0.993
0.984
0.997
0.998
0.999
0.995
NB
0.315
0.431
0.359
0.861
0.861
0.878
0.645
DT
0.993
0.993
0.996
0.998
0.999
0.999
0.996
RF
0.993
1.00
0.994
0.999
0.999
0.999
0.999
SVM
0.633
0.865
0.726
0.943
0.951
0.975
0.908
Table 5 Cross-validation accuracies of seven prediction models on dataset 2 (with random feature importance) Classifiers
LR
LDA
KNN
NB
DT
RF
SVM
Accuracy
0.886
0.902
0.998
0.85
0.999
0.999
0.936
Table 6 Results of seven prediction models on dataset 3 (with the univariate statistical method) Classifiers
Precision
Recall
LR
0.825
0.854
LDA
0.825
0.831
KNN
0.973
0.953
NB
0.013
DT
0.973
RF SVM
F1 support
Accuracy
Specificity
AUC
ROC
0.832
0.956
0.975
0.914
0.952
0.824
0.956
0.974
0.902
0.942
0.959
0.990
0.996
0.974
0.998
0.076
0.017
0.875
0.875
0.475
0.750
0.986
0.974
0.995
0.996
0.991
0.993
0.986
0.986
0.983
0.996
0.998
0.992
0.999
0.825
0.891
0.840
0.975
0.975
0.933
0.991
Table 7 Cross-validation accuracies of seven prediction models on dataset 3 (with the univariate statistical method) Classifiers
LR
LDA
KNN
NB
DT
RF
SVM
Accuracy
0.963
0.959
0.997
0.868
0.999
0.999
0.966
recall is 100%, specificity is 99.9%, and ROC is 99.9%. This table enlists the values of all the metrics employed for all the prediction models. It also concludes that the prediction models based on RF and DT classifiers are exhibiting better performances in comparison with the other five models. Also, RF seems to be performing the best. The ROC (AUC for all possible threshold values) is also computed to highlight the performances. It is concluded that the ROC of the RF is also found to be the best.
Supervised Classification Techniques
Effect of Feature Selection Techniques on Machine Learning-Based …
575
Performance of 7 prediction models with RFE
SVM RF DT Naïve Bayses KNN LDA Logistic Regression 0 Cross validation Specificity
0.5 Values of Parameters ROC AUC Accuracy F1 support
1
Fig. 1 Comparative performances of seven prediction models on dataset 1 (with RFE)
3.1 Cross-Validation Results for the Dataset 1 The results are cross-validated employing the 10% data kept separately for the crossvalidation test in order to confirm that the performance of the models does not change with the unseen data post-training. Table 3 shows these results for all seven prediction models. Figure 1 shows the diagram of values of all the metrics for all the prediction models for the purpose of comparison. It is inferred from Fig. 1, Tables 2, and 3 that RF performs better than other models, and the results presented in Tables 2 and 3 can be seen supporting each other. It is observed from Table 4 that experimental setup 1 uses ten features in dataset 1 (with random feature importance) in order to evaluate the performance of each of the seven prediction models. The model based on the RF classifier shows the best performance out of all prediction models as in its computations, accuracy is 99.9%, precision is 99.3%, recall is 100%, specificity is 99.9%, and ROC is 99.9%. This table enlists the values of all the metrics employed for all the prediction models. It also concludes that the prediction models based on RF, DT, and KNN classifiers are exhibiting better performances in comparison with the other four models.
3.2 Cross-Validation Accuracy for the Dataset 2 Also, RF seems to be performing the best. The ROC (AUC for all possible threshold values) is also computed to highlight the performances. It is concluded that the ROC of the RF is also found to be the best. The results are cross-validated employing the 10% data kept separately for the cross-validation test in order to confirm that the performances of the models do not change with the unseen data post-training. Table
Shavnam and M. Sood Supervised Classification Techniques
576
Preformances of 7 prediction models with Random Feature Importance
SVM RF DT naïve bayes KNN LDA Logistic Regression 0
0.2 0.4 0.6 Values of Prameters Cross validation accuracy AUC ROC Specificity Accuracy F1 support Recall Precision
0.8
1
Fig. 2 Comparative performances of seven prediction models on dataset 2 (with random feature importance)
5 shows these results for all seven prediction models. Figure 2 shows the diagram of values of all the metrics for all the prediction models for the purpose of comparison. It is inferred from Fig. 2, Tables 4, and 5 that RF performs better than other models, and the results presented in Tables 4 and 5 can be seen supporting each other. It is observed from Table 6 that experimental setup 1 uses ten features in dataset 1 (with a univariate statistical method) to evaluate the performance of each of the seven prediction models. The model based on the RF classifier shows the best performance out of all prediction models as in its computations, accuracy is 99.6%, precision is 98.6%, recall is 98.6%, specificity is 99.8%, and ROC is 99.9%. This table enlists the values of all the metrics employed for all the prediction models. It also concludes that the prediction models based on RF, DT, and KNN classifiers are exhibiting better performances in comparison with the other four models.
3.3 Cross-Validation Results on Dataset 3 Also, RF seems to be performing the best. The ROC (AUC for all possible threshold values) is also computed to highlight the performances. It is concluded that the ROC of the RF is also found to be the best. The results are cross-validated employing the 10% data kept separately for the cross-validation test in order to confirm that the performances of the models do not change with the unseen data post-training. Table 7 shows these results for all seven prediction models. Figure 3 shows the diagram of values of all the metrics for all the prediction models for the purpose of comparison. It is inferred from Fig. 3, Tables 6, and 7 that RF performs better than other models, and the results presented in Tables 6 and 7 can be seen supporting each other.
Effect of Feature Selection Techniques on Machine Learning-Based …
577
Supervised Classification Techniques
Performances of 7 prediction models with univariate statistical method
SVM Random forest DT Naïve Bayes KNN LDA Logistic Regression 0 ROC F1 support
0.5 Value of Parameters AUC Specificity Recall Precision
1 Accuracy
Fig. 3 Comparative performances of seven prediction models on dataset 3 (with the univariate statistical method)
3.4 Analysis Table 2 provides an insight into the performance of each of the seven models in dataset 1 for all seven parameters. It is observed from Table 2 that random forest and decision tree give the best accuracy, while random forest performs best in all the parameters on dataset 1. It can be observed from Table 4 that the random forest gave the best accuracy of 99% and also it performed best on all other parameters in dataset 2. Table 6 shows that random forest and KNN gave the best accuracies on dataset 3, while random forest performed on all the seven parameters. Validation accuracies from Table 3 show that random forest has the highest validation accuracy of 99% for dataset 1, results from Table 5 show that random forest is having highest validation accuracy on dataset 2. Random forest and decision tree have shown the best validation accuracy in Table 7 on dataset 3. From the results, it can be concluded that random forest performs best with all the feature selection techniques for all the models of classifiers.
4 Limitations of the Proposed Study This study focuses on the effect of feature selection techniques on a machine learningbased prediction model. The data is like fuel for machine learning, if data is reasonable, only then the models are expected to perform better. The dataset used in this study was a medium-sized dataset because only a very few datasets on DDoS attacks
578
Shavnam and M. Sood
are available. It becomes difficult to get a large-sized dataset for a DDoS attack. Imbalanced data can also affect some parameters for evaluation such as precision and recall, and hence, it is important to have a large-sized balanced dataset. Besides, there are some other feature selection techniques other than the three used for the experimentation here in this work and these have not been explored as of now.
5 Conclusion and Future Work Using the classification techniques of supervised ML for the binary classification of DDoS attacks without using an appropriate feature selection technique can result in the below-par performance of these classification models. The feature selection techniques if chosen appropriately have a direct impact on the performance of any supervised ML classifier. This work evaluates the significance of three feature selection techniques on a dataset and presents a comparative study on the final performances of classification models. A dataset downloaded from the Mendeley website that includes the DDoS attack scoreboard data from various IoT devices has been used for evaluating this comparison. Data cleaning and pre-processing have been done in the initial stages of the data preparation. After pre-processing, we choose three different feature selection techniques which are used on the original dataset to filter out the irrelevant features, all the three feature selection techniques returned three different sets of features. All in all, seven classifiers of supervised machine learning are used for checking the performance of the models and seven standard parameters have been chosen for evaluating these performances. It is concluded that random forest gives the best results in all the datasets corresponding to all the feature selection techniques with higher accuracy and ROC as it outperforms all other classifiers. In addition, it can also be seen that the RFE method for feature selection has been able to provide the best performance for the RF-based model. Out of all the feature selection techniques, the univariate statistical method (filter method) has outperformed all other feature selection techniques. The data contained a dependent variable (target value) which may have produced the outcomes which are gathered after experiments, as the univariate statistical method works on a single dependent variable. In the future, more feature selection techniques can be explored for even better results.
References 1. Swan M (2012) Sensor mania! The internet of things, wearable computing, objective metrics, and the quantified self 2.0. J Sens Actuator Netw 1(3):217–253 2. Xiao L, Wan X, Lu X, Zhang Y, Wu D (2018) IoT security techniques based on machine learning: how do IoT devices use AI to enhance security? IEEE Signal Process Mag 35(5):41–49 3. Vandana TS, Ravi KS (2018) A survey overview: on wireless body area network and its various applications. Int J Eng Technol 7(2.7):936
Effect of Feature Selection Techniques on Machine Learning-Based …
579
4. Petrenko A, Petrenko S, Makoveichuk K, Chetyrbok P (2018) The IIoT/IoT device control model is based on narrow-band IoT (NB-IoT). In: 2018 IEEE conference of Russian young researchers in electrical and electronic engineering (EIConRus) 5. [Online]. Available at: https://blog.cloudflare.com/ddos-attack-trends-for-2022-q1/#:~:text= In%20the%20last%20quarter%2C%202021,YoY%20and%2052%25%20decrease%20QoQ. Accessed 2 May 2022 6. Liu J, Kantarci B, Adams C (2020) Machine learning-driven intrusion detection for ContikiNG-based IoT networks exposed to the NSL-KDD dataset. In: Proceedings of the 2nd ACM workshop on wireless security and machine learning, pp 25–30 7. Ray S (2019) A quick review of machine learning algorithms. In: 2019 international conference on machine learning, big data, cloud, and parallel computing (COMITCon). IEEE, pp 35–39 8. Aldahiri A, Alrashed B, Hussain W (2021) Trends in using IoT with machine learning in health prediction systems. Forecasting 3(1):181–206 9. Alsaedi A, Moustafa N, Tari Z, Mahmood A, Anwar A (2020) TON_IoT telemetry dataset: a new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8:165130–165150 10. Gad AR, Nashat AA, Barkat TM (2021) Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access 9:142206–142217 11. Sonkhla D, Sood M (2019) Performance examination and feature selection on Sybil user data using recursive feature elimination. Int J Innov Technol Exploring Eng 8(9S4):48–56 12. Roy P, Sood M (2020) Implementation of ensemble-based prediction model for detecting Sybil accounts in an OSN. Adv Intell Syst Comput 709–723 13. Mathew TE, Kumar KSA (2020) A logistic regression based hybrid model for breast cancer classification. Indian J Comput Sci Eng 11(6):899–906 14. Bindra N, Sood M (2018) Data pre-processing techniques for boosting performance in network traffic classification. In: Proceeding of the first international conference on computational intelligence and data analytics, ICCIDA-2018, 26–27 Oct 2018. Springer CCIS series, Gandhi Institute for Technology (GIFT), Bhubaneswar, Odisha, India 15. Kurt I, Ture M, Kurum A (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374 16. Nasteski V (2017) An overview of the supervised machine learning methods. Horizons B 4:51–62 17. Kotsiantis S, Zaharakis I, Pintelas P (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26(3):159–190 18. Svetnik V, Liaw A, Tong C, Culberson J, Sheridan R, Feuston B (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43(6):1947–1958 19. Pavlidis P, Wapinski I, Noble W (2004) Support vector machine classification on the web. Bioinformatics 20(4):586–587 20. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst 80:56–71
Optimal Feature Selection of Web Log Data Using Optimization Techniques Meena Siwach and Suman Mann
Abstract In today’s era, more than 60% peoples are web log data. But most of the web log data is unsecured with different attacks. In this research, a multi-stage filter is proposed, contingent on the distribution and analysis of various types of network assaults in the KDD and real web log datasets. An extended GOA algorithm with a decision tree algorithm is used in the first stage of the filter to detect frequent attacks, and an enhanced GOA algorithm with a genetic algorithm is used in the second stage to detect moderate attacks. An upgraded GOA algorithm using Naive Bayes as a base learner has been utilized in last step of the filtration to detect the rare attacks. Using the KDDCup99 dataset, we can see that this design has the highest detecting speed with the least false alarming rates. Using the GOA technique, we describe a new approach to the selection of features and KDD dataset classification for intrusion detection. As a primary goal, it is to reduce the number of features used in training data for intrusion classification. Features are selected and eliminated in supervised learning in order to improve classification accuracy by focusing on the most significant input training features and eliminating those that are less important. A number of input feature subset required to train KDD 99 datasets are used as classifier for different experiment. Keywords Intrusion detection · Ensemble learning · GA algorithm · Features selection · GAO algorithm
M. Siwach (B) Guru Gobind Singh Indraprastha University, Delhi 110078, India e-mail: [email protected] M. Siwach · S. Mann Department of Information Technology, Maharaja Surajmal Institute of Technology, Delhi 110078, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_48
581
582
M. Siwach and S. Mann
1 Introduction According to the National Vulnerability Database, the web-based program has 18,500 vulnerabilities, including XSS, SQL, and buffer overflow. Hackers can take advantage of these flaws to launch a variety of attacks against web apps. As a result, assaults like brute force, XSS, SQL, and others must be detected intelligently. Attacks on web servers, network devices, or the network as a whole are all too easy for hackers to launch once they get past the firewall. Network security is enhanced by the use of an Intrusion Detection System (IDS) [1] that helps prevent intrusions. Host-based IDSs and network-based IDSs are the most common types of IDS. Based on attack detection mechanisms, IDSs are further divided into abuse and anomaly categories. A misuse IDS has a set of criteria for detecting attacks. The anomalous IDS verified the incoming n/w traffic’s behavior profiles against the stored behavior profile and generates intrusion alarms. It is possible for anomaly-based IDS to identify new attacks, but their false alarm rate is considerable. It is common in Internet security studies to talk about “anomalies.” In web detection, log data analysis is employed. When a system is running, log files can expose a great deal of information, and they can be used to trace the vast majority of assaults. Log systems, on the other hand, generate a large volume of data, and it is possible that important details could be omitted in the chaos. Furthermore, due to the everchanging nature of assaults and hacking techniques, gathering anomaly data has become increasingly complex, leading to the current problem that manual log file analysis is inadequate to meet log testing standards. Conventional intrusion detection methods, on the other hand, require operators or programmers to manually remove the attack features and use keyword and rule matching to identify typical attack patterns. Unidentified attacks cannot be detected using the normal approach, and this leads to a loss of data. In this case, the traditional method of collecting suspicious log entries from these systems using keyword search has become ineffective, which may result in a large number of false positives, including information unrelated to actual failure. Manual testing will become much more difficult as a result of this. As an outcome, automated log analysis technologies are critical for anomaly detection. Anomaly detection based on logs has been extensively researched over the previous few decades. However, we have discovered that there is a disconnect among academic research and industry practice. On the one hand, because there is presently no systematic evaluation of the issue, many developers are ignorant of the methods for detecting state-of-theart technology. To gain a complete picture of current object-tracking approaches, they must read a substantial body of literature. This is a time-consuming operation, but it does not ensure that the best way will be discovered, because each study project is generally focused on providing a thorough approach to a certain target system. The challenge may be increased if developers lack the machine learning background knowledge necessary to comprehend these approaches. The main motive of the research work is to detect different attacks in web log data.
Optimal Feature Selection of Web Log Data Using Optimization …
583
The organization of rest of the paper is as follows. Web log data literature review is discussed here in Sect. 2. Section 3 discusses several attacks on web log datasets. Pseudo code for the suggested algorithm is discussed in Sect. 4. The method’s execution and analysis are discussed in detail in Sect. 5. Section 6 concludes with future scope.
2 Related Work Intrusion detection system filter-based feature selection techniques are proposed by Hu et al. [2]. Features are selected using Information Gain (IG), Correlation (CR), and Gain Ratio (GR). A combination of filter feature selection techniques is used to choose the best features for use in the system. DoS, probe, R2L, and U2R assaults are all detected using the KDD Cup dataset. When compared to Naive Bayes and Multilayer Perceptron, the system’s classification accuracy using K-Nearest Neighbor is the greatest at 98.9%. An ensemble core vector machine technique with feature selection in network IDS is proposed by Amor et al. [3]. For feature reduction, the system employs chi-square tests and weighted functions. Using the concept of a minimum enclosing ball, the core vector machine technique predicts attacks. On the KDD Cup’99 dataset, this technique detects DoS with 99.05% accuracy while using fewer characteristics. For intrusion detection, Farid et al. [4] present an ensemble classifier and feature reduction method. To choose the best features, the system makes advantage of both information gain and PCA feature selection. It is proposed by Natesan et al. [5] that ANN classifiers can be used in intrusion detection to identify features that are less significant. With the help of information gain and correlation feature reduction methods, the system obtains reduced features based on the ranking of features. Union and intersection procedures on the subset created by IG and CR are used to reduce the number of features from 25 to 15. Feature reduction is compared to no feature reduction on the KDD Cup’99 dataset. An intrusion detection system with a better GOA classifier is proposed by Xiang et al. [6]. For feature reduction, R Studio’s PCA and ensemble feature selection packages are used in conjunction with R. The CICIDS 2017 dataset is used to test the system’s output of 25 reduced features. The system’s accuracy is improved to 81.83% by reducing the number of characteristics in GOA. Gupta et al. [7] offers a mixed machine learning approach for detecting cyber intrusions. KNN, C4.5, MLP, SVM, Linear Discriminant Analysis, and a hybrid machine learning model based on these techniques are used in the system (LDA). Testing is done using the KDDcup99 dataset. To detect intrusions, Khor et al. [8] employs an enhanced PCA and Probabilistic Neural Network (PNN). An evaluation of the system using the KDDcup99 dataset and a comparison with the standard PCA approach are also performed. In terms of accuracy, the system outperforms the standard PCA.
584
M. Siwach and S. Mann
Natesan et al. [9] proposes an intrusion detection method based on hybrid feature reduction. To reduce the number of features, the system makes use of kernel PCA and binary gray wolf optimization. SVM is used to test the system using the KKDcup99 dataset. Probe assault had the highest accuracy, with a score of 96.821%. Taking inspiration from relevant literature, we present an ensemble feature reduction strategy.
3 Attacks in Web Log Data Research on computer network intrusion detection systems is being evaluated by MIT Lincoln Laboratory with data collected and disseminated by DARPA and the Air Force Research Laboratory (AFRL). Data from the KDDCup99 dataset are derived from the DARPA benchmark [10]. KDDCup99 is a four-gigabyte dataset of compressing binary TCP dumped data culled from last few weeks of traffic networks and broken down into around 5 million recording, each containing about 100 bytes of information on the connections that were made. The two weeks of test data include roughly two million records of connections. To differentiate between a normal and an attack connection, every KDD training connecting record has 41 attributes and is designated as such [10]. The KDDCup’99 training set has 494,020 records, whereas the KDDCup’99 test set has 311,029 records. As a result of this categorization of the dataset’s numerous attack types, the detection rate of comparable attacks can be improved. Among the test set’s 38 assault types, 14 are brand new. The training set comprises 24 of these. Dos, Probes, Remote to Local (R2L), and User to Root (U2R) attacks are all included in the dataset (U2R). A breakdown of the assaults detected on the training and test sets for the KDDCup’99 can be seen in Table 1. Table 1 Attacks in KDD dataset S. No. Type of attack Attacks in KDDCupp’99 training set
Additional attacks in KDD test set
1
DOS
Back, Neptune, smurf, teardrop, land, pod
Apache2, mail bomb, process table
2
Probe
Satan, portsweep, upsweep, Nmap
Mscan, saint
3
R2L
Warezmaster, Warezclient, Send email, named, SnmpGet ftpwrite, guess password, IMAP, attack, snmpguess, xclock, xsnoop, multihop, phf, spy worm
3
U2R
Rootkit, buffer overflow, load module, Perl
Httptunnel, ps, sql attack, xterm
Optimal Feature Selection of Web Log Data Using Optimization …
585
Fig. 1 Pseudo codes of the GA algorithm
4 Proposed Algorithm 4.1 Genetic Algorithms It is the crossover and mutation operators that are utilized to preserve population diversity and avoid localized optimum in genetic optimization. To make matters worse, as populations evolve, crossover and mutation probabilities remain constant, delaying algorithm convergence until much later in the process, which in turn leads to the lengthy training time of GOA. Since the crossover and mutation probabilities of GA are influenced by the fitness value of the population, this paper’s strategy develops a population that speeds up search in early evolution and increases convergence in later evolution by adjusting these probabilities. (A) Selection Operators. (B) Optimized Crossover Probability. (C) Optimized Mutation Probability (Fig. 1).
4.2 Grasshopper Optimization Algorithm (GOA) Grasshoppers are little, arachnid-like creatures. They are regarded as a pest because of the damage they cause to crops and farming. Pseudo code depicts the life cycle of
586
M. Siwach and S. Mann
a grasshopper. However, in the wild, grasshoppers form one of the largest swarms known to man [11] even though they are typically seen alone. The swarm’s size could be comparable to that of an entire continent, which would be a disaster for farmers. Swarming behavior is documented in both nymph and adulthood in the grasshopper swarm [12]. There are millions of nymph grasshoppers that move like rolling cylinders when they jump. They consume nearly all of the plant life that gets in their way. Adults who exhibit this behavior form a flying swarm in the air. They travel long distances in this manner. The larval swarm is characterized by the grasshoppers’ sluggish pace and short steps. Adult swarms, on the other hand, travel quickly and over great distances. Grasshopper swarms are notable for their constant search for new food sources. There are two distinct modes of search that can be found in nature-inspired algorithms: exploration and exploitation. It is recommended in exploration to move quickly, whereas it is more common in exploitation for search agents to proceed more cautiously. Grasshoppers naturally accomplish these two tasks, as well as target searching. As a result, if we can mathematically represent this behavior, we can create a new algorithm that takes inspiration from nature. In order to model grasshopper swarming behavior, a mathematical model was used [13]: X i = Si + G i + Ai where X i is the grasshopper’s position, Si represents its social interaction, G i is its gravitational pull, and Ai is its advection in the wind. The equation can be written as X i = r1 Si + r2 G i + r3 Ai for random behavior Gi + r 3 . If looking for random numbers in [0, 1], find them in Ai (Figs. 2 and 3).
Fig. 2 Pseudo codes of the GOA algorithm
Optimal Feature Selection of Web Log Data Using Optimization …
587
Fig. 3 Proposed algorithm pseudo code
5 Experiment and Result Analysis The KDD [14] web-attack dataset was tested using the open-source machine learning program Weka [10]. Processor speed is 3.6 GHz, and memory is 32 GB, on an Inteli5. We will be using the C.I.C.I.D.S 2017 webs-attacking datasets to see how well our strategy performs in practice. A total of 170,366 records have been categorized as benign or web-attack based on the study’s 78 features [15]. In total, there are 168,186 non-malicious brute forces, SQL and XXS attacks in the dataset.
5.1 Performance Measures A classifier’s performance measures are used to evaluate its accuracy (as a whole, both training and testing). There are four metrics that may be used to evaluate the learning classifier’s performance: As an illustration, consider the terms “true positive,” “true negative,” “true neutral,” and “true positive” (the positive instances are misclassified as negative by the learning algorithm). The percentage of people who are correctly
588
M. Siwach and S. Mann
classified refers to the proportion of correctly identified test cases in a batch of test data that can be attributed to a learning system.
5.2 DoS-Attacks Detection We carried out two different sets of experiments to compare the two different hypotheses. In the first experiment, we tested the detection rate of the DoS assaults that occur often in the network using all of the dataset’s 41 attributes. Initially, a Decision Tree is built, and then, an updated GOA algorithm is utilized to improve its classification accuracy. To replicate experiment, we use Enhanced GOA via DT as its basic learning tool to select the 15 characteristics. When 15 features are taken into account, the system takes only 6.2 s to train. Additionally, the incoming network connection is tested in just 0.26 s (Fig. 4 and Table 2).
5.3 Probe-Attacks Detection We carried out two different sets of experiments to compare the two different hypotheses. In the first experiment, we analyzed the detection rate of Probe attacks, which occurred on a regular basis in the network, using all of the dataset’s 41 attributes. In the beginning, the Naive Bayes classification method is utilized, and then, the GOA algorithm is applied to improve its classification accuracy. Enhanced GOA with Naive Bayes as its base learner is used to conduct the same experiment (Fig. 5 and Table 3). The proposed algorithm is compared with another previous work. As discussed in the literature survey last decade years, different classifiers like cascade classifiers and multilayer hybrid classifier are more popular but grasshopper optimization algorithm has more detection rate as compared to the previous algorithm. Table 4 clearly indicates that detection rate for normal, Dos, and Probe attack is 99.76, 99.76, and 98.64%, respectively, using grasshopper optimization algorithm.
6 Conclusion For a human-centered smart IDS, this study presents an alarm intrusion detection algorithm (GA-GOA), which is built on the GA and GOA algorithms. Using the GA population search technique and the capacity of individuals to exchange information by maximizing the crossover and mutation probabilities of GA, this research first and foremost makes efficient use of these features. Convergence of the algorithm and GOA training speed are both improved as a result of the new algorithm. The
Optimal Feature Selection of Web Log Data Using Optimization …
589
Fig. 4 a No. of features and accuracy for each step for DoS attack, b classification accuracy, c comparative analysis of accuracy, sensitivity, and specificity
Table 2 Comparative analysis of genetic algorithm and grasshopper optimization algorithm
S. No.
Parameter
GA selected
GOA selected
1
Accuracy
0.9923
0.9976
2
Sensitivity
0.9721
0.9972
3
Specificity
0.9968
0.9999
GOA error rate can be reduced while the true-positive rate can be increased using a novel fitness function that has been developed. As a result, the accuracy of SVM is enhanced while also optimizing the kernel parameter, and feature weights all at the same time. Improvements in intrusion detection are discussed in this work that boost detection rates, accuracy, and the real rate of intrusion detection while decreasing false positives and shortening SVM training time. These findings are supported by
590
M. Siwach and S. Mann
Fig. 5 a No. of features and accuracy for each step for probe attack, b classification accuracy, c comparative analysis of accuracy, sensitivity, and specificity
Table 3 Comparative analysis of genetic algorithm and grasshopper optimization algorithm
S. No.
Parameter
GA selected
GOA selected
1
Accuracy
0.6538
0.9864
2
Sensitivity
0.9898
0.8126
3
Specificity
0.5932
0.9997
simulations and experiments. The area of research is improved as classified all the attacks in a single phase. The detection time might be calculated in future also.
Optimal Feature Selection of Web Log Data Using Optimization …
591
Table 4 Comparison with other algorithms S. No.
Name of technique
% of detection rate Normal
DoS
Probe
1
Cascade classifier J48-BN
97.4
97.8
73.3
2
Multilayered hybrid classifier [6]
96.8
98.6
93.4
3
KDD’99 winner [16]
99.5
97.1
83.3
4
Layered approach using CRFs [7]
NA
97.4
98.62
5
Two stage filter using enhanced AdaBoost [9]
99.2
98.7
92.4
6
Multistage filter using enhanced AdaBoost
98.8
98.9
93.8
7
Proposed GOA
99.76
99.76
98.64
References 1. Garcia-Teodoro P, Diaz-Verdejo J, Macia-Fernandez G, Vazquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28:18–28 2. Hu W, Hu W, Maybank S (2008) AdaBoost-based algorithm for network intrusion detection. IEEE Trans Syst Man Cybern 38:577–583 3. Amor NB, Benferhat S, Elouedi Z (2004) Naïve Bayes vs. decision trees in intrusion detection systems. In: Proceedings of the 2004 ACM symposium on applied computing, New York, pp 420–424 4. Farid DMd, Harbi N, Rahman MZ (2010) Combining Naive Bayes and decision tree for adaptive intrusion detection. Int J Netw Secur Appl 2(2):12–25 5. Natesan P, Balasubramanie P, Gowrison G (2011) Adaboost with single and compound weak classifier in network intrusion detection. In: Proceedings of international conference on advanced computing, networking and security, vol 1, pp 282–290 6. Xiang C, Yong PC, Meng LS (2008) Design of multiple-level hybrid classifier for intrusion detection system using Bayesian clustering and decision trees. Pattern Recogn Lett 29(7):918– 924 7. Gupta KK, Nath B (2010) Layered approach using conditional random fields for intrusion detection. IEEE Trans Dependable Secure Comput 7(1):35–49 8. Khor K-C, Ting C-Y, Phon-Amnuaisuk S. A cascaded classifier approach for improving detection rates on rare attack categories in network intrusion detection. J Appl Intell. http://doi.org/ 10.1007/s10489-010-0263-y 9. Natesan P, Balasubramanie P, Gowrison G (2012) Design of two stage filter using enhanced Adaboost for improving attack detection rates in network intrusion detection. J Comput Sci Inf Technol Secur 2(2):349–357 10. KDDCup99 dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html 11. Friedman N, Geiger D, Goldsmidt M (1997) Bayesian network classifiers. Mach Learn 29:131– 163 12. Freund Y, Schapire RE (1997) A decision theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139 13. Tan PN (2006) Introduction to data mining. Addison-Wesley, Reading, MA 14. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356 15. Shi L, Zhang L, Ma X, Hu X (2009) Rough set based personalized recommendation in mobile commerce. In: 2009 international conference on active media technology. Lecture notes in computer science, pp 370–375 16. Pfahringer B (2000) Winning the KDD99 classification cup: bagged boosting. SIGKDD Explor 1(2):65–66
Colorizing Black and White Images Using Deep ConvNets and GANs Savita Ahlawat, Amit Choudhary, Chirag Wadhwa, Hardik Joshi, and Rohit Shokeen
Abstract In this paper, a technique to colorize black-and-white images combining both localized and global features has been presented. The technique is based on Deep Convolutional Neural Networks and Generative Adversarial Networks (GANs), which merge localized scene information derived from small area with the global scene information of sample image. The whole framework, including the colorization model, is trained in an E2E manner, i.e., end-to-end, because of its ability to perform well irrespective of the knowledge of the problem. Unlike other convolution-based approaches, the model is able to process images of varied resolutions. To train the model, an existing large-scale scene classification dataset has been used. The class labels about the types and scenes represented in various images have been utilized for efficient training. In this work, the adversarial networks on top of deep convolutional nets have been used as a generalized approach toward I2I translation. What makes these networks generic is the way their algorithm learns loss function mapping along with mapping from input to output layers, which otherwise would have required separate approaches. At last, we have compared the proposed model with some existing models used for grayscale image colorization. The proposed model demonstrated promising result on many different types of black-and-white images and delivered realistic colorization, even on random images taken from the Internet. Keywords Image colorization · Generative adversarial networks · Deep convolutional networks
S. Ahlawat (B) · C. Wadhwa · H. Joshi · R. Shokeen Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, India e-mail: [email protected] A. Choudhary Maharaja Surajmal Institute, GGSIP University, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_49
593
594
S. Ahlawat et al.
1 Introduction Many applications of computer vision, image processing, and computer graphics can be constituted as ‘translating’ an input matrix of numbers (pixels) into a corresponding output matrix of numbers (pixels). A scene may be proffered as a RGB/LAB image, a grayscale image, an edge map, a heat map, etc. In image-to-image (I2I) translation, a scene is converted or represented in another scene. It is similar to automatic language translation where one language is translated in other languages. The ample amount of training data guarantees the accuracy of these types of translations. Earlier, each of these translation tasks had to be implemented separately, despite it being the same task of predicting output pixel values from input pixel values. The proposed system is based on the LAB color space. More specifically, considering the lightness ‘L’ channel is present, the proposed model will predict the ‘a’ and ‘b’ channels of the image. An existing large-scale dataset known as ‘Places 365 dataset’ has been used in the present work. The problem of predicting color has an added advantage that the training data is easily available. The training set images must consist of colored photos. The ‘L’ channel of image is taken as input, whereas ‘a’ and ‘b’ channels of images as supervisory signals (just to strip off the brightness which distinguishes color). Many of the available approaches have made use of this easy availability of training data. CNNs have been used by many researchers as a main building block to predict color in such scene recognition tasks. But one thing seems to be common as most of these result images tend to look desaturated. One reason can be the use of loss functions, which tend to give conservative predictions. The loss generated from these functions is taken from the standard regression problem. The aim is to minimize the Euclidean error between the estimated value and the ground truth. Convolutional networks work toward minimizing a chosen loss function which is used to grade and compare the quality of results, and although the learning process is self-regulating, the loss functions need to be designed effectively to take account of each era of losses. In the present work, we have used generative adversarial networks (GANs) [1] in I2I translations. The GANs try to learn loss metric that classifies the output image as real or fake (0/1 binary result). Out of focus or blurry images, since they look obviously fake, will not be tolerated. Since generative models calculate loss that is adaptable on the same data they are trained on, but they can also be implemented in wide variety of tasks that otherwise would require different loss formulations as shown in Fig. 1.
1.1 Motivation Computational image colorization is an active research topic as it forms an essential basis for many computer vision and light processing tasks. Due to the high cost associated with manually labeling each particular object present in the image as well
Colorizing Black and White Images Using Deep ConvNets and GANs
595
Fig. 1 Training a generative adversarial net. To classify between fake and real pictures, discriminator has been used, while generator is used to synthesize fake images
as very high possibility of the same objects having different colors altogether, where they lack adequate annotated resources, the need arises to find a unique approach for the task of image colorization, especially for images consisting multiple scenes. The main goal is to examine the success of generative algorithms for the task of supervised image colorization when applied to wide range of real-life scenes where the colorization is synthetically complex (not simply one color), and where resources are minimal.
1.2 Paper Organization The organization of paper is as follows: In the next section, a literature review with theoretical aspects of the colorization has been provided. The third section describes proposed solution setup, its practical aspects, and outline of the processes going on behind the scene. The concluded research points along with the scope for future work have been discussed in the last section.
2 Related Work The most common point of difference between colorization algorithms is the way the data is obtained and used for creating and plotting the connection between input and output images. What happens in non-parametric methods is for an input grayscale image, a set of colored reference images are defined (can be more than one). These can be provided either by the user or retrieved automatically. The color from parallel regions of the reference images is translated into the input image [2–5]. On the other hand, parametric methods, in order to estimate the set of parameters, at the time of
596
S. Ahlawat et al.
training, learn prediction functions [6] from large datasets of colored pictures, posing the problem as either regression or classification. Regression is done on continuous color space while classification is on the basis of quantized values of colors. Different architectural styles as well as procedures have been researched in the past to tackle the problem and a few of these are mentioned below: • Zhang et al. [3] approached the problem as classification task but garnered problems due to uncertainty within usage of different colors (e.g., vehicles with different colors will appear to be exactly the same). • Iizuka et al. [7] employs the colorization problem as regression task and uses both local as well as global features to determine the pixel value. Depending too much on global features results in not able to significantly detect the present in-object significant boundaries. • Anwar et al. [8] aims to remove unrealistic settings of each image by changing the abstract background to having a white background and then moving on with objects present in the given scene. • Guadarrama et al. [9] employs usage of conditional generative nets to produce low-resolution color image, thus producing a significant loss for high-resolution scenes. Colorization methods can be roughly divided into three categories: • Scribble-based colorization: Levin et al. [10] used a simple colorization method that requires neither image segmentation, nor region tracking. It was based on a simple premise: Neighboring pixels that have similar intensities should have similar colors. Huang et al. [5] improved Levin’s cost function for more sensitive to edge information. • Reference image-based colorization: The work was inspired by color transfer techniques that are widely used for recoloring a color image. Gupta et al. [11] used feature-matching methods and space voting for the colorization problems. The authors matched super-pixels between the input image and the reference image and reported good outcome of their research. Similarly, Chia et al. [12] implemented an approach that automatically segments the foreground objects and automatically labels them. Liu et al. [13] worked on reference images obtained directly from web for colorization task. However, its applicability is limited to famous landmarks where exact matches can be found. Going further toward earlier adopted methodologies, Welsh et al. [14] implemented a technique based on luminance and texture information between images aiming to minimize human labor required in the process. The authors suggested idea of matching areas of the two images with rectangular swatches in their future work. • Automatic colorization: Main aim is to remove user interaction. Cheng et al. [15] groups these images into different clusters adaptively and uses existing multiple image features to compute chrominance via shallow neural network. It depends on the performance of sematic segmentation and is only able to handle simple outdoor scenes. Deshpande et al. [16] takes in single image resulting in multiple possible outputs taking advantage of mixture density network.
Colorizing Black and White Images Using Deep ConvNets and GANs
597
Conversion of one pixel to another can also be thought of as a classification or regression problem. However, using this approach, we are treating the output pixel to be independent of each surrounding pixel, which will lead to structural information loss. Therefore, in the proposed work, we have used ‘U-Net’-based architecture since U-Net is focused toward solving the problem of localizing the region of difference. The intrinsic reason behind this is the ability to localize and differentiate outlines by doing classification on every pixel, which makes input and output of equal size.
3 Methodology In the present work, a loss function is used which is customized for the colorization problem. As pointed out by Welsh et al. [14], the task of color prediction is multimodal in nature, i.e., many objects can take on several possible colorizations with all of them being suitable equally. For example, consider a rose. A rose is typically red, white, yellow, pink, purple, but unlikely to be brown or green. Here, the colors, red, white, yellow, pink, and purple, are equally probable. Therefore, to appropriately model this multimodal possibility, for each pixel, there is a need to predict and plot a distribution of potential colors. Using the architecture shown in Fig. 2, we train a convolutional neural network, through which a distribution over quantized color values is mapped from input. It is proposed to use the LAB color space instead of the original RGB color space to take into account the lighting conditions of the present scene as it will have significant impact on deciding the final output image conditions. Afterward, the design of the objective function and for conjecturing point estimates from the predicted color distribution is implemented. GANs are generative models [1] that tend to learn the distribution of the training data and try to generate new samples or data points from this distribution (along with some variations). Mapping from the observed image x and random noise vector (represented by z) to output y,
Fig. 2 Deep convolutional net architecture. The model used can be applied to each use case with the same architecture and objective, but with its own goal specific training dataset
598
S. Ahlawat et al.
G : {‘x’, ‘z’} → ‘y’ Functioning of generative networks can be elucidated as follows: • The architecture consists of two neural networks. • Generator (G) gets initialized with a random data distribution and then proceeds to replicate the distribution of training data. • Going through more and more training, Discriminator (D) gets better at classifying an artificially made-up distribution from the present real one. • Both generator and discriminator tend to outsmart each other by playing a minmax game. The min–max loss, which is the standard GAN loss function, was first described in [1], E x log(D(x)) + E z log(1 − D(G(z)) The discriminator will try to maximize this loss function, and on other hand, the generator tends to minimize it. Implementing it practically, the function gets saturated for the generator, which shows that the training is frequently stopped by the generator if it does not match with the discriminator. The standard GAN loss function will be further categorized and examined in different parts: Generator loss and Discriminator loss. This will be covered more thoroughly in the upcoming section.
3.1 Objectives A contrasting feature of I2I translations is mapping of a high-resolution input grid into a high-resolution output grid. In the present work, the input and output differ in surface appearance; but both of them are manifestations of their identical intrinsic structure. The underlying input’s structure is roughly lined up with that of output. Therefore, these considerations have to be kept in mind while designing the architecture. Already available approaches [17–19] have used an encoder-decoder network. The input in the encode-decoder network continuously down-sampled by passing it through a sequence of layers until a bottleneck layer is encountered, which indicates the process to be reversed. These networks require the whole data to pass all the layers. In image-processing applications, there is a large amount of low-level intrinsic details (details of edges, etc.), which need to be shared between the input and output. To pass all the data directly across the net [20–22] is advantageous in performance. Like in image colorization, the position of edges and outlines between the objects present is shared by input and output image.
Colorizing Black and White Images Using Deep ConvNets and GANs
599
3.2 Network Architecture The output layer of our model deep convolutional network that we built is pipelined to the input layer of the generative network. The advantage of using this approach is that the generative network can filter out all the discrepancies that have crept into the proposed model by using intrinsic generator and discriminator, which filters out the extreme outliers as well as low-quality output images labeled as fake. This model combines the advantage of both and thus produces the resultant image, which is more vivid, and accurate as compared to either one (similar trends can be seen correspondingly in many linguistic models like [23]). Generator Architecture. Intrinsic architecture of U-Net. U-Net is able to differentiate outlines by doing classification on every pixel, which makes output size as equal to input size as shown in Fig. 3. Encoder: C64-C128-C256-C512-C512-C512-C512-C512 Decoder: CD512-CD512-CD512-C512-C256-C128-C64 (Where ‘C’ refers to a block of Convolution-Batch Normalized-Leaky ReLU layers and number signifies filters). As shown in Fig. 4, convolution is applied in the decoder to connect to the no. of output channels (three in general, except two in colorization) post the last layer, which is then followed by a ‘tanh’ activation function. Also in the encoder, batch normalization is not applied to the first C64 layer. All ReLUs in the encoder are leaky with a slope of 0.2, and all ReLUs in the decoder are plain, i.e., non-leaky as shown in Fig. 5. In the proposed model, skip connections have been used to connect the activations from layer i to layer (n − i). Here, n is the total number of layers. Discriminator Architecture. A convolution is applied to connect to a onedimensional output, which is then connected to sigmoid function layer, post the last layer. Here also, batch normalization is not applied to the first C64 layer. Also,
Fig. 3 Intrinsic architecture of U-Net. U-Net is able to differentiate outlines by doing classification on every pixel, which makes output size as equal to input size
600
S. Ahlawat et al.
Fig. 4 Image normalization using encoder-decoder architecture
Fig. 5 Intrinsic view of connected layers forming the encoder-decoder architecture
similar to generator architecture, leaky ReLUs as optimization functions with a slope of 0.2 has been used in the proposed model.
3.3 Experiment Dataset. In the present work, we have used the MIT Places Dataset [24] for our training purposes. According to the website, the design of the Places dataset is based on basic principles of human visual cognition. The main goal while designing the dataset was to find essential visual knowledge, which plays important role in training of an artificial system for scene recognition tasks. A category in Places dataset is named by its label. For example, it has different categories of bedrooms/streets, as we cannot predict about what can be present in a home/hotel bedroom or a nursery. In total, Places dataset consists of more than 10 million images with more than 400 unique scene categories. It has nearly 5000–30,000 training images per class, which is somewhat indicatory of their respective occurrence in the practical world. Research in the field of scene recognition shows that the Places dataset has been widely used to learn deep scene features with the aim to establish new state-of-the-art results. Generator and Discriminator Training. The generative networks are difficult to train. Actually, the training is based on zero-sum game rule which means the optimization in one comes at behest of other. The aim of training both models simultaneously is to find a point of just like equilibrium. This also leads to dynamic change in the optimization problem being
Colorizing Black and White Images Using Deep ConvNets and GANs
601
Fig. 6 Generator loss (Epoch 1, 50,000 iterations)
solved since every time the model’s parameters are updated which leads to creation of a dynamic system. Thus, training two competing neural networks simultaneously can lead to failure in convergence and its shown in Fig. 6. Random noise is sampled and an output is generated by G while it is being trained (from that noise). This produced output is then passed to D which then classifies it based on the ability of the discriminator to distinguish between the two as either ‘Real’ or ‘Fake’. The loss from generator is calculated from the discriminator’s classification, i.e., it gets rewarded if discriminator fails to determine the image’s true nature, otherwise it gets penalized. The equation minimized for training the generator is: ∇θ d
m 1 log(1 − D G z (i) m 1
• Logarithm of D(x) shows the probability of generator misclassifying the real image. • Maximizing logarithm of (1 − D(G(z))) helps in correctly labeling the fake images and shown in Fig. 7. Evaluation Metrics. Hyperparameters: Using random seed value as 100 and batch size of 128 images per batch, we will perform 200 epochs. On our local machine with non-GPU environment, it took nearly 17–18 h for a single epoch with 50k iterations. The final learning rate we used after a lot of experimentation was 1e−6 with a learning rate decay step of 1e4. We will be switching upon leaky Relu and standard Relu activation functions upon requirement along with Adam Optimizer (β
602
S. Ahlawat et al.
Fig. 7 Discriminator loss (Epoch 1, 50,000 iterations)
= 0). We have our whole model setup working using Lab color space instead of RGB color space due its obvious supremacy in handling lighting conditions. In the present work, all networks have been trained from scratch. Here, images of size 286 × 286 are taken as input and re-size into image size of 256 × 256. The node weights have been initialized from a Gaussian distribution assuming mean 0 and standard deviation 0.02 . The comparative loss is shown in Fig. 8 and the loss metrics in Table 1. • L1 loss shows the loss for deep convolutional net. • Generative loss shows loss for the GAN architecture (both generator and discriminator). • Combined loss is of the complete model compared to training dataset. Inference. The proposed deep convolutional network model performs well, given the variety of colors and textures in the training set. The algorithm and model successfully shade our environmental features differently; as an example, the trees and their reflection in the second image are shaded with different colors. However, there was still a major need of improvement (which were handled later on by generative network): • Over similar colors. For example, brown and dark yellow. Certain light shades of brown were mixed with yellow and dark shades of yellow were considered as brown which created ambiguity in the final outcome. • Clear boundary lines among intrinsic objects. For example, scenes like yellowish sunlight along with yellowish farm scenes show high level of mixing capacity and the boundaries appear to be very blurred. • Hues of same colors. For example, blue, deep blue, turquoise blue, purple, etc. Scenes having the same color hues altogether tend to combine both hues into
Colorizing Black and White Images Using Deep ConvNets and GANs
603
Fig. 8 Comparative plot of losses
Table 1 Loss metrics in different spaces (L, a, b)
Loss
L
A
B
L1 loss
0.81
0.69
0.70
Generative loss
0.87
0.74
0.84
Combined loss
0.86
0.84
0.82
one which is not a high priority problem but also cannot be neglected in certain unforeseeable conditions. This is where generative network comes in. The output of the Deep ConvNets served as input for the generative network training. For quick and more accurate initialization of weights, we initialized the weights of GAN layers with the output model of this trained deep convnet model, thus having the combined benefits of both models. Some advantages of our model were observed under the following conditions: • Our model was able to differentiate among similar shades of different colors. • It was able to differentiate between the boundary lines of similar-looking and similar-colored objects. • Hue effects depend very much on the lighting conditions also but the difference between them is more subtle in this case. Case of Legacy Black and White Photos. The proposed model has been trained using fake grayscale images. These images are generated from color photos by stripping a and b channels. The main aim is to demonstrate the performance of proposed model on real legacy black-and-white photographs (even randomly from the Internet), as shown in Fig. 9. It can easily be seen that our model is able to colorize these types of images quite effectively even though the lower-level characteristics of legacy pictures
604
S. Ahlawat et al.
differ drastically from those of the modern photographs on which the proposed model has been trained. Also in Fig. 10, we have done a qualitative comparison of our model’s final output with different state-of-the-art models already present in the industry and the available ground truth. Image colorization as a research problem has been around for quite a while now. Deep learning techniques and their noteworthy achievement have resulted in promising results in this application. The following trends in image colorization have been observed: • Our generative model delivers diverse colorization visually compared to other only convolutional network-based methods. • Most of the presently available models highlighted sub-optimal results for complex scenes (i.e., scene which is full of large number of objects in it). • Model having a dense architecture have higher complexity and are therefore able to synthesize image features more strongly and as a result have little improvement over other models. • The variety of networks performing excellent in image colorization tasks as compared to other image restoration is very much important.
Fig. 9 Visual representation showing flow of images inside our model on range of scenes
Colorizing Black and White Images Using Deep ConvNets and GANs
605
Fig. 10 Visual comparison of other colorization models with our model on natural-color dataset
In future, techniques such as attention mechanism and loss functions can be researched for achieving good results in the field of I2I translation. There were several observed limitations in the proposed approach. Most significant of them is the model finding it difficult to output rich colorful images with respect to the ground truth. The proposed approach also cannot be used to restore exact colors in case of same objects occurring in multiple possible colors.
4 Conclusion and Future Scope The results in this paper suggest that combining deep convolutional networks with generative adversarial networks can be a very rewarding approach for tasks involving highly structured input–output like image-to-image translation. These networks are generic in nature and can be adopted for wide variety of application as these networks learn a loss adapted to the task and data at hand. Supervised approaches based on generative networks show promising results in the area of image colorization of black-and-white images. The proposed setup achieves the state-of-the-art results on multiple scenic photographs. We showed that even with a very small amount of unlabeled data (training data), the proposed setup is able to produce promising results. Also, the proposed model allows us to work on images of any resolution and is able to run in near real time.
606
S. Ahlawat et al.
In future, to further improve the accuracy of the proposed work, the classification layer performance needs to be enhanced. We can also train and improve the model if we wish to evaluate significantly different types of images, for example, humancreated images.
References 1. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 1–16 2. Ozbulak G (2019) Image colorization by capsule networks. In: IEEE conference on computer vision and pattern recognition workshops 3. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. Springer, Berlin 4. Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition 5. Huang J-B, Su J-W, Chu H-K (2020) Instance-aware image colorization. In: ACM conference on computer vision and pattern recognition 6. Bohra N, Bhatnagar V (2021) Group level social media popularity prediction by MRGB and Adam optimization. J Comb Optim 41:331–337 7. Iizuka S, Simo-Serra E, Ishikawa H (2016) Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans Graphics 35(4), Article-110 8. Anwar S, Tahir M, Li C (2020) Image colorization: a survey and dataset. In: IEEE winter conference on applications of computer vision (WACV), pp 2–12 9. Guadarrama S, Dahl R, Bieber D, Norouzi M, Shlens J, Murphy K (2017) Pixcolor: pixel recursive colorization. In: 28th British machine vision conference 10. Levin A, Lischinski D, Weiss Y (2004) Colorization using optimization. In: ACM SIGGRAPH 2004 Papers (SIGGRAPH ’04), pp 689–694 11. Gupta RK, Rajan D, Yong A (2012) Image colorization using similar images. In: 20th ACM international conference on multimedia 12. Chia AY, Zhuo S, Gupta RK, Tai Y-W, Cho S-Y, Tan P, Lin S. (2011) Semantic colorization with internet images. ACM Trans Graph 30:1–8 13. Liu R, Freund Y, Spraggon G (2009) Image-based crystal detection: a machine-learning approach. Acta Crystallogr D Biol Crystallogr 64:1187–1195 14. Welsh T, Ashikhmin M, Mueller K (2002) Transferring color to greyscale images. In: 29th annual conference on computer graphics and interactive techniques, pp 1–15 15. Cheng Z, Yang Q, Sheng B (2015) Deep colorization. In: IEEE international conference on computer vision, pp 1–6 16. Deshpande A, Lu J, Yeh M-C, Chong MJ (2017) Learning diverse image colorization. In: Conference on computer vision and pattern recognition (CVPR) 17. Bindra J, Rajesh B, Ahlawat S (2021) Deeper into image classification. In: International conference on innovative computing and communications, advances in intelligent systems and computing, vol 1166. Springer, Singapore, pp 74–78 18. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition 19. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations 20. Ahlawat S, Choudhary A, Nayyar A, Singh S, Yoon B (2020) Improved handwritten digit recognition using convolutional neural networks (CNN). Sensors 20:3344
Colorizing Black and White Images Using Deep ConvNets and GANs
607
21. Bindra J, Rajesh B, Ahlawat S (2020) Deeper into image classification. In: Gupta D, Khanna A, Bhattacharyya S, Hassanien AE, Anand S, Jaiswal A (eds) International conference on innovative computing and communications. Advances in intelligent systems and computing, vol 1166. Springer, Singapore 22. Choudhary A, Ahlawat S, Gupta H, Bhandari A, Dhall A, Kumar M (2021) Offline handwritten mathematical expression evaluator using convolutional neural network. In: Gupta D, Khanna A, Bhattacharyya S, Hassanien AE, Anand S, Jaiswal A (eds) International conference on innovative computing and communications. Advances in intelligent systems and computing, vol 1166. Springer, Singapore 23. Sitender, Bawa S, Kumar M, Sangeeta (2021) A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J Ambient Intell Humaniz Comput pp 1–34 24. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (NIPS), vol 27
Water Potability Prediction on Crops Considering pH, Chloramine, and Lead Content Using Support Vector Machine V. Varsha, R. Shree Kriti, and Sekaran Kripa
Abstract The agricultural field has seen so many improvements with the advancements in technology. The demands on yield are constantly increasing. It is important to maintain some specifications of the irrigation water based on some parameters so that the water continues to be suitable for the purpose of irrigation. Manually checking out every sample of water and recommending approval is arduous and risky. By examining these points, a model is proposed that uses the support vector machine technique to predict whether the water sample is potable for irrigation or not by considering the chloramine and lead content of the water sample collected along with its pH. These variables are selected using the Gini index for identifying the best features that can be chosen for classifying the data points. The collected dataset is analyzed, and the outliers are removed by visualization using boxplot and violin plot. Based on the accuracy of the model, it can be deduced that whether a particular water sample is acceptable for the purpose of irrigation or not. Keywords SVM · Pre-processing · Gini index · Visualization
1 Introduction Water scarcity is seen as a major barrier to intensifying agriculture in a sustainable way to meet the food requirements of an ever-growing population. The water quality is also being deteriorated due to disposal of untreated industrial wastewater and agricultural saline effluents directly to groundwater and canal water. The saltwater intrusion in fresh groundwater areas from saline water zones due to the over drafting of water with tube wells also caused the deterioration of groundwater quality. And due to this deterioration, soil problems such as salinity, alkalinity, toxicity, and low water infiltration rates occur very frequently. Unfit water, if used for the purpose of irrigation of crops, results in major health problems as well. The available water’s quality must be tested to ensure its fitness prior to use. Water used for irrigation, V. Varsha · R. Shree Kriti · S. Kripa (B) St Joseph’s College of Engineering, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_50
609
610
V. Varsha et al.
whether it originates from streams or is pumped from tube wells, contains significant quantities of harmful substances that will reduce crop yield and degrade the soil fertility. Therefore, it is very important to know whether a particular sample of water is suitable for the purpose of irrigation or not.
2 System Overview In many areas, impaired water and soil quality have led to poor crop productivity. Water quality is very important for the healthiness of the crops. If the water content is not at a prescribed level, it will affect the crop as well as the outcome of the crop. This will affect the lives of farmers and indirectly the lives of each consumer of the crops. Hence by analyzing the content beforehand, we can help the yield of the crop getting maximized and also improve its benefits. In this work, the proposed model is used to predict water potability of crops based on the pH, lead, and chloramine content of the water. For optimal water quality, the pH of the water must be within the range of 5.5–6.5 and the lead and chloramine must be below 50 ppb and 4.5 mg/L, respectively. A dataset containing the values of pH, lead, and chloramine is used for testing and training purposes. The water quality is predicted using a machine learning technique which will predict whether the water sample is potable for irrigation based on the previous record of the water sample collected. Visualization of the data is also carried out using box plot and violin plot. Based on the accuracy of the model, it can be deduced that whether a particular water sample is acceptable to use for the purpose of irrigation.
2.1 Scope • The main objective of this work is to develop a machine learning model to predict water potability considering the pH, lead, and chloramine content of the water. • By these considerations, there will be a huge amount of reduction in unhealthy crops due to unfit water. • This model will benefit the farmers greatly due to its ease of function. • The results are predicted within seconds of entering the details. • No prior knowledge of any chemical parameters of the water is needed to predict the water potability using this model. • This approach has been proposed to inform the agriculturalist and the farmers to achieve better outcomes.
Water Potability Prediction on Crops Considering pH, Chloramine, …
611
3 Methodology The overall mechanism starts with dataset collection for predicting water potability. The dataset is loaded from the web resource Kaggle, with which the analysis is planned to be performed using various statistical measures. The dataset that is gathered is then pre-processed using the pre-processing techniques in machine learning. Here, the data is analyzed and the features or the required columns for our model are separated. The data correctness is then checked. Using this dataset after the pre-processing, we apply the support vector machine algorithm. The data model which was created using SVM is applied on the training set. This is where the algorithm is trained. Test set prediction is done based on the test result accuracy across various parameters. The model is then deployed to predict water potability considering the specific data or features like lead, pH, and chloramine, thereby giving accurate analysis of whether the water is fit for the purpose of irrigation (Fig. 1).
Fig. 1 Methodology
612
V. Varsha et al.
4 Existing Work In the existing system, random forest and decision tree algorithm are used [1]. They considered random columns and not specific columns which are only required for making prediction. Many values or data were randomly considered in the dataset like organic carbon, conductivity, and trihalomethane, thereby not making it very specific to particular important compounds like lead and chloramine. Also in the existing systems, there are separate different models present for the water potability prediction using lead and water potability prediction using pH for irrigation purpose [2]. The accuracy rate is considerably low for most of the existing system as well [3].
4.1 Drawbacks of the Existing Work • The existing system has only separate models for water potability prediction considering pH and water potability prediction considering lead. • The accuracy of existing system is minimal compared to the proposed model [4]. • The time taken for the model to classify the dataset and to predict the outcome is comparatively more as the RFT checks for the positive and negative results at every end of the node. • Since in existing system random forest and decision tree algorithm is used, it requires much computational power as well as resources as it builds numerous trees to combine their outputs [5]. • In the existing system, calculations can become very complex if many values are uncertain and/or if many outcomes are linked.
5 Proposed Work In the proposed system, the dataset is loaded first. We select specific features like the pH value, lead, and chloramine values of water sample. This is then analyzed, and pre-processing is performed where the system detects and corrects the corrupt or inaccurate records and garbage values. The outliers are observed using violin plot and box plot, and then, it is removed. After data collection and data pre-processing, the support vector machine model is trained [6]. Using the support vector machine learning algorithm, we can categorize the water as potable and non-potable by considering the pH and lead, chloramine content of the water which is taken as the input from the user. Graphs are plotted by considering the value of lead, chloramine, and pH as x- and y-axis values in different combinations. The water that is being categorized is then used for predicting the irrigation purpose. Similarly, considering the elements in the water we come to a conclusion whether the water is suitable for irrigation and other purposes. Since the dataset that is collected is a quantitative dataset
Water Potability Prediction on Crops Considering pH, Chloramine, …
613
Fig. 2 View of the model classification
and the target variable is bivariant which comes under classification model, we use the support vector machine to classify the data points as the best fit line divides the data points linearly, better than the tree algorithm as shown in Fig. 2.
5.1 Merits of the Proposed Work • Predicts between two different scenarios by considering pH and considering lead and chloramine. • Minimum columns are considered efficient. • Time for execution is very less. • The accuracy of the proposed system is more comparable to the existing system. • Since SVM algorithm is used, the model can handle many features.
6 System Analysis The various modules involved in our project are as follows, where each module has a separate process to complete. 1. Data Pre-processing 2. Data Visualization 3. Data Analysis and Model Creation.
614
V. Varsha et al.
6.1 Data Pre-processing The dataset is gathered from the web resource Kaggle, which is a CCO domain dataset that is collected on 3276 water bodies, which contains the records of the content of the water sample collected based on units such as ppm: parts per million, microgram per liter, and milligram per liter. The dataset has 1072 records and contains 11 features such as pH, lead, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, turbidity, potability, hardness, and solids. The dataset is a quantitative dataset. Following this, data pre-processing takes place. Data pre-processing plays a very important role in building the model for prediction. Data cleaning takes place when the system detects and corrects the corrupt or inaccurate records from the database and refers to identifying incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying, or detecting the dirty or coarse data. In our dataset, the missing values fall under the type of missing at random and since the dataset is a numeric one, imputation by mean, median, or mode is carried out. In data processing, the system converts data from a given form to a much more usable and desired form, i.e., makes it more meaningful and informative. In this, we divide 80% of the data for testing and the rest of the data is given for training the model.
6.2 Data Visualization The collected dataset is analyzed, and the outliers are removed by visualization using a boxplot and violin plot. Boxplot is a type of diagram that can be used in descriptive statistics to analyze explanatory data. They are a visual representation of numerical data and skewness by showing data quartiles and averages [7]. Violin plots help to visually show the comparative distributions of numerical data across different levels of categorical variables. The mean and mode and the median are identified from the features that are to be considered. The data points are visualized in distribution, and if it is a normal distribution, then the mean mode for the distribution that is identified is cross-verified. Considering the values of lead and pH as x- and y-axis, graphs are plotted. Figure 3 is a box plot generated for pH, which has a normal distribution where the mean, mode, and median values are equal. For the lead, since the distribution is neither a right skewed or left skewed and nor a normal distribution, the analysis cannot be made with boxplot and hence violin plot is used as shown in Fig. 4. The graph that is plotted is for chloramine as shown in Fig. 5 and has a positively skewed (or right-skewed) distribution which can be a type of distribution which has most of the values clustered around the left tail of the distribution while the right tail of the distribution is longer. The mean value is 0.2 and in the positively skewed distribution the value of mode > median > mean [8]. The distribution of the pH is a normal distribution; hence, the mean, mode, and median of the feature are going to be the same. The identified mean can be used to replace the missing values at random through imputation method.
Water Potability Prediction on Crops Considering pH, Chloramine, …
Fig. 3 Data visualization for pH
Fig. 4 Data visualization for lead
615
616
V. Varsha et al.
Fig. 5 Data visualization for chloramine
Mean = Sum of Observation/Number of Observation Hence, the missing values that we have in our model are replaced with the mean value. It is not a hard and fastened rule to replace the missing values only with mean; it can also be replaced with median and mode. Since for the pH feature all remains the same, it would not create a big difference. But for the chloramine the distribution is a left skewed and the mode > median > mean. To also find the median, the violin plot was plotted for the chloramine feature and identified the values to be very near, and hence, the replacement was done with median. Median = (n + 1)/2 The above equation is used here since the missing values are at random and identified to be missing in the row that is half of the total rows and also the missing value row to be odd.
Water Potability Prediction on Crops Considering pH, Chloramine, …
617
6.3 Data Analysis and Model Creation Out of the many compounds present as features in the dataset, three are considered, namely lead, chloramine, and pH of the water. These specific contents are chosen using the Gini index for identifying the best features that can be considered for classifying the data points. The confusion matrix that is generated for the model helps us to analyze the true positive, true negative, false positive, and false negative [9]. The f 1 score is also measured using the precision and the recall values that are calculated with the true positive, false negative, and false positive. The precision score is used to identify the number of predictions that are made accurate from the total number of positive predictions made. Precision Score = True Positive/True Positive + False Positive The recall score is identified to be the ability of the model to detect the positive sample. Recall Score = True Positive/True Positive + False Negative With the calculated part of the recall and precision values, we can come to a conclusion regarding the model’s accuracy and performance. The accuracy of the model determines the model’s ability in making the prediction (Fig. 6). Accuracy = True negative + True Positive/True Positive + True Negative + False Positive + False Negative Considering the calculated value of the precision and the recall value to be good for the model to predict with a better accuracy, we stop with the analysis part. The accuracy is then calculated using the confusion matrix with which the value of 87% is derived which is achieved using support vector machine or SVM algorithm. This is primarily used for classification problems in machine learning. The SVM algorithm is best used for creating a decision boundary or a best line which helps in classifying n-dimensional space into numerous classes to put the new data point in the exact category in the future. In SVM, we mainly focus on the splitting up of the data points, and due to this, we opted for a linear kernel, with the consideration of the below parameters: • HYPERPLANE—SVM creates the best line or decision boundary that can categorize n-dimensional space into classes so that new data points can be placed in the correct category in the future easily. Hyperplane denotes the best decision boundary that is possible. This can also be extended for the p-dimensional space. In our model, the best hyperplane was found to be a linear line that divides the
618
V. Varsha et al.
Fig. 6 Data analysis
data into two classes and says the water is safe for irrigation and the other class is the water is not safe for irrigation. • MARGIN—SVMs define the criterion as finding a decision surface that is maximally far from any data point. It is this distance between the decision surface and the closest data point that determines the margin of the classifier. In consideration with the support vectors, the margin that we use is the hard margin since the data is linearly separable. • LINEAR KERNEL—Kernels are functions used in SVM to help solve problems. They provide shortcuts to avoid complicated calculations, and by using kernel, we can go to higher dimensions and perform smooth calculations. It also allows us to go to an infinite number of dimensions. As said the data points are linearly separable, we use the linear kernel for separating the data points. The linear kernel is used when the data is linearly separable, that is, when it can be separated using a single line. The most common use is when there are a large number of features in a particular data set.
7 Results and Discussion Comparing the detection of water potability in terms of accuracy among the existing random forest classifier and SVM classifier, it is found that the SVM classifier outperforms the random forest classifier in terms of precision value, recall value, and accuracy value. The precision value of random forest classifier is found to be 0.447 whereas it is found to be 0.667 for SVM. The recall value is found to be around 0.56 for random forest and around 1.00 for SVM classifier. The accuracy percentage
Water Potability Prediction on Crops Considering pH, Chloramine, …
619
Table 1 Performance comparison S. No.
Classifier
Precision value
Recall value
Accuracy (%)
1
Random forest
0.447
0.56
76.04
2
SVM
0.667
1.00
87
90 80 70 60 50 40 30 20 10 0 Precision Value
Recall Value Random Forest
Accuracy(%) SVM
Fig. 7 Performance comparison
stood out at 87% for SVM against random forest classifier’s accuracy of 76% (Table 1). The graphical representation of the accuracy performances of random forest classifier and SVM classifier is shown in Fig. 7.
8 Conclusion The developed model is planned to be utilized as a web application where the user can manually type in the values of particular features like the content of lead, chloramine, and the pH of the water [10]. Considering the inputs, the model helps the user in predicting the target as to whether the water is safe for irrigation or not. The safety of the water quality for a better yield of crops can be assured when this model is used. Furthermore, the accuracy of the model will be worked on to improve and even more statistical analysis is planned to be made with the dataset to make useful insights.
620
V. Varsha et al.
References 1. Madhumithaa S, Mannish S, Jean Justus J (2021) Descriptive and predictive analytics of groundwater. IEEE 2. Davis SN, Whittemore DO, Fabryka-Martin J (2021) Uses of chloride/bromide ratios in studies of potable water NGWA 3. Leong WC, Bahadori A, Zhang J, Ahmad Z (2020) Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM). Taylor and Francis, UK 4. Chauhan AS, Badwal T, Badola N (2020) Assessment of potability of spring water and its health implication in a hilly village of Uttarakhand, India 5. Al-Sulami A, Al-Taee A (2021) Potability of drinking water in Basra-Iraq 6. Malakar A, Snow DD, Adugna CRG (2020) A review on impact of compost on soil properties, water use and crop productivity. MDPI 7. Arulnangai R, Mohamed Sihabudeen M, Vivekanand PA, Kamaraj P (2021) Influence of physio chemical parameters on potability of ground water in Ariyalur area of Tamil Nadu 8. Guan L, Li Q, Jin B (2019) Research of the healthy drinkable water—Bama recreate water. IEEE 9. Ayers RS (2019) Quality of water for irrigation. ASCE 10. Hatch CE, Fisher AT, Revenaugh JS, Constantz J, Ruehl C (2006) Quantifying surface watergroundwater interactions using time series analysis of streambed thermal records in method development water resources research, vol 42
CRACLE: Customer Resource Allocation in CLoud Environment Siya Garg , Rahul Johari , Vinita Jindal , and Deo Prakash Vidyarthi
Abstract As well known, cloud computing follows dual pricing approach. It follows free-to-use as well as pay-per-use model approach, in which a user pays for the quantum of the resources they consume over a period of time. Depending on the demand, a customer requests for the resources and the Cloud Service Provider (CSP) allocates the resources matching to the customer’s requirements. In the past, in the simulation conducted using CloudSim software, researchers have usually worked with the pre-loaded and default values of variables for the resource allocation without considering the end-user requirement. In the current research work, a CRACLE: Customer Resource Allocation in CLoud Environment has been proposed that prompt customers for resource request and subsequent framework allocation of the same is done by the CSP. Keywords Cloud · CloudSim · Cloud computing · Cloudlet · Virtual machine · CSP · CRACLE
1 Introduction Cloud computing is a revolutionizing technology that realizes our dream of computing as a utility. It has become an essential part of the digital world as it is used in day-today life in one form or another (Gmail, OneDrive, etc.). It offers on-demand access to R. Johari (B) SWINGER: Security, Wireless, IoT Network Group of Engineering and Research, University School of Automation and Robotics (USAR), Guru Gobind Singh Indraprastha University, East Delhi Campus, Delhi, India e-mail: [email protected] S. Garg · V. Jindal Department of Computer Science, Keshav Mahavidyalaya, University of Delhi, Delhi, India e-mail: [email protected] D. P. Vidyarthi Parallel and Distributed System Lab, School of Computer and System Sciences, JNU, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_51
621
622
S. Garg et al.
computing resources from anywhere in the world. Cloud computing presents numerous advantages, like cost efficiency, scalability, and accessibility. NIST states, “The cloud model is composed of five essential characteristics, three service models, and four deployment models” [1]. The following are the five characteristics of the cloud computing model: • On-Demand Self-Service: Cloud computing services are used on-demand basis with or without human interaction. • Broad Network Access: Computing services can be accessed on the Internet anytime and anywhere through multitude of homogeneous and heterogeneous IT devices. • Resource Pooling: Servers, storage, applications, and other resources are pooled to serve several customers. • Rapid Elasticity: Customers’ computing needs can be addressed by dynamically provisioning and releasing resources. • Measured Service: Cloud computing relies on a pay-for-what-you-use model, in which resource utilization is measured and monitored, and consumers are charged according to their usage. Cloud computing can be perceived from two different perspectives—deployment models and service models as depicted in Fig. 1. Cloud computing consists of three service models based on the various types of services provided [2]:
Fig. 1 Service and deployment models
CRACLE: Customer Resource Allocation in CLoud Environment
623
1. Software as a Service (SaaS) is a service model that allows software applications to be delivered over the Internet on-demand and typically via subscription. 2. Platform as a Service (PaaS)—Consumers can use PaaS to develop, test, and deploy real-time applications over the cloud. 3. Infrastructure as a Service (IaaS) provides basic computing infrastructure to users. The vendor provides resources such as storage, network, servers, and virtual machines. The user, on the other hand, is in charge of data, applications, and middleware. It provides the highest level of flexibility and management control over the IT resources. As shown in Fig. 1, depending on the requirements of the consumers, cloud services can be deployed in four ways: 1. Public cloud: In this, resources are dynamically provisioned to the public over the internet and are owned by a CSP. It generally follows a pay-per-use model, wherein a customer only pays for the resources used by him, similar to a prepaid electricity metering system. 2. Private Cloud: In this case, cloud infrastructure is only available to a single organization via a private network. The organization manages cloud resources and applications. Because only authorized personnel has access to the cloud, and private clouds are more secure than public clouds. 3. Hybrid Cloud: A hybrid cloud provides IT solutions by combining public and private clouds. It allows some parties to access information over the Internet while providing secure control over cloud applications and data. 4. Community Cloud: It is a cloud infrastructure shared by multiple organizations that serve a specific community and is managed internally or by a third party. Rest of the paper is organized as follows. In Sect. 2, the related works are analyzed. Section 3 presents the methodology adopted. In Sect. 4, experimental setup is discussed. In Sect. 5, simulation and results are analyzed followed by the conclusion in Sect. 6.
2 Literature Survey In this section, we present the related works on the CloudSim [3] platform. CloudSim is a cloud simulation tool that allows users to simulate the cloud environment. To our knowledge, the default values set in the CloudSim examples have been used by the majority of researchers working in the domain of resource allocation. In this paper, we propose to dynamically allocate resources as per the demand of the customers. Yu [4] introduced an improved version of Particle Swarm Optimization (IPSO) to improve the efficiency of resource scheduling. Traditional Particle Swarm Optimization was hampered by the premature phenomenon and was unable to find the global optimal solution. In the velocity variation formula, constant cognition and social items of PSO were changed to coefficients that could vary with the number of iterations. Both algorithms’ operation times increased as the number of tasks increased,
624
S. Garg et al.
but the IPSO algorithm increased more steadily. The loads on the virtual machines in the two algorithms were different, with the IPSO algorithm having a more balanced load on the virtual machine (VM). Furthermore, the VM with greater processing capacity was assigned more tasks. In [5], author(s) proposed an algorithm which takes task length and deadline as input. Each user request’s completion time was calculated and compared to the deadline. If there was any violation, VM was reconfigured based on the host capacity and the workload was shifted to another VM. Thereby, creating a workload balance among VMs. The approach considered an important Quality of Service (QoS) parameter such as deadline and ensured efficient resource utilization; however, algorithm can further be improved to incorporate task requirements like priority. Another attempt at efficient task allocation has been made in [6] using Genetic Algorithm (ETA-GA). The approach improves the system’s decision-making abilities by assessing the fitness value of a proposed solution (chromosome) in order to efficiently divide work over multiple VMs while reducing the makespan and overall execution time. An improved Antlion optimization algorithm was proposed in [7]. In the paper, two experimental series on synthetic and real-world trace datasets were used. MALO surpassed other well-known task scheduling algorithms, with superior performance. When compared to other algorithms, MALO produced significantly better results. The authors of [8] presented this problem as a multi-objective optimization with the goal of minimizing Energy Consumption (EC) while maximizing Service Level Agreement (SLA). They created a VM placement model with predicted utilization as the heuristic and compared their results to AntAc (Ant-based solution with available capacity as the heuristic) and Power-aware Best Fit Decreasing (PABFD) heuristic methods. Their findings revealed that Antpu outperformed the competition. Deep Reinforcement Learning (DRL) was used in [9] to solve multi-objective problems in data centers in order to solve the VM placement problem. The authors managed concurrent workloads presented to an IaaS while pursuing multiple and conflicting goals such as used power and quality. The primary goal of the approach was to choose the best heuristic from a variety of feasible alternatives for placing each VM requested by users at every time step. Significant results were obtained, demonstrating the effectiveness of the DRL-VMP, particularly when used to handle workloads with significant fluctuations. The authors of [10] presented an improved version of the ABC algorithm called as Hybridizing Artificial Bee Colony (HBAC) for tasks allocation systems in virtual machines. Both the employed bee and the onlooker bee used the same equation in the earlier algorithm to find the candidate solution that made an unbalanced choice. As a result, the authors used new update rules derived from the Bat algorithm. The results showed that the algorithm outperformed other state-of-the-art algorithms in makespan time, overall accuracy, and response time. Another hybrid approach using Simulated Annealing (SA) was proposed in [11] to further optimize the task scheduling utilizing the ABC algorithm. The method increased overall efficiency by allocating resources more efficiently and reducing the makespan time. The authors of [12] presented an evolutionary algorithm called EVMC for minimizing energy consumption and optimizing QoS. Unlike traditional greedy algo-
CRACLE: Customer Resource Allocation in CLoud Environment
625
rithms, EMVC considered multiple resources for VM consolidation under varying load. The Support Vector Machine (SVM) method was used to trigger VM migration, Modified Minimization of Migrations (MMM) was used for VM selection, and Modified Particle Swarm Optimization (MPSO) was used for VM reallocation. In comparison with MBFD, EVMC increased energy efficiency by 30%, reduced VM migration, and balanced all PM resources. The authors of [13] designed and developed an application that aims to combine an individual’s various identity proofs to generate a UID number that contains all of the information about the individual’s identity proofs. The efficacious simulation was carried out using Visual Studio and Manjrasoft’s ANEKA platform. In this paper, we have used CloudSim for performing the experiment and resource allocation, and our methodology has been explained in Sect. 3.
3 Methodology Adopted Cloud computing works on the principle of sharing [14], wherein the available physical resources are shared among different users across organizations irrespective of platform and geo-location. To achieve the same, logical instances of these resources are created, which are then shared by multiple users. This process is called virtualization, and the logical instance is called a VM. Since more than one VM can exist on a single server, we require a software process known as a broker to manage these VMs. Similarly, to authenticate and grant resources to multiple users, cloudlets are used. A cloudlet (CL) is a software process that exists between an end-user and a broker to facilitate resource allocation. CloudSim provides an excellent platform for cloud simulation. However, it has only been used with fixed parameter values in the past. User interaction was never given enough weightage. Thus, in its first, a customer-driven framework for accepting inputs from the customers has been incorporated. To trigger the simulation in the CRACLE framework, the user needs to load the CloudSim tool. Then, initialize parameters such as the number of users, date, and time. If initialization fails, re-initialization is required. Further, array lists are created for the broker, VM, CPU, and CL. A user is then asked to input values for the required resources, which are mentioned below: 1. 2. 3. 4. 5. 6. 7. 8. 9.
CPU speed Image size Memory size Bandwidth Number of CPUs VM name Length CL I’d File size
626
Fig. 2 Flowchart of the proposed work
S. Garg et al.
CRACLE: Customer Resource Allocation in CLoud Environment
627
10. Output size. If the user fails to provide the input, prompt the user to input again; else establish a connection with the database. The parameters are then archived into the database, and the simulation is successful. The flowchart depicting the above process has been shown in Fig. 2.
4 Experimental Setup CloudSim is an open-source framework for modeling and simulating cloud infrastructure. It is preferred for simulation and resource selection because of the following advantages: 1. 2. 3. 4. 5.
Easy to download and setup Flexibility to define configurations Open source and free of cost Java-based environment Secure and reliable. The hardware and software used in simulation have been mentioned in Table 1.
5 Result The user was prompted to enter values for input parameters. The total time required to execute the program with input parameters as specified in Table 2 is 659 s. These values were passed to the broker for VM creation. The VM was created in the data center, and CL was then sent to the VM. All CLs were successfully executed in the CRACLE framework. The Java code for user input-driven CloudSim tool-based simulation has been showcased in Fig. 3. The performance execution chart with 03 virtual machines is represented in Fig. 4, where in blue bar indicates the CPU utilization and red bar indicates the memory consumption.
Table 1 Hardware and software deployed in simulation setup S. No. The hardware and software needs Description 1 2 3 4
OS CloudSim tool Java development toolkit CPU processor used
WINDOWS 10 Version 3.0 Version 1.8.0 Intel i5
628
S. Garg et al.
Table 2 Input parameters to CloudSim program S. No. Input parameters 1 2 3 4 5 6
CPU speed Number of CPUs Bandwidth Image size File size Memory
Fig. 3 Simulation snapshot of CloudSim tool
Values 1000 cycles per second 02 100 bits per second 4 MB 300 KB 200 MB
CRACLE: Customer Resource Allocation in CLoud Environment
629
Fig. 4 Performance execution chart
6 Conclusion Cloud computing works on the principle of sharing, wherein resources are rapidly provisioned as per the demands of the customer. This paper presents a CRACLE framework for resource selection that considers users’ requirements and provides them with a platform to interact with the environment. The experiments conducted in the paper show that CRACLE is able to handle the dynamic needs of the customers.
References 1. Danish J, Hassan Z (2011) Security issues in cloud computing and countermeasures. Int J Eng Sci Technol (IJEST) 3(4):2672–2676 2. Yang J, Chen Z (2010) Cloud computing research and security issues. In: 2010 international conference on computational intelligence and software engineering. IEEE, pp 1–3 3. http://www.cloudbus.org/cloudsim/ 4. Yu H (2020) Evaluation of cloud computing resource scheduling based on improved optimization algorithm. Complex Intell Syst 1–6 5. Shafiq DA, Jhanjhi NZ, Abdullah A, Alzain MA (2021) A load balancing algorithm for the data centres to optimize cloud computing applications. IEEE Access 9:41731–41744 6. Rekha PM, Dakshayini M (2019) Efficient task allocation approach using genetic algorithm for cloud environment. Cluster Comput 22(4):1241–1251 7. Abualigah LM, Diabat A (2021) A novel hybrid antlion optimization algorithm for multiobjective task scheduling problems in cloud computing environments. Cluster Comput 24(1):205–223 8. Varun B, Man Mohan Singh R (2021) AntPu: a meta-heuristic approach for energy-efficient and SLA aware management of virtual machines in cloud computing. Memetic Comput 13(1):91– 110
630
S. Garg et al.
9. Caviglione L, Gaggero M, Paolucci M, Ronco R (2020) Deep reinforcement learning for multiobjective placement of virtual machines in cloud datacenters. Soft Comput 1–20 10. Ullah A, Nawi NM (2021) An improved in tasks allocation system for virtual machines in cloud computing using HBAC algorithm. J Ambient Intell Humanized Comput 1–14 11. Muthulakshmi B, Somasundaram K (2019) A hybrid ABC-SA based optimized scheduling and resource allocation for cloud environment. Cluster Comput 22(5):10769–10777 12. Zolfaghari R, Sahafi A, Rahmani AM, Rezaei R (2021) An energy-aware virtual machines consolidation method for cloud computing: simulation and verification. Softw Pract Experience 1:194–235 13. Gupta S, Johari R (2017) UID C: cloud based UID application. In: 2017 7th international conference on cloud computing, data science and engineering-confluence. IEEE, pp 319–324 14. Koushal S, Johri R (2013) Cloud simulation environment and application load monitoring. In: 2013 international conference on machine intelligence and research advancement. IEEE, pp 554–558
Detection SQL Injection Attacks Against Web Application by Using K-Nearest Neighbors with Principal Component Analysis Ammar Hatem Farhan and Rehab Flaih Hasan
Abstract Web applications are exposed to many attacks, including SQL injection attacks, cross-site scripting, etc. This study will focus on attacks related to SQL-i. SQL-i injection leads to loss of confidentiality, integrity, and availability of data for users or organizations, as a result of which unauthorized persons have to access, update, and delete the user’s database, which leads to many risks at the individual or institutional level. Many methods for detecting SQL injection attacks include static analysis, dynamic analysis, and machine learning techniques. As a result, preventive measures must be implemented to combat the increased risk of SQL injection. This paper proposes using a model to detect these threats by applying machine learning algorithms, precisely the improved K-Nearest Neighbor algorithm, as the primary injection detection mechanism. Experiments show that applying the optimized K-Nearest Neighbor model with principal component analysis produces a dataset with significant advantages that improve model accuracy. After using the proposed model, the results showed good accuracy of 96.75% and time complexity, even with a difference in the number of features in the dataset used. Keywords SQLIA · K-Nearest neighbors · Principal components analysis · CountVectorizer
1 Introduction Organizations may now use web-based apps as the foundation for their daily operations thanks to advancements in technology and network connectivity. However, security for online applications is often weak and demanding in several sectors such as Intelligent Transportation Systems, Healthcare Systems, Industrial Technologies, E-commerce, and social activities. All of these are now accessible through web-based A. H. Farhan (B) · R. F. Hasan Computer Sciences Department, University of Technology, Baghdad, Iraq e-mail: [email protected] R. F. Hasan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_52
631
632
A. H. Farhan and R. F. Hasan
database-driving apps. The data is processed by these apps and saved in a back-end database server, where it is linked to the organization’s data. Most communications with consumers and users are conducted via the organization’s services, depending on the application. As anybody may use these programs throughout the globe, the attention of attackers who want to exploit these vulnerabilities was drawn to them. SQLIA is a method that may be used to take advantage of web-based driven database applications (SQL injection attack). When an attack modifies a programmer-approved query to access restricted data or do illegal data changes, it is known as SQLIA (SQL Injection Attack). SQLIA may occur in several forms depending on what the attacker is trying to do. Still, the most common source of SQLIA is poor user input validation, which the programmer should keep in mind when creating the application [1]. According to the statistics of the Open Web Application Security Project, it is one of the ten most dangerous types of attacks that database-based web applications are exposed to. The attack mechanism sends the attacker a request to provide specific data, including keywords and SQL operators. However, in terms of structure, a web application’s dynamic SQL query does not conform to the logic, semantics, or syntax intended by the application because of the wrong syntax. Moreover, the application accepts it, executes it, and returns the requested data from the attacker. This happens due to not taking appropriate measures to sanitize user input and using tools and techniques to curtail malicious attacks [2]. The structure of this paper is presented in the following details. In the second section, the relevant works are described. The third section will present the proposed approach to detecting SQL attacks using KNN algorithms with principal component analysis. The fourth and fifth sections will deliver the results and conclusions about the model.
2 Related Work Numerous research articles have advocated the use of machine learning or static or dynamic approach methodologies to implement, detect, and prevent SQLIA (Table 1).
3 SQLIA Proposed System 3.1 K-Nearest Neighbor (KNN) According to experts in this field, KNN is one of the top ten machine learning approaches. It uses the Cicero idea of cum paribus facillime congregator, which is well known (birds of a feather flock together). An unknown sample is classified using the known classification of its neighbors as a basis for classification. Let us
Detection SQL Injection Attacks Against Web Application by Using …
633
assume that we have a training set of samples that have already been categorized. Each sample should be categorized in the same way as its neighbors. Because of this, it is possible to anticipate the classification of a sample based on the classifications of the samples that are closest to it. An unknown sample may be used to calculate the distances between its samples and all the other samples in the training set, given Table 1 Represent related work about SQLIA Ref.
Method
Description
[3]
Neuron—fuzzy technique
The suggested method for detecting and preventing SQLIA is as follows: • It is by using the adaptive neural inference (ANFIS) model • The used model has been improved by including Fuzzy C-Means (FCM) to solve the uncertainty in SQL properties • The Scaled Conjugate Gradient algorithm (SCG) significantly boosted the suggested system’s speed • The suggested method was assessed using a well-known dataset, and the findings indicate that the proposed approach significantly improves the identification and prevention of SQLIAs
[4]
Ensemble algorithm classifies
The proposed model showed the ability to distinguish between malicious and legitimate queries; this model offers the ability to classify malicious SQL injections into three categories (simple, unified, or lateral) to determine the severity of the cyber-attack. If the query is not evil, it is added to the flow redirection scheme. Otherwise, it is blocked. If the query is invalid, the entry is blocked if a simple SQL entry attempt is detected
[5]
CNN-BiLSTM
The paper presented a machine learning-based hybrid CNN-BiLSTM approach for detecting SQLI attacks. In terms of accuracy and performance, the proposed CNN-BiLSTM model surpassed current machine learning algorithms by a wide margin
[6]
Black box testing
Suggestions for an automated black-box testing program that will automate the process of eliminating SQL Injection Vulnerabilities. When such assaults occur, the SQLIV is analyzed instantly. Additionally, it created the SQLIV Scanner employing an object-oriented paradigm to reduce the number of false-positive and negative discoveries (continued)
634
A. H. Farhan and R. F. Hasan
Table 1 (continued) Ref.
Method
Description
[7]
Deep learning
By finding patterns in the input, the model will determine if the data supplied was SQL injected or not. The system’s benefits will be that it will be capable of detecting all types of injection procedures. The model will do all feature extraction and selection on its own. Only the user should be required to input text. Additionally, it is scalable, allowing it to be used for many applications
[8]
Pattern-based neural network model
Existing techniques relied on a restricted collection of SQL query signatures, keywords, and symbols to detect injected queries. This study aims to extract SQL injection patterns using current parsing and tagging methods. Multi-layer Perceptron is used to train and model pattern-based tags, which performs very well at query categorization
[9]
BiLSTM-attention
The research employs neural networks to analyze SQL injection strings and identify their injection features. With the help of multi-string analysis and word2vec, we can vectorize the query string using a loss layer (long-term and short-term memory and gating cycle unit). Furthermore, it is possible to develop a BiLSTM-Attention detection model by adding an Attention mechanism to the BiLSTM model
[10]
Standard intrusion detection system in network forensics
Snort log files, which include information about attackers, are used in this study. Additionally, it may send out notifications through email or other digital notification systems when an attack is about to occur Use Snort intrusion detection system (IDS) to develop a web server-based network system that can recognize many methods of SQL injection attack The method uses a step-by-step process to evaluate risk based on NIST standards There are five stages to this study project: exploiting the site, simulating an attack, installing IDS, compiling the information collected, and finally, analyzing the results This effort developed a web server-based IDS snort system that can detect multiple SQL injection attacks and notify the system in real time via digital notifications
Detection SQL Injection Attacks Against Web Application by Using …
635
enough information from the training set. The sample in the training set with the lowest distance to the unknown sample is the one with the value. The unknown sample may thus be classified using the classification of this closest neighbor [11]. Classification is the process of sorting unlabeled data into groups based on various criteria. Classifiers often use KNN. It categorizes data based on the most similar samples in a specific area. This approach is popular due to the ease with which it may be implemented, and the little computing time it requires. It determines its closest neighbors for discrete data by using the Euclidean distance. KNN conducts two procedures when encountering a new unlabeled tuple in the dataset. First, it begins by analyzing the K points closest to the new data point, i.e., the K nearest neighbors. Second, KNN calculates which class the new data should be categorized into based on its neighbors’ classifications. As a result, the Euclidean, Manhattan, or Minkowski distances between the test sample and the provided training samples must be determined [12]. Algorithm K-Nearest Neighbor Step 1: Determine the value of k, which represents the number of K-Nearest Neighbors. Step 2: Calculate the distance between the new data point and all training points using the following equation: n D(X, Y ) = (X i − Y i )2
(1)
i=1
Step 3: Choose the K-Nearest Neighbors by the value of k. Step 4: Assign the new data point to the category in which the majority of the data points belong.
3.2 Data Pre-processing In data mining, preprocessing is critical since it involves modifying and preparing data. Data preparation aims to minimize the quantity of data, build correlations between data, normalize data, and eliminate outliers from the dataset. Data cleaning, integration, reduction, and integration are among the approaches used [13]. The data are frequently not in a processable state when gathered. The data may be encoded in detailed logs or sheets. In many circumstances, a free-form document might include data from several sources. To process data, they must be transformed into a format that data mining algorithms can work with, such as multidimensional, time series, or semi-structured. The most popular design is multifaceted, where distinct data fields correspond to different measurable traits called features, attributes, or dimensions. Extracting beneficial characteristics for mining is critical.
636
A. H. Farhan and R. F. Hasan
Feature extraction is commonly conducted along with data cleaning, which estimates or corrects missing or incorrect data. Data is taken from numerous sources and then consolidated for processing in many circumstances. This approach yields an organized data collection that a computer application may utilize. After feature extraction, the data may be processed in a database. This phase begins after the collection of the data, and it consists of the following steps: Feature extraction. This phase primarily relies on the analyst’s ability to abstract away the essential aspects of a particular application. For instance, in a credit-card fraud detection program, the dollar amount of a charge, the regularity with which it occurs, and the location are often excellent fraud markers. On the other hand, numerous other characteristics may be less reliable markers of fraud. As a result, extracting the appropriate characteristics is often a talent that needs to grasp the application area in question. Feature selection and transformation. Several machine learning techniques do not operate properly when highly high dimensional data. Additionally, many higher dimensional attributes are noisy and may introduce mistakes to the data processing process. Therefore, multiple strategies are utilized to either delete unnecessary attributes or convert the present collection of features into a new data space that is more conducive for analysis. For missing data estimates, the data cleaning procedure necessitates the employment of statistical approaches. It is another common practice to delete inaccurate data entries to get findings that are more precise through data analytic. The data cleaning procedure necessitates using statistical techniques that are often employed for missing data estimates. Additionally, incorrect data entries are often eliminated to provide more precise results [14]. countVectorizer. To overcome the issue of machine learning algorithms not being able to interpret text directly, we use numerical vectors to represent the text. For example, the words indicate category characteristics in the articles, and a single vector will provide each phrase. Vectorization is the name given to this technique. CountVectorizer and TF-IDFVectorizer are two of the most often used approaches for text vectorization. Both of these vectorizers are used to create vector representations of textual data. In contrast to countVectorizer, TF-IDFVectorizer records the weighted frequency of each token about a document’s total number of tickets [15]. CountVectorizer is a widely used technique for calculating class features and extracting text characteristics numerically. It evaluates just the words in the training text that occur often. Additionally, CountVectorizer turns the text into a word frequency matrix, which is then used to determine the number of repetitions of each word using the matrix fit transform function. The CountVectorizer class has many arguments and may be separated into three processing steps: preprocessing, token generation, and n-g created [16].
Detection SQL Injection Attacks Against Web Application by Using …
637
3.3 Principal Component Analysis (PCA) Feature extraction is the process of extracting the original signal’s feature properties. The extraction of acceptable features and effective characteristics is critical for the subsequent data analysis operation. PCA is a helpful mathematical technique for reducing the number of variables in a dataset to a manageable number. The new variables retain the majority of the old variables’ information. The PCA is an excellent method for reducing data dimensions and removing correlations between them [17, 18]. PCA Goals: • • • •
A data table’s pertinent data is retrieved and extracted. Retaining just the most essential data to reduce the dataset’s overall size. The dataset’s description should be simplified to make it easier to understand. Finding out how variables and observations are related to one other [19].
The algorithm below illustrates the stages involved in implementing the PCA reduction approach. Component Analysis Based on Principle Components (PCA) [20] Input Dataset Steps of Algorithms Step1: Applied standardize the dataset by calculating: Mean μ=
N i Xi N
(2)
Where: X i : Value of Each Attribute N: All Feature standard deviation σ =
X i −π 2 N
(3)
calculate Standardization formula X −μ
X new = iσ (4) Step 2: Compute a covariance matrix for all dataset
COV(X, y) =
N I =1 X i −X ∗ yi −y N
(5)
Step 3: Compute eigenvalues and eigenvectors Matrix(E) = det (Ev I − COV(D)) (6) Where: /I = identity matrix det = determine of matrix Step 4: Sort the eigenvalues and the eigenvectors that correspond to them Step 5: Select k eigenvalues and construct an eigenvector matrix Step 6: Transform the matrix in its original state Transformed Data = Feature matrix * top k eigenvectors Output: new dataset after reduction End
After doing PCA, the variables in the dataset were reduced from 10,321 to 15.
638
A. H. Farhan and R. F. Hasan
3.4 SQL-i Datasets The study conducted in this paper needs a dataset comprising both malicious and benign SQL commands for SQL-i payload. The dataset, including 33,762 cases, was separated into two groups learning phases and testing phases of the classifier about SQL-i attacks. The learning phases have 27,008 instances. The dataset used for testing phases has 6753 instances. This data was collected SQL-i and benign instances from the Kaggle dataset, where Table 2 represents instance of datasets SQL-i.
3.5 Proposed (SQLIA) Approach Design This section describes the design procedures for the proposed (SQLIA) strategy, which combines dimensional reduction approaches (PCA) with a supervised machine learning algorithm (KNN) to increase model accuracy and efficiency. The suggested (SQLIA) technique also includes a preparation step in extracting features from text using the CountVectorizer. Then, a mathematical technique called PCA is used to generate data models for the features that have the most significant influence on the prediction model’s accuracy. Then, using the Holdout technique, the data-splitting step is performed. Following the data-splitting step, the PCA findings are fed into a supervised machine learning methodology (KNN) to increase the accuracy and efficiency of prediction model (Fig. 1). Table 2 Represent instance of datasets SQL-i select * from users where id = ‘1’ or @ @1 = 1 union select 1,version () – 1’,1 select * from users where id = 1 or 1#” ( union select 1,version () – 1,1 ‘ select name from syscolumns where id = ( select id from sysobjects where name = tablename’) –,1 select * from users where id = 1 + $ + or 1 = 1 – 1,1 1; ( load_file ( char ( 47,101,116,99,47,112,97,115,115,119,100))),1,1,1;,1 select * from users where id = ‘1’ or ||/1 = 1 union select 1,version () – 1’,1 select * from users where id = ‘1’ or \. < \ union select 1,@@VERSION – 1’,1 ? or 1 = 1 –,1 ) or ( ‘a’ = ‘a,1 admin’ or 1 = 1#,1 select * from users where id = 1 or “ (]” or 1 = 1 – 1,1 or 1 = 1 –,1
Detection SQL Injection Attacks Against Web Application by Using …
639
Dataset Before Preprocessing
Conditional Attribute
Decision Attribute
Applied CountVectorizer to convert text to vector value by Remove: Max_df is used for removing terms that appear too frequently Min_df is used for removing terms that appear too infrequently Remove Stopword.
Result: New Feature content value (0, 1)
Principal Component Analysis (PCA)
Merge conditional attribute and decision attribute after preprocessing and PCA
New Dataset
Learning phase
Testing phase
Apply machine learning algorithm using K-NN
Prediction Model
Evaluation
Fig. 1 Diagram represents Proposed (SQLIA) approach design
640
A. H. Farhan and R. F. Hasan
Table 3 SQL-i KNN classifier performance Accuracy
Precision
Recall
F1-score
95.80
99.35
88.15
95.80
Confusion matrix Benign
SQL-i
Benign
4461
13
SQL-i
270
2009
Table 4 SQL-i KNN with PCA classifier performance Accuracy
Precision
Recall
F1-score
96.75
97.11
93.15
96.75
Confusion matrix Benign
SQL-i
Benign
4411
63
SQL-i
156
2123
4 Result and Discussion This section summarizes the findings of tests undertaken to assess the effectiveness of a classifier (using 15 features). The KNN classifier was used to classify payloads as SQL injection or benign. The technique was evaluated using a holdout procedure. Tables 3 and 4 below summarizes the holdout assessment; the entire training dataset was used to develop the classifier and then tested using the complete testing dataset. The above data presents a comparison between the two models. The first model achieved an accuracy of 95.80%, while the improved model achieved an accuracy rate of 96.75%. The confusion matrix depicts correctly classified and incorrectly classified events. This part will compare the results obtained from the proposed technique with the results achieved from related works described in the second section of this study. The presented model results showed high accuracy and less time in implementation due to the application of principal components analysis with the K-Nearest Neighbors algorithm, which reduced the dimensions in the dataset from 10,321 to 15, which had a good effect in terms of accuracy and speed.
5 Conclusion Detecting attacks against web applications is very important to protect the back-end database from SQL injection attacks. This paper presented a hybrid technique for detecting SQL injection attacks. The proposed hybrid technology is implemented in two main stages: The first stage uses the principal component analysis method to reduce the dimensions of the dataset while maintaining the excellent performance of the proposed model. The second stage after minimizing the dimensions of the dataset
Detection SQL Injection Attacks Against Web Application by Using …
641
in the first stage, machine learning algorithms are used to defend online applications against SQLIA, specifically, the KNN algorithm for classifying the input type by application users. As a result, the proposed model produced accuracy and accuracy rates higher than 96%. They obtained these results through training classifiers with a structured dataset containing 33,761 samples, which collected malignant and benign transcripts from different reliable sources. In addition, the features recovered using PCA are the main reason for the excellent accuracy of the approach due to the reduction of dataset dimensions.
References 1. Aliero MS, Qureshi KN, Pasha MF, Ghani I, Yauri RA (2020) Systematic review analysis on SQLIA detection and prevention approaches. Wireless Pers Commun 112(4):2297–2333. Springer. https://doi.org/10.1007/s11277-020-07151-2 2. Das D, Sharma U, Bhattacharyya DK (2019) Defeating SQL injection attack in authentication security: an experimental study. Int J Inf Secur 18(1):1–22. https://doi.org/10.1007/s10207017-0393-x 3. Nofal DE, Amer AA (2021) SQL injection attacks detection and prevention based on neuro— fuzzy technique. In: Machine learning and big data analytics paradigms: analysis, applications and challenges, pp 93–112. Springer 4. Kasim Ö (2021) An ensemble classification-based approach to detect attack level of SQL injections. J Inf Secur Appl 59:102852. https://doi.org/10.1016/j.jisa.2021.102852 5. Gandhi N, Patel J, Sisodiya R, Doshi N, Mishra S (2021) A CNN-BiLSTM based approach for detection of SQL injection attacks. In: 2021 International conference on computational intelligence and knowledge economy (ICCIKE), pp 378–383 6. Thombare BM, Soni DR (2022) Prevention of SQL injection attack by using black box testing. In: 23rd International conference on distributed computing and networking, pp 266–272 7. Jothi KR, Pandey N, Beriwal P, Amarajan A et al. (2021) An efficient SQL injection detection system using deep learning. In: 2021 International conference on computational intelligence and knowledge economy (ICCIKE), pp 442–445 8. Arock M et al. (2021) Efficient detection Of SQL injection attack (SQLIA) Using pattern-based neural network model. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp 343–347 9. Wen P, He C, Xiong W, Liu J (2021) SQL injection detection technology based on BiLSTMattention. In: 2021 4th International conference on robotics, control and automation engineering (RCAE), pp 165–170 10. Bhardwaj S, Dave M (2021) SQL injection attack detection, evidence collection, and notifying system using standard intrusion detection system in network forensics. In: Proceedings of international conference on computational intelligence, data science and cloud computing, pp 681–692 11. Seidl T (2009) Nearest neighbor classification. In: Encyclopedia of database systems, vol 1, pp 1885–1890. https://doi.org/10.1007/978-0-387-39940-9_561 12. Taunk K (2019) A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International conference on intelligent computing and control systems, ICCS 2019, pp 1255–1260 13. Alasadi SA, Bhaya WS (2017) Review of data preprocessing techniques in data mining. J Eng Appl Sci 12(16):4102–4107 14. Aggarwal CC et al. (2015) Data mining: the textbook, vol 1. Springer 15. El Rifai H, Al Qadi L, Elnagar A (2022) Arabic text classification: the need for multi-labeling systems. Neural Comput Appl 34(2):1135–1159. https://doi.org/10.1007/s00521-021-06390-z
642
A. H. Farhan and R. F. Hasan
16. Yang JS, Zhao CY, Yu HT, Chen HY (2020) Use GBDT to predict the stock market. Procedia Comput Sci 174(2019):161–171. https://doi.org/10.1016/j.procs.2020.06.071 17. Elameer AS (2017) Feature extraction techniques on facial images: an overview. Int J Sci Res 6(9):2015–2018. https://doi.org/10.21275/ART20176682 18. Chen H et al. (2020) Multi-fault condition monitoring of slurry pump with principle component analysis and sequential hypothesis test. Int J Pattern Recogn Artif Intell 34(7). https://doi.org/ 10.1142/S0218001420590193 19. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdisc Rev Comput Stat 2(4):433–459 20. Swain D, Laishram M, Taraphder S (2017) Multivariate statistical data analysis- principal component analysis PCA. https://doi.org/10.5455/ijlr.20170415115235
Certain Investigations on Ensemble Learning and Machine Learning Techniques with IoT in Secured Cloud Service Provisioning S. Sivakamasundari and K. Dharmarajan
Abstract Malicious attacks are common among the Internet of Things (IoT) devices that are installed in several locations like offices, homes, healthcare facilities, and transportation. Due to massive amounts of data created by IoT devices, machine learning is frequently used to detect cyber-attacks on these devices. The fact that fog devices may not have the computing or memory capability to identify threats in a timely manner is a source of concern. According to this article, machine learning model selection and real-time prediction can both be offloaded from the cloud, and both jobs can be performed by fog nodes, which are distributed computing devices. A cloud-based ensemble machine learning model is constructed using the provided method, and subsequently, attacks on fog nodes are identified in real time using the model. The NSL-KDD dataset is used to evaluate the performance of this approach. The results indicate that the proposed technique is effective in terms of a variety of performance criteria, including execution time, precision, recall, accuracy, and the F1 measure. Keywords Ensemble learning · Machine learning · IoT · Cloud service
1 Introduction When it comes to the Internet of Things (IoT), it is a broad term that refers to numerous services that connect various types of devices in organizations or with individuals in unimaginable ways. The IoT is a huge network of networked devices, each having a distinct function, design, and owner, which are all connected to one another. Smart S. Sivakamasundari (B) Department of Computer Science, Vels Institute of Science and Technology and Advanced Studies (VISTAS), Chennai, Tamil Nadu 639117, India e-mail: [email protected] K. Dharmarajan Department of Information Technology, Vels Institute of Science and Technology and Advanced Studies (VISTAS), Chennai, Tamil Nadu 639117, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_53
643
644
S. Sivakamasundari and K. Dharmarajan
devices include things that are all part of various application systems [1]. IoT is powered by embedded systems that are found across the network. These systems use a combination of hardware and software to perform specialized functions as part of a more comprehensive system [2]. Meanwhile, cyber-attacks are growing more sophisticated and ubiquitous even as the IoT continues to rise [3]. It has, as a result, evolved into a top priority and a major effort [4] to secure devices in such a diversified environment. On the other hand, traditional security solutions are too taxing on IoT devices because of their low CPU capacity [5]. In FFRFDTC, final strong classifier result is obtained by training each base classifier (i.e., FFDT) on a training set sampled with replacement from the original training set. The FFRFDTC aggregates the individual FFDT prediction results to combine into a final prediction based on a majority vote and thereby design strong classifier for user authentication in cloud [6]. An Internet service provider in the United States was targeted by a DDoS attack in which a large number of IoT devices were used to launch the attack [7]. Researchers are growing increasingly interested in intrusion detection systems (IDS) as a means of protecting IoT devices from externally launched attacks [8]. It is now possible to detect and report network anomalies created by attacks with control nodes in the relevant network, which can subsequently take steps to restrict the flow of traffic [9]. Functional metrics, which demonstrated their ability to reliably predict outcomes, were the primary focus of these techniques, which overlooked operational considerations. ML is defined by the requirement for computer processing power at every stage of the process. It is necessary to clean and scale the data samples that have been obtained. For the purpose of developing a model, it is important to categorize, fit, and evaluate the attributes. In order to avoid harmful packets slipping through the gaps, all of these tasks must be executed in the proper sequence [10]. • The machine learning accuracy in identifying the abnormalities of traffic and attack types has been improved, but researchers have paid little attention to whether or not those models will be put into practice. • Our solution for managing ML-IDS allows for the ML model adaptation in the operational contexts arising from the IoTs, thus bridging the functional and operational feasibility gap in the IoT. We are attempting to achieve a balance between machine learning and operational measures. In order to cope with a large number of IoT devices on the market, we have integrated cloud functionality into our framework. • The framework with two layers namely the cloud and IoT layer, both of which communicate with one another. The service might adapt the models that were developed to best fit the operating resources of each device by including training aspects into the device models that were already in place. At this tier, IoT devices are only capable of performing activities including attribute extraction from the network traffic and anticipating anomalies using intrusion detection systems (IDS). • A manageable group of IoT devices with similar roles and resources is created by the framework in order to lessen the strain on cloud services. This information
Certain Investigations on Ensemble Learning and Machine Learning …
645
is used to create batches of samples, which are then disseminated to devices that can perform incremental training on anomalies that are found by the devices themselves.
2 Related works The accuracy of anomaly intrusion detection systems must be evaluated and confirmed in order to assure reliable performance [11]. In the course of the research, the following features are included: network configuration, traffic labeling, capture, interactivity, attack diversity, available protocols, heterogeneity, anonymity, metadata, and feature set [12]. To achieve a high detection rate over entire attacks that are described within the dataset, machine learning approaches can benefit from a set of principles outlined. Machine learning algorithms do not allow for the use of null values. It is critical that the dataset includes the appropriate kinds of feature values for machine learning algorithms to function properly. In the dataset, there were feature-to-feature connections discovered. When correlations are high, it is likely that features will be duplicated, finding the most important characteristics and values to use in identifying different types of attacks. Certain features do not impact the detection of investigated attacks. The process of identifying the most effective approach for balancing the dataset and assessing how well balanced the dataset is if the dataset is uneven, the machine learning system will favor attacks with a larger sample size since they have a higher chance of success. Finding the most appropriate normalization function for your dataset is important because not all normalization approaches are appropriate for all datasets. IoT collects the information from an entity and sends the information to the cloud server. In the generation phase, the FSCHSTIMRSVRDA technique uses the Fast Syndrome-Cryptographic Hash function to generate the hash value for the cloud user data and send it to the cloud server for dynamic storage [13]. It is possible that there will be difficulties in establishing a training set. Because of the large number of web apps, endpoint devices and cookies should be considered as part of the overall protection plan [14]. In order to perform high-accuracy classification, the quality of a dataset must be increased by overcoming certain difficulties. The granularity of attribute extraction in a dataset can be divided into two categories: packet- and flow-based [15, 16]. Several datasets including UNSW-NB15 and the CICDDoS2019 dataset are designed to detect packet-based DoS attacks. There are labeled datasets that allow for both traffic granularities which are all available online [17]. It is possible that IoT devices are vulnerable to a wide range of attacks because of their inadequate computing capabilities, insufficient encryption [18], inefficient authentication, insecure web services and authorization methods, and heterogeneous nature. Implementing security solutions consistently across IoT devices is, as a result, difficult [19]. In the following sections, the study goes through some of the most
646
S. Sivakamasundari and K. Dharmarajan
recent research on IoT security. Performance indicators for both the IDS functionality as well as the underlying operating platform are highlighted in each reference, with the most essential features of each indicator being highlighted in each reference [20]. The cloud server classifies the collected information by designing Adaptive Discriminant Quadratic Boosting Ensemble Classifier (ADQBEC). After that, the classified data gets stored in the Radix Hash Tree-Based Secured Cloud Data Storage (RHT-SCDS) for easy data access. Bakhsh et al. [21] proposed an adaptive IDS onto IoT devices that make use of agent technology in order to offer portability, rigidity, and self-starting features in addition to other features. It was placed in the middle of the network with the intention of serving all the IoT devices. Soe et al. [22] tested the capabilities of the Raspberry Pi 3 Model B to design and execute an IDS with over 175,000 instances and 49 characteristics using the UNSW-NB15 dataset. Correlation-based Feature Selection (CFS) technique picks only seven features for each of the nine attack categories in order to train the J48. When compared to non-CFS models, CFS models significantly reduced training and testing times by 100%, whereas the metrics were reduced by 10% (0.8). CFS was used to speed up training and testing for the Naive Bayes Classifier (NB) algorithm by 300%, whereas ML metrics fluctuated and were reduced (0.1–0.8) for different attack types. For resource-limited IoT networks, Thamilarasu et al. [23] developed a Deep Learning model. The model is composed of 34 layered nodes and three hidden layers. The Raspberry Pi 1 Model B was used to train and test the model, which was trained and tested using 40,000 training and 20,000 test examples of a locally constructed dataset of six packet-based attributes. When measured using machine learning techniques, the model had a success rate of more than 90%. This study did not examine the testbed’s ability to forecast real network traffic; nonetheless, the results revealed that it was capable of detecting a number of specific attack scenarios such as a black hole, opportunistic service, and DDoS attacks, as well as wormhole and sinkhole attacks. A study conducted [24] looked at seven machine learning classification methods and evaluated how well they performed. Precision, specificity, and accuracy were some of the metrics used. For the sake of gathering our statistics, we simply looked at how long different classifiers took on average to categorize a single instance. This was a hybrid system that included both host-based and network-based capabilities for identifying misuse and anomalies in order to detect them. When an attack or intrusion occurs, the system analyzer employs detection criteria to halt all communications between the computer and the network. Security plays an essential role during the service provisioning process in cloud server. Secured cloud service provisioning is the allocation of the resources and services to the customer in secured manner. Many researchers introduced service provisioning techniques with better security in cloud environment. Anthi et al. [25] developed network-based IDS for IoT using anomaly and signature-based detection. DoS attacks were tested using SYN and UDP Flood Attacks, which were utilized in conjunction with one another. IDS, which was
Certain Investigations on Ensemble Learning and Machine Learning …
647
installed on a MacBook and used to conduct the testbed, kept an eye on the network for any anomalies. The use of several machine learning for network-based intrusion detection systems using the graphical processing unit (GPU) of a high-end desktop computer. The study’s main purpose was to find a way to boost the speed of MLP processing by adjusting the batch size. It was found that the computing technique was 1.4–2.6 times faster. Doshi et al. [26] examined machine learning using the integrated Python library for a deep neural network, and other libraries for each method. In the research, packet and flow-based instances from constructed datasets were explored, and all algorithms, with the exception of the LSVM, achieved ML metrics of 0.99% for all algorithms, with the exception of that algorithm (less recall score). According to the findings, in the context of specified IoT network behaviors, such as limited edge nodes [27] and specific intervals between packets between them, the researchers improved their capacity to detect DDoS in IoT network data by picking features based on these assumptions [28]. Yassin et al. [29] proposed a cloud-based intrusion detection service based on signatures, which would employ signature-based anomaly detection to detect attacks. Thus, captured traffic instances from the client’s own cloud are received by these services. They are then analyzed in the SaaS cloud, and the results are returned to the customer. They found that MapReduce apps are effective for doing several ML operations. Proposed Method The proposed architecture (as depicted in Fig. 1) is divided into two major layers: the cloud service layer, which is in charge of managing machine learning-based intrusion detection models, and IoT device layer. Figure 1 illustrates SOA architecture, and the details of the architectural levels and their relationships are provided in the following section. Cloud Service Layer It is necessary to employ Software as a Service (SaaS) in the cloud to develop and train models for each individual collection of devices, and this is done in the cloud. Because of the varied designs and roles played by the millions IoT devices, an IoT service will necessitate the use of huge cloud computing resources. Cloud computing can be done by a single cloud node or by a group of cloud nodes. This is determined according to the amount of node burden and the number of devices in each given device set. By distributing device sets among a large number of nodes, it is possible to achieve a balanced distribution of burdens. The components present in the cloud nodes are illustrated in Fig. 2. • Device Benchmark: It is required to understand the resources available on the edge device as well as the average network traffic in order to execute the IDS model. Attribute Weighing makes use of this component to install the benchmark
648
Fig. 1 Cloud-based adaptive IDS
Fig. 2 Cloud functional components
S. Sivakamasundari and K. Dharmarajan
Certain Investigations on Ensemble Learning and Machine Learning …
•
•
•
• • •
649
agent on a particular device set and collect the data collected by the agent during the installation process. Attribute Weighing: It is used to assign operational weight to each and every significant aspect that is a part of the model-building process (based on storage size for each feature). Each of these weights represents the amount of labor that goes into feature extraction on the edge devices, and they should have an effect on the modeling process. When doing calibration, it is the job of the Model Calibration component to make use of the weighted features that were specified. Model Calibration: Using operational weights for significant features, it adjusts model parameters and excludes features that overload devices. Consider the following scenario: If all 15 characteristics were assigned equal operational weights, the result would be a 50% overload, which would indicate that only 10 features would be picked for the model. Batch Creator: In order to use the Batch Creator, it is necessary to construct batches of instances that can be used for incremental training purposes. Batches have a predefined size before they can be applied in Train Agent, and this size must be met before the batch can be applied. Batch Cleansing: This feature adapts the batch sizes created to the characteristics of each device. Device replication is prevented by removing instances that have been reported by a device from the relevant set batch. Distribution Agent: This component is responsible for distributing new models to the devices that are part of a specific device set. Train Agent: When new models are created, they are trained with appropriate assembled and updated data; otherwise, old models are incrementally trained with new batch data.
IoT Device Layer According to their functions and operating systems, this layer divides IoT devices into groups or sets. This group includes the cloud service that we will be addressing in this article. An edge device would be able to foresee irregularities and attacks based on cloud service metrics, and it would also be able to report them back to the cloud service. The IDS that tests the IoT devices has five key functional components that must be addressed (see Fig. 3). It is possible to use either a packet-based or flow-based intrusion detection system. Model prediction is used to determine whether or not a packet is abnormal in network traffic, which is done on a packet-by-packet basis. Both kinds of operations require the use of a training dataset to be successful. The functional components of the system are substantially same in all modes of operation. The functions of each component are discussed in greater detail in the sections that follow: • Monitor Agent: When packets arrive at their destination, the Monitor Agent captures them and sends them to the Attribute Builder for further analysis, either individually or as part of a flow set.
650
S. Sivakamasundari and K. Dharmarajan
Fig. 3 Proposed IDS structure for IoT devices
• Attribute Builder: This component is in charge of extracting the attribute from individual packets and generating what we refer to as instance data and this is analyzed by machine learning model. • Checker Agent: It collects data from each instance and analyzes it to determine whether or not there is an anomaly. The cloud node communicates the alerting instance to the cloud node, which then distributes the alarm to the other devices in the network. • Train Agent: The model is progressively trained by a special agent known as the train agent, which uses locally observed anomalies to build batches of data that are applied to the model when the device is idle. Along with instances supplied by the device, this agent receives the tagged instances, which are used for model training. • Block Agent: It is the block agent responsibility to prevent traffic from suspected sources from being accepted. This can be detected and reported to either the device
Certain Investigations on Ensemble Learning and Machine Learning …
651
Fig. 4 Ensemble classification model
or a firewall, with the proper action taken. Figure 3 depicts the architecture of the planned IDS and defines the major responsibilities of each of its constituent parts. Ensemble ML as a Model When combined with fundamental machine learning classifiers, these approaches produce the best potential results in terms of precision, accuracy, and execution time. We also make use of the most basic machine learning classifiers because they are the easiest to implement. In this step, the data, base, meta, and selection layers are depicted in Fig. 4. Base classifiers (B2) and KNN (B3) are utilized in the base layer, and they are used to apply alternative combinations of the base classifiers (B2) and (B3). Results from diverse combinations are aggregated in the meta-layer using ensemble methods. The accuracy, precision, recall, receiver operating characteristic (ROC), and execution time of each ensemble technique are all examined. The optimal model is determined by combining base classifiers with an ensemble approach, which in turn determines the model with the best outcomes.
3 Results and Discussions A large feature set and a moderate-sized dataset were used to fit a model, and the outcomes of this area of the inquiry results were highlighted. According to different machine learning models, such activities should be delegated to the cloud layer. According to the findings of this study, researchers can estimate the size of models that are frequently downloaded from the cloud to IoT devices, and the results suggest that they are practical. It is possible to predict the future by using machine learning.
652
S. Sivakamasundari and K. Dharmarajan
A cyber-attack detection system is depicted in this study using machine learning models. Machine learning algorithms were trained to predict cyber-attack scores using data from prior cyber-attacks on an open-source website. In order to detect an attack at its earliest possible stage, this research also examined multiple linear machine learning algorithm-based categorization models. The investigation testbed consists of a Windows host computer with Intel Core-i7 at 2.4 GHz and 16 GB of RAM, which is running as a Raspberry Pi 4. As a result of this research, a virtual guest execution cap of 20% is set in order to meet the benchmarks used in this study. Allowing the VirtualBox acceleration function to run, which makes use of four host CPU cores, enabled the simulation of the Raspberry Pi 4 quad-core CPU. With 4 GB of memory, 800,000 cases were examined, with 100 features designated as benign and attack samples, all of which were classified as benign. Because it is not possible to mimic GPU acceleration on the Raspberry Pi in a virtual environment, the experiment was carried out in a Python notebook. In order to conduct our study, we employed the Random Forest Classifier (RFC) to identify 15 important characteristics. Even with a Raspberry Pi 4, running a classifier for feature categorization is a practical impossibility due to the limited resources available. We looked at five machine learning models (DT, NB, KNN, LR, and DNN) that have recently been referenced by a large number of models. To save time, we pre-selected four key features when fitting the models instead of relying on the Random Forest Classifier to select 15 critical features. When training on four features only, the NB model acquires a computational time of 1.41 s, whereas 15 features of training require 124 s. In comparison, the four-feature model requires 1.41 s of CPU time. In two different scenarios, the DNN was configured with 15 and 4 input layers, each of which had a hidden layer, and each was tested. The former ran six epochs rather than overfitting, while the latter ran four epochs for 40 ms each rather than overfitting. According to the results, DNN placed last in terms of the smallest model size. In the current state of network technology, it is possible to download updated models (other than the KNN) from the cloud that are only a few kilobytes in size (Figs. 5, 6, 7 and 8). With a classification precision of 1.0, there are no false positives, but a classification precision of 0.0 shows that the model was unable to identify meaningful labeled feature samples in the training dataset. While there are no erroneous negative predictions when the recall is one hundred% (100), a recall of zero% (0.0) indicates that the model was unable to recognize any relevant labeled feature samples. Due to the fact that the F1 score is defined as the weighted average of recall and precision for the relevant labeled feature, it is possible to identify model weaknesses by examining both metrics for the relevant labeled feature. Model correctness is defined as the ratio of predicted observations to observed data, with a higher value indicating greater accuracy of the model. Furthermore, to ensure overall model accuracy, the F1 score should be checked to ensure that all labeled features are operating at the same level of accuracy. About 1% of training samples contain UDPLag attacks, UDPLag attacks are extremely rare in training samples for the ML models.
Certain Investigations on Ensemble Learning and Machine Learning …
Fig. 5 Accuracy
Fig. 6 Precision
653
654
S. Sivakamasundari and K. Dharmarajan
Fig. 7 Recall
Fig. 8 F1 score
Results from the DT, NB, LR, and KNN for the four labeled features differ from those obtained from the other models. Overall, the DT outperformed the competition in all four areas, with the LR having the worst results. Due to the fact that it is the only model that allows for incremental training, the Neural Network model beats
Certain Investigations on Ensemble Learning and Machine Learning …
655
the other models in terms of both operational and functional efficiency. By reducing the amount of time, it takes to derive attack prediction features from gathered traffic samples, and optimizations of this type can be achieved.
4 Conclusions and Future Work This study describes a successful ML-based IDS for resource-constrained IoT devices that make use of a cloud-based ML-based security solution. A cloud framework has been built in order to organize IoT devices into manageable matching sets and offload the activities of ensemble ML to dedicated cloud computing resources. Cloud-based updates are delivered to a distributed IDS model that has been optimized for the computing capabilities and workload of each device in the group. In an ideal world, the only functions of an IDS would be anomaly detection and feature extraction from network traffic. When abnormalities are discovered, they are sent to the cloud service, which then does more analysis and model training. Testing of the cloud service algorithm on a range of IoT devices to ensure that it performs as intended is now in progress. The proposed ensemble model is tested on the NSL-KDD dataset, and the results of the simulation show that the proposed method. In future, we will validate the proposed framework with real-time analysis and achieves a higher rate of accuracy, precision, recall, and F1 measure.
References 1. Akhtar MS, Feng T (2022) Comparison of classification model for the detection of cyber-attack using ensemble learning models. EAI Endorsed Trans Scalable Inf Syst e39–e39. https://doi. org/10.4108/eai.1-2-2022.173293 2. Shahraki A, Abbasi M, Taherkordi A, Jurcut AD (2022) A comparative study on online machine learning techniques for network traffic streams analysis. Comput Netw 207:108836. https:// www.sciencedirect.com/science/article/pii/S1389128622000512 3. Omolara AE, Alabdulatif A, Abiodun OI, Alawida M, Alabdulatif A, Arshad H (2022) The internet of things security: a survey encompassing unexplored areas and new insights. Comput Secur 112:102494. https://www.sciencedirect.com/science/article/pii/S0167404821003187 4. Ahmad W, Rasool A, Javed AR, Baker T, Jalil Z (2021) Cyber security in iot-based cloud computing: a comprehensive survey. Electronics 11(1). https://www.mdpi.com/2079-9292/11/ 1/16 5. Sivakamasundari S, Dharmarajan K (2021) Fast and frugal random forest decision tree classifier based cloud user authentication for secure cloud IoT services. J Mech Eng 6(3). ISSN: 09745823. Anna University Annexure-https://Kalaharijournals.Com/Ijme-V6-3-2021.Php 6. Sharma P, Jain S, Gupta S, Chamola V (2021) Role of machine learning and deep learning in securing 5G-driven industrial IoT applications. Ad Hoc Netw 123:102685. https://doi.org/10. 1016/j.adhoc.2021.102685 7. Ahakonye LAC, Amaizu GC, Nwakanma CI, Lee JM, Kim DS (2021) Enhanced vulnerability detection in SCADA systems using hyper-parameter-tuned ensemble learning. In: 2021 International conference on information and communication technology convergence (ICTC), pp 458–461. IEEE
656
S. Sivakamasundari and K. Dharmarajan
8. Singh S, Sulthana R, Shewale T, Chamola V, Benslimane A, Sikdar B (2021) Machine-learningassisted security and privacy provisioning for edge computing: a survey. IEEE Internet Things J 9(1):236–260. https://ieeexplore.ieee.org/document/9490350 9. Sivakamasundari S, Dharmarajan K (2021) Fast syndrome-cryptographic hash storage based tanimoto index margin relaxing support vector regressive data auditing with IoT. Webology 18(5). https://Www.Webology.Org/Abstract.Php?Id=1451 10. Al-Garadi MA, Mohamed A, Al-Ali AK, Du X, Ali I, Guizani M (2020) A survey of machine and deep learning methods for internet of things (IoT) security. IEEE Commun Surv Tutorials 22(3):1646–1685. https://ieeexplore.ieee.org/document/9072101 11. Divya V, Leena Sri R (2020) Intelligent real-time multimodal fall detection in fog infrastructure using ensemble learning. In: Challenges and trends in multimodal fall detection for healthcare, pp 53–79 12. Pashaei Barbin J, Yousefi S, Masoumi B (2020) Efficient service recommendation using ensemble learning in the internet of things (IoT). J Ambient Intell Humanized Comput 11(3):1339–1350. https://www.researchgate.net/publication/335924004_Efficient_ service_recommendation_using_ensemble_learning_in_the_internet_of_things_IoT 13. Sivakamasundari S, Dharmarajan K (2020) Adaptive discriminant quadratic boosting classification based radix hash data storage for context aware cloud IoT services. Eur J Mol Clin Med 7(11): 7808–7825. https://Ejmcm.Com/Article_10364.Html 14. Verma A, Ranga V (2020) Machine learning based intrusion detection systems for IoT applications. Wireless Pers Commun 111(4):2287–2310 15. Tuli S, Basumatary N, Gill SS, Kahani M, Arya RC, Wander GS, Buyya R (2020) HealthFog: an ensemble deep learning-based smart healthcare system for automatic diagnosis of heart diseases in integrated IoT and fog computing environments. Future Gener Comput Syst 104:187–200. https://www.sciencedirect.com/science/article/abs/pii/S0167739X19313391 16. Ullah I, Mahmoud QH (2020) A two-level flow-based anomalous activity detection system for IoT networks. Electronics 9(3):530 17. Jurcut A, Niculcea T, Ranaweera P, Le-Khac NA (2020) Security considerations for internet of things: a survey. SN Computer Science 1(4):1–19 18. Susilo B, Sari RF (2020) Intrusion detection in IoT networks using deep learning algorithm. Information 11(5):279 19. Ahmim A, Maglaras L, Ferrag MA, Derdour M, Janicke H (2019) A novel hierarchical intrusion detection system based on decision tree and rules-based models. In: 2019 15th International conference on distributed computing in sensor systems (DCOSS), pp 228–233. IEEE 20. Cihan ATAÇ, Akleylek S (2019) A survey on security threats and solutions in the age of IoT. Avrupa Bilim ve Teknoloji Dergisi 15:36–42 21. Bakhsh ST, Alghamdi S, Alsemmeari RA, Hassan SR (2019) An adaptive intrusion detection and prevention system for Internet of Things. Int J Distrib Sens Netw 15(11):1550147719888109 22. Soe YN, Feng Y, Santosa PI, Hartanto R, Sakurai K (2019) Implementing lightweight IoT-IDS on raspberry pi using correlation-based feature selection and its performance evaluation. In: International conference on advanced information networking and applications, pp 458–469. Springer, Cham 23. Thamilarasu G, Chawla S (2019) Towards deep-learning-driven intrusion detection for the internet of things. Sensors 19(9):1977 24. Sivakamasundari S, Dharmarajan K (2019) Survival study on secured cloud service provisioning techniques with IoT. J Crit Rev 7(01). ISSN: 2394-5125. http://Www.Jcreview.Com/ Index.Php?Iid=2019-7-1.000&&Jid=197&Lng 25. Anthi E, Williams L, Burnap P (2018) Pulse: an adaptive intrusion detection for the internet of things. In: IET, Cybersecurity of the IoT—2018. London 26. Doshi R, Apthorpe N, Feamster N (2018) Machine learning DDoS detection for consumer internet of things devices. In: 2018 IEEE security and privacy workshops (SPW), pp 29–35. IEEE
Certain Investigations on Ensemble Learning and Machine Learning …
657
27. Sharafaldin I, Habibi Lashkari A, Ghorbani AA (2018) A detailed analysis of the CICIDS2017 data set. In: International conference on information systems security and privacy, pp 172–188. Springer, Cham. https://www.semanticscholar.org/paper/A-Detailed-Analysis-of-the-CICIDS 2017-Data-Set-Sharafaldin-Lashkari/cafa09df1905ec46f5a0ab25c2daa77252ed458d 28. Keegan N, Ji SY, Chaudhary A, Concolato C, Yu B, Jeong DH (2016) A survey of cloud-based network intrusion detection analysis. HCIS 6(1):1–16 29. Yassin W, Udzir NI, Muda Z, Abdullah A, Abdullah MT (2012) A cloud-based intrusion detection service framework. In: Proceedings of the 2012 international conference on cyber security, cyber warfare and digital forensic (CyberSec), pp 213–218. IEEE
CAPTCHA-Based Image Steganography to Achieve User Authentication Jayeeta Majumder and Chittaranjan Pradhan
Abstract In recent days, steganography is the major promising data hiding technique. Data can be hidden by different techniques. Similarly, attackers also develop many tools to break the technique and retrieve the data. In recent trends, CAPTCHA is introduced to ensure that the receiver is a human being and not a robot. To authenticate the user and to increase the confidentiality, a randomized CAPTCHA code is created. Steganography is the method and to hide the secret data from an unauthorized access. The steganography method tends to mute the visibility of the secret data into a cover file of different multimedia type. Passing of a secret message over a medium in modern days, the newly invented steganography techniques are used to embed the data into the original cover image. To achieve secret and reliable data communication, the encryption and decryption mechanism is used. A predefined procedure cannot be detectable by the system. After establishing a secure connection between an authenticate sender and receiver, the original secret message is transmitted using image steganography algorithm. Our proposed algorithm gives the better PSNR ratio (~1.04%) compared with the existing method and also minimizes the MSE. Keywords CAPTCHA · Encryption · Decryption · Image steganography · PVD scheme
1 Introduction 1.1 CAPTCHA Codes Completely Automated Public Turning Test to tell Computers and Human Apart (CAPTCHA) has been invented to protect the information from malicious intents. To determine the difference between computer program and human user, one set of protocol is used known as human interaction proof (HIP). The user can easily break the CAPTCHA, but a machine or robot cannot derive it. So, a non-authenticate J. Majumder (B) · C. Pradhan KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_54
659
660
J. Majumder and C. Pradhan
Fig. 1 Pixel value differencing process data hiding
person cannot use it. CAPTCHA is found in many forms like numbers, image and alphabets. To access the system, the human being rewrites the code. In this paper, we used text-based CAPTCHA, which is a combination of capital, small, numeric and alphanumeric character.
1.2 Pixel Value Differencing (PVD) In 2003, a new data hiding method built on pixel value differencing (PVD) method. The cover image is taken in gray colored which is separated into non-overlapping blocks of two consecutive pixels, defined as pi and pi+1 . From each pixel block, we can measure the difference value d i = | pi − pi+1 |, and then di varies from value 0 to value 255. If di value is little enough, then the image block is positioned inside the smooth region and will insert minimum amount of secret data (Fig. 1).
1.3 Objective of the Work The main objective of the work to authorize the user, sender, and receiver then applies the steganography technique. The total work is highlighted on user authentication
CAPTCHA-Based Image Steganography to Achieve User Authentication
661
and then the message privacy. The PVD scheme is applied after getting a satisfactory outcome at the end of this research work.
2 Related Work In modern days, steganography is one of the most significant data hiding techniques. When one can propose any steganographic scheme, the hiding capacity and the undetectability properties are to be considered. One of the most common steganographic schemes is LSB substitution method. Data hiding along with data security is very much useful. To differentiate between human and computers, the automated random quenches of CAPTCHA codes are used in modern days. [1, 2]. The main purpose of using CAPTCHA code is to block the entry of vulnerable and malicious software [3, 4]. For character segmentation, text CAPTCHA should be computationally expensive and hard. Non-character object recognition is less skilled compared to character object recognition. Pixel value differencing method is first introduced by Wu and Tsai, where the edge region of the image is considered to hide more data in comparison with smoother region. The change of edge region cannot be detectable by human eye. The data hiding in the edge region of image is the main working principle of PVD technique. Initially, a 2 × 2 block of image is used for PVD technique, and later the block size is expanded to increase the hiding capacity [5]. In 2 × 2 block size of image, there are total three edges diagonally considered. Balasubramanian et al. [6, 7] describes PVD with 3 × 3 size pixel blocks. Khodaei and Faez [8] proposed LSB substitution and PVD in 1 × 3-pixel block. It is extended to 2 × 2 size block and 2 × 3 size block to achieve the high PSNR. Kalaichelvi et al. [9–11] also elaborate a new steganography method with CAPTCHA with respect to human interception.
3 Proposed Technique Both the LSB and PVD are existing algorithms for steganography, and both the techniques are easier for the attackers to extract the secret data, which is vulnerable. So, the new method is introduced, where the CAPTCHA code is implemented. The randomized CAPTCHA codes and the embedded image are sent to the receiver end by using the LSB and PVD technique to increase the privacy. In Fig. 2, the proposed working principal flowchart is given. Both the sender side and receiver side are discussed here. The flowchart is in two parts. First the user authentication and second part are the image steganography. In the first phase, a random CAPTCHA is generated and then encrypts the CAPTCHA, and this CAPTCHA is hidden into a cover image.
662
J. Majumder and C. Pradhan
Fig. 2 Working flowchart of the proposed technique
4 Implementation 4.1 Rule for creation of CAPTCHA i.
CAPTCHA should be minimum 8 characters and maximum 16 characters. So, the random number between the ranges of 8 and 16. ii. Initialize an empty string st = ””. iii. Produce random number between 97 and 122 (The ASCII character of a–z), i.e., the character stream must build with at least one lower case alphabet. iv. Then, calculate st = st + new character. v. Produce random number between 65 and 90 (The ASCII character of A–Z), i.e., the character stream must build with least one upper case alphabet. vi. Then, calculate st = st + new character. vii. Produce random number between 48 and 57 (The ASCII character of 0–9), i.e., the character stream must build with at least one numeric value. viii. Then, calculate st = st + new character.
CAPTCHA-Based Image Steganography to Achieve User Authentication
663
ix.
Produce random number between 64 and 91 or 33 and 47 or 58 and 96 or 96 and 123 or 126 (The ASCII character of special characters), i.e., the character stream must built with at least one special character. x. Then, calculate st = st + new character. xi. Repeat step iii to step x to generate the character and keep merging the character to create the code. xii. The sequence of code must terminate with a dot (.), i.e., to the end of character stream.
4.2 Algorithm for encryption i. ii. iii. iv.
v. vi. vii.
The random CAPTCHA is finished with a dot (.). For each character of CAPTCHA, calculate the corresponding ASCCII character. Convert its equivalent binary stream. Measure the number of ‘1’ and number of ‘0’in the binary value if number of ‘1’ is odd number, then set the flag value ‘1’ Otherwise, if number of ‘0’ is odd number set the flag value ‘0’ If both ‘1’ and ‘0’ are even number, set the flag value ‘0’ Count the length of the binary bit stream. If number of ‘0’ is equal to the number of ‘1’ and number of ‘1’ is equal to zero and number of ‘0’ is equal to zero, then ignore the bit positions. If flag value is equal to zero, then point out the bit location of 0 in the binary bit stream else point out the bit location of 1 in the binary bit stream
4.3 Decryption Algorithm i. ii. iii. iv. v. vi.
Determine the flag value from the sequence. Depending on the flag values, determine the number of ‘0’ and number of ‘1’. From the next bit of the binary stream, determine the length of the binary stream. If flag value ‘1’, then set the bit position with ‘1’ in the bit stream Convert the binary stream with the equivalent ASCII value. Convert the ASCII to its equivalent CHAR
4.4 CAPTCHA Implementation and Encryption, Decryption Implementation Let us consider the CAPTCHA generated as h^? U6_ > *. It is an 8-bit random character stream terminate with a (.).
664
J. Majumder and C. Pradhan
Encryption Phase i. ii. iii. iv. v. vi.
Chose the first character from the CAPTCHA, i.e., ‘h’. Equivalent ASCII of ‘h’ is 104. Binary equivalent of 104 is 1101000. Here, number of ‘1’ is odd number so; set the flag value ‘1’. The length of the bit stream is 7. Here, flag value ‘1’ means indicate the bit position of ‘1’ so, here ‘1’ is at the third, fifth and sixth position from the right of the bit stream. vii. Now, the encrypted value for the data ‘h’ is 17356. viii. Similarly, calculate all the character of the CAPTCHA to its equivalent encrypted code. The final encrypted version of the CAPTCHA is. h^? U6_ > * is 17356 1312346 076 07135 07036 075 1712,345 17135. Decryption Phase The obtained encrypted CAPTCHA is 17356 1312346 076 07135 07036 075 1712345 17135. Choose any one encrypted value. Suppose here we choose 17135. i. ii. iii. iv.
Here, flag value is ‘1’. And the length of the bit stream is 7. Next 3 bit indicates the bit position of the binary sequence. So, first, third and fifth are the bit position of ‘1’ of the binary stream rest of the position filled with ‘0’. v. So, the binary sequence is obtained as 0101010. vi. The equivalent ASCII is 42. vii. The equivalent character of 42 is ‘*’. So, after the decryption of 17356 1312346 076 07135 07036 075 1712345 17135 we get the CAPTCHA h^? U6_ > *.
5 Results Analysis The most common parameters to compare the stego image and the cover image are peak signal–noise ratio (PSNR) and mean squared error (MSE) (Figs. 3 and 4).
CAPTCHA-Based Image Steganography to Achieve User Authentication
665
Fig. 3 Random CAPTCHA generation
Lena (Original)
Boat (Original)
Moon (Original)
Lena (Stego)
Boat (Stego)
Moon (Stego)
Baboon(Original)
Baboon ( Stego)
Cameraman(Original) Cameraman (Stego)
Flower (Original)
Flower (Stego)
Fig. 4 Images taken for experiment and corresponding stego images
Comparison of PSNR and MSE difference between the existing method and the proposed method is displayed in Table 1. In comparison with the two images with respect to human eye visibility, greater PSNR values represent the data invisibility.
666
J. Majumder and C. Pradhan
Table 1 Comparison of PSNR and MSE difference between existing method and proposed method Parameter
Existing method
Image (size 512 × 512)
PSNR (dB)
MSE
Proposed method PSNR (dB)
MSE
Lena
41.24
4.28
41.28
3.54
Baboon
37.54
4.14
37.86
3.80
Boat
36.33
4.74
37.48
4.16
Cameraman
38.45
4.22
39.24
4.28
Moon
39.22
4.36
40.14
4.16
Flower
41.56
4.46
43.24
4.08
Standard deviation: It provides a measure of the dispersion of image gray-level intensities. Comparison of SD and variance difference between existing method and new proposed method is displayed in Table 2. Table 3 shows the comparison of NCC, AD and MD difference between existing and proposed methods. Table 2 Comparison of SD and variance difference between existing method and proposed method Parameter
Existing method
Proposed method
Image (size 512 × 512)
SD
Variance
SD
Variance
Lena
2.2
3.84
2.12
3.6472
Baboon
2.4
3.66
2.02
3.2569
Boat
2.3
3.78
2.16
3.6424
Cameraman
2.6
4.26
2.58
3.9872
Moon
2.8
4.18
2.46
3.0648
Flower
2.1
4.42
1.989
4.1436
Table 3 Comparison of NCC, AD and MD between existing method and proposed method Parameter
Existing method
Image (size 512 × 512)
NCC
AD
MD
Proposed method NCC
AD
MD
Lena
0.9986
0.014
3.14
0.9992
0.012
2.98
Baboon
0.9945
0.016
3.19
0.9954
0.012
3.01
Boat
0.9978
0.018
3.27
0.9984
0.011
3.21
Cameraman
0.9964
0.021
3.82
0.9988
0.016
3.42
Moon
0.9998
0.019
3.74
0.99992
0.018
3.12
Flower
0.9974
0.018
3.46
0.9982
0.011
3.22
CAPTCHA-Based Image Steganography to Achieve User Authentication
Histogram Analysis Image Lena
Baboon
Boat
Cameraman
Moon
Flower
Cover image histogram
Stego image histogram
667
668
J. Majumder and C. Pradhan
6 Conclusions Here, we express a new method of generation of CAPTCHA and then encrypt it using key and send to the receiver. The user authentication is verified here. The third party cannot have any clue how to fetch the CAPTCHA codes. After embedding the cover image, convert it to stego image. Hackers do not understand to retrieve the CAPTCHA code from the stego image. The algorithm to extract the CAPTCHA code does not require any encryption key or decryption key. So, without help of any shared key, the proposed algorithm generates the encrypted CAPTCHA and then embeds to the cover image. Also, after extraction decrypt the CAPTCHA. In future, we can use new key generation technique for both the keys along with this algorithm to afford more strength in the field of image steganography.
References 1. Arya A, Soni S (2018) A literature review on various recent steganography techniques. Int J Future Revolution Comput Sci Commun Eng 4(1) 2. Swain G (2016) A steganographic method combining LSB substitution and PVD in a block. In: Proceedings of the international conference on computational modelling and security, CMS 2016. India, pp 39–44 3. Pradhan A, Sekhar KR, Swain G (2016) Digital image steganography combining LSB substitution with five way PVD in 2×3-pixel blocks. Int J Pharm Technol 8(4):22051–22061 4. Kanimozhi R, Jagadeesan D (2014) Authenticating a web page using CAPTCHA image. Int J Adv Res Comput Sci 5(7) 5. Lee Y-P, Lee J-C, Chen W-K, Chang K-C, Su I-J, Chang C-P (2012) High-payload image hiding with quality recovery using tri-way pixel-value differencing. Inf Sci 191:214–225. https://doi. org/10.1016/j.ins.2012.01.002 6. Chelliah B, Subramanian S, Subbiah G (2013) High payload image steganography with reduced distortion using octonary pixel pairing scheme. Multimedia Tools Appl 73:2223–2245. https:// doi.org/10.1007/s11042-013-1640-4 7. Kamdar NP, Kamdar DG, Khandhar DN (2013) Performance evaluation of LSB based steganography for optimization of PSNR and MSE. J Inf Knowl Res Electron Commun Eng 8. Khodaei M, Faez K (2012) New adaptive steganographic method using least-signifcant-bit substitution and pixel-value diferencing. IET Image Proc 6(6):677–686 9. Ardhita NB, Maulidevi NU (2020) Robust adversarial example as captcha generator. In: 2020 7th International conference on advance informatics: concepts, theory and applications (ICAICTA), pp 1–4. https://doi.org/10.1109/ICAICTA49861.2020.9429048 10. Kalaichelvi T, Apuroop P (2020) Image steganography method to achieve confidentiality using CAPTCHA for authentication. In: 2020 5th International conference on communication and electronics systems (ICCES), pp 495–499. https://doi.org/10.1109/ICCES48766.2020.913 8073 11. Ezhilarasi S, Maheswari PU (2020) Image recognition and annotation based decision making of CAPTCHAs for human interpretation. In: 2020 International conference on innovative trends in information technology (ICITIIT), pp 1–6. https://doi.org/10.1109/ICITIIT49094.2020.907 1558
A Systematic Review on Deepfake Technology Ihtiram Raza Khan, Saman Aisha, Deepak Kumar, and Tabish Mufti
Abstract The twenty-first century has seen technology become an integral part of human survival. Living standards have been affected as the technology continues to advance. Deep fake is a deep learning-powered application that recently appeared on the market. It allows to make fake images and videos that people cannot discern from the real ones and is a recent technique that allows swapping two identities within a single video. A wide range of factors, including communities, organizations, security, religions, democratic processes as well as personal lives, are being impacted by deep fakes. In the case of images and videos being presented to an organization as evidence, if the images or videos are altered, the entire truth will be transformed into a lie. It is inevitable that with every new and beneficial technology, some of its adverse effects will follow, causing world problems. There have been several instances in which deep fakes have alarmed the entire network. During the last few years, the number of altered images and videos has grown exponentially, posing a threat to society. Additionally, this paper explores where deep fakes are used, the impacts made by them, as well as the challenges and difficulties associated with deep fakes in this rapidly developing society. Keywords Deepfake · Deepfake videos · Artificial intelligence
I. R. Khan · S. Aisha (B) · T. Mufti Department of Computer Science and Engineering, School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India e-mail: [email protected] T. Mufti e-mail: [email protected] D. Kumar Amity Institute of Information Technology, Amity University, Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_55
669
670
I. R. Khan et al.
1 Introduction Individuals’ lives have changed due to advancements in technology. As, today’s society firmly credence in the digital world, especially the younger generation, through visual media. With the help of tech, the images, videos, and audios are easily altered without any noticeable error. In particular, deep fakes can be explained as the videos that are edited to replace the person in the original video with someone else, making it look like the original [1]. Deepfake is a term that combines “deep learning” and “fake” and has become increasingly popular with the rise of artificial intelligence and known as synthetic media, which states an image, sound, or video that appears to be produced through traditional methods but is actually composed by a sophisticated tool. Deepfake was originally conceived by Reddit user of the same name in late 2017 [2]. Researchers from the University of Washington used deepfakes to circulate fake photos of President Barack Obama on the Internet, and the president was forced to explain what their purpose was [3] as shown in Fig. 1. The proliferation of deep fakes over the past few decades has been increasing despite the decline in trust in photography since the development of image-editing technology [4]. Although deepfakes found today are mostly derived from this original code, each one is an entertaining thought experiment but none of them is reliable. In this article, the researcher has focused: • The overview to deep fakes and how deep fakes employ artificial intelligence to replace the likeness of one person with another in video and other media.
Fig. 1 Example of deepfake of Ex-President of America Barack Obama. Source YouTube
A Systematic Review on Deepfake Technology
671
• What they are and how they work? • Social implication and its impact on the society. • Challenges and difficulties they face—concerns have been raised.
2 Background Study Deepfakes: What are they and how do they work? Deep fake is the most widely used and most efficient face swapping technique used to alter videos. To create deepfake software, there are different methods, but generally, these machine learning algorithms are designed to generate content based on personal data input. For instance, if a program is asked to create a new face or replace a part of a person’s face, the algorithm must be trained first. Or in simple words, we can say that when a person commits an action that they did not commit themselves, the person is imitated by a program that mimics their actions, speech, and emotions. Almost all the programs are fed enormous amounts of data, which they use to create their own, new data. They are primarily autoencoders and sometimes Generative Adversarial Networks (GANs) [5].
2.1 Architecture One of the many applications of deep learning is to the reduction dimensionality and high-dimensional data, and this capability has been widely applied to image compression and deep autoencoding. An image is deeply broken using a fake technique that produces a new image based on the characteristics of a previous image [6]. Using this technique, they can learn the latent features based on the cycle loss function. To put it another way, the model does not require that the source and target images be related to learning their features (see Fig. 2). 1. As a neural network, unsupervised learning technique trains the network to ignore signal noise to learn efficient data representations (encoding). Denoising images, compressing images, and generating images are just some of its applications [7]. An autoencoder model consists of two neural networks: an encoder and a decoder. The output created by the autoencoder consists of three phases: encoder, latent space, and a decoder. In autoencoding, data is compressed in a manner similar to the training data but not exactly the same as the input [8]. The encoded image is used as input to the latent space to learn new patterns and patterns between the data points. The number of nodes per layer decreases as the layers get deeper, and there is no limit to how deep an autoencoder architecture can be [9]. In conclusion, the decoder reconstructs the image based on its representation in latent space; see in Fig. 3. It recreates the image as closely as possible to the original [10].
672
I. R. Khan et al.
Fig. 2 Architecture model of deep fake
Fig. 3 Architecture of autoencoder
2. Generative Adversarial Networks were introduced by Ian Goodfellow in 2014 as a relatively new module for generating fake images [11]. With GANs, the fake image creation process is automated, and the results are better than by manual methods [12]. The conventional GAN model consists of two neural networks: a generator and a discriminator; see in Fig. 4. The generator produces realistic images, while the discriminator uses convolutional layers to determine fake images from real ones [13]. In order to train the generator and discriminator, the min–max method is used. A min (0) represents a false output, while a max (1) represents the correct output. GANs are more suitable for generating new data in images and videos as the discriminator must get as close to the maximum value as possible in order to generate a realistic-looking deepfake [14].
A Systematic Review on Deepfake Technology
673
Fig. 4 Architecture of GANs
1. The generative model encapsulates the distribution of the training set. 2. In discriminative models, the sampling probability is estimated by comparing it to the generative model above and not the training data. Generic implies that the model uses supervised learning as an approach and is a generative model. We use neural networks to implement artificial intelligence algorithms as adversarial models, and these networks are models trained against an adversarial set. A random sample is sent to a generator network where it is generated. After that, the sample is sent to the network in question to determine whether it is real or not [14]. GANs are specialized tools that can be used for various tasks: • The creation of novel data samples such as images of imaginary people, animals, objects, etc. This method can be used to produce not only images, but audio and text as well. • The inpainting process helps restore the missing parts of images. Upscaling involves converting low-resolution images to high resolution without introducing visible artifacts. • Adapting data from one domain to another while retaining its original content (e.g., making a normal photo appear like an oil painting) [15]. 2.1.1
Do GANs Outperform Autoencoder?
• Autoencoding minimizes the loss of image reproduction and allows the autoencoder to treat this as a semi-supervised learning problem. • GAN addresses the problem of supervised learning, but it can also be used for unsupervised learning. In this study, the main difference in training time between the two methods was observed. • GAN is a relatively fast training, this is not always the case.
674
I. R. Khan et al.
In summary, both methods produce sufficiently realistic images for less diverse datasets. We can get results faster with an autoencoder, but the image quality is lower than with the GAN.
3 Related Works 1. The researcher proposes in paper [16] a method to detect deep fake images and videos, as well as explain the advantages of GANs over autoencoders in the detection of deep fakes. In reviewing several papers, the authors found that the SSTNet model achieved the highest accuracy level of approximately 90–95%. 2. For the purpose of detecting fake images, [17] the author used 600,000 real images and 10,000 fake images during training. The proposed iterative convolution (RCN) model, which covers 1,000 videos and includes GAN, results using Face Forensics + + datasets. The results indicate that the proposed method can detect fake videos with promising performance, which can be enhanced by incorporating dynamic blink patterns. 3. The paper [18] discusses how deepfake videos can be detected and how they are used in digital media forensics. The methods used in the first category are based on what they mean and whether deepfake videos are physically/physiologically exhibited. It explains the curb of deep fakes and forecasts that future breakthroughs even more realistic and effective. 4. The author presented in this work [19] a deep learning-based method for identifying fake videos generated by AI from real videos based on an immense number of factual and deepfake images. CNN models were used to train the CNN classifier based on a profusion of factual and deepfake depiction. Additionally, it explains that it can effectively capture deepfake videos using two sets of deepfake video datasets, so that they can be effectively captured by using deep neural networks. In Table 1, additional related works are listed.
4 Social Implications With the evolution of technology, and as societies deal with deepfakes, most of the use cases will revolve around empowering communities and institutions.
4.1 Accessibility It is possible to build tools that can to see, hear, and, soon, reason at a higher level of accuracy thanks to artificial intelligence. AI-generated synthetic media can also augment the power of human agency [36]. In addition, AI tools can help make
A Systematic Review on Deepfake Technology
675
Table 1 Literature survey S. No.
Year of publication
Author’s name
No. of citations
Key findings
1.
2020
Somers [20]
11
Using deep learning and error-level analysis (ELA) to identify counterfeit features and developed to separate deepfake-generated images from real ones
2.
2019
Mach [21]
32
The article shuns the inescapability arguments in support of a critical approach to deepfakes to visual information literacy and that womanism approaches to artificial intelligence
3.
2019
Albahar and Almalki [22]
45
The current paper discusses some deep problems and how they may also be neutralized by addressing them with the help of deep fakes. By doing so, innovative solutions to other sticky social and political problems can be unlocked
4.
2019
Kietzmann et al. [23] 249
Analyzing, based on 84 online news articles, this work inspects what deepfakes are, who makes them, where they come from, what their advantages and disadvantages are, what examples exist, and how to combat them
5.
2021
Li and Lyu [24]
Deep fakes raise ethical concerns about blackmail, intimidation, and sabotage that can result in ideological incitement, as well as broader implications for trust and accountability
9
(continued)
676
I. R. Khan et al.
Table 1 (continued) S. No.
Year of publication
Author’s name
No. of citations
Key findings
6.
2021
Bian et al. [25]
148
Provide a detailed look at how deepfakes work and how to detect them methods for creating and detecting deepfakes
7.
2022
Botha and Pieterse [26]
0
Examines the governing, communal, and technical issues surrounding deepfakes
8.
2020
Hu et al. [27]
1
End-to-end deepfake video detection program can train, interpret frame sequences with LSTM structure
9.
2022
Sami et al. [28]
1
An analysis of recent deep learning-based studies for detecting fake content and exploring the various categories of fake content detection
10.
2020
Buzz [29]
14
Emphasis on generating and in-depth fake news and counterfeiting detection
11.
2014
Kumar et al. [30]
26
Compared the resulting video sequences across four different sizes and two typical compression schemes (i.e., MPEG-2 and H.264/AVC) with older ones which are effective
12.
2019
Katarya and Lal [31]
468
Convolutional neural networks (CNNs) are effectively able to recognize the original face in the video, but deepfake can only produce images of limited resolution (continued)
A Systematic Review on Deepfake Technology
677
Table 1 (continued) S. No.
Year of publication
Author’s name
No. of citations
Key findings
13.
2019
Negi et al. [32]
134
As a strategic approach to risk management in depth, R.E.A.L. emphasizes documenting original content, exposing deep-seated mistakes, and protecting the rights of the law
14.
2020
Jayathilaka [33]
56
Video synthesized with realistic samples produced in one tool can resemble real videos more realistically
15.
2021
Jaiman [34]
19
With TensorFlow or Keras, open-source trained models, affordable computing infrastructure, and the rapid evolution of deep learning (DL) methods, especially Generative Adversarial Networks (GAN)
16.
2021
Yang Hui [35]
1
A wide range of high-quality deepfake videos, our method has proven to be highly efficient in detecting in-dataset pattern matches, detecting within-dataset pattern matches, and detecting cross-dataset pattern matches
solutions more accessible by making them smarter, more affordable, and more customizable.
4.2 Education Teaching with deepfakes can extend beyond visuals and media formats to create engaging lessons. In the classroom, artificial intelligence generated virtual media
678
I. R. Khan et al.
can inspire students by bringing historical figures to life and giving them a more interactive way to learn [37]. Participants may become more engaged, and learning may be enhanced.
4.3 Art In addition to democratizing expensive VFX, artificial intelligence generated synthesized media have great potential for bringing the primary tenants of comedy and parody to life realistically. These are reflections, stretches, distortions, and diversions of real-life experience. The entertainment industry may benefit from these skills. In addition, we see many independent YouTubers and YouTubers taking advantage of this opportunity. Video games can benefit from AI-generated graphics and images, and audio storytelling and book commentary can be enhanced with synthetic audio [38].
4.4 Autonomy and Expression People who support human rights or report on atrocities on social or traditional media can gain a great deal of power by using synthetic media. Deepfake can be used to mask people’s identities to protect their privacy in dictatorial and oppressive regimes [39]. Deepfakes create avatar experiences on the web for individuals to express their views, ideas, and beliefs. Synthetic avatars representing people with disabilities provide a way to represent yourself online.
4.5 Message and Its Reach Enhancement Podcaster voice fonts can speed up the text-to-speech model and reduce podcaster errors. Deepfake not only helps influencers reach and expand their audience, but also helps brands reach their target markets with targeted, personalized messages. Deepfake and digital versions created by AI are becoming a new trend in the fashion and brand marketing industry [40]. This will engage and amplify their audience, engage their fans further, and deliver customized experiences to them.
A Systematic Review on Deepfake Technology
679
4.6 Digital Reconstruction and Public Security The reconstruction of the crime scene is both science and art. This requires both inductive and deductive reasoning and evidence. Synthetic media derived from artificial intelligence will help reconstruct the crime scene. The team also used autopsy reports and surveillance videos to create a virtual crime scene.
4.7 Innovation Deepfake is also attracting attention as an opportunity for customer engagement and value creation, especially in an industry where data and AI are driving the digital transformation and automation. Deepfakes help fashion retail shoppers to turn into fashion retail models by allowing them to actually try on clothes and accessories [41]. AI uses to enhance and create apps that simulate the latest trends and generate deepfake based on the customer’s face, body, and virtual rehearsal room.
5 Deepfake Impact on Individuals, Organizations, and Governments This article illustrates how deepfakes are used to deceive and exploit individuals through examples of deepfakes, which present a dark picture of society. It seems that technological progress brings forth both good and bad in people, pushing us from side to side at the same time [42]. As a result, deepfake technology has the potential to lead to both short-term and long-term consequences, and below you can determine how this unethical practice can impact our society and those living there and being threatened by it. However, by using a proper deepfake algorithm, such videos or images can be detected, minimizing the impact on the public [43]. A fake video or image circulating on social media or other online channels can affect the voting patterns during a state or central government election. In addition, the economic harm posed to individuals and companies is no less noteworthy, including false advertisements, fraud, in addition, clients experience, creative control loss, extortion, persecution, and a reputational hit. As well as posing systemic risks to social and political institutions, deepfakes can manipulate civil discourse, sabotage elections, and erode the trust in government in general [44].
680
I. R. Khan et al.
6 Deepfake Challenges Increasingly, deep fakes are being distributed, but there is no standard to evaluate deep fake detection. The number of deep fake videos and images online has nearly doubled since 2018 [45]. A deepfake can be used to hurt individuals and societies as well. Special techniques are also used for faking terrorism events, blackmail, defaming individuals, and creating political distress. 1. Deep fakes and similar technologies pose grave challenges for international diplomacy. First, the risks posed by emerging technologies like deep fakes elevate the importance of diplomacy. 2. Secondly, deep fakes might be used so frequently that individuals and governments eventually grow weary of being bombarded with manipulated videos and images. If one of these fake videos is actually authentic and the authorities fail to respond quickly, it becomes a problem. 3. As a third issue, analysts who analyze data and identify trends will be affected because they now have to put in so much more time and effort to even verify that something is true, leaving them with fewer resources to actually analyze.
6.1 Genesis Deepfakes: Challenges Deepfakes have made several efforts to increase their visual quality, but there are still numerous challenges that remain. This section discusses a few of those challenges.
6.1.1
Abstraction
Often, it is impossible to convincingly generate deepfakes for a specific victim because the data used to train the model is often insufficient. Programs that create deepfakes use generic models based on the data they use for training [46]. The distribution of driving content data is readily available, but finding sufficient data for a specific victim can prove difficult, requiring a generalized model to account for multiple targets that were not seen during training.
6.1.2
Identifier Condensation
In source identity-based reconstruction tasks, preserving the target identity can be difficult when the targets are clearly out of sync, especially if the matching is based on the same identity and training is based on multiple identities [47].
A Systematic Review on Deepfake Technology
6.1.3
681
Training Pairs
Developing a high-quality output from a trained, supervised model requires pairing of data, which is a process of identifying similar input examples to yield similar outputs.
6.1.4
Occlusion
Occlusion, occurring when facial features of the source and victim are obscured by objects such as hands, hair, glasses, or any other item, is a major challenge in deepfake generation. Furthermore, the image can be distorted because of the hidden object or as a result of the occlusion of the face and eye area.
6.1.5
Artificially Realistic Audio
The quality of the service needs to be improved, despite improvements. False emotions, pauses, breathiness, and lack of natural voice in the target make deep fakes challenging to pull off [48].
6.2 Detecting Deepfakes: Challenges Despite impressive advances in deepfake detection techniques, this chapter describes some of the challenges of deepfake detection approaches. Deepfake detection challenges are discussed in base in this section.
6.2.1
Dataset Standard
A major factor used in the origination of deepfake spotting techniques is the availability of large databases of deepfakes. In these databases, different artifacts can be visualized, such as temporal flickering during some speech, blurriness around facial regions, over-smoothed texture/lack of detail in the facial texture, lack of head rotation, or absence of face-obscuring objects. Additionally, low-quality-manipulated content is barely persuasive or creates a real impact [48].
6.2.2
Analyzing Performance
Today’s deepfake detection methods are described as binary classification problems, where each sample consists of either true or false possibilities. However, we can edit videos in other ways with deepfakes, which makes it impossible to be 100% sure that
682
I. R. Khan et al.
the content that is not detected as manipulation is genuine. Also, deepfake content can be altered for a variety of reasons, such as audio/video so no single tag may not always be accurate.
6.2.3
Unfairness and Distrust
Existing deepfakes datasets have been biased, containing imbalanced data across races and genders, as well as the detection techniques used are biased. As a result, very little work has been done to fill this gap yet. Therefore, it is urgent for researchers to develop methods that improve the data and make the detection algorithms fairer.
6.2.4
Fraudulent Use of Social Media
As the main online platforms for spreading audio-visual content, social networks such as Twitter, Facebook, and Instagram are used to disseminate audio-visual content on a global scale. To save on bandwidth or to protect users’ privacy, these types of manipulation are common, known as social media laundering. This obfuscation removes clues about underlying frauds and eventually leads to false positive detections.
7 Solution(s) to Deepfakes? 1. Discovering deep fakes is difficult. Deepfakes that are simple can of course be seen by the naked eye if they are bad. Some detection tools can even spot the faulty characteristics we discussed previously. However, artificial intelligence is improving continuously, and soon we will have to rely on deepfake detectors to detect them. Since deepfakes can spoof common movements like blinks and nods, we must make sure that companies and providers are using facial authentication software that offers certified liveness detection. In addition, authentication processes will need to adapt to guide users through a less predictable range of live actions. 2. Active methods can be used with passive methods for high-risk/high-value transactions, such as money transfers or changes to account information. Active methods include two-factor authentication using a one-time token or verification through an alternate channel like SMS. 3. It is also possible for account holders’ identities to need to be reconfirmed at various points after their accounts have been established. For instance, some companies request identity verification every time a dormant account is activated for high-value transactions, or whenever passive analytics indicate hyper-fraud potential.
A Systematic Review on Deepfake Technology
683
8 Conclusion and Future Scope • Although sticks-and-stones might seem like a harmless aphorism, lies have always been capable of causing significant harm to individuals, organizations, and society as a whole. The new developments in deep fake are improving the quality of fake videos, and people are becoming distrustful of the online media content as a result. Several techniques have been developed to manipulate images and videos due to the development of artificial intelligence. Although several legal or technological solutions could mitigate the threat, none will remove it. In this paper, we have focused on the real challenges for generating deep fakes, detection methods, and solution to the deep fakes. Introducing robust, scalable, and generalizable methods needs to be the focus of future research. • There may be a proposed approach: First, we will use a method that focuses primarily on the altered faces in the videos to test the videos. As soon as this video is processed by this classifier, it will be passed on to another classifier, which will test its audio changes. Based on the results from both classifiers, we will be able to determine whether it is fake or not. Although we just examined the article briefly and tried presenting the important points, an extensive study would have provided a more comprehensive analysis. This technology has been subjected to extensive analysis, and deep insights about it will help provide more opportunities for research in this area.
References 1. Zhang W, Zhao C, Li Y (2020) A novel counterfeit feature extraction technique for exposing face-swap images based on deep learning and error level analysis. Entropy 22:249. https://doi. org/10.3390/e22020249 2. Passos LA, Jodas D, da Costa KAP, Júnior LAS, Colombo D, Papa JP (2022) A review of deep learning-based approaches for deepfake content detection. arXiv. https://doi.org/10. 48550/arXiv.2202.06095 3. Gupta B, Mittal P, Mufti T (2021) A review on Amazon web service (AWS), Microsoft azure and Google cloud platform (GCP) services. Presented at the ICIDSSD 2020. Jamia Hamdard, New Delhi. https://doi.org/10.4108/eai.27-2-2020.2303255 4. Maurya S, Mufti T, Kumar D, Mittal P, Gupta R (2020) A study on cloud computing: a review. Presented at the ICIDSSD 2020. Jamia Hamdard, New Delhi. https://doi.org/10.4108/eai.272-2020.2303253 5. Dertat A (2022) Applied deep learning—part 3: autoencoders. Medium, 08 Oct 2017, https:// towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798. Accessed 05 May 2022 6. Mufti T, Kumar D Big data: technological advancement in the field of data organization, p 4 7. Mufti T, Saleem N, Sohail S (2020) Blockchain: a detailed survey to explore innovative implementation of disruptive technology. EAI Endorsed Trans Smart Cities 4(10). Art. No. 10. https:// doi.org/10.4108/eai.13-7-2018.164858 8. Sharma J, Sharma S (2021) Challenges and solutions in deepfakes. arXiv:2109.05397. Retrieved from, http://arxiv.org/abs/2109.05397. Accessed: 02 May 2022
684
I. R. Khan et al.
9. Mufti T, Gupta B, Sohail SS, Kumar D (2021) Contact tracing: a cloud based architecture for safe covid-19 mapping. In: 2021 International Conference on Computational Performance Evaluation (ComPE), pp 874–877. https://doi.org/10.1109/ComPE53109.2021.9752314 10. Dey P (2021) Deep fakes one man’s tool is another man’s weapon. Int J Sci Res Manage 05:7 11. Citron D, Chesney R (2019) Deep fakes: a looming challenge for privacy, democracy, and national security. Calif Law Rev 107(6):1753 12. Mahmud BU, Sharmin A (2021) deep insights of deepfake technology : a review, p 12 13. Guei AC, Akhloufi M (2018) Deep learning enhancement of infrared face images using generative adversarial networks. Appl Opt 57(18):D98–D107. https://doi.org/10.1364/AO.57. 000D98 14. Nguyen T, Nguyen CM, Nguyen T, Duc T, Nahavandi S (2019) Deep learning for deepfakes creation and detection: a survey 15. Lyu S (2020) DeepFake detection: current challenges and next steps,” arXiv:2003.09234. Retrieved from http://arxiv.org/abs/2003.09234. Accessed 02 May 2022 16. Vaccari C, Chadwick A (2020) Deepfakes and disinformation: exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Soc Media + Soc 6:205630512090340. https://doi.org/10.1177/2056305120903408 17. Gamage D, Chen J, Ghasiya P, Sasahara K (2022) Deepfakes and society: what lies ahead? 18. Almars A (2021) Deepfakes detection techniques using deep learning: a survey. J Comput Commun 09:20–35. https://doi.org/10.4236/jcc.2021.95003 19. Masood M, Nawaz M, Malik K, Javed A, Irtaza A (2021) Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward 20. Somers M (2020) Deepfakes, explained | MIT Sloan. https://mitsloan.mit.edu/ideas-made-tomatter/deepfakes-explained. Accessed 08 May 2022 21. Mach J (2019) Deepfakes: the ugly, and the good. Medium, 02 Dec 2019. https://towardsdatas cience.com/deepfakes-the-ugly-and-the-good-49115643d8dd. Accessed 05 May 2022 22. Albahar M, Almalki J (2005) Deepfakes: threats and countermeasures systematic review, no. 22, p 9 23. Kietzmann J, Lee L, McCarthy I, Kietzmann T (2019) Deepfakes: trick or treat? Bus Horiz 63. https://doi.org/10.1016/j.bushor.2019.11.006 24. Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. arXiv:1811.00656. Retrieved from http://arxiv.org/abs/1811.00656. Accessed 02 May 2022 25. Bian S, Luo W, Huang J (2014) Exposing fake bit rate videos and estimating original bit rates. IEEE Trans Circuits Syst Video Technol 24(12):2144–2154. https://doi.org/10.1109/TCSVT. 2014.2334031 26. Botha J, Pieterse H (2020) Fake news and deepfakes: a dangerous threat for 21st century information security 27. Hu J, Liao X, Liang J, Zhou W, Qin Z (2022) FInfer: frame inference-based deepfake detection for high-visual-quality videos, p 9 28. Sami N, Mufti T, Sohail SS, Siddiqui J, Kumar D, Neha (2020) Future internet of things (IOT) from cloud perspective: aspects, applications and challenges. In: Alam M, Shakil KA, Khan S (eds) Internet of things (IoT): concepts and applications. Springer International Publishing, Cham, pp 515–532. https://doi.org/10.1007/978-3-030-37468-6_27 29. How deepfake technology impact the people in our society? | by Buzz Blog Box | Becoming human: artificial intelligence magazine. https://becominghuman.ai/how-deepfake-technologyimpact-the-people-in-our-society-e071df4ffc5c. Accessed 08 May 2022 30. Kumar D, Mufti T, Shahabsaquibsohail (2021) Impact of coronavirus on global cloud based wearable tracking devices. In: 2021 9th International conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 1–5. https://doi.org/10. 1109/ICRITO51393.2021.9596163 31. Katarya R, Lal A (2020) A study on combating emerging threat of deepfake weaponization. https://www.researchgate.net/publication/344990492_A_Study_on_C ombating_Emerging_Threat_of_Deepfake_Weaponization. Accessed 02 May 2022
A Systematic Review on Deepfake Technology
685
32. Negi S, Jayachandran M, Upadhyay S (2021) Deep fake : an understanding of fake images and videos. https://www.researchgate.net/publication/351783734_Deep_fake_An_Understan ding_of_Fake_Images_and_Videos. Accessed 02 May 2022 33. Jayathilaka C (2021) Deep fake technology raise of a technology that affects the faith among people in the society—a literature review. https://www.researchgate.net/publication/351082 194_Deep_Fake_Technology_Raise_of_a_technology_that_affects_the_faith_among_peo ple_in_the_society_-A_Literature_Review. Accessed 02 May 2022 34. Jaiman A (2022) Positive use cases of deepfakes, Medium. https://towardsdatascience.com/ positive-use-cases-of-deepfakes-49f510056387. Accessed 30 May 2022 35. Yang Hui J (2020) Preparing to counter the challenges of deepfakes in Indonesia 36. Chowdhury SMAK, Lubna JI (2020) Review on deep fake: a looming technological threat. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), pp 1–7. https://doi.org/10.1109/ICCCNT49239.2020.9225630 37. Veerasamy N, Pieterse H (2022) Rising above misinformation and deepfakes 38. Shahab S, Agarwal P, Mufti T, Obaid AJ (2022) SIoT (social internet of things): a review. In: ICT analysis and applications. Singapore, pp 289–297. https://doi.org/10.1007/978-981-165655-2_28 39. Mirsky Y, Lee W (2021) The creation and detection of deepfakes: a survey. ACM Comput Surv 54:1–41. https://doi.org/10.1145/3425780 40. Ruiter A (2021) The distinct wrong of deepfakes. Philos Technol 34. https://doi.org/10.1007/ s13347-021-00459-2 41. Westerlund M (2019) The emergence of deepfake technology: a review. TIM Rev 9(11):39–52. https://doi.org/10.22215/timreview/1282 42. Shakil et al (2021) The impact of randomized algorithm over recommender system. Procedia Comput Sci 194:218–223. https://doi.org/10.1016/j.procs.2021.10.076 43. The legal implications and challenges of deepfakes. https://www.dacbeachcroft.com/en/gb/art icles/2020/september/the-legal-implications-and-challenges-of-deepfakes/. Accessed 08 May 2022 44. Silbey JM, Hartzog W (2019) The upside of deep fakes. Social Science Research Network, Rochester, SSRN Scholarly Paper 3452633. Retrieved from: https://papers.ssrn.com/abstract= 3452633. Accessed 02 May 2022 45. Wagner T, Blewer A (2019) ‘The word real is no longer real’: deepfakes, gender, and the challenges of AI-altered video. Open Inf Sci 3:32–46. https://doi.org/10.1515/opis-2019-0003 46. Adee S (2020) What Are deepfakes and how are they created? IEEE Spectr, 29 Apr 2020. https://spectrum.ieee.org/what-is-deepfake. Accessed 05 May 2022 47. Johnson D (2021) What is a deepfake? Everything you need to know about the AI-powered fake media | Business Insider India. https://www.businessinsider.in/tech/how-to/what-is-a-dee pfake-everything-you-need-to-know-about-the-ai-powered-fake-media/articleshow/804111 44.cms. Accessed 08 May 2022 48. Words we’re watching:‘deepfake.’ https://www.merriam-webster.com/words-at-play/dee pfake-slang-definition-examples. Accessed 08 May 2022
Predicting Stock Market Price Using Machine Learning Techniques Padmalaya Nayak, K. Srinivasa Nihal, Y. Tagore Ashish, M. Sai Bhargav, and K. Saketh Kumar
Abstract Financial time-series predictions like stock and stock indexes have become the main focus of research because of their fluctuating and nonlinear nature in almost all advanced and developing countries. Predicting stock market prices is a crucial topic in the present economy as multiple factors like the global economy, political conditions, country’s performance, company’s financial reports, and many more affect the stock price. Hence, the inclination toward new opportunities to predict the stock market has increased dramatically among professionals. Thus, many predictive techniques are employed over the past few years to maximize the profit and diminish the losses from the stock market movements. With the advancement of artificial intelligence and increased computational capabilities, various methods with programming models have been proven to be more efficient in forecasting stock trends. Mostly, the data size in the stock market is huge and not linear. So, efficient models are required to deal with the complexity and nonlinearity of huge datasets and to find out the hidden pieces of information. Therefore, an effort has been made to forecast the future stock market prices by applying various machine learning techniques such as linear regression (LR), support vector machine (SVM), decision tree (DT), and long short-term memory (LSTM). Then the performance parameters of all ML models such as the root mean squared error, mean absolute error, and mean square error are computed. Our experimental results show that LSTM provides better accuracy in terms of forecasting stock prices compared to the SVM and decision tree algorithm. Keywords Stock market · Linear regression · SVM · DT · LSTM
P. Nayak (B) Gokaraju Lailavathi Womens Engineering College, Hyderabad, India e-mail: [email protected] K. Srinivasa Nihal · Y. Tagore Ashish · M. Sai Bhargav · K. Saketh Kumar Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_56
687
688
P. Nayak et al.
1 Introduction The value of global stock market capitalization in 2021 is 116.78 trillion U.S. dollars [1]. Over the last few years, the stock transaction has attracted users to invest capital due to technological advances worldwide. Investors are trying regularly to follow up on the markets and invest in them hoping that it would increase the profit and reduce the risk [2]. So, people are learning to invest in stock markets keeping the hope of earning a better living. However, prediction of the stock market is not an easy task as it is stochastic, varies frequently, and has untrustworthy characteristics [3]. The stock market is a volatile area for a long, where rates fluctuate frequently at very high volumes. It is very difficult to know the instant of the fluctuations that have been proved to be very dangerous as many have incurred huge losses. The stock market is so volatile that the prices fluctuate regularly on internal as well as external factors. Using simple statistics and mathematics can provide some insights into the data, but they are not sufficient. Stock market prediction (SMP) is an instance of time series that can forecast future data by analyzing the past data. Analyzing the stock market data, deriving the profit through prediction is always an important factor [4]. Traditional approaches for predicting the stock price were mainly based on historical prices like closing, opening, or adjacent to closing, etc. The other approach is the qualitative approach that depends on the company profile, market situation, newly published financial article, political status, and social media including blogs by economists. In recent times, advanced technologies like time-series prediction, which is a common technique that is applied widely in many real-world applications like weather forecasting, financial market prediction, mechanical industry, and so on. With the rapid growth in technology, AI algorithms have been actively applied to stock market trends to predict the future in order to have confidence among investors. In summary, the main contributions of this research paper are listed below: • We study various AI algorithms and predict the closing price of AAPL stock. • We compare the performance of every algorithm and provide their competency to show which algorithm’s values are closest to the original values. The structure of the paper is designed as follows. Section 2 presents the state of the art of the current literature. Section 3 discusses the methodology opted to forecast the stock market trends. Section 4 discusses the experimental results, and the concluding remark is given in Sect. 5.
2 Literature Review The stock market is considered active, volatile, nonlinear, and complex by nature. Prediction of stock prices is an exciting task as it is affected by various factors like political influence, global economy, company’s performance, financial backgrounds, and so forth. Thus, to derive profit and reduce losses, many advanced techniques have
Predicting Stock Market Price Using Machine Learning Techniques
689
been applied to predict stock values in advance by investigating the history of the stock market and proven to be highly beneficial for the movements of stock trends [5, 6]. In general, there are two approaches discussed in the literature for predicting the stock market: (1) technical analysis approach and (2) qualitative approach. The technical analysis approach considers the past price of stocks like opening and closing price, the volume operated adjacent close values, etc. The qualitative approach is performed based on the company profile, market condition, economic status, political situation, financial news articles, social media, and even blogs written by economists [7]. Nowadays, AI-based advanced intelligent techniques are applied for stock market prediction. The machine learning (ML) technique is a part of data science that applies self-learning skills to historical data to predict statistical results. Particularly, for stock market analysis, efficient ML models are required to analyze the huge volume of data, derive complex input/output relationships, and identify the hidden patterns of data [8]. It has been proved that ML techniques have increased the efficiency of the stock market from 68 to 86% as compared to the traditional methods. Most of the past studies in the stock market area use conventional procedures like linear regression [9], random walk theory (RWT) [10], moving average convergence/divergence (MACD) [11], and some linear models like autoregressive integrated moving average (ARIMA), and autoregressive moving average (ARMA) [12] to predict stock market trends. Current literature demonstrates that other ML techniques such as support vector machine (SVM) and random forest (RF) can be applied to predict SMP [13]. In [1, 14–16], various ML techniques are applied to predict the stock market in advance based on public sentiments and political situations. In this work, we have applied four ML models such as linear regression (LR), SVM, decision tree (DT), and an LSTM model to predict an organization’s closed price. A set of new variables are added to the ML model using the financial dataset. These new variables are solely responsible to enhance the correctness of the models to predict the next day’s closing price of a stock market.
3 Datasets Descriptions and Methods The dataset used in this work is collected from Apple’s stock from some time on January 3, 2022, to January 20, 2022. We get the data from the Yahoo finance website and download the historical data for the period of January 3, 2022, to January 20, 2022, in.csv format. The description of the financial data is open, high, low, close, adjacent to close, etc. The new variables are created by using the closing prices of stock which are fed to the inputs to the model. The raw data is preprocessed before applying the ML techniques for forecasting the stock trend. The steps are discussed in detail in the following paragraphs. • Data Preprocessing It is a mandatory requirement to preprocess the data before applying any machine learning algorithm. The main purpose is to bring the dataset into a suitable shape so
690
P. Nayak et al.
that it will be fit to apply machine learning algorithms. It passes through different stages such as data cleaning, data transformation, data normalization, and feature selection. • Data Cleaning In the data cleaning phase, there is a chance of a few missing values in some attributes of the dataset that occurs due to data transmission errors. So, there is a necessity to fill the missing values with their means or remove them to get appropriate values. First, the dataset is taken in the form of a.csv file. Data is present in the form of rows and columns. Columns include date, open, close, etc. We use data cleaning techniques to get rid of any impurities like duplicates, missing data, and unsorted data. One simple solution is by using the features present in MS Excel. We can get this option under the data tab, sort option, and remove the duplicate option. • Data Transformation The data we use has many columns, but we require only one column. The dataset we have used is of Apple’s stock for some time from January 3, 2022, to January 20, 2022. This data is in the form of many columns like date, open, close, etc. But, in this project, we only predict close stock value. Hence, we require only the close column data values. So we need to extract close values from the given dataset. We perform this in data transformation. • Data Normalization The randomly distributed data must be scaled between two smaller values, i.e., 0–1, which shows the data is uniformly distributed. It is a mandatory process because some machine learning algorithms accept values in the range of [0–1] only to improve the performance of the model. In our case, we use a min–max scaler. • Data Prediction and Visualization After normalizing the dataset, we train the data using different AI algorithms. In our project, we have applied different types of ML algorithms to forecast the variation of close stock value over the period January 3, 2022, to January 20, 2022. • Linear Regression: It is the most commonly used predictive modeling technique. It is used to predict a value that depends on one or more other values, and it tries to establish a relationship between them. • Support Vector Machine (SVM): Generally, SVM belongs to the classification models meant for classifying the data points; however, we use it for regression. The main function of the SVR is to find the best fit line having the highest number of points and that will be best to classify the data. • Long Short-Term Memory (LSTM): LSTM is a powerful classification model that helps to learn dependencies in sequence prediction problems. A rather important feature of LSTM is that it can selectively remember and forget things.
Predicting Stock Market Price Using Machine Learning Techniques
691
• Decision Trees (DT): It is a popular tool for the classification and prediction of future values based on some operations performed on the input dataset. It represents a tree-like structure.
4 Results and Discussions The performance of the ML models is computed through various performance metrics such as mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), CPU time, and wall time. After experimenting, the computed measures of all the ML models are listed in Table 1. • Root Mean Square Error (RMSE): RMSE defines the correctness of the model by forecasting the errors of that model for a particular dataset, and it can be calculated using a mathematical formula as shown in Eq. (1). N 1 2 Pi − A j RMSE = N j=1
(1)
• Mean Square Error (MSE): MSE computes the degree of errors by taking the average squared difference between the observed and predicted values as shown in Eq. (2). The error value increases w.r.t increase in error in the model. The MSE is zero when there is no error in the model. MAE =
N 1 Pj − A j N j=1
(2)
• Mean Absolute Error (MAE): MAE finds the errors between paired observations expressing the same phenomenon and is calculated using Eq. (3). N MAE =
|ei | N
1
(3)
Table 1 Performance measures of ML models Performance measures
LR
SVM
DT
LSTM
RMSE
10.70
11.21
9.96
5.22
MSE
114
125.80
99.40
27.24
MAE
8.38
8.47
7.42
4.10
CPU time (ms)
2.55
5.61
3.29
27.95
Wall time (ms)
1.22
5.94
3.33
21.95
Accuracy (%)
74.80
71.96
88.55
96.80
692
P. Nayak et al.
Where, ei = Pi − A j
(4)
AJ is the true value, Pj is the predicted value, N is the total number of actual values, and ei is the error between the predicted value and actual value. • CPU Time: As the name implies, CPU time is the amount of time that the CPU has taken to process and execute a specific program or process. • Wall Time: It is defined as the actual amount of time taken from the start to the end of an operation as opposed to CPU time, which only includes the time period during which instructions were processed. The actual AAPL dataset is shown in Fig. 1. The proposed model is depicted in Fig. 2. The prediction of each model is compared with the actual dataset as shown in Figs. 3, 4, 5 and 6. The prediction model of DT for the stock trend is shown in Fig. 3, linear regression is shown in Fig. 4, LSTM is shown in Fig. 5, and SVM is shown in Fig. 6. The final comparison results of all the models are shown in Fig. 7. It is observed from the experimental results that linear regression has lower computational complexity and takes a considerably lower time in comparison with other models. But it may not be suitable for every stock market prediction justifying that all the stock markets do not vary linearly. SVM is suitable for a larger dataset compared to other ML models. The advantages of DT are that it does not require any normalization of data, so less effort is required for preprocessing the data. But the disadvantage is that it is relatively expensive and time consuming to execute the model. It is observed from the experimental result that the LSTM model performs well compared to other ML models providing an accuracy of 96.80% as the prediction value for the next day is very close to the actual value (see Fig. 5). All the performance metrics are compared for the four models and shown in Fig. 8. RMSE, MSE, and MAE favor the LSTM model by compromising CPU time and wall time.
Fig. 1 Stock closing prices
Predicting Stock Market Price Using Machine Learning Techniques
Fig. 2 Proposed methodology
Fig. 3 Actual versus predicted stock value using decision tree algorithm
693
694
Fig. 4 Actual versus predicted stock value using linear regression model
Fig. 5 Actual versus predicted value using LSTM model
Fig. 6 Actual versus predicted value using SVM model
P. Nayak et al.
Predicting Stock Market Price Using Machine Learning Techniques
695
Fig. 7 Comparison of actual versus predicted value using LR, LSTM, SVM, and DT model
Fig. 8 Comparision of performance metrics
5 Conclusion Our eye witness proves that the stock market characteristics are complex and not linear. The prediction of the stock market is not only a challenging task but also quite cumbersome. The reason is that dynamic changes in stock values depend on multiple parameters like global economies, company’s profit, political issues, etc., which make the dataset more complex. Any company’s website contains a dataset that consists of limited features like the high, low, open, close, adjacent close value of stock prices, the volume of shares traded, etc., which are not adequate enough to predict the returns of the stock market. To attain a higher accuracy level in predicted values, new variables have been twisted into the existing variables. Various ML models like LR, SVM, DT, and LSTM are applied to predict the next day’s closing price of the stock to obtain
696
P. Nayak et al.
a comparative analysis. The comparative errors based on RMSE, MSE, and MAE values indicate that the accuracy of predictive results is better for LSTM as compared to other ML models. However, based on the performance measures obtained from LR, SVM, DT, and LSTM models, LSTM is considered a useful ML model that provides 96.80% accuracy for predicting stock market movements by compromising CPU time and wall time which is less in other ML models. Furthermore, it is expected that it would enhance the user’s confidence level to invest in the stock markets. Our future work includes developing a new accurate ML model by integrating the existing models to predict the accurate stock price in advance.
References 1. Khan W, Malik U, Ghazanfar MA, Azam MA, Alyoubi KH, Alfakeeh A (2019) Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput 24:11019–11043 2. Upadhyay A, Bandyopadhyay G (2012) Forecasting stock performance in Indian market using multinomial logistic regression. J Bus Stud Q 3:16–39 3. Tan TZ, Quek C, Ng GS (2007) Biological brain-inspired genetic complementary learning for stock market and bank failure prediction. Comput Intell 23:236–261 4. Ali Khan J (2016) Predicting trend in stock market exchange using machine learning classifiers. Sci Int 28:1363–1367 5. Masoud NMH (2017) The impact of stock market performance upon economic growth. Int J Econ Financ Issues 3(4):788–798 6. Murkute A, Sarode T (2015) Forecasting the market price of the stock using artificial neural network. Int J Comput Appl 124(12):11–15 7. Hur J, Raj M, Riyanto YE (2006) Finance and trade: a cross-country empirical analysis on the impact of financial development and asset tangibility on international trade. World Dev 34(10):1728–1741 8. Li L, Wu Y, Ou Y, Li Q, Zhou Y, Chen D (2017) Research on machine learning algorithms and feature extraction for time series. In: IEEE 28th annual international symposium on personal, indoor, and mobile radio communications (PIMRC), pp1–5 9. Seber GAF, Lee AJ (2012) Linear regression analysis. Wiley Mathematics, p 582 10. Reichek N, Devereux RB (1982) Reliable estimation of peak left ventricular systolic pressure by M-mode echographicdetermined end-diastolic relative wall thickness: identification of severe valvular aortic stenosis in adult patients. Am Heart J 103(2):202–209 11. Chong T-L, Ng W-K (2008) Technical analysis and the London stock exchange: testing the MACD and RSI rules using the FT30. Appl Econ Lett 15(14):1111–1114 12. Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175 13. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300 14. Liaw A, Wiener M (2002) Classification and regression by Random Forest. R news 2(3):18–22 15. Pierdzioch C, Risse M (2018) A machine-learning analysis of the rationality of aggregate stock market forecasts. Int J Financ Econ 23(4):642–654 16. Nti IK, Adekoya AF, Weyori BA (2020) A comprehensive evaluation of ensemble learning for stock-market prediction. J Big Data 7:1–40
Hyper Lattice Structure for Data Cube Computation Ajay Kumar Phogat
and Suman Mann
Abstract In the current business scenario, a quick and accurate response is required for business decisions. Data warehouse consists of huge volumes of data repositories. This information is utilized to develop insights for decision-making queries on data, besides it takes a long time to process, resulting in longer response times. This response time must be lowered in order to make effective and efficient decisions. The computation of data cubes is a momentous task in data warehouse design. Precomputation may significantly reduce the response time and also improve online analytical processing efficiency by computing partially or all or part of a data cube. However, such computing is difficult since it may need a substantial amount of time and storage space. Many authors have suggested algorithms for efficient data cube computation for reducing the time computation and storage cost. In this paper, we have proposed a framework that uses a heuristic approach for data cube computation in the hyper lattice structure. Keywords Data cube · Hyper lattice · Data warehouse · Multidimensional model
1 Introduction In the data warehouse, data is stored in an electronic repository. The data warehouse’s core building elements are data models [1]. The simplest ways of view data in a lattice structure are data cubes or cuboids. A subset of database properties can be used to create a data cube [2–4]. Dimensions and facts are specified in the hyper lattice data cube in which dimensions entities are related to the data listed in the data cube. Every hyper lattice dimension, such as time or location, may be connected with one or more tables [4]. Online analytical processing (OLAP) operations deal with aggregate data. A consumer can perform OLAP operations by initial requirements A. K. Phogat (B) · S. Mann Maharaja Surajmal Institute of Technology, GGSIPU, New Delhi 110058, India e-mail: [email protected] S. Mann e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_57
697
698
A. K. Phogat and S. Mann
that are specified at the time of developing a multidimensional model structure by the developers. These initial requirements may be used for extended data analysis [5]. OLAP constitutes some of the basic operations on the data model such as roll-up, which includes moving upward in the level of the hierarchy, second is drill-down operation, which decreases the level of aggregation by moving down in the hierarchy in the lattice structure, the third operation is slice and dice, in which slicing includes selecting one dimension in the cube that results in a subcube, whereas dice operation includes a selection of two or more dimension in a cube, and fourth operation is the pivot that performs a rotation on data axis to view different viewpoint. Roll-up operations may be used to produce a high degree of abstraction, while dimension reduction in the lattice of cuboids can be used to calculate aggregate cuboids, and drill-down operations can be used to acquire specific data. Mapping of multiple levels of concepts to produce different aggregate views may be done in OLAP, for example, low-level to high-level aggregate concepts can be focused in OLAP through concept hierarchy, which implies one dimension can be presented in numerous aggregate levels. The amount of abstraction is determined by the business requirements. If one of the dimensions is time, then it can be described as week, month, year, and so on [6–10]. Data cubing in a huge data warehouse application is difficult since it has multiple dimensions, so each dimension has multi-hierarchy levels. A business’s performance depends on its ability to respond quickly and accurately. The query process time is critical for effective commercial applications because it allows quick access to data in big databases, mainly networked databases. Data is rapidly growing in proportion to the dimension required by a company. It is quite expensive to handle in terms of space and time computing processes. The data warehouse employs multiple distinct perspectives to effectively handle queries. The problem of space and time calculation cost can be solved using a pre-computed data cube. As a result, the processing of queries may take more time [11–14]. Materialization of data cube views is one of the most efficient approaches to reducing calculation time in a decision support system (DSS). As a result, researchers are always looking for improved algorithms that can select the greatest perspectives to materialize the data cube. The diagram below shows a three-dimensional lattice construction. The data warehouse stores sales (S), time (T ), and product (P). Cuboid STP is the basic cuboid; when we progress up the cuboid hierarchy, we reach the apex cuboid, which has 0 dimensions [15]. The traditional lattice has some drawbacks, such as adding more dimensions will almost double the size of the structure since all new dimensions must connect the structure’s bottom and upper bound dimensions. It is difficult to comprehend the entire lattice. Materializing the entire lattice is quite tough. To solve this difficulty, the hyper lattice concept is presented [11, 16] (Fig. 1). Paper organization is as follows Section 2 discusses the literature review done by various authors with respect to the view selection in the data cube. Section 3 contains the hyper lattice framework and its structure that describe the overlapping of lattices in the structure and also proposed a framework for data cube computation in the hyper lattice. Section 4 includes the
Hyper Lattice Structure for Data Cube Computation
699
Fig. 1 Lattice structure with three dimensions
comparative analysis of traditional and hyper lattice structure according to cuboids generated when we add new dimension. Section 5 contains the conclusion and future scope.
2 Literature Review The present study effort in terms of view materialization and query path selection in the lattice structure is covered in the literature review section. The primary two works on which continual research has been done in the past by many authors are query optimizing and view materialization. Find the most effective and effective collection of views with the storage available and decide which point of view to pursue is really challenging. The challenge of view selection is complicated by constraints such as time, space, and availability constraints, as well as join and group-by constraints. Many people have presented a method for optimizing query path selection. Harinarayan et al. [17] presented a greedy method for selecting the optimal range of views to materialize in a lattice framework, based on different constraints. Shukla et al. [3] introduced a distributed algorithm that deals with single cube computation as well as extends it to improve multi-cube performance. For this purpose, three algorithms, namely SimpleLocal, ComplexGlobal, and SimpleGlobal, that choose the aggregates for pre-computation from multi-cube schemas have been developed. When compared to Harinarayan’s method, it is more efficient in terms of view selection. Gupta and Mumick [4] proposed a polynomial-time greedy AND-OR viewgraph method. OR perspective implies that every view can be computed by its related view in this algorithm, whereas AND view has a unique evaluation value. It is an estimation greedy algorithm that picks a group of views to materialized data cube to reduce query response time for OR view graphs, and they also build an A* heuristic
700
A. K. Phogat and S. Mann
approach to offer an ideal answer. With the support of the various view processing plans, Amit et al. [18] proposed an algorithm based on the heuristic method that discovers a collection of views based on multiple global processing plans of the query (MVPP). To overcome the problem of view selection in the distributed data warehouse, Chen [19] suggested a greedy-based selection method for view selection under storage cost limitations. Zhang et al. [20] suggested an algorithm for materialized view selection based on genetic algorithms, which was proven to be extremely successful in decreasing query maintenance and query cost when compared to the heuristic. Chaudhari and Dhote [9] proposed a cluster-based dynamic algorithm for view selection that retrieves the prominent dimension from a set of queries, creates a cluster of queries to generate a set of candidate views, and then dynamically adjusts the final materialized view. In comparison with the genetic method, Gosain [14] introduced a stochastic approach called particle swarm optimization that efficiently decreases the query processing time. Soumya and Nabendu suggested a dynamic query path algorithm to select in the lattice with a specified idea hierarchy [13]. The algorithm’s goal is to determine the shortest path from source to destination cuboid while minimizing query access time. The algorithm first picks partly materialized cuboids that are present in cache or primary memory and then constructs cuboids from secondary memory using the least recently used approach if the target cuboid is not located in cache or main memory. Sen et al. [16] presented an algebraic hyper lattice structure that is more flexible than regular lattice since adding a new dimension is quite easy. There is an overlapping of two or more lattices that share both time and space in a hyper lattice. He also created a technique for creating a hyper lattice, as well as the selectMinCost path algorithm based, which reduced query cost time.
3 Hyper Lattice Structure in Data Warehouse A lower bound and an upper bound are parts of the hyper lattice structure. In data warehouses, the hyper lattice structure is more versatile than the prior standard lattice structure. In a hyper lattice, adding additional dimension is relatively convenient. The common elements of overlapping lattices meet at a common position termed the least upper bound of every lattice in hyper lattice; however, the GLB should be unique to each lattice [16]. In Fig. 2, there are lattice A1 and A2 overlapping and share same storage space. A1: {, < TM >, < PM >, < PS >, < T >, < P >, < M >, < ALL >} A2: {, < PM >, < PS >, < MS >, < P >, < M >, < S >, < ALL >} To minimize both time and space, we may use hyper lattice. As seen in Fig. 2, various lattices share similar space, thus cuboids only need to be stored once if they are encountered in the same place. They do not need to store data individually for each lattice since all the cuboids of lattices are kept in same storage area, making
Hyper Lattice Structure for Data Cube Computation
701
0
1
3
2
6
8
4
5
7
9
10
11
Fig. 2 Hyper lattice of cuboids having overlapping of two sets
it easier to search the data. Because each lattice must be stored in its own memory address, searching data in a conventional lattice may take longer. Hyper lattice also reduces data redundancy and improves consistency. Higher-level aggregates can always be calculated from previously computed lower-level aggregates rather than the fact table in cube computing. Moreover, parallel aggregation from cached preliminary compute results can reduce the cost of expensive disc I/O operations. A lattice structure is formed by all conceivable combinations of cuboids starting with the base cuboid [26]. Moving up the hierarchy from a base cuboid with n-dimensions to the apex cuboid with 0 dimension is possible in conventional lattice structures. From the base cuboid to intermediate-level cuboids, alternate solutions can be identified in the lattice. A hyper lattice is a hybrid variant of the standard lattice structure with multiple base cuboids. In a hyper lattice design, two or even more lattices share data storage. Hyper lattice is more flexible and versatile than standard lattice in that it allows for the addition of different dimensions, whereas standard lattice does not allow for this. Additional lattice structures must be developed to add new dimensions, which increases the storage problem, but in a hyper lattice structure, any dimensions between levels N and N − 1 may be added [21, 22]. We may identify more than one path through the source to destination cuboid when retrieving information from a lattice structure applying hyper lattice. The optimum path leads to the most effective use of resources. The cost factor might be determined by the cuboid’s size. The size of a cuboid may be computed in hyper lattice by multiplying the tuples in a cube by the size of the related cube.
702
A. K. Phogat and S. Mann
3.1 Proposed Framework for Data Cube Computation in Hyper Lattice In this paper, we have proposed a framework for efficient data cube computation in hyper lattice structure. We have created sorted array of that to reduce the computation time in hyper lattice structure (Fig. 3). Pseudo code for Data cube computation Create two arrays X and Y for storing the index number of source and target data cuboids in hyper lattice structure Check whether attributes of array Y have all the attributes of array X Store source index values to array Z ELSE EXIT Create an sorted array at each level having the index value and size of the cuboid Repeat the below steps at each level in hyper lattice hierarchy Search all possible paths in upper levels and calculate the cost of traversing by multiplying the tuple in cuboid by the size of associated cuboid Consider the least cost path at each level and store the selected cuboid from each level in array D If array D equals to target array A Minimal cost path found
4 Comparative Analysis—Traditional Lattice Versus Hyper Lattice Hyper lattice is more flexible and versatile than standard lattice in that it allows for the addition of different dimensions, whereas standard lattice does not allow for this. Additional lattice structures must be developed to add new dimensions, which increases the storage problem, but in a hyper lattice structure, any dimensions between levels N and N − 1 may be added (Fig. 4; Table 1). Above comparative analysis shows that if we add a new dimension in traditional lattice, then its structure becomes almost double that requires more space, but the hyper lattice structure will not expand that much as it creates another base cuboid and overlapping of two or lattice shares same memory space.
Hyper Lattice Structure for Data Cube Computation Fig. 3 Flowchart for data cube computation in hyper lattice
703
704
A. K. Phogat and S. Mann
Tradional Lace V/S Hyper Lace 35 30 25 20 15 10 5 0 1
2 Tradional Lace Structure
3
4
Hyper Lace Structure
Fig. 4 Comparison of number of cuboids in traditional and hyper lattice
Table 1 Comparison of traditional lattice and hyper lattice structure Number of dimensions
Number of cuboids in traditional lattice structure
Number of cuboids in hyper lattice structure
2
4
4
3
8
6
4
16
13
5
32
20
5 Conclusion and Future Scope In this paper, we have explored the hyper lattice structure and also compared it with the traditional lattice structure in the data warehouse. Hyper lattice structure in the data warehouse can resolve the issue of space and time computation. The overlapping cuboid can save the memory space by sharing the same memory in the system, and time computation is reduced by only traversing the particular base cuboid instead of all base cuboids. We also proposed a framework for data cube computation in the hyper lattice. Our future scope will be to propose an algorithm for efficient data cube computation in the hyper lattice structure.
References 1. Inmon WH (1992) Building the data warehouse. QED Information Sciences, Wellesley 2. Yang J, Karlapalem K, Li Q (1997) A framework for designing materialized views in data warehousing environment. In: Proceedings of the 17th IEEE international conference on distributed computing systems. Maryland, U.S.A. 3. Shukla A, Deshpande PM, Naughton JF (1998) Materialized view selection for multidimensional datasets. In: Proceedings of the 24th international conference on very large databases.
Hyper Lattice Structure for Data Cube Computation
705
New York, pp 488–499 4. Gupta H, Mumick IS (2005) Selection of views to materialize in a data warehouse. IEEE Trans Knowl Data Eng 17(1):24–43. https://doi.org/10.1109/TKDE.2005.16 5. Mann S, Phogat AK (2020) Dynamic construction of lattice of cuboids in data warehouse. J Stat Manage Syst 23(6):971–982 6. Gosain A, Mann S (2014) Empirical validation of metrics for object oriented multidimensional model for data warehouse. Int J Syst Assur Eng Manage 5:262–327. https://doi.org/10.1007/ s13198-013-0155-8 7. Sen S, Chaki N, Cortesi A (2009) Optimal space and time complexity analysis on the lattice of cuboids using Galois connections for data warehousing. In: 2009 Fourth international conference on computer sciences and convergence information technology. IEEE 8. Mann S, Gosain A, Sabharwal S (2009) OO approach for developing conceptual model for a data warehouse. J Technol Eng Sci 1(1):79–82 9. Chaudhari MS, Dhote C (2010) Dynamic materialized view selection algorithm: a clustering approach. In: International conference on data engineering and management. Springer, Berlin, pp 57–66 10. Kumar TVV, Arun B (2017) Materialized view selection using HBMO. Int J Syst Assur Eng Manage 8(1):379–392 11. Gosain A, Mann S (2010) Object oriented multidimensional model for a data warehouse with operators. Int J Database Theory Appl 3(4):35–40 12. Gosain A, Mann S (2013) Space and time analysis on the lattice of cuboid for data warehouse. Int J Comput Appl 77(3) 13. Soumya S, Nabendu C (2011) Efficient traversal in data warehouse based on concept hierarchy using Galois connections. In: Proceedings of the second international conference on emerging applications of information technology, pp 335–339 14. Gosain A (2016) Materialized cube selection using particle swarm optimization algorithm. Procedia Comput Sci 79:2–7 15. Gosain A, Mann S (2012) An object-oriented multidimensional model for data warehouse. In: Fourth international conference on machine vision (ICMV 2011), computer vision and image analysis; pattern recognition and basic technologies, vol 8350. SPIE 16. Sen S, Cortesi A, Chaki N (2016) Hyper-lattice algebraic model for data warehousing. Springer International Publishing 17. Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. In: ACM SIGMOD international conference on management of data. ACM Press, New York, pp 205–216 18. Amit S, Prasad D, Jeffrey NF (2000) Materialized view selection for multi-cube data models. In: Proceedings of the 7th international conference on extending database technology: advances in database technology. Springer, pp 269–284 19. Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multidimensional regression analysis of time series data streams. In: Proceedings of the 2002 international conference on very large databases (VLDB’02). Hong Kong, pp 323–334 20. Zhang C, Yang J (1999) Genetic algorithm for materialized view selection in data warehouse environments. In: Proceedings of the international conference on data warehousing and knowledge discovery, LNCS, vol 1676, pp 116–125 21. Prashant R, Mann S, Eashwaran R (2021) Efficient data cube materialization. In: Advances in communication and computational technology. Springer, Singapore, pp 199–210 22. Phogat AK, Mann S (2022) Optimal data cube materialization in hyper lattice structure in data warehouse environment. J Algebraic Stat 13(1):149–158
Malicious Network Traffic Detection in Internet of Things Using Machine Learning Manjula Ramesh Bingeri, Sivaraman Eswaran, and Prasad Honnavalli
Abstract The Internet of Things (IoT) is rapidly getting popular throughout the globe. However, it is not in safe hands as it is facing many attacks from intruders. Malicious infections are at the pace of affecting IoT at a rate greater than 30%. Machine learning (ML) is one of the effective techniques for identifying malicious attacks on devices. Since IoT nodes have a poor processors, it becomes necessary to identify an attack in a single IoT device or will become more challenging. This mechanism looks into the possibilities of vulnerability in a single IoT device by making use of machine learning algorithms. Initially, the IoT-23 dataset is passed to the preprocessing. After preprocessing and feature selection, it is trained and evaluated in multiple ML techniques such as random forest (RF), K-nearest neighbor (KNN), and support vector machine (SVM) models. The accuracy, precision, F1score, and recall metrics are used to evaluate the output of all algorithms. Here, model identifies network traffic anomalies and classifies packets as ‘malicious ‘or ‘benign’. Keywords IoT · Machine learning · IoT-23 · Random forest · KNN
1 Introduction The IoT had grown massively in the last decade, and it links all types of smart devices, establishing an integration that enables the platform for a variety of applications. Such applications might range from basic smart household appliances to complex equipment for a major infrastructure site. As the demand for IoT services grows, the flow of network traffic grows as well, leading to a significant issue, when it comes to analyzing network traffic and identifying malicious intrusion activity. Malicious activity detection in the IoT network requires an efficient detection methodology. To defend the IoT network from cyberthreats, a rapid and effective harmful activity detection method is required to continually monitor IoT data traffic also to remove bad traffic. M. R. Bingeri (B) · S. Eswaran · P. Honnavalli Department of Computer Science, PES University, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_58
707
708
M. R. Bingeri et al.
An intrusion detection system (IDS) is used to keep track of the network and security flaws. It maintains a record of data flow to detect and secure system data. IDS activities are separated into three phases: monitoring, analysis, and detection. The monitoring stage checks the suitability of the dataset to the model. The analysis stage is determined by the feature extraction technique. The detection stage is the identification of malicious activities in the networks. Using the random forest technique, effective identification of harmful activities in network traffic is done. Autoencoder is used for detecting and analyzing malicious intrusions. The newest network traffic dataset, such as IoT-23, is used to analyze performance. The IoT-23 dataset consists of 23 features. These 23 features are divided into two types of which 20 are malware features and 3 are benign features. Furthermore, performance is assessed in terms of accuracy using techniques such as KNN and SVM algorithms. The rest of the paper is organized as follows. The literature survey is briefly summarized in Sect. 2. Section 3 contains a description proposed methodology, and the dataset is briefly described in Sect. 4. Section 5 explains the performance analysis. Finally, Sect. 6 concludes the work with a final remark. Section 7 provides the future scope followed by the reference papers.
2 Literature Survey van Dartel [1] illustrates that malware threats are detected in real time by a single IoT device. The machine learning technique was executed on a single IoT device using the decision tree model. The classifier’s performance metrics are high, and it is quite accurate. Identifying malicious activities on a specific IoT device is a useful strategy to limit the number of devices affected. However, there are several drawbacks. A few data points are still incorrectly categorized. Although the decision trees and random forests are insignificant, the purpose of this work is to develop a classifier that is as quick as feasible while preserving the highest possible metrics. Nugroho and Djatna [2] illustrate how IoT systems have a limitation on IoT devices since they aren’t designed to deal with security against attacks or disruptions. In addition, the normal IoT gadget is broadly organized in preparing expansive information. It has constraints like memory, control quality, and sensor settings which are limited to meet the device’s requirements, making it noxious for clients to use it while assaulting. IDS is required to identify the IoT organizing framework disruption due to the device’s impotence. Stoian [3] stated that the primary enormous step is information preprocessing, which comprises information choice, information visualization, information designing, and information part. These steps are repeated to handle the data to categorize. The information was distributed randomly in an 80–20 ratio, with the 80% proportion of data for training and the 20% of it for testing purposes. Garcia [4] explained that a single IoT device is used to execute the ML model. In IoT, the number of connected devices is rapidly increasing. If one device is infected, the virus spreads to all linked devices, forming botnets. As the result, early identification of a botnet assault is crucial for infrastructure security. Garcia et al. [5]
Malicious Network Traffic Detection in Internet of Things …
709
investigated the benefits and drawbacks of a commonly used network attack and its detection approach. They utilized machine learning models to detect a specific sort of assault in network traffic. They suggested a new error metric, which includes IP addresses and time into accounts, and it readily compares the algorithms from the perspective of a network administrator. Zeidanloo et al. [6] explained about analyzed seven distinct forms of botnet assaults on actual network data captured throughout the computer network using the machine learning model. The botnet operation was carried out utilizing a command-and-control system. IRC stands for Internet Relay Chat, which is a network communication channel. Through the legitimate IRC channel, the command-and-control server is utilized to deliver orders to infect susceptible machines. Gu et al. [7] looked at the communication channel’s data stream and utilized a statistical technique for anomaly detection to find harmful activities linked to botnets. This study discusses BotSniffer, which effectively detected all botnets with only a few false positives. BotSniffer is a network anomaly-based botnet detection technique that examines botnet command and spatial-temporal correlate and comparability. They are also working on a next-generation detection technology that is unaffected by the botnet C&C protocol or network structure. Liu and Shaver [8] used different machine learning algorithms to test the anomaly detection process on the IoT network intrusion dataset to improve IoT security. Intrusion training set is used to obtain excellent accuracies while keeping high efficiency. Using KNN, the greatest accuracy with 99% is achieved. To provide a safe IoT framework for smart settings is designed with efficient, reliable, and simple IDS. Livadas et al. [9] stated that machine learning techniques were employed in their work to detect C&C traffic of IRC-based botnets. (1) differentiating IRC traffic from non-IRC traffic. (2) recognizing the difference between botnet and legitimate IRC traffic. The first step in developing a multiphase based on a Bayesian classifier is to evaluate IRC traffic, and the second stage is to detect botnet activity in connection. They are also looking at using telltales from compromised servers to classify suspicious and non-suspect flows, as well as distinguish among botnet and legitimate IRC traffic. Nagisetty and Gupta [10] this study presents a method for detecting malicious activity in IoT based on the Keras. The major goal of this project is to use quick and efficient big data techniques like TensorFlow and Keras to identify suspicious network traffic on the Internet of Things. To identify network traffic, three distinct deep learning models are employed in the proposed framework: multilayer perceptron, convolutional neural networks, deep neural networks also an autoencoder. For analyzing and testing the performance of the proposed approach, two well-known datasets, UNSW-NB15 and NSLKDD99, are employed. Machine learning methods like logistic regression (LR) and SVM are employed in this case. The model gained higher ratings of 99.24% accuracy. Palla and Tayeb [11] published intelligent Mirai detection methods for IoT nodes in 2021. In this model, researchers utilized the NBaloT dataset. The model gained higher ratings of 92.8%, according to the findings. The system results in producing the least amount of FN rate with 0.3% and provides higher accuracy. To train the machine learning model, many algorithms used a neural network and it consisting of TensorFlow and MATLAB. The model is configured
710
M. R. Bingeri et al.
such that 70% of the input is utilized for train, 15% for the classification model, and 15% for tests. 49548 sets are included in the model. ANN, as well as RF, is indeed the techniques to contrast in terms of accuracy. Liu et al. [12] this study compares and contrasts various supervised feature selection algorithms for detecting malicious network traffic from IoT devices. For each selection strategy, they use three distinct feature selection methods and 3 distinct logistic regression (LR) methodologies. The results show that all 3 LR algorithms SVC, XGBoost, and RF performed well. It meant that in a supervised learning scenario, these approaches may be used to detect an attack on the Internet of Things. Aljabri et al. [13] stated in their study that random forest is the most often utilized method, which may be explained by the fact that it employs ensemble learning. Researchers used an essential collection of characteristics that influenced the model accuracy to identify malicious and benign URLs. It elaborates static lexical information from the URL. Furthermore, other researchers want to turn the models they developed into a real-time system in the future so that they may be used in realworld scenarios like attack management and also mitigation. Online predictions and online learning are the two stages of real-time machine learning. Michael Austin explained about the topic of binary categorization of malicious and benign activities in IoT network traffic was addressed in the report [14]. Across all areas except recall, the RF has the best performance. The support vector machine had the highest recall, and Nave Bayes was in second. Timestamp, history, protocol, responding IP address, and origin IP byte number were discovered to be the most relevant criteria for differentiating classes at the root node by the random forest. The conclusions add to the experimental observations and provide a statistically valid analysis of the domain of IoT malicious traffic categorization. These classifier’s performance and the characteristics discovered may be utilized to identify malicious traffic on the IoT.
3 Proposed Methodology As seen in Fig. 1, when an IoT-23 dataset is given as an input, it goes to the preprocessing unit where raw data are converted into useful data. In preprocessing, null values and irrelevant fields have to be removed from the data. From there, the data are sent to the feature extraction unit where the features that are important to determine the output are extracted. These selected features are relevant, not miss-classified, and reduce the overfitting. These features are sent to the training-testing unit. The training-testing unit dataset is divided according to the 90–10 rule. 90% of the data is used to train the model, whereas 10% is used to test it. Once data have been trained, ML algorithms such as random forest, K-nearest neighbor, and support vector machine are utilized to calculate the accuracy of the ML models. The model is predicted using the predict function. If any malicious activities are detected, then it will label the types of malicious activities.
Malicious Network Traffic Detection in Internet of Things …
711
Fig. 1 Malicious detection and classification system
3.1 Pre-processing In preprocessing, null values and irrelevant fields have to be removed from the data. Pandas and sklearn, Numpy libraries are utilized at this step. The data are read from the conn.log text files using Pandas. Numpy is utilized to replace all ‘–’ values that indicated empty spaces, which indicate a null value. At the stage of dataset insertion, the counts of NaN values in each column were evaluated. In certain circumstances, such as the tunnel parents, complete columns are vacant. Those fields were eliminated because they had no clear value. To a smaller extent, the service, the protocol, and historical characteristics are also impacted by NaN values. As a result, a given value is improper. The scikit-learn LabelEncoder was used to convert the various strings to numbers. The labels were given a simple statistic encoding for correlation and processing. A correlation matrix is a useful tool for describing a huge dataset as well as recognizing and analyzing correlations in it as seen in Fig. 2.
3.2 Feature Extraction The characteristics timestamp, unique identifier, id_orig h, local orig, local_resp, missed_bytes, and tunnel parents can be deleted using statistical correlation analysis. The reason for this is that the feature and the label have an unstable relationship. Some other issue are that several columns, particularly the final three, lacked sufficient data that allow a correlation to be determined. A few of the characteristics, such as the local-remote connection, connection state, and tunnel parent fields, are instantly eliminated since their values are generally null or empty.
712
M. R. Bingeri et al.
Fig. 2 Correlation graph of features
Figure 3, shows that origin ip_bytes, origi_pkts, orig_bytes, duration, resp_pkts, resp_ip bytes, resp_bytes, and missed_bytes are the most significant features to determine the model’s accuracy. Methods for determining a value for each model’s input characteristics are referred to as feature importance. Each “feature’s importance” is simply described by the scores. A higher score shows the specific feature will have a huge effect on the model. Feature importance helps to prevent overfitting. The data chosen must not be duplicated or inaccurate, if so, it leads to increased accuracy. The suggested technique is feasible because the model is being evaluated, and less data are being needed for training. Some characteristics, such as flow’s unique ID and IP address, are redundant.
3.3 Random Forest, KNN and SVM Random forests are a set of supervised tree classification algorithms. A node or level in the tree’s branch evaluates a dataset characteristic and separates the outputs using only a threshold. The features used in the first iteration of each tree are chosen at random, and the dataset is picked at random with substitution throughout training. Characteristics of these high-performing trees grow more relevant, whereas the trees with a higher impurity class have fewer defects. The accuracy of the random forest
Malicious Network Traffic Detection in Internet of Things …
713
Fig. 3 Shows feature importance using a random forest algorithm
Fig. 4 Shows a random forest model
classifier is defined by the individual decision tree classifiers’ independence and accuracy. For the categorization of benign and malicious network flows, random forests are often utilized. As seen in Fig. 4, first, random data samples from an IoT-23 preprocessed dataset are selected. The model will then construct the specific decision tree (DT) for each
714
M. R. Bingeri et al.
dataset or group of sample data. Then, after every DT, it will be estimated by generating a prediction result. Following that, each decision tree will be referred to voting. Then, select the most popular predicted outcome. K-nearest neighbors are a machine learning classification approach that is simple, effective, and also, it falls under the category of supervised learning. The KNN approach maintains all available data, and newer data are classified according to their previously collected data. The KNN algorithm can efficiently categorize new data into an appropriate category as it is created. The KNN algorithm is a nonparametric classification approach. Additionally, it takes less time to train before categorizing the test sample. Support vector machine is a supervised machine learning algorithm that can do classification, outlier identification, and regression. It has a high degree of accuracy when compared to other classifiers such as logistic regression (LR) and decision tree (DT). This can handle a variety of continuous, unconditional, and discrete variables with ease. The SVM creates a hyperplane in a multidimensional region that divides different classes. By reducing an error, the SVM reliably constructs the ideal hyperplane.
3.4 Train Test Split Figure 1 illustrates the model selection approach that divides a data sample into 2 parts. One is used for training, and the other is used for testing. In the following categorization, it is presented in a different sequence, making it easier to break the dataset and pursue a perfect model.
4 Dataset This research focuses on the IoT-23 dataset. Parmisano and Garcia, Erquiaga introduced the IoT-23 dataset in 2020. This dataset consists of packet captures. This information is derived from network traffic generated by IoT devices. There are 20 malware and 3 benign network traffic in it. Three separate IoT devices were identified to have benign captures: a Somfy door lock, a Philips Hue light bulb, and an Amazon Echo. The Raspberry Pi was used to collect the 20 malicious network traffic. The IoT-23 dataset has many labels. The various forms of assaults are denoted by labels. C&C, PartOfAHorizontalScan, or DDOS are some examples of labels. There are 19 attributes in this dataset.
Malicious Network Traffic Detection in Internet of Things …
715
5 Results and Evaluation 5.1 Experimental Setup The experiment is run on an Intel(R) Core (TM) i5-1135G7 processor running at 2.40 and 2.42 GHz. The machine has a RAM capacity of 16 gigabytes and an x64-based processor. Python 3.8.5 64-bit was used for the implementation. Anaconda’s Jupyter Notebook is used to execute Python 3.8.5. Pandas library is used for data analysis and manipulation of data.
5.2 Results Analysis The dataset is divided according to the 90–10 rule. Once data have been trained, ML algorithms are utilized to calculate the accuracy of the ML models. The model’s accuracy is predicted using characteristics including accuracy and precision, F1-score, and recall. The predict function is utilized to predict the model, and the classification model’s performance is determined using the confusion matrix. The confusion matrix consists of an N × N matrix that is used to estimate a classification model’s performance or efficiency. The number of target classes is denoted by the letter N. It compares the targeted values to the machine learning model’s predictions. Tables 1, 2 and 3 display the outcome of evaluating the confusion matrices using the dataset.
5.3 Performance Metrics In this paper, true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are calculated. TP is a malicious flow in this example, and TN is a benign
Table 1 Confusion matrix by RF 6618 2 2 642 0 0 14 1
0 0 2271 0
647 848 0 8186
Table 2 Confusion matrix by KNN 6614 2 0 644 0 0 3 0
0 0 2271 0
651 848 0 8198
716
M. R. Bingeri et al.
Table 3 Confusion matrix by SVM 6607 1 0 636 0 0 0 0
Table 4 Performance metrics of model S. No. Metrics and classifiers Models 1 2 3 4
Precision weighted average Macro average Recall weighted average Macro average F1-score weighted average Macro average Accuracy (%)
0 0 2268 0
659 856 3 8201
RF
KNN
SVM
0.93 0.96 0.92 0.83 0.91 0.87 92.12
0.93 0.96 0.92 0.84 0.92 0.87 92
0.93 0.96 0.92 0.83 0.91 0.87 92.10
flow. FP refers to a benign flow that has been incorrectly classified as malicious, whereas FN refers to a malicious flow that has been misclassified as benign.
5.4 Accuracy The accuracy was calculated for three algorithms as shown in Table 4. In this comparison, the RF, KNN, and SVM models scored 92.12% and 92% and 92.10% accuracy.
5.5 Precision Table 4 illustrates the precision values comparatively. The RF, KNN, and SVM algorithms have been trained and have categorized all of the nodes with a precision of 93%.
5.6 Recall Table 4 illustrates the recall values comparatively. The RF, KNN, and SVM models secured 92% recall.
Malicious Network Traffic Detection in Internet of Things …
717
5.7 F1-Score The F1-score is calculated by using precision and recall. Table 4 illustrates the F1score values comparatively. The RF, KNN, and SVM models are secured 91%, 92%, and 91% F1-scores.
6 Conclusion Various network vulnerabilities exist on the Internet. The IoT-23 dataset is filtered and classified, depending on the data requirements of the machine learning algorithm. The features of the IoT-23 dataset are selected using the RF algorithm. After preprocessing and feature selection, it is trained and evaluated in multiple ML techniques such as RF, KNN, and SVM models. The accuracy, precision, F1-score, and recall are used to evaluate the outcomes. The time it takes to train an algorithm reduces as the number of features decreases, which increases the performance of ML algorithms. The results of the RF, KNN, and SVM algorithms are compared. Both the RF, KNN, and SVM models scored 92.12%, 92%, and 92.10% accuracy, respectively. Finally, the RF algorithm is an effective choice for malicious detection and classification in the IoT-23 dataset. Because this algorithm came out on top in every statistic, making it the better choice.
7 Future Scope In the future, a real-time dataset might be gathered and tested with other machine learning algorithms. Moreover, to acquire a larger perspective of important characteristics and increased malware detection prediction, new feature selection approaches may be developed. Similarly, other network malicious activities can also be examined.
References 1. van Dartel B (2021) Malware detection in IoT devices using machine learning, 2nd Jul 2021 2. Nugroho EP, Djatna T (2020) A review of intrusion detection system in IoT with machine learning approach: current and future research 3. Stoian N-A (2020) Machine learning for anomaly detection in IoT networks: Malware analysis on the IoT-23 data set 4. García-Teodoro P, Díaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. J Comput Secur 28(1–2):18– 28
718
M. R. Bingeri et al.
5. García S, Grill M, Stiborek J, Zunino A (2014) An empirical comparison of botnet detection methods. J Comput Secur 45:100–123 6. Zeidanloo HR, Manaf AA (2009) Botnet command and control mechanisms. In: 2009 international conference on computer and electrical engineering. ICCEE 2009, vol 1, pp 564–568 7. Gu G, Zhang J, Lee W (2008) BotSniffer: detecting Botnet command and control channels in network traffic. In: Proceeding of 15th annual network and distributed system security symposium, vol 53(1), pp 1–13 8. Liu Z, Shaver A, Roy K, Yuan X, Khorsandroo S (2020) Anomaly detection on IoT network intrusion using machine learning 9. Livadas C, Walsh R, Lapsley D, Strayer WT (2006) Using machine learning techniques to identify botnet traffic. In: Proceeding on conference of local computer networks, LCN vol 1, pp 967–974 10. Nagisetty A, Gupta GP (2019) Framework for detection of malicious activities in IoT networks using Keras deep learning library 11. Palla TG, Tayeb S (2021) Intelligent Mirai Malware detection in IoT devices 12. Liu Q, Krishnan S, Neyaz A (2021) IoT network attack detection using supervised machine learning 13. Aljabri M, Aljameel SS, Mohammad RMA, Almotiri SH, Mirza S, Anis FM, Aboulnour M, Alomari DM, Alhamed DH, Altamimi HS (2021) Intelligent techniques for detecting network attacks: review and research directions 14. Austin M (2020) IoT malicious traffic classification using machine learning, 2021. A labeled dataset with malicious and benign IoT network traffic [Online]. Available: https://www. stratosphereips.org/datasets-iot23
Emotion Recognition from Speech Using Convolutional Neural Networks Bayan Mahfood, Ashraf Elnagar, and Firuz Kamalov
Abstract The human voice carries a great deal of useful information. This information can be utilized in various areas such as call centers, security and medicine among many others. This work aims at implementing a speech emotion recognition system that recognizes the speaker’s emotion using a deep learning neural network based on features extracted from audio clips. Different datasets including the RAVDESS, EMO-DB, TESS and an Emirati-based dataset were used to extract features. The features of each dataset were used as the input that would be fed into a convolution deep neural network for emotion classification. Several models were implemented based on extracted features from each dataset. The top three models that produced the best results were reported. Keywords Emotions classification · Speech · MFCC · CNN
1 Introduction Speech emotion recognition (SER) is an essential part of audio classification and Human–Computer Interaction (HCI). It involves recognizing emotions by analyzing speech signal [1]. Understanding what a speaker is feeling during a conversation is of great importance for selecting the appropriate response. SER can have many applications including improving virtual assistants, adaptive teaching according to the student’s emotions, therapy session and many more. The task of accurate SER is considered challenging because emotions are not easy to categorize. Each person B. Mahfood · A. Elnagar (B) Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE e-mail: [email protected] B. Mahfood e-mail: [email protected] F. Kamalov Department of Electrical Engineering, Canadian University Dubai, Dubai, UAE e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_59
719
720
B. Mahfood et al.
perceives emotions differently and even humans can misinterpret them [2]. Additionally, audio data carries a lot of information and various aspects must be considered when developing a SER system. These include aspects such as background noise and quality of audio as well as aspects related to the speaker such as gender and age [1]. Building a SER system consists of mainly four phases: data collection, data preprocessing and feature extraction, speech emotion recognition and evaluation. Data for SER usually consists of short audio clips that are either virtually created, natural or semi-natural, speech samples created by speakers or actors saying the same or different text several times in different emotions [1]. With the growing interest in SER, databases have been created each having different features such as gender, language, accent and duration of the audio clip and the emotion label. Some of the most used open access datasets include Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [3] and Toronto emotional speech set (TESS) [4]. In the second phase, the audio data is converted to a format compatible with the selected machine learning algorithms. Each audio file is converted to an image known as a spectrogram using short-term Fourier transform. This image is a twodimensional visual representation, in which the x-axis indicates time, the y-axis lists frequency, and the color of each point signifies the energy amplitude at a given time [2]. Additional preprocessing techniques such as framing, silence removal and preemphasis can be applied depending on the characteristics of the utilized dataset. The produced spectrograms are then used to obtain meaningful features from the samples. The most used feature in audio classification is Mel-Frequency Cepstral Coefficients (MFCC). It consists of a set of 20 features that represent the vocal tract information of the speaker. Others features include Mel-scaled spectrogram, Chromogram, Spectral contrast features and Tonnetz representations among many other [5]. After the required features have been extracted, they are fed into a machine learning model that will classify each audio track into an emotion class. Over the years, different techniques have been adopted for SER. Initially, traditional machine learning approaches were used, and good results were achieved; however, recently the focus has shifted toward deep learning. This is due to several reasons, namely, their ability to identify complex features from raw data. Following the speech emotion classification phase, the performance of the selected machine learning model can be evaluated though several metrics such as accuracy and loss [6]. To situate this research in the literature, Sect. 2 will have an outline of some previous literature. Section 3 includes a description of the datasets and the extracted features. Next, we illustrate the deep learning model in Sect. 4. Then Sect. 5 presents the experimental results as well as a discussion. At the end, Sect. 6 presents the conclusions of this research.
2 Literature Review SER has been well demonstrated over the years. Studies on SER began in the early 2000s where traditional machine learning techniques including Gaussian Mixture
Emotion Recognition from Speech Using Convolutional …
721
Model (GMM), Hidden Markov Models (HMM) and Support Vector Machine (SVM) were used to classify the emotion from an audio clip [7, 8]. However, these techniques required a lot of preprocessing and detailed feature engineering [1]. Although they produced some good results, there still was a need for more reliable techniques that can perform well in real-time applications. This led to the shift toward deep learning. Deep learning techniques have the ability to learn complex features and structures incrementally without the need of a domain expert [9]. This poses one of the biggest benefits over conventional machine learning techniques. Subsequent studies targeted the use of different deep learning models such as Convolutional Neural Networks (CNNs) [5], Deep Neural Networks (DNNs) [9], Recurrent Neural Network (RNN) [2] and Long Short-Term Memory (LSTM) [10] as well as different ensembles [11–13]. Results showed that these deep learning techniques were able to produce better results in comparison to traditional machine learning methods [1, 14]. Although previous studies have led to a better understanding of the SER problem, there is still room for improvement.
3 Datasets and Feature Extraction The datasets used in this work include Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [3], Toronto Emotional Speech Set (TESS) [4], EMODB [15] and an Emirati dataset [16, 17]. In this section, each dataset will be described, and the set of features selected for feature extraction will be explained.
3.1 Datasets The first dataset used in this work was RAVDESS [3]. This dataset comprises 1440 audio samples recorded by 12 male actors and 12 female actors fluent in the English language. Each audio file is the recording of a sentence spoken in one of eight emotions. The speech emotions classes in this dataset are neutral, happy, calm, sad, angry, fearful, surprised and disgust. The emotions are distributed as follows: 96 audio files with the emotion neutral and 192 audio files for each of the remaining 7 emotions (Table 1). The second dataset used in this work is the open access Berlin Database of Emotional Speech, EMO-DB [15]. This is another widely used emotional speech dataset which consists of 535 audio files recorded by 10 professional speakers, 5 of which are Table 1 EMO-DB emotions distribution Emotion Neutral Happy Sad Count
79
71
62
Angry
Bored
Fear
Disgust
127
81
69
46
722
B. Mahfood et al.
male and the other 5 are female. This dataset has seven emotions classes including neutral, happiness, sadness, anger, boredom, fear/anxiety and disgust. The third dataset employed in this work is the Toronto emotional speech set TESS [4]. In this dataset, audio files are recorded by two actresses fluent in English. Each recording is a sentence targeting a specific word from a set of 200 words portraying the following seven emotions: angry, disgust, pleasant surprise, happy, fear, neutral, sad. This dataset consists of a total of 2800 audio files where all emotion classes equally contain 400 recordings. The final dataset used in this work is an Emirati dataset [16, 17] that consists of 13,392 audio files recorded in Arabic, specifically the Emirati dialect. This dataset consists of 6 emotion classes four of which contain 2232 audio files, while the other two classes contain 2231 and 2230 audio files, respectively.
3.2 Feature Extraction As with all machine learning models feature extraction is one of the main aspects that determines how well a model is trained. Since audio data carries a lot of information, it is extremely important to extract a set of features that can well detect the emotion in a person’s voice. To achieve this, the Librosa Python audio library for music and audio analysis was used to extract the set of features which will be fed into a deep neural network. The features explored in this work include: • • • • • •
Mel-frequency Cepstral Coefficients (MFCCs) MFCC-delta, MFCC-delta2 Mel-scaled spectrogram Chromogram Spectral contrast Tonnetz representation.
The above spectral features are obtained by applying Fourier transform to convert time-based audio signals into frequencies. These features are useful for identifying pitch, notes, rhythm and melody. Two of the widely used features in audio classification are MFCCs and Melscaled spectrogram [2, 18]. They have the ability to precisely represent the shape of the vocal tract. MFCC-delta and MFCC delta-delta coefficients, unlike MFCCs which are measured at a given time, are the approximate first and second derivatives of the signal used to get a better description of the transitions between phonemes [2, 19]. The Chromogram feature on the other hand is useful for analyzing pitch classes and capturing harmonic characteristics [5]. In this work, the Chromogram feature was captured using short-time Fourier transform (STFT). Similarly, Spectral contrast is another feature that is used for reflecting harmonic characteristics by considering the variation in frequency energy [20]. The last feature considered in this work is the Tonnetz representation. It is also used for extracting pitch classes and relations [21].
Emotion Recognition from Speech Using Convolutional …
723
Fig. 1 Audio sample at 44,100 Hz
Capturing various audio features such as pitch, timbre and harmony provide a more detailed depiction of an audio sample. This information can improve the perform the speech emotion recognition models. In this work, several combinations of features were extracted from each employed dataset(Fig. 1).
4 Deep Learning Models A convolutional neural network was used for classifying the emotion of audio files based on a set of selected features. The baseline model in this work consists of one-dimensional convolutional layers, dropout layers to prevent overfitting, batch normalization layers to make the model more stable and activation layers which define how the weighted sum of inputs to a node or layer are transformed into the output of a node or layer. The first layer in this baseline model receives the extracted features ‘N’ as input in the shape N × 1. This layer has 256 filters with kernel size 3 × 3 and stride 1. In the next layer, batch normalization is applied. Then the input is sent to the activation layer where the Rectifier Linear Units (ReLU) activation function is used. The following 1D convolutional layer consisted of 128 filters with the kernel size 5 × 5 and stride 1. Then an activation layer is applied with the same (ReLU) function. Another convolutional layer and activation layer are applied followed by a dropout layer with the rate 0.1. Next, batch normalization is applied followed by a max pooling of size 2 for selecting the most prominent features. Another two convolution and activation layers are then implemented with the latter having a kernel size of 7 × 7. Batch normalization is applied next followed by a dropout layer of size 0.2. Another max pooling layer is added with size 3 followed by a flattening layer. The output of the flattening layer is forwarded to a fully connected dense layer where the number of units are the number of classes in each dataset. Finally, softmax activation is applied. The model made use of the optimizer called RMSprop with the learning rate of 0.00001 decay rate of 1e−6.
724
B. Mahfood et al.
Fig. 2 RAVDESS on feature sets 1
5 Results and Discussion To accommodate the variations in the chosen datasets, three CNN models with slightly varying numbers of layers were designed and tested. The results of training the models on the four previously listed datasets will be discussed in this section (Fig. 2).
5.1 Model 1 (Baseline Model) The first model trained in this work the baseline model mentioned in the previous section. To train this model two feature sets extracted from the RAVDESS dataset were used. The first feature set consisted of MFCCs, Mel-scaled spectrogram, Chromogram, Spectral contrast and Tonnetz representation which formed 193 × 1 arrays
Emotion Recognition from Speech Using Convolutional …
725
as the model input. Whereas the second feature set consisted of Mel-frequency Cepstral Coefficients (MFCCs), MFCC-delta, MFCC-delta2, Mel-scaled spectrogram and Chromogram forming 191 × 1 arrays as the model input. Five folds cross validation was applied randomly splitting the data into five equal groups. The data was split such that 90% of the data was used for training and 10% for testing. Additionally, the emotion classes in both the training and testing sets were equally distributed to prevent any emotion class to be trained more than the rest. After 400 epochs, the first feature set achieved an average accuracy of 68.33% and a maximum accuracy of 72.91%. The produced confusion matrix in Fig. 2 shows that more expressive emotions such as anger, surprise and fear were best classified, whereas softer emotions such as sadness and neutral had more classification errors. The second feature set achieved an average accuracy of 68.19% and a maximum accuracy of 70.14%. Although the average accuracy scores of both feature sets are very close, we can see from the confusion matrix of each that feature set 1 performed better at classifying each emotion (Fig. 3).
Fig. 3 RAVDESS on feature sets 2
726
B. Mahfood et al.
5.2 Model 2 The second model created in this work is a simpler version of the baseline model with one less 1D convolutional layer. It consists of a 1D convolutional layer with a kernel size of 3 × 3 and stride size of 1 followed by batch normalization layer and ReLU activation layer. Then another 1D convolutional layer is applied with kernel size of 5 × 5 and stride size of 1. A dropout layer size 0.1 is then added followed by a max pooling layer of size 2. Two more 1D convolutional layers activation layers are added the latter having a batch normalization layer before the activation layer. A dropout layer size 0.2 is applied followed by a flattening layer. The output of the flattening layer is sent to a fully connected dense layer and softmax activation is applied. To train the third model, the EMO-DB and the Emirati datasets were used. The features extracted from EMO-DB included Mel-frequency Cepstral Coefficients (MFCCs), Mel-scaled spectrogram, Chromogram, Spectral contrast and Tonnetz representation which formed 193 × 1 arrays as the model input. From Emirati dataset, only MFCCs were extracted forming 191 × 1 arrays as the model input (Fig. 4).
Fig. 4 EMO-DB dataset
Emotion Recognition from Speech Using Convolutional …
727
The same approach of fivefold cross validation and random stratified data splitting was applied to both datasets. The models were trained for 200 epochs on each dataset. Both datasets achieved higher accuracy scores than the previous model trained on the RAVDESS dataset. EMO-DB resulted in an average accuracy of 80.74% and a maximum accuracy of 85.19%. From the confusion matrix, we can observe that this dataset performed well at classifying most emotions; however, the testing data was very small and distribution samples in each emotion class was not equal. As shown in the confusion matrix, anger was the most correctly classified emotion but this could be because it had the most samples in comparison with the rest of the classes. The Emirati dataset achieved similar results where it obtained an average accuracy of 81.28% and a maximum accuracy of 83.94%. Although only one feature was considered for the Emirati dataset, it has performed the best out of all the datasets at classifying almost all emotions equally well. This can be due to the large number of samples in the dataset with a total of 13,392 audio samples (Fig. 5).
Fig. 5 Emirati dataset
728
B. Mahfood et al.
Fig. 6 TESS dataset
5.3 Model 3 The third model is the largest model of the three with several additional layers. Its first layer is a 1D convolutional layer with kernel size of 5 and stride size of 1. It is followed by a batch normalization layer, ReLU activation layer then another 1D convolutional layer and activation layer. The first dropout layer with size 0.1 is then applied followed by batch normalization and ReLU activation layers. A max pooling layer size 8 is added. The output of that layer is sent into three 1D convolutional layers then activation layers with a batch normalization layer just before the third activation layer. A dropout layer of size 0.2 is applied followed by another 1D convolutional layer. The output is then flattened and sent to a fully connected dense layer. The last two layers in this model are a batch normalization layer and a softmax activation layer (Fig. 6). The set of five features consisting of MFCCs, Mel-scaled spectrogram, Spectral contrast, Chromogram and Tonnetz were obtained from the TESS dataset and were used to train this model. Using the same cross validation and data splitting approaches, the data was divided into 90% for training and 10% for testing. The model resulted in an average accuracy of 99.75% and a maximum accuracy of 100%.
Emotion Recognition from Speech Using Convolutional … Table 2 Performance analysis Dataset/model Features
RAVDESS/Model 1
RAVDESS/Model 1
EMO-DB/Model 2 TESS/Model 3 Emirati dataset/ Model 2
MFCCs Mel-scaled spectrogram chromogram Spectral contrast Tonnetz representation MFCCs MFCC-delta MFCC-delta2 Mel-scaled spectrogram chromogram Feature set 1 Feature set 1 MFCCs
729
Accuracy Fivefold average accuracy (%)
Maximum accuracy (%)
68.33
72.91
68.19
70.14
80.74 99.75 81.28
85.19 100 83.94
From the confusion matrix, we can see that all classes were well classified. We can see from other literature such as in [22], that this dataset seems to always have better results than other datasets. This could be due to the skill of the actors recording the audio samples as well as the good size of the dataset. However, this dataset might not be the best representation of real life samples as only one word is targeted in the recording instead of a sentence portraying a real situation. As the results imply, using deep neural networks for speech emotion recognition has great potential. However, as with all deep learning applications, the process of fine tuning the hyper-parameters of a model is considered to be a difficult task. The complex nature of neural networks allows for there to be many parts that can be altered and improved such as number of layers, type of layers and parameters of each layer, and this is especially an issue with large datasets as training time becomes very long (Table 2).
6 Conclusion One of the main areas of audio classification is speech emotion recognition (SER). SER can be used in many applications and can improve many services. The ability to understand how a person is feeling allows for producing a more natural response and a better understanding of the thoughts a person is communicating. However, SER is still considered to be a complex task. It involves extracting features form audio that
730
B. Mahfood et al.
can best represent emotion. Then using a classifier to identify the emotion class of an audio track. In this work, different combinations of features were obtained from the RAVDESS, EMO-DB, TESS and an Emirati datasets, respectively. Several models were implemented based on extracted features from each dataset. The top three models which produced the best results were selected. For future work, different combination of features can be used for emotion classification in attempt of improving classification results. Additionally, data augmentation techniques can be utilized to unify data size and to alter data such that it mimics more realistic data. The work can also be extended to employ the state-of-the-art wav2vec2 for SER. Although wav2vec has been developed for Automatic Speech Recognition with some fine tuning, it can also be used for SER.
References 1. Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors (Switz) 21(4):1–27. https://doi.org/ 10.3390/s21041249 2. Akçay MB, O˘guz K (2020) Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76. https://doi.org/10.1016/j.specom.2019.12.001 3. Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5):e0196391 4. Toronto emotional speech set (TESS) | TSpace repository. https://tspace.library.utoronto.ca/ handle/1807/24487 5. Issa D, Fatih Demirci M, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Sig Process Control 59:101894. https://doi.org/10.1016/j.bspc.2020. 101894 6. Qayyum ABA, Arefeen A, Shahnaz C (2019) Convolutional neural network (CNN) based speech-emotion recognition. In: 2019 IEEE international conference on signal processing, information, communication and systems (SPICSCON). IEEE, pp 122–125 7. Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623. https://doi.org/10.1016/S0167-6393(03)00099-2 8. Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings—IEEE international conference on multimedia and expo, vol 1, pp I401–I404. https://doi.org/10.1109/ICME.2003.1220939 9. Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T (2019) Speech emotion recognition using deep learning techniques: a review. IEEE Access 7:117327–117345. https://doi.org/10. 1109/ACCESS.2019.2936124 10. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Sig Process Control 47:312–323. https://doi.org/10.1016/j.bspc.2018.08. 035 11. Mustaqeem, Kwon S (2020) CLSTM: deep feature-based speech emotion recognition using the hierarchical convlstm network. Mathematics 8(12):1–19. https://doi.org/10.3390/ math8122133 12. Lataifeh M, Elnagar A, Shahin I, Nassif AB (2020) Arabic audio clips: identification and discrimination of authentic cantillations from imitations. Neurocomputing 418:162–177
Emotion Recognition from Speech Using Convolutional …
731
13. Pepino L, Riera P, Ferrer L (2021) Emotion recognition from speech using wav2vec 2.0 embeddings. In: Proceedings of the Annual conference of the international speech communication association. In: INTERSPEECH, vol 1, pp 551–555. https://doi.org/10.21437/Interspeech. 2021-703 14. Lataifeh M, Elnagar A (2020) Ar-DAD: arabic diversified audio dataset. Data Brief 33:106503 15. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B et al (2005) A database of German emotional speech. In: Interspeech, vol 5, pp 1517–1520 16. Shahin I, Nassif AB, Elnagar A, Gamal S, Salloum S, Aburayya A (2021) Neurofeedback interventions for speech and language impairment: a systematic review. J Manag Inf Decis Sci 24(1S):1–30 17. Shahin I, Nassif AB, Nemmour N, Elnagar A, Alhudhaif A, Polat K (2021) Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments. Neural Comput Appl 33(23):16033–16055 18. Nassif AB, Alnazzawi N, Shahin I, Salloum SA, Hindawi N, Lataifeh M, Elnagar A (2022) A novel RBFNN-CNN model for speaker identification in stressful talking environments. Appl Sci 12(10):4841 19. Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Exp Syst Appl 116469 20. Jiang DN, Lu L, Zhang HJ, Tao JH, Cai LH (2002) Music type classification by spectral contrast feature. In: Proceedings—2002 IEEE international conference on multimedia and expo. ICME 2002, vol 1, pp 113–116. https://doi.org/10.1109/ICME.2002.1035731 21. Humphrey EJ, Cho T, Bello JP (2012) Learning a robust Tonnetz-space transform for automatic chord recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 453–456 22. Dolka H, Arul Xavier MV, Juliet S (2021) Speech emotion recognition using ANN on MFCC features. In: 2021 3rd international conference on signal processing and communication. ICPSC 2021, pp 431–435. https://doi.org/10.1109/ICSPC51351.2021.9451810
Performance Evaluation of Contextualized Arabic Embeddings: The Arabic Sentiment Analysis Task Fatima Dakalbab and Ashraf Elnagar
Abstract Sentiment analysis is a growing topic of study that straddles several disciplines, consisting of machine learning, natural language processing, and data mining. Its goal is to automate the extraction of conveyed concepts from a text. Much research has been undertaken in this field, particularly on English texts, because of their wide uses, whereas other languages, like Arabic, have gotten far less attention. The objective is to implement a multi-classification sentiment analyzer for Arabic text while using four recently published contextualized embedding models. The proposed work is an intrinsic evaluation metric of the four models. The results of the experiments revealed that MARBERT and ARABERT models are found to be the best.
1 Introduction Sentiment analysis (SA) is an issue in natural language processing (NLP) that is receiving a lot of attention. It is the challenge of detecting and recognizing the sentiment of a written text automatically, generally by classifying a piece of text as a positive, negative, or neutral feeling. People may now communicate their ideas, thoughts, and feelings in public thanks to the rise of social media in recent years. It aids huge corporate data analysts in evaluating public opinion, doing extensive market research, tracking brand and product popularity, and comprehending customer experiences. To categorize feelings inside a sentence or phrase given a weighted score, a SA system for text utilizes both NLP and machine learning approaches. Despite its importance, SA seems to be the subject of several studies concentrating on the English language and other languages. However, few works have been on F. Dakalbab · A. Elnagar (B) Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE e-mail: [email protected] F. Dakalbab e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_60
733
734
F. Dakalbab and A. Elnagar
sentiment analysis in Arabic [1]. Researchers, on the other side, are increasingly interested in studies relating to sentiment analysis in Arabic, as the Arabic language has grown in popularity in the online world over the previous decade. Despite their effectiveness in several NLP tasks, a fundamental constraint of conventional models for word embedding such as Word2Vec is that they create a vector for every word, irrespective of the contextual meaning and homonymy that several terms could include. They merge the contexts of distinct meanings of a word in its embeddings, discarding contextual disparities. Word token vector representation, from the other side, would consider the meanings of the words that could have different meanings according to their context. For instance, the word “saw” could indicate the tool used in cutting wood, or it could be the past tense of the verb see. Therefore, contextualized word embeddings are word embeddings that incorporate context differences into consideration. They express each meaning for each word in an independent vector which preserves its distinct interpretation. One of the most successful methodologies in this field is transfer language learning models. Several contextualized models that are used for various NLP tasks have been shown to be useful. Driven from Google’s BERT architecture, AraBERT [2] and MARBERT [3] are models which are retrained on Arabic phrases and are categorized as contextualized language embeddings. They include 12 attention layers, a maximum sequence length of 512, 12 attention heads, and 768 hidden dimensions. AraBERT is also built on Wikipedia, with a vocabulary of 64K words, 2.5 billion tokens, and 135 million parameters. MARBERT, on the other hand, is based on Arabic tweets and has 2.5 billion words, 15.6 billion tokens, and 163 million parameters. The following is how the remaining paper is structured: Sect. 2 discusses the problem definition. Next, Sect. 3 represents the related work in sentiment analysis for Arabic texts. As for Sect. 4, we present a comparison of the Arabic-based contextualized models. Furthermore, in Sect. 5 we share the experiment in terms of the dataset used, preprocessing, training and implementation, results, and performance analysis. Lastly, we conclude our work and summarize it in Sect. 5.4.
2 Problem Definition SA is a subfield of natural language processing that studies people’s feelings, thoughts, emotions, and behaviors regarding a variety of things, including brands, activities, corporations, themes, events, and concerns. Analyzing hundreds of customer reviews, for instance, might provide important input on your price or value propositions. SA determines if a text has bad, good, or balanced feelings. Machine learning and NLP are used in this type of analytics for textual tasks. SA may be known as “opinion mining” or “emotion artificial intelligence”. Polarity categorization is an important part of sentiment analysis. The overall mood expressed by a certain sentence, statement, or term is referred to as polarity. Such polarity could be valued numerically as a “sentiment score”. This score, for illustration, might be a value
Performance Evaluation of Contextualized Arabic Embeddings …
735
between − 100 and 100, with 0 signifying a neutral attitude. This score might be calculated for the complete text or a single sentence. For a given use case or a specific application, the scoring of the sentiment could be as precise as needed. For instance, scoring could be achieved using specific categories “strongly negative”, “negative”, “neutral”, “positive”, and “strongly positive”, which can be mapped to distinct emotions, such as anger, satisfaction, happiness, and so forth. Sentiment analysis is an effective marketing technique because it allows product organizations to know client sentiments and use them in marketing initiatives. It is important for brand and service recognition, customer retention, satisfaction, sales and marketing effectiveness, and brand engagement. Analyzing customer psychology may assist sales teams and client service managers in making more precise changes to their core product. As a result, it will improve customer service and promote the brand’s products and services.
3 Literature Review The issue of sentiment analysis has been extensively researched. Researchers have used a variety of methodologies to analyze the feelings of Arabic narratives. Chouikhi et al. [4] developed a method based on the BERT model with a batch size of 16/32, epochs of 10/20/50, and eight layers, as well as a softmax activation function. In addition, their strategy included hyperparameter optimization via randomized searching. The findings illustrate how good their technique is relative to classification performance and accuracy when evaluated to other contextualized models across many datasets. A sentiment and sarcasm analysis of Arabic tweets was given in [5]. The authors evaluated seven BERT-based models and examined the issues that led to misclassifications. They assert, for example, that emoji processing has a significant influence on classification because when the authors did not process the emojis, certain tweets were misclassified since some emojis distorted the polarity of emotion and switched their classification. Similarly, on the other hand, [6] tackled the same concept using other contextual embedding models known as AraELECTRA and AraBERT. Their test results were lower compared to the other work where they scored 72 Macro F1 scores on sarcasm detection and 65.31 Macro F1 scores on sentiment analysis. Obied et al. [7] used a capsule neural network, which is a derivative of an artificial neural network, to improve the model’s hierarchical interactions. The authors utilized this capsule on a BERT multi-language model with expectation-maximization, which permitted them to use fewer parameters and save training time by an order of magnitude. However, when compared to other BERT-based models, their model performed the worst. Bashmal and Alzeer [8] investigated a method for detecting sarcasm and emotion in Arabic text. They used two BERT models to extract typical tweet embeddings: first, researchers fine-tuned AraBERT for sarcasm, subsequently sentence-BERT for contrastive learning. The authors were also influenced by how the brain interprets the surface and latent meanings of sarcastic tweets, so they merged
736
F. Dakalbab and A. Elnagar
sentence embedding with the refined AraBERT model to improve the effectiveness of the model even more. The accuracy of their model was 78.30, with a 59.89 F1 score for sarcasm. Al-Twairesh [9] demonstrated the emergence of Arabic contextual embedding models through experiments. The author first looked at the classic method of analyzing feelings, which is based on the term frequency-inverse document frequency, before moving on to a more complex word embedding called word2vec, and ultimately to the most up-to-date BERT models. Yafoz and Mouhoub [10] conducted a sentiment analysis of real estate and automotive appraisals. Deep learning models and BERT models were used in their research. According to the authors’ research, using deep learning models with the BERT model to assess attitudes is more practical than using alternative models for both vehicle and real estate datasets. Habbat et al. [11] similarly present Arabic Twitter sentiment analysis using the AraBERT model. The researchers integrated three datasets and analyzed the AraBERT model against static word embedding approaches, namely AraVec and FastText, as well as other classification models like GRU, LSTM, BiLSTM, and CNN. Their results demonstrate that AraBERT and the hybrid network provide the best accuracy. Several reviews have been published to overview the wide range of techniques applied in this domain for the Arabic language. Elnagar et al. [12–16] provided a systematic review of the sentiments including Arabic dialects. The result of their systematic review shows that SVM and Naïve Bayes are the widest machine learning techniques in classifying Arabic dialect sentiments. The analysis showed that Naïve Bayes and SVM are the most frequent algorithms in sentiment analysis. The authors present the challenges faced in this domain such as the lack of lexicons compared to the English language. Additionally, Arabizi is a trend in social media where users use Latin which maps to Arabic words. Furthermore, Boudad et al. [17] present a review of the literature in Arabic sentiment analysis. The authors declare that opinion holder extraction, spam detection, and aspect-based analysis of sentiments are the least studied domains in SA. On the contrary, alOwisheq et al. [18] conducted a review of the resources available for sentiment analysis; however, recent resources are available that open the door for researchers to update this work. In this work, we provide an overview of the famous Arabic language modes, and then a performance comparison of some of these contextualized models across two Arabic datasets is presented.
4 Contextualized Embedding Models Because of the emergence of contextualized language models (LM), for example, contextualized word and phrase representations, such as ELMO and BERT, many well-known NLP tasks have experienced a major advance [19–22]. Pre-trained language models gained popularity after Jacob Devlin and his team from Google
Performance Evaluation of Contextualized Arabic Embeddings …
737
Table 1 Comparison between Arabic language models, [2] Model
Data source
Tokens numbers
Tokenization
Vocabulary
Parameters
AraBERT
Three resources
“2.5B/2.5B”
“SentencePiece”
“60K/64K”
“135M”
ARBERT
Six resources
“6.2B/6.2B”
“WordPiece”
“100K/100K”
“163M”
mBERT
Wikipedia
“153M/1.5B”
“WordPiece”
“5K/110K”
“110M”
MARBERT
Tweets in Arabic
“15.6B/15.6B”
“WordPiece”
“100K/100K”
“163M”
XLM-RB
CommonCraw
“l2.9B/295B”
“SentencePiece”
“14K/250K”
“270M”
GigaBERT
Wikipedia, Oscar, “10B” Gigaword
“WordPiece”
“50K/21K/26K”
“125M”
Arabic ALBERT
Wikipedia, Oscar “4.4B”
“SentencePiece”
“64K/125K”
–
QARiB
Gigaword, “14B” OPUS, Abulkhair Arabic Corpus
–
“64k”
–
XLM-Roberta
CommonCraw
“2.5TB”
“SentencePiece”
“250k”
“125M”
AraGPT
OSCAR, Wikipedia, Arabic Corpus, OSIAN Assafir news articles
–
–
“64K”
–
unveiled the BERT model in 2018. “Bidirectional Encoder Representations from Transformers”, or BERT, is a fundamental transformer-based machine learning technique for pre-training in natural language processing applications. Several models are established with this model as a main based for it. Several models have been developed using this paradigm as the foundation. Several models of Arabic language contextualized are built using BERT architecture. Table 1 provides an analysis between these models which are shown in the subsequent section.
4.1 Models Based on BERT • AraBERT [2] is the most well-known pre-trained Arabic language models. The AraBERT basic setup is the same as BERT. It is assessed using several tasks, such as question and answer, SA, and named entity identification. Furthermore, there are two variants of the model, with the contemporary version employing the Farasa segmenter. • MARBERT [3]: MARBERT is concerned with both dialectal Arabic and modern standard Arabic. There are several dialects of Arabic. It is based on a massive collection of Arabic tweets. It uses the same network design as BERT, but it does not forecast the following phrase because tweets are relatively short. The training data of it reflects a key component of dialectal Arabic variety. • GigaBERT [23]: GigaBERT is a data transport paradigm that specializes in information retrieval applications. It has been trained on large datasets including
738
F. Dakalbab and A. Elnagar
Wikipedia, Oscar, and Gigaword. It is a multilingual BERT that has been adapted for Arabic and English. • Multilingual BERT (mBERT) [24, 25]: mBERT is a model that supports 104 languages. Furthermore, it is an extension of the BERT model. • Arabic ALBERT [26]: It is a paraphrase of ALBERT. It has been trained using the Arabic category of the unshuffled corpus of OSCAR and Wikipedia for the Arabic language. In terms of the number of characteristics, it comes in three varieties including base, large, and xlarge. • QARiB [27]: A collection of 180 million lines of text and 420 million tweets were used to train the QCRI Arabic and Dialectal BERT (QARiB) model. The data for the tweets was acquired using the Twitter API as well as a language filter. It is trained on a mix of Arabic GigaWord, OPUS corpus, and Abulkhair Arabic for the text data.
4.2 Other Arabic Language Contextualized Models • AraELECTRA [28]: ELECTRA is a technique for learning self-supervised representations of language. It may be used for a small amount of computing power. Comparable to the discriminator in a generative adversarial network, ELECTRA models are trained to identify “genuine” input tokens from “fake” input tokens produced with other neural network. • XLM-Roberta (XLM-R) [29]: RoBERTa has a multilingual variant called XLMRoberta. It has been pre-trained on 2.5 TB of cleaned crawled data with 100 languages. It was pre-trained on raw texts solely, without manual labeling which explains the reason it utilizes a huge number of public data, and it used an automated mechanism to produce inputs and labels out of those phrases. It learns an internal depiction of 100 languages, which can subsequently be used to extract characteristics that are important for later tasks. • AraGPT2 [30]: The AraGPT model is a transformer decoder with modeling goals that was trained with informal language. The model was trained using 77 GB of Arabic phrases, the same data which was used to train AraELECTRA as well as AraBERT. There are four main varieties of it: mega, large, medium, and base.
5 Experimental Results The following experiments have been performed utilizing Google Colab with Tesla T4 GPU and high processing RAM. Moreover, contextualized models were loaded from the hugging face transformers library [31]. In addition, Python was used as
Performance Evaluation of Contextualized Arabic Embeddings …
739
the main programming language, RegEX, NLTK, and Pandas were utilized for data cleaning and processing, and several machine learning libraries and tools were used as well including scikit-learn, PyTorch, TensorFlow, Kears, and Matplot.
5.1 Dataset Collection Our experiments were performed on two different datasets. • Hotel Arabic Review Dataset (HARD) [32]. There are 93,700 Arabic language hotel reviews in this dataset. The source of these reviews is Booking.com website. Both modern standard Arabic and dialectal Arabic are used in the remarks. • ArSarcasm [33] analyzes tweet emotion and sarcasm. It incorporates sarcasm and dialect labels into the previously published Arabic sentiment analysis datasets SemEval 2017 [34].
5.2 Dataset Prepossessing Preprocessing of text is an incredibly crucial phase in the data processing process. The effectiveness of the sentiment analysis model might be improved by minimizing mistakes and cleansing it from noisy data which does not influence the final results. However, some strategies, such as removing stop words, might result in severe data will change the output polarity, resultloss. For example, the phrases ing in inaccurate results. Therefore, we downloaded the stop word list from NLTK and modified it accordingly. Figure 1 shows the preprocessing steps we followed. Our processing workflow is as follows: 1. Removing URLs, mentions, emails, dates, numbers, Latin characters, punctuation, English letters, emojis, and stop words. 2. Removing diacritics. 3. Letter normalization and unifying such as Yaa, Hamza, and ta-marbutah.
5.3 Transformers Implementation The pipeline approach of our experiment is illustrated in Fig. 2. After preprocessing the dataset, at the begging, we experimented with basic machine learning supervised classifiers including linear support vector classifier, multinominal Naïve Bayes, and random forest. Moving on to deep learning-trained language models, we train two
740
F. Dakalbab and A. Elnagar
Fig. 1 Preprocessing pipeline
Fig. 2 Pipeline of sentiment analysis approach preprocessing pipeline
BERT-based models and two other Arabic-based language contextualized models which are MarBERT, AraBERT, XLM-Roberta, and AraELECTRA, respectively. We sliced the dataset to 80% training, 10% validation, as well as 10% testing. We perform multi-classification sentiment analysis in both datasets. For the HARD dataset, we mapped the scores as follows: • • • •
rating 5: Strongly positive (SP) rating 4: Positive (P) Rating 2: Negative (N) Rating 1: Strongly negative (SN).
As for the ArSarcasm dataset, the classes were positive, neutral, and negative. Pretrained contextualized models may be used for a variety of tasks in deep learning; in our example, the task is classification. As a result, the pre-trained language model requires a single vector encoding the whole input sentence. The choice in BERT is that the concealed state of the first token represents the entire phrase. Several tokens must be manually added to the input phrase in order to do this. Each pre-trained model uses a different tokenization method to break down a text into tokens. Various special tokens identify the model’s distinctive meaning such as
Performance Evaluation of Contextualized Arabic Embeddings …
741
• [CLS]: a token placed at the begging of each input text • [SEP]: a token that the model uses to determine the end of an input text • [PAD]: as these models take a fixed length of sentence as input, a padding token is used to indicate the blank tokens within the input text. Moving forward, each token is given a unique ID when these models are trained, and this ID is used to map it to the set of vocabulary corpus on which the model was pre-trained. If a unique ID cannot be found, a special token called [UNK] will be used instead. BERT models employ a WordPiece technique to deal with this issue because it may contain valuable data. Therefore, we trained the models after performing tokens that are based on their tokenization technique and preparing the input format as needed. As for the training timing, MarBERT was the model that required a lot of time compared to the other models used in our experiments. The parameters we used for the models were using 12 epochs, 16 training batch sizes, and an Adam optimizer with a 1e−8 value, with learning rate of 1.78e−05.
5.4 Results After training the models, we used the validation and testing dataset to evaluate our contextualized models’ performance. Table 2 presents the accuracy achieved by the seven models utilized. As shown in the table, MarBERT model achieved the highest accuracy with a value of 0.78 which make sense as the MarBERT model is pre-trained on the Arabic Twitter dataset. AraBERT and AraELECTRA achieved the highest score in the hotel review dataset. Digging deeper into understanding the results, we can see that shallow machine learning classifiers were able to score close to some of the deep learning language models. For instance, the random forest which scored the highest among other shallow machine learning classifiers is close to the results obtained from the XLM-Roberta base model and MarBERT. However,
Table 2 Testing accuracy scores of all models Type Model Shallow machine learning classifiers
Arabic language models BERT-based
HARD
ArSarcasm
MNB
0.68
–
SVC RF XLM-Roberta
0.71 0.72 0.74
– – 0.73
AraELECTRA AraBERT v02 MarBERT
0.75 0.75 0.74
0.74 0.79 0.80
742
F. Dakalbab and A. Elnagar
Fig. 3 Distribution of language contextualized models
Fig. 4 Distribution of machine learning supervised classifiers
overall, the results show that using deep learning language models is more favorable than shallow machine learning for Arabic sentiment. Specifically, the performance of BERT-based models is superior to all other categories. The BERT-based model showed a better performance when trained and tested on ArSarcasm compared to the HARD dataset, even though the HARD dataset included balanced classes compared to ArSarcasm. Figures 3 and 4 present a graphical illustration of the accuracy scores by language contextualized models and supervised classifiers, respectively. As illustrated in Fig. 3, AraBERT achieved the same score on both of the datasets. Furthermore, among the machine learning supervised classifiers random forest scored the highest compared to the others. Figures 6 and 7 present evaluations in terms of precision, recall, f -score, and support according to each dataset.
Performance Evaluation of Contextualized Arabic Embeddings …
743
Fig. 5 Confusion matrix of XLM-Roberta base Arabic language model
Figure 5 demonstrates the confusion matrix of the XLM-Roberta base model. In addition, Table 3 presents a sample of the same model prediction. The sample includes five examples of misclassified predictions and five correctly classified predictions.
6 Future Work and Recommendations In the future, we plan on investigating more contextualized models across several tasks such as word prediction, sarcasm detection, sentiment analysis, and evaluation and comparing which of the contextualized language models perform the best specifically in the Arabic language. Moreover, we plan on utilizing more resources and power in terms of GPU and high processors to enable us to train a large number of models since in our work the main limitation was a lack of resources that slowed down the workflow as each model required a long training time. Another future work is to enhance the model performance by fine-tuning the parameters and experimenting more on the effect of each parameter and which number could result in the highest accuracy. We recommend researchers investigate other NLP tasks such as emoji prediction and test several Arabic language models’ performances on them.
744
F. Dakalbab and A. Elnagar
Table 3 Samples of correctly and mistakenly classified sentiments of the model XLM-RobertaBase
Fig. 6 Detailed performance of Arabic base contextualized models on HARD dataset (the sentiments are strongly negative, negative, positive, and strongly positive)
Performance Evaluation of Contextualized Arabic Embeddings …
745
Fig. 7 Detailed performance of Arabic base contextualized models on ArSarcasm dataset (the sentiments are negative, neutral, and positive)
7 Conclusion Sentiment analysis is a rapidly expanding field of research that encompasses natural language processing, machine learning, and data mining. Its focus is on the automatic extraction of conveyed concepts from a text. Because of its widespread use, much study has been done on this topic, with English texts receiving the most attention, while other languages, such as Arabic, have received less. On a multi-classification sentiment analysis task, this research investigated and assessed newly released Arabic language contextualized models using two separate datasets. The results of the experiments revealed the levels of effectiveness of each contextualized model and which one is the best to utilize. On multi-classification sentiment analysis, MARBERT and ARABERT are found as the top classifiers. When fine-tuning on specific Arabic NLP tasks, all studied models are suitable.
References 1. Ghallab A, Mohsen AM, Ali Y (2020) Arabic sentiment analysis: a systematic literature review. Appl Comput Intell Soft Comput 2020(7403128):1–21 2. Antoun W, Baly F, Hajj H (2021) Arabert: Transformer-based model for Arabic language understanding 3. Abdul-Mageed M, Elmadany AR, Nagoudi EMB (2021) ARBERT & MARBERT: deep bidirectional transformers for Arabic. In: ACL-IJCNLP 2021—59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, proceedings of the conference, pp 7088–7105
746
F. Dakalbab and A. Elnagar
4. Chouikhi H, Chniter H, Jarray F (2021) Arabic sentiment analysis using Bert model, vol 1463. Springer, Cham, pp 621–632 5. Abuzayed A, Al-Khalifa H (2021) Sarcasm and sentiment detection in Arabic tweets using Bert-based models and data augmentation. In: Proceedings of the sixth Arabic natural language processing workshop, pp 312–317 6. Wadhawan A (2021) AraBERT and Farasa segmentation based approach for sarcasm and sentiment detection in Arabic tweets 7. Obied Z, Solyman A, Ullah A, Fathalalim A, Alsayed A (2021) BERT multilingual and capsule network for Arabic sentiment analysis. Institute of Electrical and Electronics Engineers Inc., 2 8. Bashmal L, AlZeer D (2021) ArSarcasm shared task: an ensemble BERT model for sarcasmdetection in Arabic tweets. In: Proceedings of the sixth Arabic natural language processing workshop, pp 323–328 9. Al-Twairesh N (2021) The evolution of language models applied to emotion analysis of Arabic tweets. Information (Switzerland) 12(2):1–15 10. Yafoz A, Mouhoub (2021) Sentiment analysis in Arabic social media using deep learning models, pp 1855–1860 11. Habbat N, Anoun H, Hassouni L (2021) A novel hybrid network for Arabic sentiment analysis using fine-tuned AraBERT model. Int J Electr Eng Inf 13(4):801–812 12. Elnagar A, Einea O, Al-Debsi R (2019) Automatic text tagging of Arabic news articles using ensemble deep learning models. In: Proceedings of the 3rd international conference on natural language and speech processing, pp 59–66 13. Elnagar A, Yagi S, Nassif AB, Shahin I, Salloum SA (2021) Sentiment analysis in dialectal Arabic: a systematic review. In: International conference on advanced machine learning technologies and applications. Springer, Cham, pp 407–417 14. Al Qadi L, El Rifai H, Obaid S, Elnagar A (2019) Arabic text classification of news articles using classical supervised classifiers. In: 2019 2nd International conference on new trends in computing sciences (ICTCS). IEEE, pp 1–6 15. Nassif AB, Elnagar A, Shahin I, Henno S (2021) Deep learning for Arabic subjective sentiment analysis: challenges and research opportunities. Appl Soft Comput 98:106836 16. Elnagar A, Yagi SM, Nassif AB, Shahin I, Salloum SA (2021) Systematic literature review of dialectal Arabic: identification and detection. IEEE Access 9:31010–31042 17. Boudad N, Faizi R, Thami ROH, Chiheb R (2018) Sentiment analysis in Arabic: a review of the literature. Ain Shams Eng J 9(4):2479–2490 18. alOwisheq A, alHumoud S, alTwairesh N, alBuhairi T (2016) Arabic sentiment analysis resources: a survey. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 9742, pp 267–278 19. Salant S, Berant J Contextualized word representations for reading comprehension 20. Yagi SM, Mansour Y, Kamalov F, Elnagar A (2021) Evaluation of Arabic-based contextualized word embedding models. In: 2021 International conference on Asian language processing (IALP). IEEE, pp 200–206 21. Nassif AB, Darya AM, Elnagar A (2021) Empirical evaluation of shallow and deep learning classifiers for Arabic sentiment analysis. Trans Asian Low-Resour Lang Inf Process 21(1):1–25 22. Khalifa Y, Elnagar A (2021) Sentiment analysis of colloquial Arabic tweets with emojis. In: International conference on advanced machine learning technologies and applications. Springer, Cham, pp 418–430 23. Lan W, Chen Y, Xu W, Ritter A (2020) Gigabert: zero-shot transfer learning from English to Arabic, pp 4727–4734 24. Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual Bert? In: ACL 2019— 57th annual meeting of the association for computational linguistics, proceedings of the conference, pp 4996–5001 25. Hugging face (Online) bert-base-multilingual-cased. Available: https://huggingface.co/bertbase-multilingual-cased. Accessed 16 May 2022 26. Safaya A (2020) Arabic-albert. 8
Performance Evaluation of Contextualized Arabic Embeddings …
747
27. Hugging face (Online) qarib/bert-base-qarib. Available: https://huggingface.co/qarib/bertbase-qarib. Accessed 16 May 2022 28. Antoun W, Baly F, Hajj H (2021) AraELECTRA: pre-training text discriminators for Arabic language understanding, pp 191–195 29. XLM-RoBERTa-large hugging face (Online). Available: https://huggingface.co/xlm-robertalarge. Accessed 16 May 2022 30. Antoun W, Baly F, Hajj H (2020) Aragpt2: Pre-trained transformer for Arabic language generation 31. Transformers (Online). Available: https://huggingface.co/docs/transformers/index. Accessed 16 May 2022 32. Elnagar A, Khalifa YS, Einea A (2018) Hotel Arabic-reviews dataset construction for sentiment analysis applications. Stud Comput Intell 740:35–52 33. Farha AI, Magdy W (2020) From Arabic sentiment analysis to sarcasm detection: the arsarcasm dataset, vol 5, pp 32–39. European Language Resource Association, Marseille, France 34. Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 task 4: Sentiment analysis in twitter. vol 8, pp 502–518. Association for Computational Linguistics, Vancouver, Canada
Estimating Human Running Indoor Based on the Speed of Human Detection by Using OpenPose Mohammed Abduljabbar Ali, Abir Jaafar Hussain, and Ahmed T. Sadiq
Abstract The recognition of human activities had received much attention because of its broad range of applications. Academics have recently studied it in pattern recognition, for example, in the intelligent elderly care area, intelligent monitoring systems to detect probable aberrant occurrences, tracking pedestrian mobility. This article proposes a new approach to classifying human activities in a video by capturing skeleton behavior in video or camera. In this respect, three activities are monitored: running, walking, and stopping. This method to recognize the type of walking depends on determining the human speed. The human speed is determined by calculating the distance moved over time based on a mathematical model. It will aid in determining if a human in a video or camera is walking or running. The main idea of estimating speed of human is to determine the distance between the human detected in video or camera, using the skeleton data from the OpenPose technique. Experiments were conducted on different experimental video sequences to prove the efficiency in detection and classification. The results showed walking 97%, running 98%, and stopping 95%. Keywords Skeleton · Tracking · Human active recognition · Running
M. A. Ali (B) · A. T. Sadiq Computer Sciences Department, University of Technology, Baghdad, Iraq e-mail: [email protected] A. T. Sadiq e-mail: [email protected] A. J. Hussain School of Computer Sciences and Mathematics, Liverpool John Moores University, Liverpool, England e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_61
749
750
M. A. Ali et al.
1 Introduction Real-time recognition of human actions, it has piqued the interest of computer vision and supervised classification experts in recent years, because of its wide variety of uses [1, 2]. Two examples are intelligent video surveillance and pedestrian traffic monitoring for detecting potentially abnormal events of interest [3]. In addition, researchers in the area of intelligent elderly care are attempting to analyze the behavior of the elderly using body sensors and surveillance cameras to ensure their safety [4]. Several examples of similar human action recognition and analysis demonstrate that human action recognition has a diverse set of applications and future potential [5]. A human activity detection system’s sensor data depends on the sort of sensor data it uses. Action recognition approaches may be classed as multi-model or uni-modal. Multi-model-based action recognition approaches similarly detect human activities using numerous data sources. Emotional states of mind, behavioral characteristics, and social interactions are sources of these types [6]. Unimodel-based activity recognition algorithms recognize human activities by analyzing the various aspects of raw visual data. Action trajectories, rule-based, statistical models, spatiotemporal representation, and shape-based features are highlighted in these kinds [7]. Extracting valuable features for activity detection is a crucial yet challenging endeavor. The hand-crafted feature for work design and shallower feature learning architectures cannot identify highly discriminative features that can effectively categorize various activities [8]. The key points of the human skeleton may be utilized to characterize the properties of walking and determine and anticipate human activity. So it is because gait differs from the relative location of the human skeleton’s significant points and the joint angle. For example, what would happen if someone unexpectedly ran along a road in everyday life? Whether the video camera is checked to see if people are moving, there is a good chance of an accident [9]. This paper aims to study and analyze a novel method of detecting moving humans. It would analyze human activities based on feature extractors using skeletal techniques learned directly from prototypes without a feature engineering process or vast data. Identifying the human movement in an area is significant because the system is used in complex problems like video human classification and tracking. This paper’s contribution presents a novel method for human activity recognition based on a mathematics model: • Determining the distance between the camera and the human using skeletal data. • Classification of the human body (walking, running, stopping) by measuring the speed of a human.
Estimating Human Running Indoor Based on the Speed of Human …
751
2 Related Work Conventional feature extraction approaches and deep learning methods are the two basic human action recognition systems. • The two categories of traditional human action recognition approaches are feature extraction approaches based on human movement information and feature extraction techniques based on critical spatiotemporal locations. The feature extraction approaches related to human movement information were investigated in [10]. Schuldt et al. [11], three-dimensional space–time key points were created by enlarging temporal features and Harris’ spatial points. Next, the space–time zones of interest were built using local angle point extraction and Gaussian blur in threedimensional space and time. Next, the space–time interest spots’ pixel histogram statistics were calculated. Finally, the feature vector that characterizes activities has been created. Rapantzikos et al. [12], employing discrete wavelet transform in three dimensions, suggested using low-pass and high-pass filtering responses in every dimension to identify the interest of area in time and place. Chaudhry et al. [13], the final motion descriptor was created by normalizing the half-wave rectifier in two directions and producing motion vectors in upper and lower left directions. Weinland et al. [14] used the same concept to identify people, but different traits were retrieved. They depicted the movement of individuals in a sequence of shots using motion energy pictures and motion history images. • The advancement of deep learning has aided substantial progress in object recognition. AlexNet [15], GoogleNet [16],VGGNet [17], ResNet [18], YOLO [19], SSD [20], and Faster R-CNN [21] are only a few deep learning models for object recognition that have been proposed. In addition, numerous researchers have used various deep learning algorithms to recognize human actions. Liu et al. [10] developed an approach dependent on convolutional neural network (CNN) and long short-term memory (LSTM), in which distinct characteristics were extracted first and then fed into seven CNN and three LSTM networks, respectively. The ten networks were then merged using element-by-element multiplication fusion, maximal fusion, and average fusion. The final findings were then given. Szegedy et al. [17] present a new classification network that incorporates 3D CNN and LSTM to reduce the number of network parameters and make the training process easier. Islam et al. [6] present multi-class cooperation to classify activity surveillance and recognition system for improving the accuracy of the activity categorization in films that allow fog or cloud computing-based blockchain infrastructures. Sun et al. [9] predict a person running or walking in a picture or video by tracking the behavior. It employs the transfer learning and Inception V3 NN techniques to recognize the gait. At the same time, the HMDB, a vast human motion database, and UCF sports activities video action data are the paper’s main datasets.
752
M. A. Ali et al.
3 Proposal Methods The core notion is that features taken from the OpenPose for detection skeleton may be used for human actions recognition. In the first step, some common skeleton points are identified to determine the distance between the camera and the human detected using a proposed mathematical model. Next, the detected human, whose distance was determined, is tracked by skeletal tracking technology. Finally, predict the running pose of a person by calculating the distance traveled during a specific time. Figure 1 is an overview of the proposed procedure classification system structure.
3.1 OpenPose Retrieves the Human Body’s Skeleton Information The OpenPose human gesture recognition system was created by Carnegie Mellon University (CMU) and is based upon the Convolutional Architecture for the Fast Feature Embedding (Caffe) [22]. It is based on convolutional neural networks and supervised learning. In 2017, researchers of Carnegie Mellon Univ. released software for the OpenPose human skeletal recognition system, which enables real-time target tracking while viewing a video. Human skeletal information from Common Objects in Context (COCO) may be captured, and joint information can be delivered in color film. Furthermore, multi-person skeletal information may be detected in real time via the OpenPose human critical node identification method. It utilizes the parameter of feature vector affinity to build a hotspot map of the key human nodes after using the top-down attitude estimation method of the human body to locate the actual locations on the human body. In OpenPose, finger movements, facial expressions, human action, and other posture assessments may all be done. A single individual or many people may use it. OpenPose is often used to derive information about human key nodes from a surveillance camera image. The surveillance video is broken up into frames. Every frame shows a human skeleton. The horizontal and vertical coordinate values indicate the position of each joint point as well as its precision [23, 24].
Fig. 1 Diagram of the proposal
Estimating Human Running Indoor Based on the Speed of Human …
753
3.2 Tracking Humans by Pose Estimation The posture tracking approach is based on pose prediction. Each person is linked via the frames to determine their pose for the following stage after a pedestrian’s skeletal model is extracted from each picture. However, a skeleton model is employed in an existing approach for human position tracking from real footage. The pedestrian track may be managed more simply because it walks straight at a moderate pace and has fewer obstructions. The head and neck placements of the skeleton model dictate the tracking mechanism. As a result, most photographs can reliably capture the skeleton model’s head and neck, unlike other body parts. In addition, the skeleton model’s leg motion is far more significant compared to every pedestrian’s neck and head movements, which makes it challenging to deploy as a tracking foundation. Nevertheless, the system sufficiently separates individuals by monitoring their upper bodies and assigning them a unique ID. Since pedestrians walk short distances between the frames, measuring the distance between the neck and head for each frame is essential for tracking. It creates a unique structure for every person’s head and neck and analyzes the head and neck’s translation movements as a group throughout numerous frames. The data format for every individual’s head and neck position is as follows: [ID _ person, (head _ x, head _ y), (neck_ x, neck _y), ID _point]. If x- and y-axes of a person’s head and neck are within a specific range, they will be classified as the same human. After verifying the head and neck coordinates, it stores data for each individual in particular files. Skeletal data is captured on all fourteen points, the ID of person and the frame ID in a single frame. If the pose prediction result gives no data record, it will be recorded as −1, because several walkers are traveling in and out of the video. It should refresh the array if new persons arrive in a video and stop gathering skeletal data when they depart to monitor each person’s posture appropriately. To enhance the matrix, the approach compares the candidate number to the maximal recorded candidate number. If the number of candidates surpassed the maximum number, it enlarged the matrix by adding −1 to previous frames and gathering current frames. If one person disappears from a video, punishment has to be added to guarantee that no data is collected later [24, 25].
3.3 Measurement Distance Between the Camera and Human The distance between the camera and the detected person is estimated through several programming and arithmetic operations. First, the width of the human body is measured between the right and left shoulders key point based on skeletal data extracted from the OpenPose. Then, Eq. (1) shows the measurement of human body width using the right shoulder and left shoulder data using the absolute value to
754
M. A. Ali et al.
Fig. 2 Focal length and real length as per sensor camera
eliminate the negative result. w = |S2 − S5 |
(1)
where W is the human’s width, S 2 represents points left shoulder, and S 5 is right shoulder. A stationary camera is used. Figure 2 shows the distance between the camera and the person being identified. By measuring the human body’s width and determining the camera’s position, it is getting the camera’s focal length by Eq. 2. Focal length =
(width ∗ measured distance) real width
(2)
Focal length = focal distance of a person inside the camera sensor. Width = width of the person in a frame (skeleton shoulder). Measured distance = distance between the person with a camera by the use reference image. Real width = width of the person in real. When getting focal length by Eq. 2, use Eq. 3 to obtained distance human detection in video. The distance is between the person and the camera. D=
(real_width ∗ focal_length) width_in_frmae
(3)
where D is the distance between the camera and the human detected and real_width is approximately the real width of a human.
Estimating Human Running Indoor Based on the Speed of Human …
755
Fig. 3 Distance as humans spent by running
3.4 Human Speed Estimation The speed is estimated by calculating the distance between the camera and the detected person, measured in the previous step. When the camera detects the appearance of a human in the frame at first time, the unique identifier of the human being is stored through the information obtained from the human tracking technology used in this model. The distance determined between the human and camera is stored by identifier, the first distance. When the body moves a certain distance by specifying several frames, the second distance is stored for the same human body. The speed of the human body is estimated by calculating the distance moved by the body for several frames and then divided by the time between them. To calculate the human distance using Euclidean distance, suppose that D1 and D2 are the human’s distance from the camera in frames t1 and t(1 + n). The calculated distance D3 is shown in Eq. (4). D3 = |D1 − D2|
(4)
where D3 is the difference between the first and the second distance. In Fig. 3, the first distance calculated when a human detection between the human and the camera is represented by D1. The second distance is represented by D2, where a person moves through several frames.
3.5 Classification In the classification stage, three human pose activities can be recognized in the proposed model. First, tracking the human body results in recording the movement and changes that occur by recording the value of the distance. Stopping, walking, and
756
M. A. Ali et al.
running are recognized by recording data for the human body detected and tracked. Equation 4 is the difference between the first distance recorded when the human body is detected in the captured video and the second distance it reaches. The second distance is specified in the program code. The time to reach the second distance is recorded during the first and second distances. In Eq. 5, the velocity of the human body is calculated. If the speed has exceeded (13) inches per second or more, the person is running and less than means the state of regular walking. This value has been determined in the proposed system. It is recognized if the distance value has not changed for several frames in stopping. It is expected that the person has stopped moving. speed =
D3 Time
(5)
where speed is the velocity of the human body and time is the time it takes to travel the distance D3.
4 Experiment Result 4.1 Dataset Collection The suggested model was evaluated using data obtained from the interior of a building. A single Panasonic HC-MDH-2 AVCHD camera was used to acquire the RGB dataset, recorded in full high definition (1920 × 1080p). A dataset that reality simulation was produced to identify the problem. The camera was mounted on a tripod in an office building in our test setup. The three sorts of postures are: stopping, running, and walking. Two people of various ages and physical characteristics performed each position. To increase the diversity of the dataset and test the system’s ability to manage orientation and size variations, each posture was captured at three different orientations and distances from the camera. The distance is between 1 and 5 m, with a 0–360° orientation angle [26].
4.2 Result The suggested method was evaluated using datasets captured by a building’s video camera. Using the predicted conditions from the dataset, confusion matrices were constructed to test the performance of the suggested approach. The accuracy of the experimental data produced from the confusion matrix is shown in Fig. 4. This evaluation tool compares reference circumstances to real-life events. The rows of matrices represent actual classes and predicted classes by the columns. As
Estimating Human Running Indoor Based on the Speed of Human …
757
Fig. 4 Accuracy of the experimental results
Table 1 Confusion matrix to recognize three human activities No
Action
Walking
Running
Stopping
Total
1
Walking
2161
29
15
2226
2
Running
47
4583
19
4678
3
Stopping
9
1048
7
1102
demonstrated in Table 1, almost every category receives high marks. For pictures, the walking group’s accuracy had a 97% rate, the running had a 98% rate, and the stopping had a 95% rate. Figure 5 shows the result for many of the frames captured from the video on which the test was conducted. The first image shows the distance between the camera and a person who was detected by the proposed method and storing the information for measuring his speed. The second picture is of the same person who moved a certain distance and measured the distance. The distance is highlighted in the upper left corner of the frame.
4.3 Comparison Two comparison and test videos were used for the same person. In the first video, one person usually walks, and the other runs. The distance tested in the video is 120.29 inches, representing the difference between the first and second distances. The speed of the walk naturally in the video ranged between (8.295–8.592) inches per second by calculating the distance over time, which is moved between 14 and 14.5 s.
758
M. A. Ali et al.
Fig. 5 Measurement of the distance of a person moving in two different dimensions from the camera
Fig. 6 Chart of speed
The running speed (16.942–17.953) inches per second covered the exact distance of the test, and the period was between 6.7 and 7.1 s. The result of the speed test is shown through the graph in Fig. 6. It is easier and more accurate to collect the skeleton information of the human body with OpenPose than with other approaches. To some extent, our method is both accurate and easy to implement. The results are provided in Table 2, which compares our approach to running human action classification technologies by listing their algorithms, classifications, characteristics, and accuracy.
5 Conclusions and Future Work Unlike sensor-based systems, vision-based approaches are less engaged with the person’s politics of daily actions like walking, running, halting, etc. As a result, the vision-based human movement recognition approach is used in this study to recognize three categories of human activities: walking, running, and stopping. In
Estimating Human Running Indoor Based on the Speed of Human …
759
Table 2 Comparing our approach to others No
Algorithm
Classification
Features
Accuracy (%)
1
Deep belief networks (DBNs) [27]
Vision-based
Binary vector
96.44
2
Convolutional neural network (CNN) [28]
Smartphone
Local feature extraction
88.89
3
3D convolutional neural networks (3D-CNN) [29]
Vision-based
3D motion cuboid
95.6
4
Photonic reservoir computer (Photonic RC) [30]
Vision-based
Histograms of oriented gradients (HOG features)
80
5
Our proposal
Vision-based
Coordinate, speed
98
this paper, deep learning technology is used to detect the human skeleton, and the human being is tracked through a technique used to track the human skeleton. Finally, a mathematical model based on the coordinates of the human skeleton is applied to find out the distance between the human and the camera. Then, they classify the three procedures depending on the speed of a human. Finally, that speed is calculated by the proposed mathematical system. The experimental findings suggest the model proposed in this study got an accuracy of recognition 98%. It is more efficient and economical and distinguishes the proposed system is the ease of calculation and recording of the speed of the human body in the front cameras, not the side cameras. For this reason, this paper can be a reference for further research on estimating the speed of a moving person using image processing and Euclidean distance. It is possible to benefit from the proposal in many applications, including determining athletes’ speed, monitoring cameras in an abnormal state of danger when detecting a run person of more than one inside the building, and monitoring the speed of people suffering from certain diseases.
References 1. Jiang MYC, Jong MSY, Lau WWF, Chai CS, Wu N (2021) Using automatic speech recognition technology to enhance EFL learners’ oral language complexity in a flipped classroom. Australas J Educ Technol 37(2):110–131. https://doi.org/10.14742/AJET.6798 2. Salih Abedi WM, Nadher I, Sadiq AT (2020) Modification of deep learning technique for face expressions and body postures recognitions. Int J Adv Sci Technol 29(3):313–320 3. Sakulchit T, Kuzeljevic B, Goldman RD (2019) Evaluation of digital face recognition technology for pain assessment in young children. Clin J Pain 35(1):18–22. https://doi.org/10.1097/ AJP.0000000000000659 4. Li M, Yu X, Ryu KH, Lee S, Theera-Umpon N (2018) Face recognition technology development with Gabor, PCA and SVM methodology under illumination normalization condition. Cluster Comput 21(1):1117–1126. https://doi.org/10.1007/s10586-017-0806-7
760
M. A. Ali et al.
5. Tan F, Xie X (2021) Recognition technology of athlete’s limb movement combined based on the integrated learning algorithm. J Sens 2021. https://doi.org/10.1155/2021/3057557 6. Islam N, Faheem Y, Din IU, Talha M, Guizani M, Khalil M (2019) A blockchain-based fog computing framework for activity recognition as an application to e-healthcare services. Futur Gener Comput Syst 100:569–578. https://doi.org/10.1016/j.future.2019.05.059 7. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 2017. https://doi.org/10.1155/2017/3090343 8. Mahmoud MM, Nasser AR (2021) Dual architecture deep learning based object detection system for autonomous driving. Iraqi J Comput Commun Control Syst Eng 21(2):36–43 9. Sun C, Wang C, Lai W (2019) Gait analysis and recognition prediction of the human skeleton based on migration learning. Phys A Stat Mech Appl 532:121812. https://doi.org/10.1016/j. physa.2019.121812 10. Liu C, Ying J, Yang H, Hu X, Liu J (2021) Improved human action recognition approach based on two-stream convolutional neural network model. Vis Comput 37(6):1327–1341. https://doi. org/10.1007/s00371-020-01868-8 11. Schuldt C, Barbara L, Stockholm S (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, ICPR, vol 3, pp 32–36 12. Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency-based spatiotemporal feature points for action recognition. In: 2009 Conference on computer vision and pattern recognition, CVPR, pp 1454–1461. https://doi.org/10.1109/CVPRW.2009.5206525 13. Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE conference on computer vision and pattern recognition, CVPR, pp 1932–1939. https://doi.org/10.1109/CVPRW.2009.5206821 14. Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2–3):249–257. https://doi.org/10.1016/j.cviu.2006. 07.013 15. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386 16. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations ICLR 2015, pp 1–14 17. Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594 18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol 2016 Dec, pp 770–778. https://doi.org/10.1109/CVPR.2016.90 19. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91 20. Liu W et al (2016) SSD: single shot multibox detector. In: ECCV, vol 1, pp 21–37 21. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi. org/10.1109/TPAMI.2016.2577031 22. Tuyet V, Hien N, Quoc P, Son N, Binh N (2019) Adaptive content-based medical image retrieval based on local features extraction in shearlet domain. EAI Endorsed Trans Context Syst Appl 6(17):159351. https://doi.org/10.4108/eai.18-3-2019.159351 23. Melin P, Miramontes I, Prado-Arechiga G (2018) A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis. Expert Syst Appl 107:146–164. https://doi.org/10.1016/j.eswa.2018.04.023 24. Abduljabbar Ali M, Jaafar Hussain A, Sadiq AT (2022) Human fall down recognition using coordinates key points skeleton. Int J Online Biomed Eng 18(02):88–104. https://doi.org/10. 3991/ijoe.v18i02.28017
Estimating Human Running Indoor Based on the Speed of Human …
761
25. Lee JH et al (2020) Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network. Eur Radiol 30(2):1264–1273. https://doi. org/10.1007/s00330-019-06407-1 26. Abduljabbar Ali M, Jaafar Hussain A, Sadiq AT (2022) Deep learning algorithms for human fighting action recognition. Int J Online Biomed Eng 18(02):71–87. https://doi.org/10.3991/ ijoe.v18i02.28019 27. Abdellaoui M, Douik A (2020) Human action recognition in video sequences using deep belief networks. Trait du Sig 37(1):37–44. https://doi.org/10.18280/ts.370105 28. Wan S, Qi L, Xu X, Tong C, Gu Z (2020) Deep learning models for real-time human activity recognition with smartphones. Mob Netw Appl 25(2):743–755. https://doi.org/10.1007/s11 036-019-01445-x 29. Arunnehru J, Chamundeeswari G, Bharathi SP (2018) Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci 133:471–477. https://doi.org/10.1016/j.procs.2018.07.059 30. Antonik P, Marsal N, Brunner D, Rontani D (2019) Human action recognition with a large-scale brain-inspired photonic computer. Nat Mach Intell 1(11):530–537. https://doi.org/10.1038/s42 256-019-0110-8
Enhanced Multi-label Classification Model for Bully Text Using Supervised Learning Techniques V. Indumathi and S. Santhana Megala
Abstract With the hasty growth of social media users, cyberbullying has evolved as a form of bullying that involves sending electronic messages. Cyberbullying is defined as the use of technological breakthroughs to bully another person. Bullies might utilize social media to abuse victims because it gives an upscale setting in which to do so. Cyberbullying detection is extremely vital; as a result, the net info is just too massive; thus, it is impractical to be half-track by humans. Machine learning will be useful to find language patterns of the bullies and thence will generate a model to mechanically find cyberbullying actions. This analysis aims to construct a classification model with optimum accuracy in distinguishing cyberbully speech mistreatment naïve Bayes, linear SVC, and logistic regression of machine learning. Linear SVC outperforms well on the ‘insult’ label out of the different five labels. It will determine the bully text and label it into the six label classes. Keywords Cyberbully · Naïve Bayes · Linear SVC · Logistic regression · Supervised learning techniques
1 Introduction In many ways, cyberbullying is similar to conventional bullying, but there are some significant differences as well. Cyberbullying victims may not comprehend who their aggressor is or why they are being targeted because of the contact’s online nature (Fig. 1). Because the content intended to harass the victim is readily disseminated around many of us and often remains accessible long after the first incident, the harassment will have a significant impact on the victim. “Aggressive, purposeful conduct or behavior that is dispersed by a gaggle or a private, via electronic sorts of touch, frequently and overtime against a victim World Health Organization cannot V. Indumathi (B) · S. S. Megala School of Computer Studies, RVS College of Arts and Science, Coimbatore, India e-mail: [email protected] S. S. Megala e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_62
763
764
V. Indumathi and S. S. Megala
Fig. 1 Cyberbullying
simply defend him or her,” according to a commonly used definition of cyberbullying. There are many definitions, including the following one from the National Crime Interference Council: “the method of victimizing the internet, cell phones, or other devices to send or post text or photos meant to harm or defame another person.” It will be accessible through social networking sites, mobile devices, electronic message services, and entertainment platforms. Its persistent actions are meant to intimidate, enrage, or degrade those who are the targets. Examples entail: • Disseminating untrue information about someone or posting embarrassing pictures of them online. • Using electronic messaging services to send cruel remarks or threats. • Causation and impersonating someone involve sending messages on their behalf to others. Bullying that takes place in person and bullying that occurs online frequently occur side by side. On the other hand, cyberbullying leaves a digital footprint that might be useful and serve as proof to put an end to the abuse. When you are bullied online, you will feel as though you are being attacked everywhere, including your own home. There will appear to be no way out. The effects will last an extended time and affect an individual in several means: • Mentally—upset, humiliated, dumb, and even enraged. • Emotionally—losing interest in the things you enjoy or feeling ashamed. • Physically—exhausted from lack of sleep or signs like headaches and stomach problems. Individuals will refrain from speaking up or attempting to influence the situation if they fear being laughed at or disturbed by others. Cyberbullying can lead to people taking their own lives in extreme circumstances. Cyberbullying will have a wide range of consequences for the United States. However, these will be overcome, and other people will regain their confidence and health. Nationwide, an exploration is conducted by Symantec. From that, a stunning note is that almost eight out of ten people area unit subject to the various forms of cyberbullying in the Asian country.
Enhanced Multi-label Classification Model for Bully Text Using …
765
Fig. 2 Common type of cyberbullying
Out of those, around 63% faced online abuses and insults. Constant study ranks Asian country because the country facing the best cyberbullying within the Asia-Pacific region, quite Australia and Japan. The most common specific forms of cyberbullying teen’s expertise are given in Fig. 2. The major sort practiced by teenaged is offensive name-calling. Unquestionably, there are many positive effects of being the victim of cyberbullying, according to research. Victims may exhibit low self-esteem, heightened risky thinking, and a variety of emotional states, including fear, frustration, rage, and depression. Because there is no way to avoid it, cyberbullying is also much more destructive than traditional bullying. As it has to be taken care of more to prevent the teens and children from wrong decisions. From this ground, prepare the model which helps to identify the comments, posts on social media whether it is bullying or not. Given further focus to categorize the comments severity level. The objective of this study is to identify the cyberbully comments/messages and categories their severity using classification algorithms such as navies Bayes, linear SVC, and logistic regression. The bully dataset is analyzed using the above classification algorithms and evaluate their performance concerning performance metrics (accuracy, recall, and F1-score). The following is how the paper is structured: The related works are presented in Sect. 2. In Sect. 3, the methodology used in this paper
766
V. Indumathi and S. S. Megala
is given. In Sect. 4, the result and analysis are presented. In Sect. 5, the paper is concluded.
2 Related Works In [1], they used Twitter to construct a supervised machine learning (ML) method for detecting cyberbullying and categorizing its severity into multi-class categories. Along with PMI-semantic orientation, I used embedding, lexicon choices, and sentiment. The algorithms naive Bayes, KNN, decision tree (DT), random forest (RF), and support vector machine were used to apply the extracted options. Experiments using our intended framework in a very multi-class situation, as well as in a binary setting, show promise in terms of relevance letter, classifier accuracy, and F-measure metrics. These findings suggest that the proposed methodology is used to detect cyberbullying behavior and its intensity in online social networks. Finally, the outcomes of the desired and baseline options were compared using a variety of machine learning techniques. The results of the comparison show how important planned options are in detecting cyberbullying. In [2], they reviewed cyberbullying prediction models and determine the most problems associated with the development of cyberbullying prediction models. The Paper additionally provides insights on the general method for cyberbullying detection and most significantly overviews the methodology. Although information assortment and have engineering method has been careful, nonetheless, most of the stress is on feature choice algorithms, then victimization varied machine learning algorithms for prediction of cyberbullying behaviors. Supervised learning algorithmic program, which supplies the highest accuracy, was used for the detection of cyberbullying activity over the web. In [3], they suggested an algorithmic rule for detecting cyberbullying victimization. Support vector machines and convolutional neural networks are two types of neural networks (CNN). CNN produces more exact results than SVM. The Keras library was used to implement the CNN. About 70% of the data was used for training, and 30% was used for testing. The glove embedding technique is used to transform the text into numbers, with each word represented by a number array. CNN improves efficiency on complicated data content by preserving word semantics and reducing the burden of specifically selecting features. In [4], they find bullies using well-informed surveys based on information from college and high school students and present the results to the relevant authority or guardian along with a list of potential solutions. The survey findings are converted into data for the study using data processing techniques. The next five steps are data analysis, preprocessing/cleaning, transformation, data mining, and interpretation/evaluation. In this study, internal labeling, synthetic labeling, and data programming were employed. These information patterns are successfully recognized when appropriate machine learning techniques are used for information validation and categorization.
Enhanced Multi-label Classification Model for Bully Text Using …
767
In [5], a methodology for detecting cyberbullying content in a rare or somewhat rare Bengali regional language of India was planned. Although the linguistic distinctions between English and non-English content may result in variations in execution and performance, this model suggests the use of machine learning algorithms and, as a result, the concept of user information for police work digital harassing on Bengali text to address such issues. From preprocessing through classification, the model contains several steps. They manually labeled the data as bully or not bully. Preprocessed from data cleaning to tokenization. Then processed by TF-IDF to extract the features. At last, passed to classification model and compared the accuracy level of algorithm such as logistic regression (LR), random forest, support vector machine classifier.
3 Research Methodology The proposed model comprises various phases. Data preprocessing, feature extraction, baseline classification (Fig. 3) were done and then evaluated different models using evaluation metrics. The research aims to build a multi-headed model that is capable of police work different kinds of toxicity like threats, insults, obscenity, and identity-based hate; this paper used a Talk page edits of Wikipedia comments as the dataset. Enhancement to the present model will optimistically help online forums become more productive and respectful.
3.1 Data Analysis Dataset of Talk page edits of Wikipedia comments used for this model. In the dataset, 20% of the data streamed for testing, and 80% for training was utilized. Train dataset (Fig. 4) shows manually labeled with default 6 labels. Meanwhile test dataset (Fig. 5) shows with unclassified data. Notice that the training data contains 159,571 observations with 8 columns and the test data contains 153,164 observations with 2 columns. Data
Dataset
Data Preprocessing
Feature Extraction Classification
Identity_Hate
Insult
Fig. 3 Proposed systems
Toxic
Severe-Toxic
Threat
Test Data
Obscene
768
V. Indumathi and S. S. Megala
Fig. 4 Sample training data
Fig. 5 Sample testing data
Dataset also gives a few more detail that the majority of comments are brief. Almost none of the comments exceed a thousand words (as in Fig. 6). Furthermore, the label “toxic” has the most observations, and the label “threat” shown least (Fig. 7 shows). To get a better consideration of what the comments look like, Fig. 8 is examples of one clean (non-toxic) comment and one toxic (specifically, with the label “toxic”) comment.
Fig. 6 Comment frequencies
Enhanced Multi-label Classification Model for Bully Text Using …
769
Fig. 7 Label frequency
Fig. 8 Sample toxic and non-toxic comments
3.2 Data Preprocessing Before processing data into the model, shorten the long length texts into shorter form to identify uniquely for better classification. Here followed a set of processes such as tokenization, normalization, lemmatize, and lastly shorten the word length less than 5. For tokenization, Tokenize() function is used which will remove the punctuations and special characters [6]. Next, normalize to lowercase, remove punctuations and stop words, and filter non-ASCII characters. Finally, lemmatize the comments and filter them with a length of 5.
3.3 Feature Extraction The TF-IDF method employed in this study extends the influence of tokens that appear frequently from the accessible body and are hence empirically less informative than training feature characteristics that occur over a short time. TF-IDF performs well compared to another vectorizer. TF-IDF works both CountVectorizer
770
V. Indumathi and S. S. Megala
Fig. 9 Sample features
followed by TF-IDF transformer. A count matrix is transformed into a normalized term-frequency representation using transformer [7]. Here, the aim of using TF-IDF was realized rather than the raw frequencies of the preponderance of a token in an extremely supplied material. Since the dataset used here is taken from the Wikipedia Talk page, so there may be a chance of using words such as wiki, Wikipedia are very common. Anyways these will not provide any constructive information for our model, and thus, the reason prefers to go for TF-IDF [8] as well. Features are transformed from the training data which is in Fig. 9.
3.4 Classification The chosen base model for the system is multinomial naive Bayes and also compared the model with another two text classification models which also performs well. So compare multinomial naive Bayes, linear SVC, and logistic regression to determine the better performing classification model.
3.4.1
Multinomial Naive Bayes
It is a learning algorithm that is frequently used in text classification issues since it is computationally very competent and simple to instrument. Bayes theorem serves as its foundation, which states that naive options in a dataset are independent of one another. The likelihood of the occurrence of the opposite trait is unaffected by the incidence of one feature. Naive Bayes [9] will defeat the most potent alternatives for small sample sizes. It is utilized in many different fields because it is fairly durable,
Enhanced Multi-label Classification Model for Bully Text Using …
771
simple to use, quick, and accurate. Multinomial naive Bayes take into account a feature vector in which a specific term represents the quantity of times it occurs or is very common, or frequency. p A B ∗ p(B) P B A = p(A)
(1)
In Eq. (1) where P(A): occurrence of A, P(B|A): the condition occurrence of B given that A occurs, P(A|B): the condition occurrence of A given that B occurs, P(B): occurrence of B.
3.4.2
Linear SVC
Support vector machines area unit powerful however versatile supervised machine learning was used for classification, regression, and outliers detection. SVMs area unit economical in high dimensional areas and customarily area unit utilized in classification issues. SVMs area unit is common and memory economical as an effect of they use a set of coaching points within the call perform. Scikit-learn offers three categories particularly SVC, NuSVC, and linear SVC which may perform multi-class classification. It is a support vector classification whose implementation is predicated on lib SVM. The module utilized by Sci-kit-learn is sklearn, svm.SVC. This category handles the multi-class support in line with the one-vs-one (Fig. 10) theme. A linear support vector classifier (SVC) goal is to categorize or split the data according to the information you provide by returning the “best-fit” hyperplane.
3.4.3
Logistic Regression
It is specifically the binary classification predictive analysis algorithm used for sorting construction on prospects. One of the most popular machine learning algorithms that falls under the supervised learning approach is logistic regression. The categorical dependent variable is predicted using a collection of autonomous variables. Using logistic regression, the outcome of a known dependent variable is predicted. h(xi ) = β0 + β1 xi1 + β2 xi2 + · · · + βp xip
(2)
In Eq. (2) where β0 , β1 , . . . βp are the regression coefficients. Let regression coefficient matrix/vector in Eq. (3) β be:
772
V. Indumathi and S. S. Megala
Fig. 10 Multi-Class Linear SVM
⎤ β0 ⎢ β1 ⎥ ⎢ ⎥ β=⎢ . ⎥ ⎣ .. ⎦ ⎡
(3)
βp
4 Result and Analysis F1-score is a model performance evaluation statistic; given that the dataset comprises 6 labels, the F1-score would be the average of those 6 labels. Also considered other metrics while evaluating models’ accuracy and recall. Cross-validation was used to compare between baseline model and the other two classifiers.
4.1 Confusion Matrix The confusion matrix [10] is used to assess the effectiveness of classification methods. It can be used for multinomial and binary classification. Values of the target variable that are true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are represented in the matrix as true positive (TP), true negative (TN), false positive (FP), and false negative, respectively (FN). TP stands for the model’s positive prediction and the actual positive outcome, TN for the model’s projected negative result and the actual negative result, and FP for the model’s positive prediction but
Enhanced Multi-label Classification Model for Bully Text Using …
773
the actual negative result. Type 1 mistakes are also known as FP mistakes. In FN, the model predicted a bad result, while the actual result was good. Type 2 mistake is also referred to as FN. Confusion matrix used on label “insult.” From the output of the confusion matrix, it can be understood that linear SVC performs well compared to other classifiers (Figs. 11, 12 and 13). Multinomial navies Bayes was evaluated on all the labels, and its evaluation is presented in Fig. 14. Based on evaluation metrics, Multinomial Navies Bayes performs well on almost all the labels. Linear SVC is evaluated on all the labels, and its evaluation is presented in Fig. 15. Based on evaluation metrics, linear SVC performs well on the severely toxic and identity_hate labels.
Fig. 11 Multinomial Naive Bayes
Fig. 12 Logistic Regression
774
V. Indumathi and S. S. Megala
Fig. 13 Linear SVC
Fig. 14 Evaluation of Multinomial Navies Bayes based on performance metrics
Logistic regression is evaluated on all the labels, and its evaluation is presented in Fig. 16. Based on evaluation metrics, logistic regression performs well on “severe_toxic” and “threats” labels. The three models’ performance on the test data after training is compared and plotted with a box plot. Observe that Multinomial Naive Bayes performs worse than the two models; however, linear SVC surpasses the others on average based on the F1-score (Fig. 17). All three models perform the same based on average accuracy (Fig. 18) and average recall (Fig. 19).
Enhanced Multi-label Classification Model for Bully Text Using …
Fig. 15 Evaluation of linear SVC based on performance metrics
Fig. 16 Evaluation of logistic regression based on performance metrics
775
776
V. Indumathi and S. S. Megala
Fig. 17 Box plot for F1-score
Fig. 18 Box plot for accuracy
5 Conclusions Based on the aforementioned comparison, this study demonstrates that linear SVC outperforms everyone for the “insult” label for these three models with default settings. Logistic regression is best at predicting because it has a low percentage of incorrect labels. Each model performs well in one or two labels, but multinomial naive Bayes performs well on almost all the labels. Each model performs well at their
Enhanced Multi-label Classification Model for Bully Text Using …
777
Fig. 19 Box plot for recall
own strength; there needs further analysis to make our understanding in better way. So far only comparing of models is performed in this paper, as it is available. In the future, we can do tuning such as hyperparameter tuning with Grid Search and can identify the performance as well. Even go for including training time as one another metric to evaluate the model.
References 1. Talpur BA, O’Sullivan D (2021) Cyberbullying severity detection: a machine learning approach. PLOS ONE 15(10):1–19. Khokale SR, Gujrathi V, Thakur R, Mhalas A, Kushwaha S (2021) Review on detection of cyberbullying using machine learning. J Emerg Technol Innovative Res (JETIR) 8(4):61–65 2. Ingle P, Joshi R, Kaulgud N, Suryawanshi A, Lokhande M (2020) Detecting cyberbullying on twitter using machine learning techniques. 7(19):1090–1094 3. Patel M, Sharma P, Cherian AK (2020) Bully identification with machine learning algorithms. 7(6):417–425 4. Ghosh R, Nowal S, Manju G (2021) Social media cyberbullying detection using machine learning in Bengali language. Int J Eng Res Technol (IJERT) 10(5):190–193 5. Indumathi V, SanthanaMegala S, Padmapriya R, Suganya M, Jayanthi B (2021) Prediction and analysis of plant growth promoting bacteria using machine learning for millet crops. Ann R.S.C.B. 25(6):1826–1833. ISSN: 1583-6258 6. Godara S, Kumar S (2018) Prediction of thyroid disease using machine learning techniques. Int J Electron Eng 10(2):787–793. ISSN: 0973-7383 7. Wadian TW, Jones TL, Sonnentag TL, Barnett MA (2016) Cyberbullying: adolescents’ experiences, responses, and their belief about their parents’ recommended responses. J Educ Dev Psychol 6(2):47–55 8. Indumathi V, Vijayakumar P (2018) An efficient dimension reduction in text categorization using clustering technique. 4(6)
778
V. Indumathi and S. S. Megala
9. Rajesh P, Suriakala M (2016) An analytical study on cyber stalking awareness among women using data mining techniques. J Res Comput Sci Eng Technol (JRCSET) 2(3):145–157 10. Pratheebhda T, Indhumathi V, SanthanaMegala S (2021) An empirical study on data mining techniques and its applications. Int J Softw Hardware Res Eng 9(4):23–31
Prediction of Donor–Recipient Matching in Liver Transplantation Using Correlation Method M. Usha Devi, A. Marimuthu, and S. Santhana Megala
Abstract For matching donor to recipient attributes or parameters is considered as an important for Liver transplantation and donor–recipient matching characteristics are also specified as important. The recipient demographic characteristics are age at scan, sex, age at transplant, height at transplant, weight at transplant and BMI at transplant which is considered as basic characteristics of donor to recipient matching. Liver transplantation is an important for the survival of the patient. In order to support the medical system or to help for the chronic failure, the characteristics of both donor to recipient features or parameters plays an important role. The final stage of the liver patient is chronic failure, then necessity of the liver transplantation. This paper is the prediction of donor–recipient matching and the method is correlation to identify the exact donor–recipient matching. The correlation method shows the complete demonstration of matching between the donor–recipient for liver transplantation. Keywords Liver transplantation · Dataset · Data preprocessing · Dimensionality reduction · Feature extraction · Feature analysis · Feature selection · Donor–recipient matching (D–R) · Correlation method
1 Introduction The donor–recipient matching is considered as important in the liver transplantation and also less number of donor availability for the liver patient (Recipient). In order to make the matching (Donor–Recipient) in a respective manner, find out the donor M. U. Devi (B) PG and Research Department of Computer Science, Department of Computer Science, Government Arts College, Coimbatore, Tamilnadu, India e-mail: [email protected] A. Marimuthu Department of Computer Science, Government Arts and Science College, Mettupalayam 04, Coimbatore, India S. S. Megala School of Computer Studies, RVS College of Arts and Science, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_63
779
780
M. U. Devi et al.
availability and the chronic liver patient for transplantation. Prediction of the D–R matching which helps the liver patient’s for survival. The numbers of parameters or features are considered as important. The D–R matching which predict the survival and life quality. Both the sides (Donor–Recipient), the number of parameters or features are available and predict the important parameters for the D–R matching. In the case of Live to donor liver transplantation (LDLT) which is also used to donates for the liver patient. The common factors for both D–R matching are considered as follows (Figs. 1 and 2): 1. Blood tests 2. Health and wellness questionnaire 3. Radiologic evaluation of potential donor’s anatomy.
Fig. 1 LDLT donor
Fig. 2 LDLT recipient
Prediction of Donor–Recipient Matching in Liver Transplantation …
781
These factors are important in the D–R matching. The blood types of D–R matching are also specified as important. The blood group O type can donated to any type of blood group. Data visualization for donor–recipients and checking the missing values, extracting the missing values by using correlation method. Comparison of age factor for liver transplantation using gender. For donor–recipient matching parameters, compare the analysis of D–R matching with Total bilirubin and Direct bilirubin, analysis of D–R matching with Aspartate and Alumina Aminotransferase, analysis of D–R matching with Total proteins and Albumin, analysis of D–R matching with Albumin and Globulin Ratio and Total proteins. These are the important features and analyzed with feature extraction and feature selection.
2 Related Works In order to match the D–R matching, the number of parameters or features are required for the liver’s patient (set of features for both sides of D–R matching) [1]. The important parameters are used with MELD score for the liver patient’s survival by using the techniques ANN [2]. For the accurate prediction, Model for End Stage Liver Disease (MELD) which is used for the sickest policy first. Adaptive Resonance Theory and RBF MAP are used in this paper [3]. In paper [4], New Rule Mining Algorithm and Data Structure treat to find the relations in the database. In paper [5], used ANN method and also compared with Cox PH for the survival of the liver‘s patient. In paper [6], machine learning techniques and a multi-objective evolutionary algorithm, Memetic Pareto Evolutionary non-dominated sorting genetic algorithm2 used to solve the transplantation’s problem and to measure the accuracy of the model performance. In this [7], used ten-fold validation and D–R matching pairs with the respective dataset. In paper [8], Artificial Neural Network model with Multilayer Perceptron and also compared with other classification used for the liver’s patient survival. In this, [1] analysis the comparative studies in neural networks using the back propagation learning algorithms for In paper [2], used artificial neural networks in prediction of the patient for survival after liver transplantation and trained with accuracy model. In this, [9] Predicting of patients with liver disease of end-stage using the ANN techniques and recover the manual difficulties.
3 Research Methodology 3.1 Dataset The UCI dataset consist of scientific, educational organization and medical. The dataset accessed from UCI ML Repository. The UCI dataset (liver patient records) consist of multi organ dataset since from 1987. The liver patient records of dataset
782
M. U. Devi et al.
which consists of both male and female. Data gathering about patient records, Information about data fields. Statistical information about numerical columns is also available in the dataset.
3.2 Data Preprocessing Load the Features which is available in the dataset. Read the dataset. Checking Null value or Missing value. Extracting the missing value. Filling the null value with the mean value of that particular feature. Checking whether null value is changed or not (Table 1) (Figs. 3, 4 and 5).
4 Dimensionality Reduction In order to reduce the number of random variables from a set of principal variables. Under this, Extraction and analysis of features are followed with Correlation method.
4.1 Feature Extraction Refer to extracting the process of useful information or features from the predefined data (existing data). The recipient patients with liver disease is specified and the number of donors ready for liver transplantation (Fig. 6).
4.2 Feature Selection Feature analysis which is used to discovering useful information and also used for decision making. Comparing the donor–recipient Match with Total bilirubin and direct bilirubin and Aspartate and Alamine Aminotransferase. Feature selection of Alkaline Phosphatase and Alamine Aminotransferase and also Total proteins and Albumin for D–R match. Donor–recipient Match of Albumin and Albumin and Globulin ratio analysis. Feature analysis of Albumin and Globulin ratio and Total proteins for donor–recipient match. Refer to the process of reducing the inputs for processing and analysis or of finding the most meaningful inputs. In order to find the original data which can be reduce to smaller set for predict the important data and features (Table 2) (Figs. 7, 8, 9, 10, 11 and 12). The dataset indicates that the liver transplant Donor and Recipient. In this dataset 1 for Recipient and dataset 2 for Donor. The input variables or features are all inputs except dataset.
Age
65
62
62
58
72
Record No
0
1
2
3
4
Male
Male
Male
Male
Female
Gender
3.9
1
7.3
11
0.7
Total_bilirubin
Table 1 Sample dataset
2
0.4
4.1
5.5
0.1
Direct_bilirubin
195
182
490
699
187
Alkaline_phosphotase
27
14
60
64
16
Alamine_aminotransferase
59
20
68
100
18
Aspartate_aminotransferase
7.3
6.8
7
7.5
6.8
Total_protiens
2.4
3.4
3.3
3.2
3.3
Albumin
0.4
1
0.89
0.74
0.9
Albumin_and_globulin_ratio
1
1
1
1
1
Dataset
Prediction of Donor–Recipient Matching in Liver Transplantation … 783
784
M. U. Devi et al.
Fig. 3 Data visualization for donor–recipients
Fig. 4 Extracting missing values
4.3 Correlation Method Correlation is a technique with statistical use to show that pairs of variables are correlated or not. A correlation method may be positive, if two variables in the same point or same direction. In the case negative correlation, one variable decreases and
Prediction of Donor–Recipient Matching in Liver Transplantation …
785
Fig. 5 Checking missing values
other variable increases. The correlation may be neutral which means that there is no change or not correlated. In order to calculate the correlation method, the equation is defined as: (a(i) − mod(a))(b(i) − mod(b)) (a(i) − mod(a))2 (b(i) − mod(b))2 which is used to find the relationship between the variables and also point the important variables or parameters for the matching purpose. To determine the donor–recipient match using correlation method and with the help of scatter plots, the features or parameters point out the relevant features for transplantation. Finally, these are the features which can be used to keep the following features alone for prediction which is listed as Total_Protiens, Total_Bilirubin, Albumin, Albumin_and_Globulin_ Ratio and Alamine_Aminotransferase.
786
M. U. Devi et al.
Fig. 6 Comparison of age factor for liver transplantation using gender Table 2 Donor–recipient matching parameters
S. No.
Parameters
1
Age
2
Gender
3
Total_bilirubin
4
Direct_bilirubin
5
Alkaline_phosphotase
6
Aspartate_aminotransferase
7
Total_protiens
8
Albumin
9
Albumin_and_globulin_ratio
10
Alamine_aminotransferase
Prediction of Donor–Recipient Matching in Liver Transplantation …
787
Fig. 7 Analysis of D–R match with total bilirubin and direct bilirubin
5 Experimental Results From the above Joint Plots, Scatter Plots and Correlation Map, we find that there is a direct relationship between the following features because each measure has a perfect linear correlation with itself (Fig. 13). Hence, both the features are similar, it will give similar performance and to keep the above following features alone for further prediction and matching.
6 Conclusion The correlation method used for the relationship between the donor–recipient for matching the parameters or features. The important features or parameters are taken to the part of matching purpose to analysis the relevant or suitable parameters. To find out the most useful features present in the dataset
788
M. U. Devi et al.
Fig. 8 Analysis of D–R match with alkaline phosphate and alamine aminotransferase
for liver transplantation by data preprocessing and Feature extraction, Feature analysis, Feature selection in order to identify the appropriate features for liver transplantation. Compare the following donor–recipient matching parameters and the correlation method which is used to find the donor to recipient matching with the number of reduced parameters. These are the features used for further prediction and matching with Albumin, Albumin_and_Globulin_Ratio and Alamine_Aminotransferase, Total_Protiens, Total_Bilirubin to protect liver patient to extend the period of life by transplantation.
Prediction of Donor–Recipient Matching in Liver Transplantation …
Fig. 9 Analysis of D–R match with aspartate and alamine aminotransferase
Fig. 10 Analysis of D–R match with total_proteins and albumin
789
790
M. U. Devi et al.
Fig. 11 Analysis of D–R match with albumin and globulin ratio and total_proteins
Fig. 12 Analysis of D–R match with albumin and albumin and globulin ratio
Prediction of Donor–Recipient Matching in Liver Transplantation …
791
Fig. 13 Perfect linear correlation
References 1. Chandra SSV, Raji CG (2016) Artificial neural networks in prediction of patient survival after liver transplantation. 7(1):1–7 2. Vivareli M, Pinna AD (2007) Artificial neural network is superior to MELD in predicting mortality of patients with end-stage liver disease. 56(2):253–258 3. Terrault NA, Roberts JP (2011) Gender difference in liver donor quality are predictive of graft loss. 11(2):296–302 4. Vinodchandra SS, Anand HS (2016) Association mining using treap. https://doi.org/10.1007/ s13042-016-05467 5. Pourahmad S, Nikeghbalian S (2015) Five years survival of patients after liver transplantation and its effective factors by neural network and cox poroportional hazard regression models. 15(9) 6. Chandra SSV, Raji CG (2016) Predicting the survival of graft following liver transplantation using a nonlinear model. 24(5):443–452 7. Chandra SSV, Raji CG (2016) Graft survival prediction in liver transplantation using artificial neural network models. 16:72–78 8. Saduf MAW (2013) Comparative study of back propagation learning algorithms for neural networks. 3(12):1151–1156 9. Hervás-Martínez C, De La Mata M (2013) Predicting patient survival after liver transplantation using evolutionary multi objective artificial neural networks. 58(1):37–49
Blockchain for 5G-Enabled IoHT—A Framework for Secure Healthcare Automation Md Imran Alam, Md Oqail Ahmad, Shams Tabrez Siddiqui, Mohammad Rafeek Khan, Haneef Khan, and Khalid Ali Qidwai
Abstract Medical care has evolved into a lifeline for many people; due to this, there is an explosion in the field of medical big data. Wearable technology based on the Internet of Things (IoT) is being used by healthcare professionals to speed up diagnosis and treatment. There are now numerous sensors and devices, which are connected to the Internet. Internet of HealthThings (IoHT) can be used in many different ways in healthcare. Sensors and remote-monitored medical devices can help patients get more accurate and efficient diagnostics. This paper focuses on healthcare automation using the Internet of Things (IoT)-enabled devices. IoT-driven healthcare models can keep patients healthy and safe, enabling doctors to provide better care. In addition, healthcare frameworks significantly reduce healthcare costs and improve performance. Blockchain systems are transparent and thus safe to store and share private information. Data collected by medical devices attached to the body is kept in a cloud server alongside relevant medical reports as part of the proposed framework. The use of blockchain and 5G technology means that patients’ data can be securely transmitted at high speeds and with rapid response times. Keywords Internet of healththings · 5G · Blockchain · IoT · Healthcare
M. I. Alam (B) · M. R. Khan · H. Khan Department of Computer and Network Engineering, Jazan University, Jazan 45142, Saudi Arabia e-mail: [email protected] M. R. Khan e-mail: [email protected] H. Khan e-mail: [email protected] M. O. Ahmad Department of Computer Applications, B.S Abdur Rahman Crescent Institute of Science and Technology, Chennai, Tamil Nadu 600048, India S. T. Siddiqui · K. A. Qidwai Department of Computer Science, Jazan University, Jazan 45142, Saudi Arabia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_64
793
794
M. I. Alam et al.
1 Introduction Remote maintenance and monitoring of smart grids and smart cities are now possible thanks to rapid advancements in computer network and sensing network architecture models (smart cities). The Internet of Things (IoT) is a network of devices that collect, process, and evaluate data over the Internet. Remote collaboration between automation equipment and other fields of technology is now probable thanks to the Internet of Things (IoT). Individuals and terminal devices, such as storage devices, processing units, embedded sensors, and actuators, are concerned about the security of shared data. Blockchain technology ensures the security of data exchanged over IoT architecture [1]. These new healthcare management frameworks relying on IoT architecture for patient monitoring have been revolutionized by blockchain technologies. Due to the preservation of the nodes’ transaction histories, blockchain technology ensures data privacy. Proof of work is a fundamental principle of the blockchain. If the network verifies that authorized nodes have computed enough, a transaction is valid. Patients’ medical records will be more secure and private with the implementation of blockchain-based technology. Electronic patient records, personal health records, individual diagnostic reports, data gathered from wearable devices, and post-operative assessments are all examples of healthcare and medical information systems. The application of blockchain innovations is to enhance data exchange transparency, privacy, confidentiality, and traceability. 5G wireless networks are being constructed around the world. To improve network service quality, infrastructure capacity, interpretability, and overall architecture throughput, it connects heterogeneous devices and machines together. Despite the aforementioned improvements, data transfer security over 5G networks remains a major concern. The 5G network architecture will be safer with blockchain integration. Assuring immutability and transparency in 5G security is difficult due to the network’s diverse devices [2]. With the help of IoT-driven healthcare models, doctors can provide better care to their patients. In contrast, IoT healthcare frameworks have a significant impact on healthcare costs and performance. The 5G wireless IoT technology is ten times faster than current LTE technology and has lower latency. For advanced features such as distributed architecture, network slicing will be used in the future. A secure network can be formed by connecting various networking devices. Due to 5G’s security and privacy limitations, blockchain technology must be used to ensure model privacy and confidentiality [3]. This paper focuses on the fundamentals of 5G technology, including previous research, applications, and the properties of the technology. Technologies such as blockchain ensure the immutability of data exchanged over the 5G network, which consists of numerous heterogeneous devices.
Blockchain for 5G-Enabled IoHT—A Framework for Secure …
795
2 Literature Survey This section covers 5G-enabled IoT and three subsections. To start, let us review blockchain and how it is “disrupting” our current era. Then we will discuss the Internet of Things (IoT), its advantages and disadvantages, and how 5G will affect the IoT model [4]. Finally, the paper’s conclusions are based on the benefits of merging these two paradigms.
2.1 Healthcare “Smart healthcare” refers to the provision of medical services through the use of smart devices and networks (e.g., body area network, wireless local area network, extensive area network). Smart devices use sensors and biomedical systems to gather information about a person’s health (i.e., the application has information about medical science such as diagnosis, treatment, and prevention of disease). Medical errors can be reduced, and costs can be reduced by providing people from all walks of life with the information and solutions they need to do so [5].
2.2 Blockchain Technology and Related Concepts Network security and reliability can be ensured through the use of blockchain technology. Transaction management systems are also being replaced by blockchain technology. With the advent of the decentralized blockchain technology that underpins bitcoin, the problem of central banks has been eradicated. There is a currency known as Bitcoin (BTC). In a central architecture, all of the nodes are linked to a central coordination system. There will be a central coordination system where all information will be exchanged, passed, and approved. A failure of the central coordination platform will disconnect all of these individual nodes. There needs to be a shift from centralized systems to decentralized ones. In a decentralized system, there will be multiple coordinators. There is no single point of control in a decentralized system, and each node is a coordinator. Each node connects to other nodes; there is no central coordinator. There are blocks in the blockchain, each of which contains the most up-to-date verified transactions. A block’s value or proof of work is yet another crucial blockchain concept. In a blockchain, a block serves as a permanent record of all transactions that have occurred since the block was created. Blockchain is a combination of three technologies. Immutability is provided by the hash function, which combines the benefits of private key cryptography and hashing [6]. P2P networks are used to ensure complete blockchain consistency.
796
M. I. Alam et al.
2.3 Numerous Blockchain Capabilities to Support Global Healthcare Culture The industry of healthcare system can make extensive use of blockchain technology. Manage the drug supply chain, enable secure patient medical record transfer, and manage the supply chain, in healthcare and related industries, a wide range of blockchain features and critical enablers are shown in Fig. 1. Some of the technologically derived and impressive features used to develop and practice blockchain technology are digitalized tracking and issue outbreaks. There are numerous reasons for the widespread adoption of blockchain technology, including its complete digitalization and use in healthcare-related applications [7].
3 5G Technologies for Smart Healthcare The various short and long-range communication technologies are used by devices and servers in smart healthcare. WiMAX, a short-range wireless technology, is used in the smart healthcare system, such as BAN, for short communications (body area network). Smart healthcare makes use of GPRS, GSM, and other wireless technologies to transfer information from a local server to a base station. Mobile networks may be able to use LTE-M technology to connect IoT devices. If a network link can quickly send a large amount of data, then it has a high bandwidth [8]. A lack of bandwidth in 3G and 4G networks restricts the amount of data biomedical sensors can transmit. Higher frequency signals will be supported by the 5G network (including above 10 GHz frequencies). High-speed transmissions can be achieved by using these frequencies, which free up more bandwidth (on the order of Gbps). UHD images and
Fig. 1 Blockchain technology features for healthcare system
Blockchain for 5G-Enabled IoHT—A Framework for Secure …
797
healthcare solutions can be viewed remotely by doctors via a 5G network. This flexibility in bandwidth allocation enables D2D solutions in the medical field on the 5G network [9]. Wearable sensors, medical devices, and monitoring equipment can all be incorporated into these solutions [2].
4 IoT in Healthcare Internet of Things is a dynamic network infrastructure that can self-configure using interoperable and standard communication protocols, according to the IoT European Research Cluster (IECR) project. Sensors and sensor-based systems have seen an increase in the use of D2D communications [10]. For the development of 5G, IoT devices are expected to play a significant role. In spite of this, technology continues to advance. Even though managing data from multiple sources is a major challenge for IoT in healthcare, gaining meaningful insight from collected data is critical to its future [11]. Patients could only communicate with doctors via phone or text before the Internet of Things. The health of patients could not be continuously monitored and recommended by doctors and hospitals. IoT-enabled devices now make it possible for doctors to monitor patients from afar, improving care and ensuring their safety [12]. Patients can benefit from the Internet of Things by wearing fitness bands and other wirelessly connected devices, such as blood pressure and heart rate monitoring cuffs, glucometers (IoT). Even for the elderly, the Internet has made life better for many [13]. The result is that the lives of singles and their families are impacted. Notifications are sent to concerned family members and medical professionals when someone’s daily routine is disrupted.
5 Integration of 5G, Blockchain, and IoT in Healthcare As a starting point, this system uses the patient as a source of data. Data about the patient is gathered by sensors or medical devices. Accurate patient data is essential for the proposed system. It is common practice to monitor patients with sensors or medical IoT devices, which generate a significant amount of data. Treatment can be improved by analyzing patient data [14]. The data is collected and then encrypted using blockchain technology and stored in a database of patient data. Patients’ private data is safely stored in the cloud and can be accessed in the future to help with treatment decisions. The data is sent to healthcare providers at the fastest possible rate using 5G services. By using 5G services, the healthcare system’s rapid response time can be attributed to high data transmission speeds [15]. Using centralized blockchain technology to secure a database ensures the highest level of security for sensitive information. In the last step, healthcare providers should seek to deliver safe and sound care. Patients need disease-specific care from healthcare providers.
798
M. I. Alam et al.
The data generated by IoT devices generally requires huge storage capacity and connectivity for billions of devices with varying quality of services (QoS). Additionally, an open, transparent, and secure system must exist throughout the entire 5G ecosystem in order to deliver services using a combination of multiple technologies.
5.1 Proposed Framework The proposed model’s overall architecture is described in Fig. 2. Patients’ health records, sensor-based applications, database server on cloud storage (DSCS), 5G and blockchain components, and data collection modules are all part of the proposed model. In order to use the sensor-based application, the patient enters both their disease symptoms and their approximated bodily factor values. Using blockchain technology, the patient receives a copy of their medical history, which is then transmitted to the healthcare providers via 5G services. The proposed system is broken down into sections. The term “ambient sensor data” refers to data gathered from a variety of sensors. A variety of body sensors are used to gather data on various body parameters. The collected data is transferred to a cloud server where it can be stored. For security reasons, blockchain technology is employed. Improved reliability and security, as well as a better quality of output, are achieved by this as the most reliable results. The proposed system utilizes 5G technology. This is faster, more efficient, unique, and better than any previous healthcare system [14].
5.2 Working of Proposed Framework A flow diagram depicting the sensor-based application’s calculation of patient body parameter values is shown in Fig. 2. The database contains the body parameter values. The patient’s medical history is stored on a blockchain. The database has a note of this. The use of blockchain technology in the healthcare sector ensures data integrity and security. Data about the patient is sent to the cloud, where it is stored as a patient record and can be accessed for future treatments. Data stored in the cloud is transmitted to healthcare providers and experts using 5G technology. Including 5G technology in the proposed healthcare system makes it more efficient. To collect data, this system makes use of sensors like those that measure heart rate, body temperature, oxygen saturation MAX 30,100, and blood pressure [13, 16]. A layer of the intended system is comprised of hardware devices (sensors, microcontrollers, actuators, and Wi-Fi devices). Convergence of the system is impossible without application and infrastructure management. The networking and Wi-Fi modules are connected and managed by the connectivity layer. Using data channels between hardware devices and users (such as patients or healthcare workers) and the cloud, the management layer keeps track of cloud storage [17]. Patient monitoring, appointment scheduling, temperature indication, and alerts for abnormalities in body parameters are provided
Blockchain for 5G-Enabled IoHT—A Framework for Secure …
799
Fig. 2 Proposed framework using 5G, blockchain technology, and IoT in healthcare
by the application layer on top of the system. Depending on the circumstances, a variety of treatment options are available. An expert system that is aware of the disease will first offer a treatment recommendation. Second, a specialist is notified of the findings and given 24 h to make a treatment recommendation. If a patient has a disease for which the expert system does not have a database, the best option is to consult a panel of doctors [12]. Doctors meet to discuss the disease, conduct experiments, and gather all relevant information before sending patients to healthcare
800
M. I. Alam et al.
centers or expert systems via the internet. Disease information and treatment options are added to the expert system.
6 Conclusions and Future Work The proposed model outperformed the existing one in terms of efficiency, responsiveness, and confidentiality. An application that uses sensors to calculate body parameter values can reduce the time it takes for the procedure to complete. With the help of the Internet of Things (IoT), healthcare professionals can monitor and diagnose a wide range of health issues, measure a wide range of health parameters, and provide diagnostic facilities in remote areas. As an alternative to hospitalization, patients can be monitored and treated remotely from the comfort of their own homes. As a result, the healthcare industry has evolved from healthcare to a patient care system. The Internet of Things, artificial intelligence, and data science-based healthcare systems can help to improve healthcare. The use of technology-based healthcare services has the potency to enhance the quality of life for elderly people while also providing them with a more convenient means of accessing high-quality care. The use of new 6G technology using artificial intelligence and machine learning in healthcare applications and wearable devices in the future can solve most of the problems, such as connectivity problems, and can provide more information from data, which will be helpful in the long run.
References 1. Dwivedi AD, Srivastava G, Dhar S, Singh R (2019) A decentralized privacy-preserving healthcare blockchain for IoT. Sensors 19(2):326 2. Ahad A, Tahir M, Yau KLA (2019) 5G-based smart healthcare network: architecture, taxonomy, challenges and future research directions. IEEE Access 7:100747–100762 3. Sujihelen L, Senthilsingh C, Livingston LM, Sasikumar B (2019) Block chain based smart healthcare systems in 5G networks. Turk J Physiotherapy Rehabil 32(2):10 4. Al-Khazaali AAT, Kurnaz S (2021) Study of integration of blockchain and internet of things (IoT): an opportunity, challenges, and applications as medical sector and healthcare. Appl Nanosci 7 (Springer) 5. Ahmad MO, Siddiqui ST (2022) The internet of things for healthcare: benefits, applications, challenges, use cases and future directions. In: Advances in data and information sciences. Lecture notes in networks and systems, vol 318, pp 527–537 6. Siddiqui ST, Ahmad R, Shuaib M, Alam S (2020) Blockchain security threats, attacks and countermeasures. In: Ambient communications and computer systems. Springer, pp 51–62 7. Ratta P, Kaur A, Sharma S, Shabaz M, Dhiman G (2021) Application of blockchain and internet of things in healthcare and medical sector: applications, challenges, and future perspectives. J Food Qual 2021:20 8. Kumar V, Yadav S, Sandeep DN, Dhok SB, Barik RK, Dubey H (2019) 5G cellular: concept, research work and enabling technologies. In: Advances in data and information sciences. Springer, pp 327–338
Blockchain for 5G-Enabled IoHT—A Framework for Secure …
801
9. Prashar D, Rashid M, Siddiqui ST, Kumar D, Nagpal A, AlGhamdi AS, Alshamrani SS (2021) SDSWSN—a secure approach for a hop-based localization algorithm using a digital signature in the wireless sensor network. Electronics 10(24):3074 10. De Mattos WD, Gondim PR (2016) M-Health solutions using 5G networks and M2M communications. IT Professional 18(3):24–29 11. Elhoseny M, Ramírez-González G, Abu-Elnasr OM, Shawkat SA, Arunkumar N, Farouk A (2018) Secure medical data transmission model for IoT-based healthcare systems. IEEE Access 6:20596–20608 12. Siddiqui ST, Singha AK, Ahmad MO, Khamruddin M, Ahmad R (2021) IoT devices for detecting and machine learning for predicting COVID-19 outbreak. In: ICRTCIS 2021: international conference on recent trends in communication and intelligent systems 13. Ismail A, Abdelrazek S, Elhenawy I (2021) IoT wearable devices for health issue monitoring using 5G networks’ opportunities and challenges. In: Blockchain for 5G-enabled IoT. Springer, Cham, pp 521–530 14. Hameed K, Bajwa IS, Sarwar N, Anwar W, Mushtaq Z, Rashid T (2021) Integration of 5G and block-chain technologies in smart telemedicine using IoT. J Healthc Eng 2021:18 15. Ejaz W, Anpalagan A, Imran MA, Jo M, Naeem M, Qaisar SB, Wang W (2016) Internet of things (IoT) in 5G wireless communications. IEEE Access 4:10310–10314 16. Hu J, Liang W, Hosam O, Hsieh MY, Su X (2021) 5GSS: a framework for 5G-secure-smart healthcare monitoring. Connection Sci 1–23 17. Ray PP, Dash D, Salah K, Kumar N (2020) Blockchain for IoT-based healthcare: background, consensus, platforms, and use cases. IEEE Syst J 85–94
PCG Heart Sounds Quality Classification Using Neural Networks and SMOTE Tomek Links for the Think Health Project Carlos M. Huisa , C. Elvis Supo , T. Edward Figueroa , Jorge Rendulich , and Erasmo Sulla-Espinoza Abstract Cardiac PCG signal recordings are an important part of cardiology teleconsultations. The main problems related to high-quality recordings are due to less experienced healthcare personnel taking lower-quality samples and ambient noise, and these scenarios can lead to errors in diagnosis by the physician and PCG heart sound classification algorithms. Given this problem, machine learning algorithms were proposed for quality classification of PCG recordings that aid in accurate diagnosis and faster care. One difficulty in the application of these algorithms is the problems related to class imbalance, which is very common in medical applications that affect model performance. In this study, an artificial neural network (ANN) classifier with a SMOTE Tomek Links class imbalance method is used. Public databases containing 7893 recordings with ten features of each PCG signal are used. We use two types of labeling which have different levels of imbalance. The use of SMOTE Tomek Links in combination with neural networks showed better performance compared to SVM classifier. For future work, we intend to perform laboratory tests in remote areas applying the proposed algorithm and we also intend to use the concept of mel spectrograms and convolutional networks for the classification of heart sounds. Keywords Quality classification PCG · Auscultation · Data imbalance · Neural artificial networks · Think health project C. M. Huisa (B) · C. Elvis Supo · T. Edward Figueroa · J. Rendulich · E. Sulla-Espinoza Electronic Engineering, Faculty of Engineering, Production and Services, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru e-mail: [email protected] C. Elvis Supo e-mail: [email protected] T. Edward Figueroa e-mail: [email protected] J. Rendulich e-mail: [email protected] E. Sulla-Espinoza e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_65
803
804
C. M. Huisa et al.
1 Introduction Cardiovascular diseases (CVD) are one of the leading causes of mortality in the world [1]. Cardiac auscultation is the primary means by which physicians listen to heart sounds, and this procedure aids in patient diagnosis [2]. The recent SARS-CoV-2 pandemic had an increased demand for cardiology teleconsultations. A usual procedure requires PCG heart recordings taken by a physician or expert personnel who already have training and experience for a proper auscultatory process [3–6], but due to the low number of cardiologists especially in remote areas, this task is carried out by inexperienced healthcare personnel, and this results in lower-quality samples that affect the final diagnosis and can lead to erroneous results [7, 8]. One of the solutions to obtain high-quality PCG recordings is software-based procedures that estimate the signal quality from computer analysis, for example, selection algorithms were proposed to determine the best substring of the PCG signal [9–11] and classifiers based on linear regression, neural networks, and support vector machines (SVM) [12–14]. Most of these algorithms in the field of classification work well in a balanced condition between the number of classes. However, the imbalance between classes is one of the most common problems encountered in medical applications, which usually decreases the performance of classifiers [15], this is basically because the lack of a sufficient number of samples from the minority class complicates the task of identifying feature combinations that help to discriminate the classes. In this context, neural networks have shown multiple advantages over traditional methods, where they are used to mitigate the problems of imbalance between classes, they are also very flexible, suitable for solving complex problems with a high number of data, have an efficient and iterative training offering greater accuracy [16]. On the other hand, one of the most used methods to solve this problem is those based on data manipulation also known as sampling, these modify the data set to overcome the imbalance [17, 18]. Undersampling modifies the training sets by sampling from a larger to a smaller training set and oversampling is the repetition of instances of smaller classes, both intending to have balanced classes and obtaining better results. Therefore, the aim of this research is the application of neural networks with the SMOTE Tomek Links method to improve the classification performance. We use a public database containing 7893 recordings, and two types of labeling with different imbalance for model validation will be performed with k-fold cross-validation. This work is part of the research development in the area of biomedical engineering of the Universidad Nacional de San Agustín de Arequipa [19–21] for the Think Health project which seeks to improve medical assistance in the auscultation process.
PCG Heart Sounds Quality Classification Using Neural Networks …
805
Table 1 Unbalancing characteristics of the data used Labels
Total data
Majority class number “Unacceptable”
Minority class number “Acceptable”
Physician expert
7893
5806
2087
Physician expert and senior signal processing
7893
4386
3057
2 Literature Review One of the first works that evaluated the performance of a classification algorithm and PCG quality was performed by Springer [13], who used 151 clinical PCG recordings to perform a binary classification model using logistic regression this was applied with a digital stethoscope and a cell phone achieving an accuracy 82.2% and 86.5%, respectively. Another study uses the PhysioNet Challenge 2016 database [22] with which they performed a classifier based on feature extraction and neural networks achieving an overall score of 91.50% [14]. The work by Tang et al. [12] compiled a database, in which two types of labels were performed for each PCG signal, in addition to a binary and triple classifier with SVM. In this work, the compiled database of Hong Tang and the binary classification approach was adopted in which the signal is called “Unacceptable” when the recording is not good enough for analysis and “Acceptable” when the recording is good enough for analysis. The first type of labeling was performed by an expert physician in the area of cardiology and the second type of labeling is the combination of the expert physician with two engineers specialized in signal processing as shown in Table 1, it is observed that the majority class is made up of poor quality PCG recordings and the minority class is recordings identified as good quality. The databases that make up both labels are as follows: Cinc challenge 2016 [22], Pascal Classifying Heart Sound Challenge 2012 [23], PHHS 2015 [24], and Heart Sounds Catania 2011, which were taken by different devices and sensors in ideal and real environments, with a total of 7893 PCG signals.
3 Methodology The methodology consists of four phases: signal preprocessing, feature extraction, data balancing, and training. The first phase consists of running each signal through an anti-aliasing filter and then sampling at 1000 Hz. The baseline is removed by a high-pass Chebyshev filter of order 5 with a cutoff frequency of 2 Hz. In the feature extraction part, the ten features are extracted based on the kurtosis, energy ratio, frequency smoothed envelope, and the degree of periodicity of the sound [12].
806
C. M. Huisa et al.
Fig. 1 Proposed methodology
In the unbalancing method, the SMOTE Tomek Links technique was used, this method combines the ability to generate new synthetic SMOTE data for the minority class, and the ability to remove the data that are identified as Tomek Links of the majority class. The first type of labeling performed by the medical expert and the second type of labeling, which is the combination of the medical expert with two senior engineers, will go through the SMOTE Tomek Links method to be used later as targets for classifier training. The training part consists of using neural networks and SVM to find the best combination. The validation of the model will be validated with cross-validation divided into ten parts of equal size repeated 5 times and finally a comparison of the results on the proposed ANN classifier with the SVM classifier of Hong Tang (Fig. 1).
4 Experimental Tests and Results 4.1 Classification Model with Neural Networks The proposed model consists of three layers, the input layer has 32 nodes, the hidden layer consists of 64 nodes and 128 nodes, and finally, the output layer consists of a single node with a sigmoid function (for binary classification) [25]. The rectified linear unit (ReLU) is used as the activation function, and the neural network is optimized using the “RMSprop” optimizer. Table 2 shows the neural network architecture used in this study (Fig. 2). Table 2 Summary of the neural network
Layer type
Output shape
Parameters
Dense
32
352
Dense
64
2112
Dense
128
8320
Dense
1
129
Total
10,913
PCG Heart Sounds Quality Classification Using Neural Networks …
807
Fig. 2 Structure of the ANN neural network model for sound quality classification PCG “Unacceptable”–“Acceptable”
4.2 PCG Sound Quality Classification Performance In this section, experiments are conducted to evaluate the performance of the PCG sound quality classification algorithm. Results are presented for two test scenarios. • The first experiment tests the classification model with both labels without any inter-class imbalance treatment. • The second experiment tests the classification model with both labels using the SMOTE Tomek Links balancing technique. To evaluate the performance of the model, it was previously divided into 90% corresponding to the training data and 10% for the test data. The training data set was divided into 10 subsets of equal size and proportion by applying fivefold crossvalidation. For the evaluation of the proposed method, five measures are used for each experiment: Sensitivity “Se”, specificity “Esp”, the Accuracy of both classes “Acc1” and “Acc2”, and overall rate “Over”. These measures are defined: Sensitivity = Se =
TP TP + FN
(1)
Specificity = Esp =
TN TN + FP
(2)
AccuracyC1 = Acc1 =
TP TP + FN
(3)
AccuracyC2 = Acc2 =
TN TN + FN
(4)
Overall = Over =
(Se + Esp + Pcs1 + Pcs2) 4
(5)
808
C. M. Huisa et al.
where TP is the number of true positive results, TN is the number of true negative results, FP is the number of false positive results, and FN is the number of false negative results (Table 3). Compared with the SVM binary classifier of Hong Tang et al. we observe that our proposal has a better overall (Table 4).
5 Discussion We found that the models trained with SMOTE Tomek Links have a higher overall score than those that are naturally unbalanced, the use of neural networks helps to mitigate the effects of the imbalance between classes helping to have better scores compared to the SVM classifier. Using the tests it could be observed that the SMOTE Tomek Links unbalance method favors the performance of the models, for the base that contains a greater unbalance it was observed that the performance improved notably that went from an overall of 89.40–92.90%, on the other hand for the base that contains a smaller unbalance the model improved going from an overall of 93.50– 94.50%, shown in Table 3. Sampling increased the performance of the classifiers. However, this method involves increasing and decreasing information; i.e., it may remove important information or add redundant information, a more de- tailed study is needed to see the level of the negative impact that the application of SMOTE Tomek Links may have had. Other more sophisticated and recent models that can be chosen are specialized architectures, such as hybrid or transformer-based models [16] and models for representing sound in images using spectrograms together with neural networks, which have shown better results in comparison with classical machine learning models [5].
6 Conclusions The treatment of imbalanced data sets is an important issue in practice since model training from imbalanced data affects the performance of classification models. The two types of qualifications presented in this study are imbalanced, and the physician’s base has a higher imbalance. Our objective was to demonstrate that an improvement in algorithm performance can be achieved using artificial neural networks (ANN) combined with the SMOTE Tomek Links imbalance method. The results of the tests performed show that the PCG sound quality classifier improved the performance scores of the model, with the ANN classifier performing better. The use of both ANN and SMOTE Tomek Links methods gives us a more robust model with good sensitivity, specificity, and accuracy scores for both classes.
Physician and engineering
RNA
Physician
SVM
RNA
SVM
Method
Labels 94.60 ± 1.40 83.56 ± 4.1 95.28 ± 1.00 94.30 ± 1.40 93.20 ± 1.44 93.75 ± 1.8 93.24 ± 1.04
94.68 ± 1.40 89.94 ± 1.24 92.80 ± 1.30 95.42 ± 1.40 92.03 ± 1.06 94.64 ± 1.26
SMOTE Tomek Links
Unbalanced
SMOTE Tomek Links
Unbalanced
SMOTE Tomek Links
SMOTE Tomek Links
92.70 ± 1.30
83.60 ± 4.1
94.45 ± 1.40
Unbalanced
Esp%
Se%
Unbalanced
Model status
Table 3 Results of the balancing method in the two types of qualifications
94.16 ± 0.84
92.11 ± 1.30
94.42 ± 1.40
92.90 ± 1.40
95.16 ± 0.84
94.12 ± 1.50
92.84 ± 1.40
94.30 ± 1.50
Acc1%
94.17 ± 1.20
93.69 ± 1.08
94.60 ± 1.58
94.10 ± 1.40
90.13 ± 1.20
85.00 ± 3.40
91.80 ± 1.40
84.80 ± 3.40
Acc2%
94.30 ± 0.16
92.90 ± 0.30
94.50 ± 0.07
93.50 ± 0.04
92.63 ± 0.16
89.34 ± 1.18
92.90 ± 0.04
89.40 ± 1.18
Over%
PCG Heart Sounds Quality Classification Using Neural Networks … 809
810
C. M. Huisa et al.
Table 4 Comparison with the state of the art Method
Se%
Esp%
Acc1%
Acc2%
Over%
RNA + SMOTE Tomek links
95.42 ± 1.40
93.20 ± 1.44
94.42 ± 1.40
94.60 ± 1.58
94.50 ± 0.07
SVM [12]
96.10 ± 1.00
92.20 ± 1.20
94.00 ± 1.20
94.30 ± 0.70
94.30 ± 0.70
Acknowledgements This work is part of a research project funded by the Universidad Nacional de San Agustın de Arequipa, contract number IBA-IB-44-2020-UNSA, April 2021–March 2023.
References 1. Roth G, Mensah G, Fuster V (2020) The global burden of cardiovascular diseases and risks. J Am Coll Cardiol 76(25):2980–2981. https://doi.org/10.1016/j.jacc.2020.11.021 2. Pelech A (2004) Heart sounds and murmurs. In: Practical strategies in pediatric diagnosis and therapy, pp 178–210. https://doi.org/10.1016/b978-0-7216-9131-2.50015-4 3. Giordano N, Knaflitz M (2019) Multi-source signal processing in phonocardiography: comparison among signal selection and signal enhancement techniques. In: 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC). https:// doi.org/10.1109/embc.2019.8856725 4. Fontecave-Jallon J, Fojtik K, Rivet B (2019) Is there an optimal localization of cardiomicrophone sensors for phonocardiogram analysis? In: 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC). https://doi.org/10. 1109/embc.2019.8857681 5. Khan KN, Khan FA, Abid A et al (2021) Deep learning based classification of unsegmented phonocardiogram spectrograms leveraging transfer learning. Physiol Meas 42:095003. https:// doi.org/10.1088/1361-6579/ac1d59 6. Bao X, Deng Y, Gall N, Kamavuako E (2020) Analysis of ECG and PCG time delay around auscultation sites. In: Proceedings of the 13th international joint conference on biomedical engineering systems and technologies. https://doi.org/10.5220/0008942602060213 7. Mubarak Q, Akram M, Shaukat A, Hussain F, Khawaja S, Butt W (2018) Analysis of PCG signals using quality assessment and homomorphic filters for localization and classification of heart sounds. Comput Methods Programs Biomed 164:143–157. https://doi.org/10.1016/j. cmpb.2018.07.006 8. Shi K, Schellenberger S, Michler F, Steigleder T, Malessa A, Lurz F et al (2020) Automatic signal quality index determination of radar-recorded heart sound signals using ensemble classification. IEEE Trans Biomed Eng 67(3):773–785. https://doi.org/10.1109/tbme.2019.292 1071 9. Beritelli F, Spadaccini A (2009) Heart sounds quality analysis for automatic cardiac biometry applications. In: 2009 First IEEE international workshop on information forensics and security (WIFS). https://doi.org/10.1109/wifs.2009.5386481 10. Tang H, Li T, Park Y, Qiu T (2010) Separation of heart sound signal from noise in joint cycle frequency–time–frequency domains based on fuzzy detection. IEEE Trans Biomed Eng 57(10):2438–2447. https://doi.org/10.1109/tbme.2010.2051225 11. Li T, Qiu T, Tang H (2013) Optimum heart sound signal selection based on the cyclostationary property. Comput Biol Med 43(6):607–612. https://doi.org/10.1016/j.compbiomed. 2013.03.002
PCG Heart Sounds Quality Classification Using Neural Networks …
811
12. Tang H, Wang M, Hu Y, Guo B, Li T (2021) Automated signal quality assessment for heart sound signal by novel features and evaluation in open public datasets. Biomed Res Int 2021:1–15. https://doi.org/10.1155/2021/7565398 13. Springer D, Brennan T, Ntusi N, Abdelrahman H, Zühlke L, Mayosi B et al (2016) Automated signal quality assessment of mobile phone-recorded heart sound signals. J Med Eng Technol 40(7–8):342–355. https://doi.org/10.1080/03091902.2016.1213902 14. Zabihi M, Rad AB, Kiranyaz S, Gabbouj M, Katsaggelos KA (2016) Heart sound anomaly and quality detection using ensemble of neural networks without segmentation. In: 2016 Computing in cardiology conference (CinC). https://doi.org/10.22489/cinc.2016.180-213 15. Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460– 3471. https://doi.org/10.1016/j.patcog.2013.05.006 16. Borisov V, Leemann T, Seßler K, Haug J, Pawelczyk M, Kasneci G et al (2021) Deep neural networks and tabular data: a survey. arXiv Preprint (2021). arXiv:2110.01889 17. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953 18. Sadeghi S, Khalili D, Ramezankhani A, Mansournia M, Parsaeian M (2022) Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods. BMC Med Inf Decis Making. 22(1). https://doi.org/10.1186/s12911-022-01775-z 19. Sulla TR, Talavera J, Supo E, Montoya AA (2019) Non-invasive glucose monitor based on electric bioimpedance using AFE4300. In: 2019 IEEE XXVI international conference on electronics, electrical engineering and computing (INTERCON). https://doi.org/10.1109/inter-con. 2019.885356 20. Huamani R, Talavera RJ, Mendoza E, Davila N, Supo E (2017) Implementation of a realtime 60 Hz interference cancellation algorithm for ECG signals based on ARM cortex M4 and ADS1298. In: 2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON). https://doi.org/10.1109/intercon.2017.8079725 21. Edward Figueroa T, Huisa CM, Elvis Supo C, Rendulich J, Sulla-Espinoza E (in press) Algoritmo automático de detección de intercambio de electrodos para ECG de 12 derivaciones basado en puntuación 22. Database challenge (2016). https://www.physionet.org/physiobank/database/chal-lenge/2016/ May2019 23. May 16 2019. http://www.peterjbentley.com/heartchallenge/ 24. Spadaccini A, Beritelli F (2013) Performance evaluation of heart sounds biometric systems on an open dataset. In: 2013 18th international conference on digital signal processing (DSP). https://doi.org/10.1109/icdsp.2013.6622835 25. Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Lecture notes in computer science, pp 195–201. https://doi.org/ 10.1007/3-540-59497-3_175
An Influential User Prediction in Social Network Using Centrality Measures and Deep Learning Method P. Jothi and R. Padmapriya
Abstract The widespread use of online social network (OSN) and the oftenincreasing volume of information provided by their users have motivated both corporate and scientific researchers to investigate how certain systems can be manipulated. According to recent findings, monitoring and evaluating the influence of OSN users has significant applications in the fields of health, economics, education, politics, entertainment, and other fields. The propagation model has an impact on a centrality measure’s capacity to show a node’s ability to disseminate influence. However, certain modeling techniques, a centrality measure that performs well on wandering and un-weightiness networks may produce low performance. To improve prediction performance, new-centrality measures and combined centrality measures are proposed employing linear combinations of centrality metrics. Keywords Online social networks · Centrality measures · CNN · CPPNP · TDSIP
1 Introduction Online social networks, have turn out to be one of the greatest effective and efficient communication platforms for providing information, perspectives, and thoughts, along with promoting events and organizations. Users of OSNs publish content and receive feedback (reviews, responses, and so on) from other users who either accept or reject the material. Sometimes, users are able to snatch the courtesy of a massive number of other users when they publish their work [1]. The idea that their postings generate a lot of comments and feedback or that they are frequently reposted, providing them access to a significant number of additional individuals, suggests that they are getting a lot of attention. Users who obligate P. Jothi (B) · R. Padmapriya School of Computer Studies, Rathnavel Subaramaniam College of Arts and Science, Coimbatore, Tamilnadu, India e-mail: [email protected] R. Padmapriya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_66
813
814
P. Jothi and R. Padmapriya
the capability to catch the attention of a huge number of individuals are known as influencers. The problem of recognizing and predicting the individuals who are influential in an OSN’s is essential since it offers an inclusive range of prospects in a variety of fields, including politics and economics. In fact, measuring impact provides for the capture of real-world user attributes, which are important for both analyzing and explaining the system’s development [2] delivering practical answers to significant real-world challenges, for instance locating audiences and creative who have an impact on consumers’ tastes and preferences [3], recognizing brand promoters [4], determining the value of users in microblogs or games [5], recognizing travel bloggers who have an impact on tourism destinations [6], influencing political viewpoints [7], and having an impact on the social and healthcare areas of life [8]. The extensive usage of online social network and the variety of user-generated content are the primary elements that draw scholars and businesses to the study of the influencer’s phenomena. Several businesses, for example, rely on OSN’s to capitalize on the reach of their goods and products. The most common method of implementing a market policy is to identify a group of micro influencers [1], or users who are appropriate to a certain group and have the power to affect the greatest number of current members in terms of marketing. The development of a new measure that captures specific elements of influence is frequently the focus of analysis of influence between OSN users. In the scientific literature, there are a range of metrics for evaluating user influence [9]. However, acquiring a full view of the users’ data (e.g., posts, images, comments, emotions, retweets, etc.) in numerous real OSNs, such as Facebook, turns out to be quite complicated because the data which can be accessed is prohibited by both the OSNs’ API limits and the users’ privacy policies. As a result, during the preceding decade, most of the research in the zone of social impact concentrated on creating methods to estimate the influence of Twitter users [10–12]. Instead, research on the influence of Facebook has concentrated on identifying influential user-created information such as posts or images, influential pages, or prominent users of Facebook’s services. The creation of user communities or social groupings is another primary impact of the online social network model. Communities are very common in today’s social network, and most of them permit users to generate groups in order to make sharing information with other members of those groups easier. The flow of information generated by certain organizations may have a noteworthy influence on acquiring the member’s power [9]. The communications between group followers are particularly well represented using a temporal network, which encapsulates the system’s efficient communication trends and user behavior. As a result, this research provides a methodology for determining the most prominent members of communities (groups) who might play a pivotal role in existing online social network.
An Influential User Prediction in Social Network Using Centrality …
815
2 Literature Survey Zhang et al. [13] developed a new paradigm for the least-cost influence problem that operates across several networks. The employment of lossless and wrong coupling techniques is used to enhance the system’s accuracy [14]. The complete original quality of the network is preserved in the lossless method, while interesting possible solutions are developed for lossy coupling. Furthermore, for any network, which can have all its nonlinear models, the system focuses on heterogeneous propagation models. Nemanja Spasojevic examined the issue of determining the optimal moment for an user to upload a piece of material, in addition to the possibility of getting feedback from the audience [15]. Analyzed a several classifiers that gross into account the number of reactions a post or tweet receives in a given period of time after it is published. Studies on Twitter and Facebook demonstrate that the when to post dilemma can be readily handled by taking into account. (a) the number of comments made by a user’s friends after the post was published and (b) a friends predisposition to react to the users postings. Gong et al. [16] introduced a new memetic algorithm that encompasses social network societies for the maximization of influence problem. To strengthen the accuracy of the method, this algorithm performs two layers of optimization: population initialization and information retrieval similarity. This model includes the issue of influence overlap, but it is incapable of handling the enormous amount of data collected in a timely manner. Wang et al. [17] suggested a novel solution to the Distance Aware Influence Maximization (DAIM) challenge. The distance between a determined group of users and their location prompts is considered in this research, and the identification of an influential group of users differs according to the recommendations [18]. Index-based techniques such as Maximum Influence In (out), Absorbance (MIA), and Reverse Influence Sampling (RIS) are utilized to accomplish this case. Minimal sampling is a type of unbiased sampling that is used to manage the DAIM in order to considerably reduce search spaces. Tong et al. [19] established a model that uses an adaptive seed selection strategy with such a greedy algorithm to choose prominent users. Seed users in this approach fluctuate periodically depending on the type of social network. The system’s extensibility is handled via greedy and adaptive approaches [20]. The heuristic greedy technique is not suited for establishing the dynamic standalone cascade in this scenario. For community discovery, Hu et al. [21] proposed combining closeness with centrality and signal transmission. The ultimate rank of proximity centrality and similarity between nodes, estimated using signal transmission, is used to select a center point for community formation [21]. Finally, with an iteratively updated community center node, tiny groups of communities are dynamically joined to produce a consequent community. Wang et al. [17] proposed a technique for detecting influential nodes on Twitter based on structural centrality. With a weighted method and local search procedure,
816
P. Jothi and R. Padmapriya
structural centrality is used to find the networks center nodes for community creation [22]. At the conclusion of the overlapping community detection, favorable outcomes are obtained. To condense running time complexity in community detection, Rani et al. [23] employed a Label Propagation Algorithm (LPA) with impact centrality. The LPA is a graph-based technique along with a semi-supervised learning technique works quickly. In the lack of influence centrality, the LPA works poorly in some situations, resulting in either a massively huge community or no one community to cope with anyway [24]. LPA efficiency is enhanced using a hybrid method of influential centrality. This technique contains aspects for node selection in the communal building process, such as node and relative centrality, along with modularity. Partition, fast unfolding, and agglomerative algorithms are commonly used in undirected graphbased community discovery [25]. However, due to the directional aspect of the network topology, the partitioned technique produces better results than the other two.
3 Methodology In this research work, the best performing new-centrality measures are deliberated. As well as new-centrality, linearly combine two measures and new combinations of two each negate measures; this yields four new combined centrality measures.
3.1 Data Collection The workflow is concerned with data collecting and guarantees that up to date information on people within a specific social group is acquired from the OSN. The target social group for influence prediction is chosen based on either target marketing methods or the domain of interest. Whenever the target social group was identified, collecting all essential data about the group’s activities (e.g., likes, comments, responses, creation time, etc.) is a critical problem that often requires the usage of crawlers or connected apps that leverage online social network-API. The first method for data retrieval discussed is to create a crawler [10, 21] HTTP application that given a social group, frequently investigates it on purpose of a specific user by gathering information about items published by the members. The second technique can make use of the APIs given by the online social network architectures to construct applications that collect data from groups. Users of OSNs must authorize certain applications to acquire their data in this case by supplying an API key that is only valid for a short time [14].The approach utilized to gather social group data, is very critical to have a current data toward avoid appropriateness issues.
An Influential User Prediction in Social Network Using Centrality …
817
In detail, if the data collection step gives outdated knowledge of the social clutch, the influencers prediction may produce inaccurate findings. Changes in social groups occur regularly due to the dynamic character of these systems. Members, in fact, constantly generate new interactions over the publication of new-comments, newposts, new-replies, and therefore on. Moreover, the decision makers of a social cluster have complete control over who joins the group and who leaves. As a result, the phases should be redone on a regular basis, with a suitable sample frequency defined. Facebook is quickly turning into a need for people who care about more than just their family and group of friends. When people are interconnected and sharing information, communities thrive. The collection consists of four [5] csv files containing information from the five [6] unofficial Facebook groups as Cheltenham township, Elkins park occurrences, Free speech zone, Cheltenham lateral solutions, and Cheltenham township people. In post.csv, these are the main posts, having a brief glance at the page is beneficial. The commas and apostrophes in the message box have been changed to (COMMA) and (APOST), respectively. It contains Eight [1] attributes: pid Main Post id, gid Group id (5(five) different Facebook groups), id (Id) of the user publishing, name User’s name, timestamp, shares, url, msg Text of the message posted, and likes number of likes. The main post’s comments are included in comment.csv. Facebook posts receive feedback, including feedback on feedback. It has eight characteristics, including pid matches the primary post identity in post.csv, cid, and group id Comment Id, timestamp, user’s name, user’s commenting ID, user’s commenting ID who responded to the original comment, and message. Likes and responses are contained in the file like.csv. This file’s two keys, pid and cid, will combine to form posts and comments, respectively, has gid Group id, pid matches the post.csv file’s primary post identification, and cid Matches Responses to comments, including (Like and Angry), are identified by their ids. When they react, they include the user’s name and user id. It contains the following information: [gid as Group id], [id as Member ID], [namename Member ID], and [url as URL] of the Member they alter their profile image.
3.2 Data Modeling An Group Interaction Graph G(X, Y ) is absorbed multilayer graph in which X is a collection of vertices (or nodes) denoting the groups of participants in the group. E is a collection of vertices and edges indicating the events that aroused between the y) e where x, y ∈ X. Comments on posts or answers X nodes. For example: d(x, to comments are examples of such events. Emotional responses are not considered occurrences in and of themselves, but rather as qualities of the events to which they refer. The label of the interaction isdefined by the functions F: E→ E, and the E = {Comment, Reply}. list of possible labels is defined by is engaged and the sources along with the target of the In this state, the graph G interaction are determined by the sequence of nodes within every edge d(x, y). Each edge d(x, y) is made up of the source vertex a (d) = x and the target vertex b (d)
818
P. Jothi and R. Padmapriya
= y, in that order. The graph G can have many paths among the two vertexes, i.e., can all have the same distinct edges d1 (v, w), d2 (v, w), . . . dn (v, w) in the graph G source and target.
3.3 Data Transformation (DT) The goal of the DT phase is a variety of significant metrics that can be used to assess the influence of group members are to be created during the data transformation process. The metrics to look at in a particular use case of the dynamic web are then overhead calculated to determine the position of members at different period intermissions (t 1 , t 2 ,…,t s ). It can be used to predict the member’s influence and provide a set of measurements. −→ −→−→ − → Given a temporal network Tt1 , Tt2 , Tt3 , . . . . . . , Tts of a group (where t 1 < t 2 < − → t 3 < · · · < t s ) and a general GroupInteraction Graph G ti = Vti , E ti of the temporal network (i.e., ti ∈ {t1 , . . . . . . ts }) representing the group in the time interval [t i , t i + ), the time-aware centrality metrics task computes for each member w ∈ Vti the vector of metrics Mwti = [m 1 , m 2 , . . . . . . m n ] where N is the number of metrics, and the element mj of the vector (for k = 1, 2, …,I) represents the value of the metric j in the time interval (t i , t i + ) is being considered. In this work used we used a variety of centrality metrics were employed in this study, including reactions, re-shares, node degree centrality, edge degree centrality, interface rates, activity rates, h-indices, proximity-centrality, betweenness among centrality, pagewithrank, and Eigen-vector-centrality. These factors are already taken into account when predicting influential users. The additional measures like outdegree, out-strength, approximate betweenness nodes, approximate outbound closeness, approximate inbound closeness, Katz centrality are considered as additional measures. Edge Degree Centrality (EDC): It makes an all the linkages that flow through a specific node. Two distinct variations of the edge degree centrality regardless of the direction of the edges since we assume − → a generalized Group Interaction Graph G ti = Vti , E ti of the dynamic network. The amount of interactions between nodes v and w in the network is measured by the edge out-degree centrality ev+ . Similarly, the number of different edges in the graph linking a vertex w to the vertex v is defined as the in-degree centrality ev− of a node v [26]. Consequently, measuring the size of the following sets yields the in-degree and out-degree of a node v: ev+ = d(v, w) ∈ E v, w ∈ Vti
(1)
An Influential User Prediction in Social Network Using Centrality …
ev− = d(w, v) ∈ E v, w ∈ Vti
819
(2)
Node Degree Centrality (NDC): It determines the number of persons who interacted with a specific user x ∈ X ti . To analyze two alternative variants of the node degree centrality determines the direction of the edges in the case of edge degree centrality. The number of different users that got one or more connections from node X in the network is measured by the node out-degree centrality n +X . It analyses the collection of nodes connected by ingoing edges incident to x in a more formal sense. Furthermore, the variety of different users that began one or more connections with the vertex v was characterized as the node in-degree centrality n − v of v. The sizes of the following sets determine the node in-degree and node out-degree of a node v given a generic Group Interaction Graph of a dynamic network: n+ x = w ∈ X |d(v, w) ∈ E ti
(3)
n− x = w ∈ X |d(w, v) ∈ E ti
(4)
Interaction Rate (IR): It determines the typical numeral of interfaces that user l has recently experienced. Think about two unique connection rate variations, one for each edge direction [27]. The output interaction rate i v+ is the ratio of the number of different interactions launched by node v (i.e., ev+ ) to the wide variety of different types of users who obtained these connections (i.e., n + v ). The input interaction rate i v− of node v is defined as the ratio of the number of different interactions received by v (i.e., ev− ) to the number of distinct users that began these interactions (i.e., n − v ). The output/input interaction rate of a node v is computed using the following equations given a generic Group Interaction Graph of the dynamic network: i v+
=
|ev+ | n+ v
0
if
n+ v
= ∅
i v−
=
|ev− | n− v
0
if n − v = ∅
(5)
It is important to note that users who receive no inbound interactions have an input interaction rate of 0, whereas individuals who only receive one inbound contact for each inbound neighbor have an input interaction rate of 1 as (− ) have an input interface rate of 1. Rate of Activity (RA): It assesses the user’s level of involvement using both inbound and outbound interactions. The rate of activity (ax) is the proportion of the node (x’s) incoming interactions. The activity rate of a node v is calculated using the following equations given a generic Group Interaction Graph of the temporal network: av =
0
|ev− | |ev+ |+|ev− |
otherwise ifev+ + ev− = 0
(6)
820
P. Jothi and R. Padmapriya
It is worth remembering that the user activity rate is within the range of [0, 1]. In precisely, if a user v has not done any ingoing interactions in the graph (i.e., ev− = 0), the activity rate is 0, and if v has not conducted any outgoing interactions (i.e., ev+ = 0). To call v a customer when his activity rate is greater than 0.5 and customer has more ingoing interactions than outgoing interactions [28]. Instead, v is defined as a producer if it has more outgoing interactions than ingoing interactions, or if its activity rate is less than 0. H-Index (HI): It is commonly used to assess a scientist’s or scholar’s output as well as the influence of articles in terms of citations. An index of h indicates that the user has published h articles, each of which has been referenced at least h times in other works. The HI as H (Z) is set quantity of reals [A = (a1, a2, … ai)] provides the greatest integer h such that (a1, a2, ….. ai) contains at least h elements, each of which − is bigger than h. The H-index hv of a node v computes, for each neighbor w ∈ n v the number of directed edges from w to s, i.e., d(w, v)|, given the collection n − → v—of nodes who have interacted with v in G ti = Vti , E ti . Finally, the function H is given the finite series of interactions between the neighbors and the user v as an input parameter. h v = H d(s 1 , v),
s1 , s1 , . . . . . . , si ∈ n − d(s2 , v), . . . d(s i , v) v
(7)
Closeness Centrality (CC): The lengths of the defined shortest paths between a node and every other node in a specific graph are added up to determine a node’s overall importance. The closeness-centrality cv is calculated using the following equations − → for a generic Group Interaction Graph G ti = Vti , E ti and a node v ∈ V: cv =
w∈Vti
1 d(w, s)
(8)
where a function (d) measures the separation between nodes w and s. The least expensive path is used to determine distances in a graph using the Dijkstra-ShortestRoute method. The standard definition of proximity centrality stipulates that multiple edges between nodes must be disregarded and that all edge weights in the network must equal 1. Using a transformation technique, which calculates the weight between two nodes as the communal of the interaction number, this concept was made better. A node (v) with a high closeness-centrality cv may frequently interact with various group members even though (v) is close to all members of the group. Betweenness of Centrality (BC): The node values are esteems of assisting evidence to spread toward the unconnected clusters of members are measured by its BC. The proportion of directed shortest routes through that specific node is used to calculate the betweenness centrality of that node.
An Influential User Prediction in Social Network Using Centrality …
bwcv =
sp (v) sp s=v=w
821
(9)
where sp denotes the overall numeral of tinniest paths connecting users s and p, and sp(v) is the number of shortest paths between s and p that cross the node v. A user v with a strong centrality of betweenness bwcv serves as a link among group members who do not speak to each other. By utilizing the dijkstra shortest path method, to determine the count of littlest pathways among two nodes. In a specific network constructed on the least expensive path, distances were calculated using the dijkstra shortest route methods. Pagerank (PR): Fundamental concepts of an extra significant websites are more probable to obtain additional (contacts or links) from the other websites. The algorithm represents the possibility that an users will visit a certain-page based on the probability of randomly choosing connection from the current-page and the probability of moving to a other-pages are picked at random from the whole network. PageRank alters this stochastic process by include a group interaction graph model’s defined probability and alpha of skipping to each vertex. If alpha is 1, all vertices are scored equally. If alpha is 0, the Eigenvector centrality technique is applied to 1 X ti . Because normal values of alpha are in range of (0.1 and 0.2), the parameter alpha should be set to 0.15, however it can remain any amount range between 0 and 1. Eigenvector Centrality (EC): Eigen-vector-centrality of all the node is influenced by the quantity and significance of its neighbors. It is usually calculated by adding the Eigenvector centralities of immediate neighbors, although it may also be approximated using iterative techniques based on random walks in specific cases. xv =
1 xt = av, txt λ t∈M(v) t∈G Ax = λx
(10) (11)
In-Degree and Out-Degree (ID & OD): Degree centrality may be measured in two ways in a directed network in ID AND OD. The amount of connections directed to the node is counted as in-degree, while the amount of connections directed to others is counted as out-degree. As a result, their equations are as follows:
dv− = |A|
(12)
dv+ = |A|
(13)
v∈V
v∈V
822
P. Jothi and R. Padmapriya
Strength Centrality (SC): Centrality of strength S u is simply a weighted variant of degree centrality: instead of utilizing the neighborhood size, the weight of all incidence edges of u is added together. Su =
N
a i j wi j
(14)
j=1
Outbound Closeness (OC): The average distance between a node v and all other nodes it can reach (outbound centrality) or all other nodes that can connect v (inbound centrality), and the different criteria of both sets. The outbound connectivity set of v has a size of following. R[v] = |{u2V \{v}|vu }|,
(15)
where v, u denotes that u may be reached from v. Similarly, v’s inbound reachability set is rather large. ←
R [v] = |{u2V \{v}|uv }|
(16)
As a result, the total distance to the outward reachability set of v is defined as
= S[v]
dvu
(17)
u|vu
As well as the overall distance to v’s inbound reachability set ←
S [v] =
duv
(18)
u|uv
The inverse of ratios are used to establish the outbound and inbound centralities. Closeness on the inside. ← ← ] R[v] S[V and S [v] R [v] (19) Katz Centrality (KC): Katz centrality is a metric of a node’s relative importance in a network, considering both immediate and non-immediate adjacent nodes that are linked by direct neighboring nodes [29]. A node’s Katz centrality is calculated as follows: CKatz(vi) = α
n j=1
A j, iCKatz(v j) + β
(20)
An Influential User Prediction in Social Network Using Centrality …
823
where α is a damping factor, which is generally less than the greatest Eigenvalue, i.e., α < 1/and β is a bias constant, also known as the exogenous vector, which is used to avoid zero centrality values. Based on the aforementioned insights for merging different measures into new ones, a linear combination of a local and a global centrality measure is suggested to combine both perspectives of a node. This approach combines the local and global viewpoints of a node. The established combined centrality measures have a strong ability to forecast the spread of node influence. These measures, including their extension of the state-of-the-art measure, perform noticeably better than the current model in terms of correlation, monotonicity, and running time, greatly perform the existing model.
3.4 Prediction and Training The most important group members are identified, and the extent of their effect is calculated, using the data converted metrics from the data transformation step in the training and prediction phase. To find the prediction model, a training set is frequently demoralized. The training set includes a number of S normalized samples Cut1 , Cut2 , . . . Cuts , all of which are associated to a user u and are sorted by the time those who happened. The input parameter s denotes the number of observations of the user u on which the Prediction job is performed, and each converted observation Cuti corresponds to a genuine observation Muti . Depending on the duration of the temporal network’s segment interval, each evaluation is conducted at a distinct time (hourly, daily, or weekly) in this framework. For example, if the temporal network’s granularity is 1 day, 1. option is to give a training set for the user z that contains of remarks conducted in the previous 8, 15, or 22 days. Instead, if the temporal network’s granularity is 1 week, a suitable training set might be the latest 4, 8, or 12 weeks observations. In order to study the influence of distinct time resolutions on the temporal network, used different granularity values for the temporal network in this experiments, such as daily or weekly. In addition, varied the training set size s by evaluating a different subset of the available data in this experiment. Each converted observation of the training set Cuti = [c1 , c2 , . . . c N ] is associated with a specific member u at a time t i , and it consists of N values reflecting the transformed metrics produced by the Data Selection step. To derive an individual impact score of u from the converted observation Cuti = [c1 , c2 , . . . ..d N ], use the following technique to integrate the contributions of each component. If the observation’s number of components N is equal to 1, the observation’s impact score is plainly equivalent to the value of each individual component. As an alternative, if the observation’s no. of mechanisms X is higher than 1 (i.e., N > 1), the influence score should be computed by adding the component values. In this scenario, the equivalent the influence score should be computed by adding the
824
P. Jothi and R. Padmapriya
component values. Properly, an individual’s influence score Iti (u) at a given time t i N equals Iti (u) = Cuti [ j]. j=1
Prediction Although numerous prediction algorithms may be used to identify the most influential members, each with their own set of advantages and weaknesses, this work fully concentrate on the Convolutional Neural Network method in this paper since it is an effective prediction model that has been effectively employed in the area of online social networks. Each neuron works as a kernel in the convolutional layer, which is built up of a series of convolutional kernels. The convolution operation becomes a correlation function if the kernel is symmetric. The following is an example of a convolution operation: f 1k (x, y) =
c
i c ( p, q) · elk (u, v)
(21)
x, y
where i c ( p, q) is a feature of the input I c multiplied by elk (u, v) value of the kth convolutional kernel kl of the lth layer element wise. The result of the kth convolutional operation, on the other hand, may be represented as Flk = f lk (1, 1), . . . . . . f lk (x, y), . . . f lk (X, Y )
(22)
4 Result and Discussion Dataset Description Facebook has greatly expanded its functionality by allowing users to create numerous sorts of groups with diverse criteria. Users may establish three distinct sorts of groups, according to the Facebook Help Center. Depending on the group’s privacy settings, all Facebook groups need member permission, from either the group’s administrator or a member of the group. Everyone can view the members of public groups and read the posts they produce, however only the present members of the group can write a post. An open group is open to anybody who wants to join. Closed groups have postings that can only be log on and published by the active members of the group. A closed group can be requested by anyone. Finally, hidden groups can only be accessible if the group’s administrator (or a member) has asked the user to join, and only the group’s existing members can post and view the group’s postings. A Facebook crawler program is used to get the interactions that happened in a set of specified Facebook groups with the intention of achieve a huge collection of diverse clusters-groups with varied features.
An Influential User Prediction in Social Network Using Centrality …
825
This work primarily focuses on particular-(public) groups, since they may be searched by anybody otherwise joined by fetching a member. Retrieving secret groups, instead it requires a request from an associate member. Facebook groups are classified into many categories such as sports, education, politics, news, and entertainment. The accuracy of the structure in identifying the influential members for numerous counts is shown in Fig. 1. The finest prediction results are succeeded when the framework has 3000 counts. In this case, correctly predicted influencers are maximum number is equals to 3 for = 1 day and to 5 for = 1 week. Instead, the selection of 3 or more components do not improve the accuracy of the framework. Furthermore, exploiting the framework was up to predicting at least 2 of the influential members of about 80% of the groups. Predictive Performance Performance in classifying data is entirely measured using precision, recall, and accuracy. These metrics’ computation formulas are
Precision = Recall =
Fig. 1 Proposed architecture
TP TP + FP
TP TP + FN
(23) (24)
826 Table 1 Evaluation metrics for training set
Fig. 2 Evaluation metrics for training set
P. Jothi and R. Padmapriya Prediction algorithm
Precision
Recall
Accuracy
CNN
0.76
0.78
0.76
CPPNP
0.71
0.74
0.72
TDSIP
0.68
0.71
0.69
Training Set 0.8 Precision 0.75 Recall 0.7
Accuracy
0.65 0.6
CNN
Table 2 Evaluation metrics for testing set
CPPNP
TDSIP
Prediction algorithm
Precision
Recall
Accuracy
CNN
0.79
0.81
0.79
CPPNP
0.73
0.74
0.74
TDSIP
0.71
0.72
0.71
Accuracy =
TP + TN FN + TP + FP + FN
(25)
The predictive performance of the approaches is quantified using precision, recall, and accuracy metrics. Combined-Personalized Propagation Neural-Predictions (CPPNP) [30] and TDSIP [31] are compared. Table 1 and Fig. 2 display the training set results, whereas Table 2 and Fig. 3 display the testing set results. The best CPPNP and TDSIP models perform lower than the Convolutional Neural Network (CNN) predictive model across all datasets.
5 Conclusion Influential users are especially significant in an online social network because they may capture the attention of a huge number of people and ability to influence over others who follow them and retweet their information. As a result, appropriately identifying prominent users is critical in a variety of disciplines, including commercial
An Influential User Prediction in Social Network Using Centrality … Fig. 3 Evaluation metrics for testing set
827
Testing Set
0.85
0.8
Precision Recall
0.75
Accuracy 0.7
0.65
CNN
CPPNP
TDSIP
promotion, public opinion monitoring, and so on. In this paper, we proposed a work as basic technique that allows us to handle the influencers prediction challenge using CNN architecture, stirred through the increased courtesy committed to the challenge, identifying essential user’s in OSN. A Framework was tested in a Facebook scenario by analyzing the interactions of roughly 10,000 individuals from 30 Facebook groups representing a variety category (education, politics, sports, news, and entertainment). This research covered some surprising data on the effect of members. The results of the investigations reveal that our methodology is effective on average, able to forecast the members who would end up on the list of future prominent members for all the groups by using an additional group of centrality measures.
References 1. Backaler J (2018) Business to consumer (B2C) influencer marketing landscape. In: Digital influence. Springer, pp 55–68 2. Topirceanu A, Udrescu M, Marculescu R (2018) Weighted betweenness preferential attachment: a new mechanism explaining social network formation and evolution. Sci Rep 8(1):10871 3. Pálovics R, Benczúr AA (2015) Temporal influence over the Last.fm social network. In: Proceedings of the IEEE/ACM international conference on advances in social network analysis and mining, vol 5, no 1 4. Majumdar A, Saha D, Dasgupta P (2015) An analytical method to identify social ambassadors for a mobile service provider’s brand page on Facebook. In: Applications and innovations in mobile computing (AIMoC), 2015. IEEE, pp 117–123 5. Ponchai W, Watanapa B, Suriyathumrongkul K (2015) Finding characteristics of influencer in social network using association rule mining. In: Proceedings of the 10th international conference on e-business (iNCEB2015) 6. Magno F, Cassia F (2018) The impact of social media influencers in tourism. Anatolia 2018:1–3 7. Weeks BE, Ardèvol-Abreu A, Gil de Zúñiga H (2017) Online influence? Social media use, opinion leadership, and political persuasion. Int J Pub Opin Res 29(2):214–239 8. Zhou J, Liu F, Zhou H (2018) Understanding health food messages on Twitter for health literacy promotion. Perspect Pub Health 138(3):173–179
828
P. Jothi and R. Padmapriya
9. Riquelme F, González-Cantergiani P (2016) Measuring user influence on Twitter: a survey. Inf Process Manag 52(5):949–975 10. Deborah A, Michela A, Anna C (2019) How to quantify social media influencers: an empirical application at the Teatro alla Scala. Heliyon 5(5):e01677 11. Jain S, Sinha A (2021) Identification of influential users on Twitter: a novel weighted correlated influence measure for Covid-19. Chaos Solitons Fractals 139:110037 12. Khan HU, Nasir S, Nasim K, Shabbir D, Mahmood A (2021) Twitter trends: a ranking algorithm analysis on real time data. Expert Syst Appl 164:113990 13. Zhang H, Nguyen DT, Das S, Zhang H, Thai MT (2016), Least Cost Influence Maximization Across Multiple Social Networks, IEEE/ACM Trans Networking, https://doi.org/10.1109/ TNET.2015.2394793 14. Romero DM, Galuba W, Asur S, Huberman BA (2011) Influence and passivity in social media. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 18–33 15. Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 65–74 16. Gong M, Song C, Duan C, Ma L (2016) An Efficient Memetic Algorithm for Influence Maximization in Social Networks, IEEE Comput Intell Mag 11(3):22–33, https://doi.org/10.1109/ MCI.2016.2572538 17. Wang N, Sun Q, Zhou Y, Shen S (2016) A Study on influential user identification in online social networks, Chinese Journal of Electronics, The Institution of Engineering and Technology. https://doi.org/10.1049/cje.2016.05.012 18. Weitzel L, Quaresma P, de Oliveira JPM (2012) Measuring node importance on twitter microblogging. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics. ACM, no 11 19. Tong G, Wu W, Tang S, Du DZ (2017) Adaptive influence maximization in dynamic social networks, IEEE ACM Trans. Network, https://doi.org/10.1109/TNET.2016.2563397 20. Yamaguchi Y, Takahashi T, Amagasa T, Kitagawa H (2010) Turank: Twitter user ranking based on user-tweet graph analysis. In: International conference on web information systems engineering. Springer, pp 240–253 21. Zhao Q, Lu H, Gan Z, Ma X (2015) A K-shell decomposition based algorithm for influence maximization. In: Cimiano P, Frasincar F, Houben GJ, Schwabe D (eds) Engineering the web in the big data era. ICWE 2015. Lecture notes in computer science, vol 9114. Springer 22. Zhao N, Bao J, Chen N (2020) Ranking influential nodes in complex networks with information entropy method. Complexity 2020(5903798):15 23. Rani R, Bhatia V (2017) An efficient influence based label propagation algorithm for clustering large graphs, INFOCOM, https://doi.org/10.1109/ICTUS.2017.8286044 24. Rezaie B, Zahedi M, Mashayekhi H (2020) Measuring time-sensitive user influence in Twitter. Knowl Inf Syst 62:3481–3508 25. De Salve A, Mori P, Guidi B, Ricci L, Pietro RD (2021) Predicting influential users in online social network groups. ACM Trans Knowl Discov Data 15(35):1–50 26. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 1–62 27. Tashiro S, Nakamura Y, Matsuda K, Matsuoka M (2016) Application of convolutional neural network to prediction of temperature distribution in data centers. In: 2016 IEEE 9th international conference on cloud computing (CLOUD), pp 656–661. https://doi.org/10.1109/ CLOUD.2016.0092 28. Thaduri A, Polepally V, Vodithala S (2021) Traffic accident prediction based on CNN model. In: 2021 5th International conference on intelligent computing and control systems (ICICCS), pp 1590–1594. https://doi.org/10.1109/ICICCS51141.2021.9432224 29. Kavitha D, Hebbar R, Vinod PV, Harsheetha MP, Jyothi L, Madhu SH (2018) CNN based technique for systematic classification of field photographs. In: 2018 International conference on design innovations for 3Cs compute communicate control (ICDI3C), pp 59–63. https://doi. org/10.1109/ICDI3C.2018.00021
An Influential User Prediction in Social Network Using Centrality …
829
30. Cuzzocrea A, Leung CK, Deng D, Mai JJ, Jiang F, Fadda E (2020) A combined deep-learning and transfer-learning approach for supporting social influence prediction. Procedia Comput Sci 177:170–177 31. Abu-Salih B, Chan KY, Al-Kadi O et al (2020) Time-aware domain-based social influence prediction. J Big Data 7:10. https://doi.org/10.1186/s40537-020-0283-3
An Approach to Enhance the Character Recognition Accuracy of Nepalese License Plates Pankaj Raj Dawadi, Manish Pokharel, and Bal Krishna Bal
Abstract A methodology is proposed in this study to reduce character recognition flaws in the extracted License Plate (LP) region. To achieve this, the LP region must first be cleaned, and all noise that does not correspond to LP characters is filtered out using various image processing techniques. Later, the structure of LP characteristics will be used based on prior knowledge about the numbers of characters that are available in each vertical segment of the LP. Each positional LP character is determined either as a letter or a digit and will be expressed in terms of vector of letter, digit, or unknown. On the test dataset, the suggested technique for accurately segmenting LP characters had a 92% accuracy for all types of LP structures, whereas labeling letter-digit pairs had a minimum accuracy of 90.5% for 1-row LP. The proposed methodology can only be used with Nepalese LP; however, the approach can be adapted to work with LP datasets from other countries. Keywords Character segmentation · Letter-digit expression · Nepalese license plate · Image processing · Projection profiles
1 Introduction Automatic License Plate Recognition (ALPR) makes extensive use of surveillance cameras. A camera sensor is mounted in the zone of interest to detect the vehicles on the road. The first stage in any ALPR is to locate and extract the LP area, which is followed by precisely recovering the alphanumeric characters from the backdrop. Optical Character Recognition (OCR) technology is then employed to identify the P. R. Dawadi (B) · M. Pokharel · B. K. Bal Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal e-mail: [email protected] M. Pokharel e-mail: [email protected] B. K. Bal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. Khanna et al. (eds.), Proceedings of Data Analytics and Management, Lecture Notes in Networks and Systems 572, https://doi.org/10.1007/978-981-19-7615-5_67
831
832
P. R. Dawadi et al.
characters. Any ALPR system comprises four main phases: (1) image acquisition, (2) LP detection, localization and extraction, (3) character segmentation, and (4) recognition of characters [1, 2]. The first two steps of ALPR involve detecting the vehicle and capturing it. The LP is then detected in the image’s likely regions. In the third phase, characters are segregated from the background inside the detected zone. In the last stage, the characters are identified and later recognized using a character recognizer. The character segmentation approach along with prediction of positional characters is covered in this paper. The positional characters prediction step ensures improvement in the accuracy of character recognition which is discussed for private (Red) Nepalese LP context. The LP characters are merged as letter vector or digit vector and later fetched into a character recognizer for character prediction.
1.1 Nepalese License Plate In Nepal, the letter class represents the zonal, provincial, and load type attributes, whereas the digit class represents the lot number, province number, plate status, and vehicle number. According to the LP structure listed in Table 1, there are three types of LPs, which are further classified into ten categories based on the number of characters seen in each row of the LP. The LP characters are written in a specific order using letter-digit combinations to highlight various properties of Nepalese LPs for three-row, two-row, and one-row LPs. The province-based LP begins with the letter P and then advances on to Province Number (PN), Plate Status (PS), Lot Number (LT), Load Type (LD), and Vehicle ID (VID). Similarly, zonal LPs begin with Zone, then LT number, LD type, and finally VID. The VID is 1 to 4 digits vehicle number. Table 1 lists the sample LPs as well as character interpretations based on positional layout. Letters and digits make up the LP characters, with L designating letter and D denoting digit. For example, P is a letter, whereas the PS belongs to digit class. We propose a method to tackle the noise suppression and character segmentation for non-standardized Nepalese LP. A non-standardized LP is one that contains varying size LP characters on multiple vertical segments, different fonts with varying widths used to encode LP characters, unequal gaps or tightly written characters between two successive rows, unequal gaps or tightly written LP characters within the same vertical segment. The background of ALPR systems and the approaches employed are presented in Sect. 2. Section 3 describes the proposed methodology. The experiments are presented in Sect. 4. Section 5 wraps up this study with a summary of the main points and conclusions.
An Approach to Enhance the Character Recognition Accuracy …
833
Table 1 LP characteristics based on plate structure, color, and fonts Type
Class
3-row
1
2 -row
2
3
4
LP characters arrangement First row: P PN PS
First row: L D DD
Second row: LT LD
Second row: DDD L
Third row: VID
Third row: D OR DD OR DDD OR DDD
First row: P PN PS
PS First row: L D DD
Second row: LT LD VID
Second row: DDD L (D or DD or DDD or DDDD
First row: P PN PS LT LD
LD First row: L D DD DDD L
Second row: VID
Second row: D OR DD OR DDD OR DDD
First row: P PN
First row: L D
Second row: LT LD VID
Second row: DDD L (D or DD or DDD or DDDD)
First row: Z LT LD
First row: L D L
Second row: VID
Second row: (D or DD or DDD or DDDD)
First row: Z LT LD
First row: L DD L
Second row: VID
Second row: (D or DD or DDD or DDDD)
7
Z LT LD VID
L DD L (D or DD or DDD or DDDD)
8
Z LT LD VID
L D L (D or DD or DDD or DDDD)
9
Z LT LD VID
First row: L DD L DDDD
10
P PN PS LT LD VID
L D DD DDD L (D or DD or DDD or DDDD)
5
6
1-row
LP sample
2 Background and Literature Review The scientific community has recently become quite interested in ALPR systems and proposed various approaches to tackle ALPR problems. Most of the research
834
P. R. Dawadi et al.
on ALPR focuses on the study of a standardized LP while leaving the cases of nonstandardized LP. We primarily examine the noise filtering and character segmentation approaches for standardized and non-standardized LP in this section. The authors in [3] proposed an approach for character segmentation, which utilize the horizontal and vertical projection profile. The acquired characters from a LP were converted into vectors in their method, and their method has a recognition accuracy of 76.00%. In [4], an ALPR system is proposed. It is based on long short-term memory and employs Tesseract 4.0 as a character recognition engine. Its main goal is to improve real-time performance, and the confusion matrix was included to measure performance of a character recognizer. According to the authors, their proposed method has achieved up to 83.00% accuracy. Silva et al. [5] described a supervised classification algorithm-based technique for recognizing character; authors exploit sequence in pixel behavior for characters as a model to characterize the LP’s character classes and the pixel patterns were established with the primary purpose of improving real-time performance. An integrated segmentation strategy to separate the attributes which utilized the Harris corner detection method along with connected component-based method is proposed in [6]. They integrated the analysis of connected components with other features like pixel counts, aspect ratio, and character height. A vertical projection-based strategy was used by the authors in [7] to segment characters, in which an LP image is projected on the horizontal axis at the beginning. The total number of black pixels in each column is afterward estimated by the authors by adding up all the black pixels in each column; because there are gaps between each character, the histogram’s value in these gaps is zero. The individual peaks where the histogram value below threshold is filtered out and only peaks above threshold as prominent LP characters are allowed in the next stage. In [8], authors cropped the probable LP region with Bicubic Interpolation before segmenting it with Sliding Concentric Windows. In [9], the author takes the advantage of vertical histogram along with horizontal histogram-based method to separate the characters. The suggested method uses column sum vectors to locate character boundaries. Two clusters of neighboring characters are formed as a result. The character segmentation approach is devised in [10], which encompasses height and width estimation of available characters along with the character’s blob extraction. The character’s height is determined using color reversal, vertical edge detection, and a horizontal histogram projection. The edges of the LP characters are examined jointly with the help of Sobel mask and image binarization technique. To appropriately segment the characters, authors in [11] implemented image enhancement along with horizontal and vertical patch correction. For character segmentation, authors [12] used a three-step procedure: The horizontal skew, vertical skew, and compound skew are all rectified in the first stage and between the first and last characters, auxiliary lines are formed to detect the related boundaries in middle stage while the noise is eliminated and the characters are segmented in the last phase. Kocer and Cevik [13] employed techniques such as contrast expansion to sharpen the image, median filter to suppress noisy zones, and blob coloring on closed regions in a binary mask to segment the characters. The overall character recognition rate is 95.36%, with segmentation accuracy of 98.82%. To segment the characters, Shapiro and Dimov
An Approach to Enhance the Character Recognition Accuracy …
835
[14] employed the character clipper approach, which includes a feature extractor, classifier, post-processor, and character trainings. On the first LP dataset with 1000 photos, their suggested system had an accuracy of 85.2%, while on the second LP dataset with 400 images; it had 93.2% accuracy. Zhu et al. [15] proposed method for segmenting the characters. They began by using vertical projection to dynamically adjust the width of the LP characters in order to acquire standard width. Overall, the presented method is claimed to be 93.50% accurate. A character segmentation approach proposed in [16] is based on pixel count; an image histogram is created using vertical projection. The transition from a peak to its corresponding valley is used, and the threshold is set to segment the characters. Li and Chen [17] demonstrated a character segmentation process in which a binary LP image is interpreted as a matrix; the row vector is calculated as the sum of each column using a sum function, and the matrix in the horizontal axis is examined using a loop. The region is separated and considered disjoint as the sum of former column is less than one while the sum of the latter column is greater than one. The recognition accuracy is considered to be around 80% while the segmentation accuracy is found to be 95.00%. Only two works of literature are available as far as we know in context of Nepalese LP. The authors in [18] proposed an end-to-end ALPR system in which the HSV LP is color masked and a candidate LP region is determined using the aspect ratio and profile test from several detected contours. De-skewing and projection profiles are used later in post-processing, and the characters are learned and predicted using the HoG-SVM combination. According to the authors, the experiments were done on very limited dataset with 75% LP character recognition accuracy. Two deep learning algorithms were used by the researchers in [19]: one to detect the LP and the other to predict characters. According to the authors, character segmentation approaches comprises of CLAHE filter, projection profile, and noise filtering based on the concept of denoising. In their proposed method, the authors have only addressed standardized LP (2-row and 1-row), leaving the problem of non-standardized LP unaddressed. The available literatures cannot work alone in the context of Nepalese LP because the majority of the acquired LP is non-standardized, and the methods to clean and segment LP have to be amended in several phases. This stage is critical for improving the accuracy of character recognition.
3 Proposed Method Only the noise filtering and character segmentation processes are covered in this section, as we are concentrating on enhancing the character recognition stage. The phases of LP detection, classification, and character recognition have been purposely left here. The following are some of the research’s characteristics: (1) The study’s scope includes both non-standardized and standardized private LP for the Province and Bagmati Zone vehicles, as shown in Table 1 (2) A variety of filters are used to improve character segmentation, each of which is important and complements the
836
P. R. Dawadi et al.
other filters in the pipeline. Although the processes are adequately addressed, the existing research does not go into great detail about them.
3.1 Noise Filtering and Plate Region Segmentation The noise filtering stage for LP comprises of removing the background and determining the number of vertical segments. To remove the background color from vertically segmented regions, each region is processed again with the same filter with minor or major changes during character’s blob analysis. To increase recognition accuracy, the characters are segmented, the number of characters is approximated, and the characters are appended to the appropriate class. Figure 1 depicts the proposed method for noise filtering and plate region segmentation. To correct the low light in the RGB LP, we first apply an image filter that looks at the expected global average intensity of luminance components in the YCbCr color space and thresholds the image to determine if it is bright or dark. After that, the thresholded image is sent to the gamma correction filter, which corrects the brightness of the captured LP while keeping the other chrominance components intact. The image changes from black to white as the gamma value rises. The final image is an RGB color space image with low light correction. The Saturation (S) and Value (V ) channels of the HSV-transformed resultant RGB image are then fed through a Contrast Limited Adaptive Histogram Equalization (CLAHE) filter [20] in the following stage. Other Adaptive Histogram Equalization techniques fail to limit the noise amplification in some uniform areas in most cases. CLAHE redistributes contrast in each channel by constructing many histograms, each of which corresponds to a different part of the image, and using them to redistribute contrast. To minimize noise, the S and V pictures are thresholded and then Bitwise-Anded. Then after, the image is fed to a Connected-Component Analysis (CCA) [21] technique, and only the regions that pass the conditions are permitted to advance to the next stage. For
Fig. 1 Noise filtering and plate region segmentation
An Approach to Enhance the Character Recognition Accuracy …
837
this, the width, height, aspect ratio, and area of the positional LP characters are taken into account. The regions with a bigger or smaller area than the threshold are filtered out at this step. In the next stage, the Binary LP Horizontal Projection Profile (HPP) is calculated. HPP is the projection profile of an image on horizontal axis, and it is calculated for each row as the sum of all column pixel values within the row. The image is then thresholded depending on the expected number of pixels in each row based on the LP structure. The threshold value is calculated by dividing the sum of the minimum and maximum number of pixels for each row by a constant value. Pixel values in rows that do not meet the threshold are suppressed and turned into black pixels. By establishing a boundary between two successive rows, this step aids in the partitioning of the original RGB LP into numerous vertical segments based on the LP structures. The threshold value aids in the rough segmentation of two vertical segments; however in some characters, it destroys the edge area. For example, due to thresholding rows based on pixel sums, the top edge region of the character (Pradesh—Province) in the first segment of a 3-row LP is lost. To prevent this, we process the original RGB LP again by passing it to another CLAHE. For consistency, each vertically segmented region is scaled to RGB image of size 120 by 40 pixels. Each extracted vertical segment is fetched again to an HSV image filter, and only the S and V channels are extracted. The S and V channels are processed by another CLAHE filter to reduce the effect of noise in limited homogenous area. The thresholded S and V channels are Bitwise-Anded again, and CCA is applied to generate the binary mask of each vertical segment. The edge areas of characters are protected in this stage. Vertical projection profile (VPP) is performed in the next stage on vertical axis; for each column as the sum of all row pixels values within the column. Based on the size of the position character, the image is then thresholded on the predicted amount of pixels in each column. As a result, valleys and peaks can be seen between two successive characters; each pick represents an LP character, while the valley reflects the gaps between each horizontally segmented character. In a later stage, all the LP characters from noise-filtered vertical segments are added in a single vector S, which is used to identify the character’s class based on character position. Figure 2 shows the overall noise filtering pipeline for a 3-row LP structure.
3.2 Positional Decision of LP Characters The failure of the Character Recognizer Model (CRM) to anticipate a right label for each LP character has a substantial impact on its performance. To anticipate alphanumeric characters, most OCR-based character recognition experiments use a single CRM. CRM’s accuracy suffers since it must predict characters for many classes, and in the context of Nepalese LP, many characters are included in the letter class to express diverse characteristics. A letter-digit pair (PA-1) exists, for examples; 1 belongs to the digit class, whereas PA belongs to the letter class. Because these two characters are written similarly in LP, if they are fed into a single CRM for prediction,
838
P. R. Dawadi et al.
Fig. 2 Proposed pipeline for noise filtering and character segmentation
they may be misclassified. Separate models for letter and digit classes can be used to overcome this challenge, which will enhance recognition accuracy. In proposed approach, the LP characters are segmented one by one from all rows and appended into the corresponding image vectors based upon the prior information about the position of LP characters. The algorithm for character segmentation is illustrated in Table. 2. According to Table 2, a 3-row LP structure has three horizontal segments H and is classified as a class 1 problem, a 2-row LP structure has multiple variations and is handled by checking the number of characters in the top segments H1 and the total number of characters in a vector S, and a 1-row LP structure has unpredictable character position and is handled by introducing a U image vector. Because the third and fourth characters in a one-row LP can be either a letter or a digit, the CRM encounters letter-digit ambiguity. For 1-row LP, Class 7, 8, and 9 examples reveal unusual character behavior. So, the proposed approach for the ALPR system’s third and fourth stages has been modified as illustrated in Fig. 3.
4 Result and Discussion Two separate experiments were carried out to assess the proposed system’s adaptability. The first test was undertaken to see how well the proposed solution lowers LP noise and splits the character’s blobs from the plate region appropriately. The second experiment was carried out to decide the accuracy of the suggested approach for producing the letter-digit expressions on LP. The letter-digit expression cannot be created successfully if the blobs are not accurately retrieved.
An Approach to Enhance the Character Recognition Accuracy …
839
Table 2 Algorithm for positional character decision Input: A binarized LP mask Initially L = D = U =S = H1 = empty image vectors find horizontal_segment(H) in LP if (H = 3): declare 3 row assign Class = 1 /* province 3-row */ for each H: find each vertical_segment V as a character and append all V in a 2 dimensional image vector S for each V in S: resize V into 32 * 32 size if position is 1 or 7: append V in a letter vector L else: append V in a digit vector D if (H = 2): declare 2-row LP find vertical segments V1 in first horizontal segment H1 for each H: find each vertical_segment V as a character and append all V in a 2 dimensional image vector S for each V in S: resize V into 32 * 32 if length(S) < = 8: declare zonal LP if length(H1) = 3: assign Class = 5 /* zonal 2-row with 3 characters in first row */ if position is 1 or 3: append in a letter vector L else: append in a digit vector D if(length(H1) = 4): /* zonal 2-row with 4 characters in first row */ assign Class = 6 if position is 1 or 4: append V in a letter vector L else: append V in a digit vector D else if(length(S) > 8): Declare Province LP if length(H1) = 4: assign Class = 2 /* province 2-row with 4 characters in first row */ if position is 1 or 8: append V in a letter vector L else: append V in a digit vector D else if length(H1) = 8: assign Class = 3 /* province 2-row with 8 characters in first row */ if position is 1 or 8: append V in a letter vector L else: (continued)
840
P. R. Dawadi et al.
Table 2 (continued) Input: A binarized LP mask append V in a digit vector D else if length (H1) = 2: assign class = 4 /* province 2-row with 2 characters in first row */ if position is 1 or 5: append V in a letter vector L else: append V in a digit vector D if( H = 1): Declare 1 row LP find vertical_segment V as a character and append all V in a 2 dimensional image vector S for each V in S: resize V into 32 * 32 if length(S)