910 57 27MB
English Pages 1024 [1025] Year 2023
Lecture Notes in Networks and Systems 587
Subarna Shakya Valentina Emilia Balas Wang Haoxiang Editors
Proceedings of Third International Conference on Sustainable Expert Systems ICSES 2022
Lecture Notes in Networks and Systems Volume 587
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Subarna Shakya · Valentina Emilia Balas · Wang Haoxiang Editors
Proceedings of Third International Conference on Sustainable Expert Systems ICSES 2022
Editors Subarna Shakya Department of Electronics and Communication Engineering Pulchowk Campus, Institute of Engineering Tribhuvan University Lalitpur, Nepal
Valentina Emilia Balas Automation and Applied Informatics Aurel Vlaicu University of Arad Arad, Romania
Wang Haoxiang Go Perception Laboratory Cornell University Ithaca, NY, USA
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-7873-9 ISBN 978-981-19-7874-6 (eBook) https://doi.org/10.1007/978-981-19-7874-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honored to dedicate this book to all the technical program committee members, editors and authors of ICSES 2022.
Preface
This proceeding includes the papers presented in the 3rd International Conference on Sustainable Expert Systems [ICSES 2022], Organized by Tribhuvan University (TU), Nepal with a primary focus on the research information related to Artificial Intelligence [AI], Sustainability, and Expert Systems applied in almost all the areas of industries, government sectors, and academia worldwide. ICSES 2022 will provide an opportunity for the researchers to present their research results and examine the advanced applications in Artificial Intelligence [AI] and the expert systems field. Nevertheless, this proceeding will promote novel techniques to extend the frontiers of this fascinating research field. Advancement in both the sustainability and intelligent systems disciplines require an exchange of thought and ideas of audiences from different parts of the world. The 2022 International Conference on Sustainable Expert Systems was potentially designed to encourage the advancement and application of sustainability and artificial intelligence models in the existing computing systems. ICSES 2022 received a total of 327 manuscripts from various countries, wherein the submitted papers have been peer-reviewed by at least 2 reviewers drawn from the technical conference committee, external reviewers, and also from editorial board depending on the research domain of the papers. Through a rigorous peer-review process, the submitted papers are selected based on the research novelty, clarity and significance. Out of these, 76 papers were accepted for publication in ICSES 2022 proceedings. It covers papers from several significant thematic areas namely data science, wireless communication, intelligent systems, social media, and image processing. To conclude, this proceeding will provide a written research synergy that already exists between the intelligent expert systems and network-enabled communities, and represents a progressive framework from which new research interaction will result in the near future. I look forward on the evolving of this conference theme over time and to learning more on the network sustainability and continue to pave the way for applications of network sustainability in the emerging intelligent expert systems. Lalitpur, Nepal
Prof. Dr. Subarna Shakya
vii
Contents
Design, Development, and Implementation of Software Engineering Virtual Laboratory: A Boon to Computer Science and Engineering (CSE) Education During Covid-19 Pandemic . . . . . . . . Ashraf Alam and Atasi Mohanty The Effect of Facebook Social Media on Recent Jambi Regional Election—An Empirical Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimas Subekti and Titin Purwaningsih
1
21
Estimating the Intervals Between Mount Etna Eruptions . . . . . . . . . . . . . Kshitij Dhawan
35
Sentiment Enhanced Smart Movie Recommendation System . . . . . . . . . V. Ranjith, Rashmita Barick, C. V. Pallavi, S. Sandesh, and R. Raksha
45
A Machine Learning Model for Predictive Maintenance of a Stepper Motor Using Digital Simulation Data . . . . . . . . . . . . . . . . . . . B. Sivathanu Kumar, A. Aravindraj, T. A. S. Sakthi Priya, Sri Nihanth, Dhanalakshmi Bharati, and N. Mohankumar Impact of Pollutants on Temperature Change and Forecasting Temperature of US Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanmaay Kankaria, Bandla Vaibhav Krishna, Duppanapudi Surya Teja, D. V. S. Dinesh Chandra Gupta Kolipakula, and R. Sujee Hybrid Precoding Schemes for mmWave Massive MIMO Systems—A Comprehensive Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Baranidharan, K. P. Nithish Sriman, V. Sudhan Siddarth, P. Sudharsan, M. Krishnan, and A. B. Tharikaa Srinithi Optimized Web Service Composition Using Hybrid Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Subbulakshmi, M. Seethalakshmi, and Devika Unni
57
71
83
93
ix
x
Contents
Organization Security Framework—A Defensive Mechanism . . . . . . . . . Sayooj B. Kumar, Krishna Rajeev, Sarang Dileep, Adil Muhammed Ashraf, and T. Anjali
105
Trusty Medicare: An Online Virtual Care System . . . . . . . . . . . . . . . . . . . Aruna U. Gawade, Shourya Amit Kothari, Saloni Deepak Patel, Rushi Bhavesh Desai, and Varun Dinesh Talreja
115
Blockchain-Based Remote Construction Monitoring Using UAV in SITL Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Sherin Beevi, S. Muthusundari, D. Vishnu Sakthi, and G. Subhashini Papaya Diseases Detection Using GLCM Feature Extraction and Hyperparatuning of Machine Learning Approach . . . . . . . . . . . . . . . Snehal J. Banarase and S. D. Shirbahadurkar Image Forgery and Image Tampering Detection Techniques: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Hridya Nair, Kasthuri A. S. Nair, Niharika Padmanabhan, S. Remya, and Riya Ratnakaran Low-Voltage Ride-Through for a Three-Phase Grid-Integrated Single-Stage Inverter-Based Photovoltaic System Using Fuzzy Logic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Sahana and N. Sowmyashree Automated Detection of Malaria Parasite from Giemsa-Stained Thin Blood Smear Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Vanitha and S. Srivatsan Forecasting Diabetic Foot Ulcers Using Deep Learning Models . . . . . . . . Shiva Shankar Reddy, Laalasa Alluri, Mahesh Gadiraju, and Ravibabu Devareddi Artificial Intelligence-Based Chronic Kidney Disease Prediction—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. M. Amaresh and Meenakshi Sundaram A. Smart Home Security System Using Facial Recognition . . . . . . . . . . . . . . G. Puvaneswari, M. Ramya, R. Kalaivani, and S. Bavithra Ganesh Automated Algorithm for Neurodegenerative Disorder Detection Using Gait-Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richa Tengshe, Akanksha Singh, Priyanshu Raj, Saavi Yadav, Syeda Kauser Fathima, and Binish Fatimah
131
145
159
181
195 211
229 239
253
Contents
Health Monitoring System for Comatose Patient Using Raspberry-Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Visvesvaran, S. Kamalakannan, M. Maria Rubiston, K. Aventhika, V. C. Binsha Vinod, and V. Deepashri
xi
263
Digital Skeletonization for Bio-Medical Images . . . . . . . . . . . . . . . . . . . . . . Srinivasa Rao Perumalla, B. Alekhya, and M. C. Raju
277
IoT-Based Condition Monitoring of Busbar . . . . . . . . . . . . . . . . . . . . . . . . . Lloied Abraham Lincoln, Chandrashekhar Badachi, and Pradipkumar Dixit
293
Novel CNN Approach (YOLO v5) to Detect Plant Diseases and Estimation of Nutritional Facts for Raw and Cooked Foods . . . . . . . M. Najma and G. Sekar
305
The Evolution of Ad Hoc Networks for Tactical Military Communications: Trends, Technologies, and Case Studies . . . . . . . . . . . . Zalak Patel, Pimal Khanpara, Sharada Valiveti, and Gaurang Raval
331
Modified Floating Point Adder and Multiplier IP Design . . . . . . . . . . . . . S. Abhinav, D. Sagar, and K. B. Sowmya
347
Meta Embeddings for LinCE Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Ravi Teja, S. Shilpa, and Neetha Joseph
363
Recent Trends in Automatic Autism Spectrum Disorder Detection Using Brain MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triveni D. Dhamale and Sheetal U. Bhandari
375
A Pipeline for Business Intelligence and Data-Driven Root Cause Analysis on Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shubham Thakar and Dhananjay Kalbande
389
Non-knowledge Based Decision Support System . . . . . . . . . . . . . . . . . . . . . N. L. Taranath, B. P. Aniruddha Prabhu, Rakesh Dani, Devesh Tiwari, and L. M. Darshan Interactive Image Generation Using Cycle GAN Over AWS Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lakshmi Hemanth Nallamothu, Tej Pratap Ramisetti, Vamsi Krishna Mekala, Kiran Aramandla, and Rajeswara Rao Duvvada Challenges and New Opportunities in Diverse Approaches of Big Data Stream Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nirav Bhatt, Amit Thakkar, Nikita Bhatt, and Purvi Prajapati
399
411
425
xii
Contents
Door Lock System Using Cryptographic Algorithm Based on Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sumit M. Sangannavar, Sohangi Srivastava, G. N. Sagara, and Usha Padma Knowledge Engineering-Based Analysis of Convolutional Neural Network Architectures’ Performance on Luna16 and GAN Generated Pulmonary Nodule Clipped Patches to Diagnose Lung Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ramasubramanya Mysore Sheshadri, Yash Aryan Chopra, Yashas Anand, G. Sumukh, and S. Geetha Brain Tissue Segmentation Using Transfer Learning . . . . . . . . . . . . . . . . . Farhan Raza Rizvi and Khushboo Agarwal
435
449
463
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar Enrolment in Citizen Services Using Blockchain . . . . . . . . . . . . N. Veena and S. Thejaswini
477
Enhanced Human Action Recognition with Ensembled DTW Loss Function in CNN LSTM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . D. Dinesh Ram, U. Muthukumaran, and N. Sabiyath Fatima
491
Link Prediction Using Fuzzy Computing Model by Analyzing Social Relationship in Criminal Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . M. R. Sumalatha, Lakshmi Harika Palivela, G. Aishwarya, M. Roshin Farheen, and Aadhithya Raj Madhan Raj Estimation of Maximum Power Operating Point for Single-Diode Solar Photovoltaic Model Using Python Programming . . . . . . . . . . . . . . . Christina Thottan and Haripriya Kulkarni Hardware Trojan Modelling on a FPGA Based Deep Neural Network Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gundlur Bhanu Vamsi Krishna, Karthi Balasubramanian, and B. Yamuna Sustainable Farming and Customized Livestock Management Using Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. A. Sivakumar, B. Maruthi Shankar, M. Mahaboob, N. Adhish, R. Dineshkumar, and N. Rahul Verification and Validation of 64-Bit Processor Memory System . . . . . . S. G. Chandana and G. Nagendra Rao Land Use and Land Cover Change Assessment Using Remote Sensing and Geographic Information System . . . . . . . . . . . . . . . . . . . . . . . . Ch. Rohitha, N. Vinay, G. Bharath Kumar, and M. Suneetha
509
523
533
543
553
563
Contents
A Comparative Analysis of Single Image Dehazing Techniques for Real Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pushpa Koranga, Sumitra Singar, and Sandeep Gupta A Detailed Study on a Software-Based Fog Network-Based Delay-Tolerant Data Transmission Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Kotari Sridevi, J. Kavitha, G. Charles Babu, Yugandhar Garapati, and Srisailapu D. Vara Prasad Noisy Brain MR Image Segmentation Using Modified Adaptively Regularized Kernel Fuzzy C-Means Clustering Algorithm . . . . . . . . . . . . P. Yugander, K. Akshara, Syed Zaheruddin, K. Suganthi, and M. Jagannath
xiii
573
587
601
Deep Learning with Metadata Augmentation for Classification of Diabetic Retinopathy Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maksym Shulha, Yuri Gordienko, and Sergii Stirenko
613
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Mohamed Haris, N. Sabiyath Fatima, and Syed Abdallah Albeez
631
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manish, Upasana Dohare, and Sushil Kumar
649
A Lightweight Intrusion Detection Model for In-vehicular CAN Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. S. Divya Raj, G. Renjith, and S. Aji
665
Vein Pattern-Based Species Classification from Monocotyledonous Leaf Images with Deep Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdul Hasib Uddin, Sharder Shams Mahamud, Abdullah Al Noman, Prince Mahmud, and Abu Shamim Mohammad Arif A Scalable Distributed Query Framework for Unstructured Big Clinical Data: A Case Study on Diabetic Records . . . . . . . . . . . . . . . . . . . . Ahmet Sayar DDGAN: Deep Dense Generative Adversarial Networks for Improvement in Arrhythmia Classification . . . . . . . . . . . . . . . . . . . . . . S. T. Sanamdikar, S. T. Hamde, V. G. Asutkar, R. M. Sahu, and R. K. Moje Securing Health Records Using Quantum Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Arulmozhi, J. I. Sheeba, and S. Pradeep Devaneyan
679
691
701
719
xiv
Contents
A Comprehensive Review and Current Methods for Classifying Alzheimer’s Disease Using Feature Extraction and Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Chithra and R. Vijayabhanu A Comparative Study of Machine Learning and Deep Learning Techniques for Prediction of CO2 Emission in Cars . . . . . . . . . . . . . . . . . . Samveg Shah, Shubham Thakar, Kashish Jain, Bhavya Shah, and Sudhir Dhage Artificial Intelligence: An Effective Protocol for Optimized Baggage Tracking and Reclaim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saâdia Chabel and El Miloud Ar-Reyouchi Detecting Frauds in Financial Statements Using Deep-Forest . . . . . . . . . Quang-Vinh Dang
735
749
759 773
Strengthening the Communicative Competence by Integrating Language Functions in the EFL Classroom with Gamification Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dorys Cumbre-Coraizaca, Verónica Chicaiza-Redín, Ana Vera-de la Torre, and Xavier Sulca-Guale
781
Target Tracking Area Selection and Handover Security in Cellular Networks: A Machine Learning Approach . . . . . . . . . . . . . . . Vincent Omollo Nyangaresi
797
Uncertainty Analysis of Fluid Flow Measurement in Pipeline . . . . . . . . . Shaowen Cao, Yurou Yao, Qilin Cai, Jiapeng Zhang, Li Zuo, Xiaoming Liu, and Xi Wu
817
ECT-ABE Algorithm-Based Secure Preserving Framework for Medical Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. G. Sree Devi and N. Suresh Singh
841
Evaluation of the Preference of Web Browsers Among Undergraduates Using AHP-TOPSIS Model . . . . . . . . . . . . . . . . . . . . . . . . Kah Fai Liew, Weng Siew Lam, Weng Hoe Lam, and Kek Xue Teh
851
A Distributed Framework for Measuring Average Vehicle Speed Using Real-Time Traffic Camera Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmet Sayar
863
Auditory Machine Intelligence for Incipient Fault Localization and Classification in Transmission Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biobele A. Wokoma, Dikio C. Idoniboyeobu, Christopher O. Ahiakwo, and Sepribo L. Braide
877
Contents
The RRRS Methodology Using Self-Regulated Strategies with ICT to Homogenize the English Language . . . . . . . . . . . . . . . . . . . . . . Verónica Elizabeth Chicaiza Redin, Sarah Iza-Pazmiño, Edgar Guadia Encalada Trujillo, and Cristina del Rocío Jordán Buenaño Application of Artificial Intelligence in Human Resource Activities for Advanced Governance and Welfare . . . . . . . . . . . . . . . . . . . . K. Sivasubramanian, K. P. Jaheer Mukthar, Wendy Allauca-Castillo, Maximiliano Asis-Lopez, Cilenny Cayotopa-Ylatoma, and Sandra Mory-Guarnizo
xv
889
901
Model-Based Design for Mispa CountX Using FPGA . . . . . . . . . . . . . . . . M. Manasy Suresh, Vishnu Rajan, K. Vidyamol, and Gnana King
909
Analysis of Native Multi-model Database Using ArangoDB . . . . . . . . . . . Rajat Belgundi, Yash Kulkarni, and Balaso Jagdale
923
A Review of Deep Learning-Based Object Detection Current and Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Museboyina Sirisha and S. V. Sudha
937
Boron Nitride Nanotube for Sustainable Nanomaterial: A Bibliometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weng Siew Lam, Pei Fun Lee, and Weng Hoe Lam
953
Design of a University Thesis and Project Automation System (UTPAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Rawshan Habib, Abhishek Vadher, Md. Tanzimul Alam, Md. Apu Ahmed, Md. Shahnewaz Tanvir, Tahsina Tashrif Shawmee, Rifah Shanjida, Aditi Ghosh, and Rao Faraz Shakeel
969
An Evaluation of Machine Learning Methods for Classifying Bot Traffic in Software Defined Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua van Staden and Dane Brown
979
A Systematic Literature Review on Information Security Leakage: Evaluating Security Threat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sahar Ebadinezhad
993
Identifying Memory Dump Malware Using Supervised Learning . . . . . . 1009 Abdel Kareem Klaib, Mohammad Al-Nabhan, and Qasem Abu Al-Haija Analysis of Quality Management and Sustainability in Sports Services During the Covid-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021 Mocha-Bonilla Julio Alfonso, Mocha Altamirano Kevin Israel, Tenorio Heredia Lucia Elizabeth, and Encalada Trujillo Edgar Guardia
xvi
Contents
Cost Analysis of Simultaneous AC-DC Power Transmission System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035 Md. Rawshan Habib, Ahmed Yousuf Suhan, Abhishek Vadher, Abu Rifat Rakibul, Quazi Shadman Doha, Tahsina Tashrif Shawmee, Md. Shahnewaz Tanvir, Md. Mossihur Rahman, and Shuva Dasgupta Avi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
About the Editors
Dr. Prof. Subarna Shakya is currently Professor of Computer Engineering, Department of Electronics and Computer Engineering, Central Campus, Institute of Engineering, Pulchowk, Tribhuvan University, Coordinator (IOE), LEADER Project (Links in Europe and Asia for engineering, eDucation, Enterprise and Research exchanges), Erasmus Mundus. He received M.Sc. and Ph.D. degrees in Computer Engineering from Lviv Polytechnic National University, Ukraine, 1996 and 2000, respectively. His research area includes E-Government System, Computer Systems and Simulation, Distributed and Cloud computing, Software Engineering and Information System, Computer Architecture, Information Security for E-Government, Multimedia System. Dr. Valentina Emilia Balas is currently Full Professor at “Aurel Vlaicu” University of Arad, Romania. She is Author of more than 300 research papers. Her research interests are in Intelligent Systems, Fuzzy Control, Soft Computing. She is Editor-in Chief in International Journal of Advanced Intelligence Paradigms (IJAIP) and in IJCSE. Dr. Balas is Member of EUSFLAT, ACM and SM IEEE, Member in TC—EC and TC-FS (IEEE CIS), TC—SC (IEEE SMCS) and Joint Secretary of FIM. Wang Haoxiang is currently Director and Lead Executive Faculty Member of GoPerception Laboratory, NY, USA. His research interests include multimedia information processing, pattern recognition and machine learning, remote sensing image processing and data-driven business intelligence. He has co-authored over 60 journal and conference papers in these fields on journals such as Springer MTAP, Cluster Computing, SIVP; IEEE TII, Communications Magazine; Elsevier Computers and Electrical Engineering, Computers, Environment and Urban Systems, Optik, Sustainable Computing: Informatics and Systems, Journal of Computational Science, Pattern Recognition Letters, Information Sciences, Computers in Industry, Future Generation Computer Systems; Taylor & Francis International Journal of Computers and Applications and conference such as IEEE SMC, ICPR, ICTAI, ICICI, CCIS, ICACI. He is Guest Editor for IEEE Transactions on Industrial Informatics, IEEE Consumer Electronics Magazine, Multimedia Tools and Applications, xvii
xviii
About the Editors
MDPI Sustainability, International Journal of Information and Computer Security, Journal of Medical Imaging and Health Informatics, Concurrency and Computation: Practice and Experience.
Design, Development, and Implementation of Software Engineering Virtual Laboratory: A Boon to Computer Science and Engineering (CSE) Education During Covid-19 Pandemic Ashraf Alam and Atasi Mohanty Abstract One of the critical features of Computer Science and Engineering (CSE) education is learning by doing. The rapid upsurge in the use of Internet has drawn attention to the importance of online laboratory-based learning in CSE education. However, bringing such experiences online is challenging. Contextually, enabling online virtual lab-based learning is a modern trend in many educational institutions in India. So, online laboratory-based learning has emerged as a popular area of research among educational-technology researchers. In an online laboratory learning environment, the instructor has a significantly reduced role, and students take increased responsibility for their learning. This shows that in online lab-based learning, the involvement of students is higher. Traditional classroom-based laboratory learning has many limitations. The necessity to engage students in self-learning through online laboratory learning is imminent, as the students get an opportunity to perform their laboratory experiments beyond the classrooms as well, such as at home or when on vacation. In this paper we present the design, architecture, database schema, related technologies, user activities, and the use/reuse aspects of the software engineering virtual laboratory (SE VLab) that we have designed and developed. We also provide the details of each experiment in the SE VLab with examples and the assessment results obtained using the SE VLab. Keywords Critical pedagogy · Software engineering · Curriculum · Virtual laboratory · Computer science engineering education · Evaluation · Learning outcomes
A. Alam (B) · A. Mohanty Rekhi Centre of Excellence for the Science of Happiness, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_1
1
2
A. Alam and A. Mohanty
1 Design of the SE VLab This section describes the architecture and database schema design for the software engineering virtual laboratory (SE VLab) that we’ve designed and developed.
1.1 Architecture We developed a software engineering virtual laboratory (SE VLab) as a Web application, based on the Model-View-Controller (MVC) architecture, as shown in Fig. 1. In the MVC architecture, a “model” defines a data store containing all the required pieces of information. The “views” define how the contents obtained from the model are rendered. For example, the same set of information can be displayed list-wise or in a tabular fashion. The “controller” controls what data are to be provided for display, depending on the given constraints, if any. For example, one might wish to view the sales reports only for the year 2021. The MVC architecture allows an application to achieve decoupling among the data, presentation and business logic layers. Thus, any component can be modified without affecting others. As shown in Fig. 1, when a user requests a page by typing in the URL in the Web browser, an HTTP Request is sent to the Web server. This HTTP Request is intercepted by the controller module of the application running on the server, which, based on the specified URL, fetches the necessary information from the database (model). The selected data is then populated within a pre-defined HTML template (view) and returned back to the user as an HTTP Response. The contents are, then, displayed in the user’s Web browser. Apart from the plain HTTP Requests, the client can also send Asynchronous JavaScript and XML (AJAX) queries. AJAX requests differ from the normal HTTP Requests—in the later case, the entire page is refreshed,
Fig. 1 MVC architecture of the SE Vlab (Adapted from [1–6])
Design, Development, and Implementation of Software Engineering …
3
but, with the former, only a portion of the displayed page is updated. When an AJAX query is received by the SE VLab application, it sends out a response in the JavaScript Object Notation (JSON) format. JSON data are retrieved by the JavaScript code on the client side, and the contents are updated.
1.2 Database Schema A relational database model is used to implement the data storage facilities for the SE VLab. Figure 2 shows the ER diagram involving the various tables, and their cardinality mappings. Every table in the database has a primary key (PK), represented with an id. The “core” table of the database is isad_theory. This table is referred to by multiple other tables, whose foreign keys are indicated by theory_id. The other fields of the isad_theory table are title, introduction, objectives and contents. Each record of this table represents an experiment, with the PK giving the experiment number. Thus, the addition of a new experiment in the SE VLab begins with the insertion of a record into this table. The information from the contents fields are extracted and displayed in the theory section of the corresponding experiments. The other tables directly referencing the isad_theory table contains the contents to be displayed for the other sections of the corresponding experiments. Of particular interest is the isad_exercise table. The
Fig. 2 ER model for the SE Vlab (Adapted from [7–14])
4
A. Alam and A. Mohanty
Exercises section of any experiment contains a workspace and provides the sample solutions. This information is captured in the isad_workspace and isad_solution tables, respectively. As shown in Fig. 2, an experiment can have multiple exercises, and the same workspace can be used by multiple such exercises. For example, the experiment on the ER diagram has two exercises. Both of these use the same workspace interface.
1.3 Related Technologies A collection of Free and Open-Source Software are used to develop the SE VLab. The core of it is implemented using Django, a Python-based framework for Web applications, and a MySQL database is used as the data store. Multiple other software, including Graphviz and PlantUML, are used at the backend to dynamically draw the ER and UML diagrams. The workspaces developed for all these exercises are intuitive, where a user is required to specify the required parameters as text. For example, in the case of state diagrams, a user is required to specify the labels for each state, the actions causing the transitions, and so on. All the user inputs are logically summarized in multiple tables for cross checking and further modifications, if any. Once a user completes the diagram specifications, s/he clicks on a button, which transfers the inputs to the server. Based on the inputs, the corresponding diagram is generated at the server, and returned back to the user’s browser. The SE VLab is deployed in a virtual machine running the Linux OS. The advantages are multi-fold: high availability of the virtual lab, easy reuse, deployment to other places, quick modification and testing.
1.4 User Activities The users of the SE VLab are required to complete the related theory and simulations before beginning with any hands-on work. The Exercises module of the virtual lab presents a set of problems on every experiment. Figure 3 shows the sequence diagram of activities performed while working on the Exercises in the SE VLab. When a user selects an exercise number, the corresponding problem statement and the workspace are displayed. The user then reads the problem, and uses the workspace to input all the parameters required for drawing the concerned diagram. Once s/he clicks on the submit button, the summarized inputs are sent to the diagram engine, which generates the corresponding diagram at the back-end. This is then displayed in the user’s browser as an image. The diagram can be asynchronously updated, i.e., the user does not need to wait for the result to be available in order to alter the specifications.
Design, Development, and Implementation of Software Engineering …
5
Fig. 3 A typical sequence of activities in the SE Vlab (Adapted from [1, 15–22])
The SE VLab also provides a social-sharing widget to spread the word about it through different social networking sites. This form of interaction with the virtual lab would help in targeting numerous other potential users who are, otherwise, unaware of this virtual lab.
2 Overview of the SE VLab This section describes the objectives of the SE VLab, the target audience, the concept of an experiment, the list of experiments one can perform, the user interface, and the deployment issues.
6
A. Alam and A. Mohanty
2.1 Objectives The SE VLab is designed to provide users the basic concepts of SE on a wide range of topics. The selected topics of experiments in the SE VLab were decided as per the feedback from subject experts in SE area and from the SE laboratories curriculum followed by the premier institutions and major universities in India.
2.2 Use/Reuse of SE VLab The SE VLab is available online. An easy to use and intuitive programming environment is also provided. So, the application can be specified at a high level of abstraction by fostering the reuse of existing tools. The SE VLab is reasonably extensible in terms of reuse of learning components. The reusable platform can be used collaboratively by large groups of students in a distance-learning or open-learning mode, thereby overcoming the inherent weaknesses of the traditional laboratorybased educational system. The use and reuse of experiments in this open laboratory will improve students’ learning. Further, the SE VLab provides options to its administrators to easily add new experiments or enhance the existing ones. Moreover, reusable learning materials (e.g., presentations and illustrations on the topics provided by other faculties) can be easily linked-up from this lab. Such a collaborative effort on the reuse of existing open-source materials can further boost the quality and extent of laboratory education offered by the lab.
2.3 Target Audience The SE VLab is suitable for undergraduate and postgraduate students having SE as a curricular subject. It is also useful for the graduates in other engineering disciplines willing to pursue a career in the software industry. The SE VLab has a set of ten experiments focusing on different areas of the subject. A list of all the experiments, and their individual learning objective(s) is presented in Fig. 4. The details of few experiments are described in the following sections.
3 Details of Few Experiments of SE VLab In this section, we describe the details of few of the experiments included in SE VLab.
Design, Development, and Implementation of Software Engineering …
Fig. 4 List of experiments and the corresponding learning objectives in the SE VLab
7
8
A. Alam and A. Mohanty
3.1 Identification of Requirements from Problem Statements Requirements gathered during the requirement identification phase must be unambiguous, consistent, and complete. Requirements of a system are mainly of two types. In this experiment, simulation section contains animated functional requirement statements based on the Case-study on ‘Online Voting System’. The Exercises section of this experiment contains a problem statement for an ‘Online Auction System’. There are four sample exercises available in the Exercises section of this experiment. The objective of these exercises is to learn how to identify the FRs and NFRs of a system.
3.2 Estimation of Project Metrics Expert business analysts analyze the benefits as well as shortcomings of identified solutions strategies in terms of cost, time and resources required to develop the project. In this experiment, we describe how to categorize projects using Constructive Cost Model (COCOMO), estimate the effort and development time required for a project, and how to estimate the program complexity and effort using Halstead’s Metrics. The Exercises section of this experiment contains three exercises. The first exercise identifies the type of project as per COCOMO and prepares an estimate of the required effort and cost. The second exercise identifies the unique operators and operands from a given snippet of code and the third exercise helps to compute the estimated effort using Halstead’s metrics. However, the limitation of this experiment is that the values taken in the examples considered are arbitrary and does not relate to real life.
3.3 Statechart and Activity Modeling A statechart diagram is a pictorial representation of such a system, with all its states, and events that trigger a transition from one state to another. This is illustrated with the following example “A bulb with a push down switch. The bulb initially remains off. When the switch is pushed down, the bulb is on. Again, when the switch is pushed up, the bulb turns off. The life cycle of the bulb continues in this way until it gets damaged”. The statechart diagram for this example is shown in Fig. 5. From the problem statement, it can be identified that there are three possible states of the bulb, that are: “ON”, “OFF”, and “Damaged”. When the bulb is in the OFF state and the switch is pushed down, it goes to the ON state. Similarly, when the bulb is in the ON state and the switch is pushed up, it goes to the OFF state. When the bulb is in either
Design, Development, and Implementation of Software Engineering …
9
Fig. 5 A screenshot of statechart diagram for the considered example
the ON or the OFF state, and if the lifetime expires then it moves to the Damaged state. The learning objectives of the above illustrated example are as follows: • To identify different states of a system. • To identify activities performed in each state. The limitation of this experiment is that a complex system often has sub-states, which is not covered as a part of this experiment. The interface provided for this experiment only lets one represent simple states. Activity diagram is used to graphically represent the flow of activity in a system. The activity diagrams can also represent various other concepts such as concurrent activities, their joining, forking and grouping of activities using swimlanes, and object flows. The objective of this experiment includes providing knowledge on the fundamental aspects of activity diagrams and how to draw an activity diagram. As such, this experiment lacks certain features as listed below, which a dedicated UML Diagram editor should have. • This system lets students represent at most five parallel activities. • Nested decisions are not considered here, which might be quite essential in complex work flows. • A decision could not be taken immediately after a merge point. • Nested activity diagrams are implemented.
10
A. Alam and A. Mohanty
Fig. 6 Screenshot for the code for program to find the sum of first n natural numbers
3.4 Estimation of Test Coverage Metrics and Structural Complexity Consider a ‘C’ program that computes the sum of first n natural numbers. The screenshot for code for the considered program is shown in Fig. 6. From the code, the basic blocks of the program are computed in this experiment. Then from the generated basic blocks, the CFG of the program is generated. The screenshot of the CFG generated from the code as in Fig. 6 is shown in Fig. 7. From the CFG for the code in Fig. 6, we can check that the number of nodes denoted as N is 7 and the number of edges denoted as E is 7. Now, according to the formula to compute McCabe’s cyclomatic complexity, the cyclomatic complexity denoted as V(G) becomes N − E + 2 that is 7 − 7 + 2 = 2. The limitation of this experiment is that the current workspace for this experiment can generate CFGs only for the main function and this would not work with userdefined functions.
3.5 Designing Test Suites Figure 8 shows the snapshot of an exercise to be used by the students for designing a test suite experiment. This experiment discusses briefly the different types of testing, and provides mechanisms to have hands-on experience on performing unit testing in software engineering. This experiment is illustrated with an example which considers the development of a software module to compute the areas of different geometric shapes such as square, rectangle, circle and right-angled triangle. For example, a square with side
Design, Development, and Implementation of Software Engineering …
11
Fig. 7 The screenshot for the generated CFG for the code as in Fig. 6
of 10 units has an area of 100 sq. units. The results of designing test suite for the function square(10) is shown in Fig. 9. This can be verified from the output of the function call square(10). As seen in the result screenshot, for function square(10), it can be verified that the expected output and the actual output are the same and the execution result for the test case gives the status of the test suite as Passed. However, testing also attempts to diagnose the existence of possible bugs in the software. It is required to check how the above code behaves for a call, namely, rectangle(10, −5). The students are asked to modify the code to address this defect. The following instructions are given for the purpose: In each function, return −1, if any given dimension is negative. To train the students in detecting bugs, a bug is intentionally introduced in the code. The example of testing the function rectangle with length of 10 units and breadth of 5 units is considered. The results of designing a test suite for the function rectangle(10, 5) shows that the expected output is different from the actual one. Although, the expected output is correct, it shows incorrect result for the actual
12
Fig. 8 The Exercises section of the experiment on designing test suite
Fig. 9 Results of successful completion of a test suite
A. Alam and A. Mohanty
Design, Development, and Implementation of Software Engineering …
13
output. Due to this, the execution result for the function rectangle gives the status of the test suite as Failed. In general, a test suite fails when at least a single test case in it fails.
4 Assessment of the SE VLab: A Pilot Study This section describes the SE VLab assessment including content knowledge test or learning gain on SE VLab, loading time, and users’ evaluation data.
4.1 Data Analysis The SE VLab was evaluated using the online feedback tool for each experiment. We collected responses online from 197 undergraduate engineering students. The students ranged from ages 20–24; 112 of them were males and 85 were females. All students were in their fifth semester. As SE is primarily taught at the undergraduate level, the collected data are primarily from these students. The responses are based on a scale ‘excellent-topoor’. Eight of these questions (shown in Fig. 10) are based on the five-point Likert scale, while the others were open-ended. Figure 11 shows the percentage response for the questions in Fig. 10 together with the average percentage of the individual ratings of the Likert scale. It can be observed from Fig. 11 that at least 88% of all the questions were rated in the range ‘excellent-to-good’. This underscores the fact that the SE VLab is found useful to students having a course on SE. There exist a few cases where the users did not rate a given question, which are indicated by ‘No Response’ in Fig. 11. The above-mentioned assessment results demonstrated a very high level of user satisfaction. Some of the important comments of the users of the SE VLab on the weaknesses and strengths of the SE virtual lab experiments obtained through the open-ended section of the online feedback questionnaire are as follows: • “More challenging questions should be included at the end of each experiment.” • “Experiments are very well written.” • “I found this lab to be more interesting than our regular lab. I can access it any time. Hope it is more useful for students as well.” • “I completely agree with the design and description of this virtual lab. It is useful!” In future, we plan to evolve the present SE VLab by considering the above suggestions from users.
14
A. Alam and A. Mohanty
Fig. 10 Assessment Questions for SE VLab (as on 15th April 2022)
Fig. 11 Percentage responses for questions in Fig. 10, together with the average percentage of individual ratings on the Likert scale
4.2 Content Knowledge Test and Learning Gain The current study also examined a sample of 106 undergraduate and postgraduate students of IIT Kharagpur. During the Spring semester, the participants in the study were enrolled in a course covering the traditional SE lab. Activities, contents, and assessments within the lab course were the same across the treatment and control groups; the only difference was the type of laboratory used. The main purpose of
Design, Development, and Implementation of Software Engineering …
15
Fig. 12 Average loading time of each experiment
this study was to analyze the content knowledge of SE laboratory experiments. The participants ranged from ages 20–24, with 74 male and 32 female. We conducted a pre-test before using the SE VLab, and a post-test thereafter. The participants completed a questionnaire, consisting of 24 multiple choice questions and 5 openended questions related to SE. The questionnaire was validated by the SE subject experts. It was observed that the participants, after using the SE VLab, scored 32% more in their content knowledge test on SE laboratory experiments. This indicates that the students learned SE through interaction with the virtual laboratory.
4.3 Loading Time The average loading time taken by each experiment was observed. Figure 12 shows the average time taken by each experiment to load the ‘Theory’ section, measured using the Web-based WebWait tool (http://webwait.com/). The loading times for the other sections in all the experiments were also noted. The error bars in Fig. 12 indicate the standard deviation for the corresponding measurement. As indicated, the page load times are low, allowing enhanced interaction with the users.
4.4 Average Use of SE VLab Users The SE VLab can also record student activities and show the number of students that accessed the experiments (frequency) and times (hours). Figure 13 shows the average use of the SE VLab during 24 h in a day. Results indicate that the maximum use of this VLab occurs at around 11:00, 18:00, and 22:00 h, and the access stops between
16
A. Alam and A. Mohanty
Fig. 13 Average use of SE VLab within 24 h
1:00 to 5:00 h. From this, we inferred that students who use SE VLab accessed it round-the-clock on Web as per their convenience.
5 Concluding Remarks This paper outlines the design of the SE VLab that we developed and presents the preliminary results of performance evaluation gathered through user feedback. The results from preliminary user survey reflect that the students found the SE VLab to be suitable and useful for their course. This indicates that making the SE VLab a part of the regular SE curriculum might be a viable option. The scope of this evaluation, however, is to be expanded. The current SE VLab provides some scope of improvement in the future. Apart from the textual content provided here, audio and video materials can be added to enrich this lab further. In fact, multiple users have requested for the same. Moreover, a tighter coupling with the social networking platforms (e.g., Facebook, Twitter, and Instagram) is envisioned to ensure a greater user participation, as well as, a global audience. Such a feature would also enable group-activities and interval discussions among the participants.
5.1 Educational Implications The present study has the following educational implications: 1. An effective teaching-learning tool for distance-learning and mass-education. 2. Can boost personalized instruction self-learning through quality education. 3. As self-learners, the students will develop skills to own-up their learning responsibilities.
Design, Development, and Implementation of Software Engineering …
17
4. Knowledge construction, dissemination, and management would be enhanced across the subject topics. 5. Students’ motivation and creativity (through problem solving) be improved by promoting learning communities through both synchronous and asynchronous communication networks.
5.2 Limitations The limitations that we come across during our study are listed as follows: 1. In this work, we did not consider the responses from teachers. As our sample was not very large, the drop rate towards the VLab users was almost zero. However, as per the existing research literature [1–3], drop rates tend to be higher in technology-mediated learning systems. 2. In this work, the SE VLab is developed based on only text and animations, and it does not include video and audio facilities. 3. As there are only one or two publicly available online virtual labs on software engineering, all the results reported in this study are specific to the developed SE VLab. 4. The data collected for this research is only from demographically limited regions, namely the Indian states of West Bengal, Madhya Pradesh, Jharkhand, Uttar Pradesh, and Odisha. However, cross-cultural and trans-national evaluations in responses have not been performed in this study.
6 Conclusion and Future Work In the modern society, software is developed for use in both offices and homes. This introduces the challenge of training large number of software engineers in undergraduate and postgraduate programs. As the real-world problems are evolving rapidly, software engineers of the twenty-first century face new challenges. In order to cope with the situation, the introduction of laboratory-based learning in Software Engineering education is indispensable. However, large number of academic institutions, in developing countries currently involved in engineering education do not have the necessary faculty and infrastructure that can ensure imparting quality hands-on training necessary for SE education. In order to address this lacuna, the SE VLab was developed and evaluated.
6.1 Designing the Software Engineering Virtual Laboratory We developed the architecture and database schema for SE VLab. Ten fundamental experiments are available in the designed SE VLab. The experiments included in the
18
A. Alam and A. Mohanty
SE VLab are standard for undergraduate SE courses in engineering colleges/institutes in India. We incorporated an online feedback form in this lab. Preliminary results of performance evaluation were gathered through user feedback. The results from preliminary user survey reflect that the students found the SE VLab to be suitable and useful for their course. Apart from the preliminary study, we also did a detailed user evaluation (main study) using the CP-VLLM tool. Finally, we found students’ overall acceptance towards SE VLab. Looking from critical pedagogy (CP) perspective, students preferred Instruction Design Delivery and Learning Outcomes more compared to other parameters in CP. This indicates that making the SE VLab a part of the regular SE curriculum might be a viable option.
6.2 CP-VLLM: Development of a Virtual Learning Measurement Tool In an undergraduate course, practical activities are of different types, such as experimental investigation, control assignments, and project works. The control assignments constitute a major part of the laboratory activities. Often it is very difficult to assess lab activities, as there is no such standard measurement tool/instrument available online/offline. In view of this, we developed a learning measurement tool, named CP-VLLM, based on the theory of CP. We statistically tested the reliability and validity of the developed tool. This tool can be used for measuring students’ learning outcomes and performance in online/virtual learning environments.
6.3 Evaluation of Students’ Learning Performance in a Virtual Laboratory Learning Environment As SE VLab is designed and implemented, it is important to evaluate this laboratory by considering students’ feedback. In this context, we considered the CP-VLLM measurement tool, along with two SE subject knowledge test tools, and administered them for both pre-test and post-test data collection.
6.4 Directions for Future Research There exist several unexplored points and open issues related to the research reported here that warrants further investigation. We briefly outline a few possible extensions to our work in the following. Consideration of additional features in lab design: The current SE VLab provides some scope of improvement in the future. Apart from the textual content provided
Design, Development, and Implementation of Software Engineering …
19
here, audio and video materials can be added to enrich this lab further. Video-based tutorials and discussion groups to promote students’ mutual cooperation and collaborative work can be included. In fact, multiple subjects we surveyed requested the same. Moreover, a tighter coupling with the social networking platforms (e.g., Facebook, Twitter, and Instagram) is envisioned to ensure a greater user participation, as well as a global audience. Such a feature would also enable group-activities and interval discussions among the participants. More case-study, new experiments (as per different syllabus), and exercises can be added in future. Consideration of alternative learning styles and cognitive theories for developing learning measurement tools: In this research, we only considered critical pedagogy theory and developed tools following the said theory. In future, it is possible to develop other virtual learning measurement tools by considering Bloom’s taxonomy, Kolb’s model, Peter Honey, Alan Mumford’s model, Anthony Gregorc’s model, Neil Fleming’s VAK/VARK model, and other cognitive approaches to learning styles. Consideration of other procedures for evaluating students’ performance: The research work we conducted considers the evaluation process based on the data collected only from few engineering colleges of 5 states of India. It is possible to collect data from other geographical regions of India. It is also possible to collect data from other countries. As this lab is available online on Web, students can use this lab and send their feedback online. In the future, it can be extended to change or add some new experiments as per the current syllabus. As mentioned earlier, currently there is very few publicly available SE VLab. In the future, when other SE VLabs are developed, a comparison can be made between them. However, for extending the evaluation of this SE VLab in the future, one can attempt to find similarities in user experiences between this VLab and virtual labs developed in other areas of computer science and engineering.
References 1. Jamshidi R, Milanovic I (2022) Building virtual laboratory with simulations. Comput Appl Eng Educ 30(2):483–489 2. El Kharki K, Berrada K, Burgos D (2021) Design and implementation of a virtual laboratory for physics subjects in Moroccan universities. Sustainability 13(7):3711 3. Alam A (2022) Platform utilising blockchain technology for eLearning and online education for open sharing of academic proficiency and progress records. In: Asokan R, Ruiz DP, Baig ZA, Piramuthu S (eds) Smart data intelligence. Algorithms for intelligent systems. Springer, Singapore 4. Deepika NM, Bala MM, Kumar R (2021) Design and implementation of intelligent virtual laboratory using RASA framework. Mater Today Proc 5. Alam A (2022) Educational robotics and computer programming in early childhood education: a conceptual framework for assessing elementary school students’ computational thinking for designing powerful educational scenarios. In: 2022 International conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–7. https://doi.org/ 10.1109/ICSTSN53084.2022.9761354 6. Vergara D, Fernández-Arias P, Extremera J, Dávila LP, Rubio MP (2022) Educational trends post COVID-19 in engineering: virtual laboratories. Mater Today Proc 49:155–160
20
A. Alam and A. Mohanty
7. Alam A (2021) Possibilities and apprehensions in the landscape of artificial ıntelligence in education. In: 2021 International conference on computational ıntelligence and computing applications (ICCICA). IEEE, pp 1–8. https://doi.org/10.1109/ICCICA52458.2021.9697272 8. Alam A (2022) Employing adaptive learning and intelligent tutoring robots for virtual classrooms and smart campuses: Reforming education in the age of artificial intelligence. In: Shaw RN, Das S, Piuri V, Bianchini M (eds) Advanced computing and intelligent technologies. Lecture notes in electrical engineering, vol 914. Springer, Singapore 9. Alam A (2020) Challenges and possibilities in teaching and learning of calculus: a case study of India. J Educ Gifted Young Sci 8(1):407–433. https://doi.org/10.17478/jegys.660201 10. Alam, A. (2022). A Digital Game based Learning Approach for Effective Curriculum Transaction for Teaching-Learning of Artificial Intelligence and Machine Learning. In 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) (pp. 69-74). IEEE. 11. Alam A (2020) Pedagogy of calculus in India: an empirical ınvestigation. Periódico Tchê Química 17(34):164–180. https://doi.org/10.52571/PTQ.v17.n34.2020.181_P34_pgs_ 164_180.pdf 12. Ramírez J, Soto D, López S, Akroyd J, Nurkowski D, Botero ML, Molina A (2020) A virtual laboratory to support chemical reaction engineering courses using real-life problems and industrial software. Educ Chem Eng 33:36–44 13. Alam A (2020) Possibilities and challenges of compounding artificial intelligence in India’s educational landscape. Int J Adv Sci Technol 29(5):5077–5094. http://sersc.org/journals/index. php/IJAST/article/view/13910 14. Tobarra L, Robles-Gomez A, Pastor R, Hernandez R, Duque A, Cano J (2020) Students’ acceptance and tracking of a new container-based virtual laboratory. Appl Sci 10(3):1091 15. Alam A (2020) Test of knowledge of elementary vectors concepts (TKEVC) among first-semester bachelor of engineering and technology students. Periódico Tchê Química 17(35):477–494. https://doi.org/10.52571/PTQ.v17.n35.2020.41_ALAM_pgs_477_494.pdf 16. Alam A (2022) Social robots in education for long-term human-robot interaction: socially supportive behaviour of robotic tutor for creating robo-tangible learning environment in a guided discovery learning interaction. ECS Trans 107(1):12389 17. Hao C, Zheng A, Wang Y, Jiang B (2021) Experiment information system based on an online virtual laboratory. Fut Internet 13(2):27 18. Alam A (2021) Should robots replace teachers? Mobilisation of AI and learning analytics in education. In: 2021 International conference on advances in computing, communication, and control (ICAC3). IEEE, pp 1–12. https://doi.org/10.1109/ICAC353642.2021.9697300 19. Alam A (2023) Cloud-based e-learning: scaffolding the environment for adaptive e-learning ecosystem based on cloud computing infrastructure. In: Satapathy SC, Lin JCW, Wee LK, Bhateja V, Rajesh TM (eds) Computer communication, networking and IoT. Lecture notes in networks and systems, vol 459. Springer, Singapore 20. Alam A (2022) Cloud-based e-Learning: development of conceptual model for adaptive eLearning ecosystem based on cloud computing infrastructure. In: Kumar A, Fister Jr I, Gupta PK, Debayle J, Zhang ZJ, Usman M (eds) Artificial intelligence and data science. ICAIDS 2021. Communications in computer and information science, vol 1673. Springer, Cham 21. Seifan M, Robertson N, Berenjian A (2020) Use of virtual learning to increase key laboratory skills and essential non-cognitive characteristics. Educ Chem Eng 33:66–75 22. Solak S, Yakut Ö, Dogru Bolat E (2020) Design and implementation of web-based virtual mobile robot laboratory for engineering education. Symmetry 12(6):906
The Effect of Facebook Social Media on Recent Jambi Regional Election—An Empirical Data Analysis Dimas Subekti
and Titin Purwaningsih
Abstract This research study attempts to perform a descriptive data analysis based on the raw facebook data of two different persons contested in 2020 regional election to determine whether Facebook is directly related to election results. Here, NVIVO 12 plus tool is used to perform data analysis. The research findings reveal that the political communication made on Facebook is not directly related to the election outcomes. The proposed data analysis also discovered that a candidate dominates the other in terms of candidate vision and mission, leaders, candidate taglines, communication and engagement and activities on Facebook. Finally, the proposed data analysis concluded that even though Facebook accounts played a major role in branding/recognizing the election candidates, it has not influenced the election results. Keywords Political communication · Governor candidate · Regional election · Facebook · Election result
1 Introduction On December 9, 2020, the regional head elections were held. The election process has concurrently taken place in 279 areas, comprising nine provinces, 224 regencies, and 37 cities [1]. Jambi province was one of the places that conducted elections to elect the governor and deputy governor. Three candidate pairs have participated in the regional head election in Jambi province, Cek Endra-Ratu Munawaroh with serial number one, Fahrori Umar-Syafril Nursal and Al Haris-Sani with serial number two, and Al Haris-Sani with serial number three [2]. According to the Jambi Province general D. Subekti (B) Department of Government Affairs and Administration, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia e-mail: [email protected] T. Purwaningsih Department of Government Affairs and Administration, Jusuf Kalla School of Government, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_2
21
22
D. Subekti and T. Purwaningsih
election commission’s official result, Candidate for Governor and Deputy Governor of Jambi number is Al Haris-Abdullah Sani with 38.1% of the vote. Meanwhile, the number one pair, Cek Endra-Ratu Munawaroh, scored 37.3%, a 0.8% difference over Haris-Sani. Meanwhile, the second-placed pair, Fachrori-Syafril, received 24.6% votes [3]. Cek Endra-Ratu Munawaroh and Al Haris-Sani received considerably higher votes than Fahrori-Syafril. As a result, the primary objective of the proposed research work is to establish how Al Haris and Cek Endra have used Facebook for political communication during the 2020 regional election. Furthermore, to observe whether Facebook’s political communication is directly related to election results. Jambi governor candidates Al Haris and Cek Endra earned nearly identical votes. As a result, the public must be aware of the two candidate pairs’ political communications on social media, notably Facebook. Since many people have profiles on different types of social media, it appears that Facebook is the most popular site for social media campaigning [4]. Political parties and officials are becoming increasingly engaging voters through social networking sites such as Facebook [5]. As a result, social media is becoming more prevalent in political campaigns. Furthermore, as Facebook usage expands among the general public in most modern democracies, it has become an essential political social media campaigning tool [6]. Several previous investigations have been conducted on this research topic. The research findings obtained from [7–9] focus on providing a methodological framework for the analysis of social media from a political context. Compared with traditional mass media, social media platforms function on a totally different logic. This is resulting in new ways of producing content, disseminating information, and utilizing media. While research from [10–13] focuses on whether election candidates address the themes most important to the general public and to what degree their communication is influenced by Facebook and Twitter characteristics. Social media has the potential to influence power relationships among political parties as they allow individual candidates to campaign more independently of the central party. Another focus of the existing research studies on [14–16] explains that social media is used between politicians for developing political discussion. The relationship between interpersonal informational trust and openness has a connection with internet-based political activity and opinions. Hyperactive social media users play a vital role in political discussions. Politicians, as opinion leaders, create agendas for developing an alternate view of public opinion. The existing research works are more focused on evaluating the methodological analysis of social media in a political context, which includes the discussion of topics by election candidates, and political conversations by politicians via social media. As a result, the novelty of this study focuses on candidates’ political communication via social media, particularly Facebook. Facebook was selected as a source of news information since it is a well-established social media platform. Since it is considered as a practical and time-saving platform, online news broadcast on the Facebook site is highly beneficial in addressing the community’s requirements for discovering the required information from the news [17]. Aspects of political communication that will be analyzed in this study are: disseminating political information about
The Effect of Facebook Social Media on Recent Jambi Regional …
23
candidates on social media, Facebook, and public opinion of Jambi in response to this information. Therefore, this research will simultaneously analyze the 2020 regional election results and also analyze whether the political communication in social media Facebook is directly proportional to the election results.
2 Literature Review 2.1 Use Social Media in Elections During election campaigns, social media has become a ubiquitous communication channel for candidates. Candidates can directly reach out to voters, rally supporters, and impact the public agenda using Facebook and Twitter platforms. As a result of these fundamental shifts in political communication, election candidates now have a wide variety of strategic options [10]. The growth of internet communities on social media sites has resulted in generating a public response to political advertisements. This forces the political parties and actors to perform a digital election campaign [16]. Social media play an increasingly important role in political campaigns’ communication strategies by reflecting the potential information about political actors’ policy preferences, opinions and public followers [11]. People are using social media platforms to participate in political processes and express themselves [18]. Represented by the 2008 presidential election campaigns in the United States, political actors can successfully use social media platforms such as social networking sites, microblogging services, and weblogs to publicly broadcast the information to voters and interact and discuss with them [19]. There is a statistically significant association between the size of online social networks, election voting and election results in the New Zealand election. However, due to the small size effect, social media presence appears exclusively predictive in elections [5]. With social media presence in elections in Nigeria, social media is used by citizens to contribute to the election monitoring process. Followed by the Campaign ahead of the election and educate citizens to communicate and monitor the election process [20]. India has the world’s third-largest Internet user-base, with more than 243 million users, mostly adults. This includes over 100 million people, who use social media sites like Facebook, Twitter, and LinkedIn. In addition, political officials, candidates, and people have extensively utilized social media sites to find election-related information [17]. While social media helped to broaden the public sphere during the Australian federal elections in 2010, there is insufficient evidence that its use has enhanced it qualitatively in terms of citizen listening and the range of topics covered [12]. The social media factors that influence candidate election win had a big influence in the 2012 DKI Jakarta regional elections for Joko Widodo and Basuki Tjahaya Purnama. Social media has become an effective tool for organizing citizens and mobilizing voters. In the era of political personalization after the New Order, mixed political marketing that combines social media, mass media, and traditional political
24
D. Subekti and T. Purwaningsih
marketing can become an alternative strategy for candidates and political parties to win the elections [21].
2.2 Political Communication in Social Media Social media presence has even influenced how we communicate in various disciplines, including marketing communication, political communication, and learning system communication [22]. The impact of social media on politics, particularly in terms of political communication. Political institutions must actively participate in political communication via social media [23]. Social media has grown in popularity as a means of political communication in recent years. It facilitates direct communication between political institutions and voters. As a result, political activities could become more transparent, and people could become more involved in political decision-making [7]. Political communication behavior on social media is designed more creatively, wherein the message monopoly does not happen [24]. Political communication is employed practically everywhere around society, whether orally or in writing. Social media is the most often utilized channel for establishing communication in terms of interactive writing. In text, images, or videos, users may exchange information, cooperate, and make a rapport [25]. Social media users are not limited by social, economic, or political position. When it comes to conveying messages to audiences, social media and traditional mass media have very distinct personalities. In state democracy, social media supports political communication networks [26]. Social media is becoming increasingly significant in political communication. Several studies examined how individuals utilize social media for political discourse to voice their opinions on politics and policy, and to mobilize and protest against social issues [14]. Since social media may create a public opinion, it can affect and determine political conduct as a political communication channel. The substance of political messaging is faster and simpler for the audience to comprehend through social media since the public may access information at any time and from any location [27]. Furthermore, social media platforms like Facebook and Twitter focus on the individual politician rather than the political party, extending the political spectrum and allowing for more individualized campaigning as well as the use of social media to portray a politician’s image [28].
3 Research Method This is a qualitative type of research. This study incorporates Q-DAS (Qualitative Data Analysis Software) NVIVO 12 plus tool to analyze the Facebook data. The features used in NVIVO 12 plus to analyze data are chart analysis, cluster analysis, and word cloud analysis. The data source for this research is from social media Facebook accounts of the election contestants, Al Haris and Cek Endra, who received a
The Effect of Facebook Social Media on Recent Jambi Regional …
25
0.8% difference in votes in the 2020 Jambi regional elections. Moreover, the reason this study uses data sources only from Facebook is that the two governor candidates, namely Al Haris and Cek Endra, only have active Facebook social media accounts. While similar social media such as Twitter, are not very actively used by the contestants. The findings on Facebook social media will be supported by top local and national web media such as https://www.metrojambi.com, https://www.Jambiberita. com, https://www.Jambiupdate.com, https://www.Viva.co.id, https://www.Kompas. com, and https://www.Republika.co.id. During 2020, the data will be collected from Facebook profiles of Al Haris and Cek Endra. Since regional elections will be held in 2020, beginning with candidates seeking political parties, deciding candidates by the General Election Commission (KPU), campaigning, and voting day. Data is taken from Al Haris social media accounts and Cek Endra in followers, following, political communication content, relationships between political communication content, and political communication narratives. Al Haris Facebook social media account has 35,679 followers, whereas Cek Endra Facebook social media account has 50,449 followers. The number of followers and following count of the Al Haris and Cek Endra social media accounts show that the social media account is active.
4 Finding and Discussion 4.1 Political Communication Content Social media’s political communication content provides messages to the public by political parties and political candidates [29]. In the research context, Al Haris and Cek Endra used Facebook social media accounts to create political communication content in order to introduce themselves to the public. Chart analysis performed in Fig. 1 shows that Al Haris and Cek Endra’s political communication content on Facebook social media particularly when participating in the Jambi governor election 2020. Figure 1 shows that political communication content on Facebook social media is related to the candidate’s vision and mission, Al Haris is 17.15%, and Cek Endra is 79.93%. Candidate for Jambi governor number 1, Cek Endra, conveyed the vision, mission, and programs elected as Jambi governor 2020. Cek Endra said Jambi had a very potent energy source and electricity had to be self-sufficient. Cek Endra noted that he would provide 200 million rural areas for the entire Jambi community to overcome the coronavirus’s short-term impact. Furthermore, Cek Endra also explained that Jambi’s local resources and wisdom had to be developed [30]. Meanwhile, Al Haris’ vision and mission in participating in the 2020 Jambi governor election said he would develop a new economic area. In addition, Al Haris will also issue cards for MSME assistance and assistance for students to university students. Things also to
26
D. Subekti and T. Purwaningsih
Fig. 1 Political communication content on Facebook
further improve human resources [31]. Looking at Cek Endra and Al Haris’s vision and mission shows that they both have different ideas. Cek Endra seems to be more focused on developing the potential of electrical energy resources, which will later aim to achieve self-sufficiency in the Jambi community. Meanwhile, Al Haris focuses on the community’s economy, human resources, infrastructure, and governance. The content of political communication about leadership was only 13.11%, while Cek Endra was 86.17%. Al Haris and Cek Endra discussed the content of political communication about leadership. After all, it was inseparable because both had served as regents: political communication content candidate tagline, Al Haris at 27.41%, and Cek Endra at 72.59%. Al Haris carries the tagline “Menuju Jambi Mantap” and the slogan “Dumisake” which stands for “Dua Miliar Satu Kecamatan”, while Cek Endra has the tagline “Jambi Maju 2024” [32]. Political communication content about candidate communication and interaction, Al Haris was only 10.94% while Cek Endra was 88.03%. Al Haris interaction in terms of campaign models because the campaign method with a mass gathering during the COVID-19 pandemic is not allowed at the simultaneous regional head elections in 2020. So, Al Haris changed his campaign strategy by being active on social media to socialize the vision and mission and introduce himself. Then it was added by moving teams at lower levels in all villages to understand the public about Al Haris and Abdullah Sani [33]. Meanwhile, Cek Endra’s interaction in the 2020 Jambi governor election regarding coming directly to the community-made an event to Campaign openly. Then the activity is documented for distribution through social media Facebook. Even Cek Endra was reported to the Jambi Provincial Election Supervisory Agency for campaigning during the quiet period [34]. Seeing what Al Haris and Cek Endra did, shows that both use Facebook social media to interact with the Jambi community.
The Effect of Facebook Social Media on Recent Jambi Regional …
27
The activities carried out directly come to the community to socialize themselves and then upload them to social media to spread information so that the Jambi community knows. Finally, the political communication content on social media Facebook related to candidate activities, Al Haris was only 14.83%, and Cek Endra was 76.96%. However, the General Election Supervisory Agency (Bawaslu) of Jambi Province noted that the governor and deputy governor candidates’ campaigns did not have permission from the Police or the Jambi Province COVID-19 Task Force. Therefore, implementation of campaign activities, based on KPU Regulation Number 11 of 2020 concerning Amendments to KPU Regulation Number 4 of 2017 concerning Campaigns for the Election of Governors and Deputy Governors, Regents and Deputy Regents, and Mayors and Deputy Mayors, Article 38 paragraph (1) limited meeting campaign must submit written notification to the local Indonesian National Police [35]. Content of the 2020 Jambi Governor candidate Al Haris and Cek Endra on Facebook, namely the candidate’s vision and mission, leadership, candidate tagline, candidate activity, and candidate communication and interaction. The communication content is related to one another. Table 1 helps to understand the interrelated connectivity between political communication content. The cluster analysis performed in Table 1 shows the coefficient numbers of the relationship between the political communication content of Al Haris and Cek Endra. Table 1 shows the content of political communication about leaders, wherein the candidate activities has the highest connectivity. This is followed by establishing a linkage connectivity between the vision and mission of candidates. In the third position, there is a relationship between the vision and mission of candidates’ communication content with the activity candidates, followed by the relationship between the vision and mission of the candidate and candidate’s tagline. Fourth, there is a relationship between communication and interaction between candidates and the candidate’s activities in the fifth position. Sixth, the association of political communication content about leaders with candidate communication and interaction, then political communication content candidate tagline with candidate communication and interaction. Eighth, there is connectivity between the leader’s communication content and the candidate’s tagline, followed by connectivity between the vision and mission of the candidate with the leader. Finally, the connectivity of political communication content about candidate taglines with candidate activities. Based on Table 1, leaders’ political communication content and candidate activities are the highest, with some 0.66706. Followed by a political communication content vision and mission candidate with a communication and interaction candidate of 0.663651. In contrast, the lowest linkage connectivity is between the political communication content of the candidate tagline and the candidate activities of 0.386886. This shows that the vision and mission conveyed by the candidate for governor will definitely be directly related to the interaction. Because the vision and mission conveyed are related to responding to the interests of the community. The
28 Table 1 The coefficient numbers of the relationship between political communication content Al Haris and Cek Endra
D. Subekti and T. Purwaningsih Content A
Content B
Pearson correlation coefficient
Leader
Candidate activity
0.66706
Vision and mission of the candidate
Candidate 0.663651 communication and interaction
Vision and mission of the candidate
Candidate activity
0.604629
Vision and mission of the candidate
Candidate tagline
0.595542
Candidate Candidate activity communication and interaction
0.589198
Leader
Candidate 0.557389 communication and interaction
Candidate tagline
Candidate 0.534803 communication and interaction
Leader
Candidate tagline
0.476843
Vision and mission of the candidate
Leader
0.474078
Candidate tagline
Candidate activity
0.386886
vision and mission conveyed by the candidate established an interaction and communication with voters. This strategy helps to make the candidate better known by the public in terms of ideas and personality. Based on the discussion, it was explained that Facebook was used by the Jambi governor candidates for communicating their political proposals. With all the content of political communication created and distributed, it shows that Cek Endra is more dominant than Al Haris. This is important because both of them are political opponents who are contesting to get the voting of the Jambi community.
4.2 Political Communication Narrative Political communication has traditionally relied on visual symbols and tales. Their significance is growing as visual television media and social media have emerged as the primary sources of political information. Politicians recognize the significance of visuals and stories in the construction of political images [36]. Figure 2 shows that the intensity of political contestation is inextricably linked to political communication. Starting with the nomination, campaign, confirmation of choice, vote, conflict, and reconciliation stages, numerous discussions feed varied the political magnitudes.
The Effect of Facebook Social Media on Recent Jambi Regional …
29
Fig. 2 Political communication narrative on Facebook
Politics is more than simply lobbying and sharing seats, and there are different ways to create a positive narrative. The views of various political elites are exchanged, the primary intention of which is to gain attention [37]. This political communication narrative was built by Al Haris and Cek Endra on social media and Facebook in their candidacy for the Jambi governor election. This aims to get the attention of the Jambi people. Al Haris and Cek Endra brought up a narrative of political communication using social media Facebook to campaign in the Jambi governor election 2020. Figure 2 shows the narrative of Al Haris and Cek Endra’s political communication on Facebook social media. Figure 2 helps us understand how Al Haris and Cek Endra built the political communication narrative in the Jambi 2020 governor election. Figure 2 shows the narrative on the Facebook social media accounts of Al Haris and Cek Endra in the Jambi 2020 governor election. “Jambi”, “mantap (steady)”, “masyarakat (public)”, “endra”, “haris”, #cekendraforjambi, #jambimaju (#jambiforward), “pembangunan (development)”, #jambisejahtera (#prosperousjambi), “bupati(regent)” and in other words, it is the narrative that is discussed on the social media accounts Facebook AL Haris and Cek Endra. Figure 2 shows that Al Haris and Cek Endra consistently talk about Jambi Province’s narrative with the word “Jambi.” Al Haris and Cek Endra also delivered their respective taglines by the phrase “mantap,” #jambisejahtera, and #jambimaju. Al Haris and Cek Endra introduced themselves to the public by constructing the narrative “haris,” #cekendra, and “endra.” Al Haris and Cek Endra talked about leadership with the words “pemimpin” and “gubernur”. Cek Endra, in the face of the Jambi 2020 governor election, made the narrative #cekendraforjambi and # Cekendra2020. Al Haris and Cek Endra also talked about “merangin” and “sarolangun as an effort to
30
D. Subekti and T. Purwaningsih
hook their respective voter’s barns. The public, as voters, also did not escape the talk of Al Haris and Cek Endra Al Haris, and Cek Endra also discussed COVID-19 with “corona” and “sehat” as an effort to campaign for the implementation of health protocols during the Jambi 2020 governor election. The governor and deputy governor candidate pairs in Jambi declared a healthy regional head election during the pandemic situation. The election of healthy regional heads is meant by maintaining healthy protocols during 2020 provincial election stages. Apart from declaring elections, the Jambi election candidate pair also symbolically distributed masks to community representatives such as the bicycle community, community organizations, etc. This is an effort to remind the public to adhere to health protocols [38].
4.3 Political Communication on Facebook with Election Result Based on the Jambi Province General Election Commission (KPU) real count results, the Candidate for Governor and Deputy Governor of Jambi number 3, Al HarisAbdullah Sani won the vote in the 2020 Jambi Governor election Haris-Sani received 597,518 votes or 38.1%. While pair number 1, Cek Endra-Ratu Munawaroh, received 585,400 votes or 37.3% or a difference of 12,118 votes from Haris-Sani. Meanwhile, pair number 2 Fachrori-Syafril received 385,312 votes or 24.6%. The General Election Commission’s real count results showed that the Haris-Sani pair excelled in three areas, namely Jambi City, Muarojambi Regency, and Merangin Regency. Meanwhile, the Cek Endra-Ratu pair excelled in five regencies: Sarolangun Regency, West Tanjung Jabung Regency, East Tanjung Jabung Regency, and Tebo Regency Batanghari Regency [39]. The political communication content of the 2020 Jambi Governor candidates Al Haris and Cek Endra on Facebook, namely the candidate’s vision and mission, leadership, candidate tagline, candidate activity, and candidate communication and interaction. Cek Endra is very dominant in all political communication content on Facebook social media compared to Al Haris. However, this is not directly proportional to the 2020 Jambi governor election results, with Al Haris outperforming Cek Endra. Real count data from the Jambi General Election Commission shows that Al Haris actually won fewer wins than Cek Endra based on the number of regions. However, what makes Al Haris gain the upper hand is that the area he wins has a large population. So that Al Haris won a bigger vote than Cek Endra. Furthermore, political communication on Facebook by Al Haris and Cek Endra made an effort to reach the larger Jambi community. The motivation for adopting Facebook for political communication is particularly significant as it is widely used in Jambi society. People in Jambi rely on Facebook for information. If this is connected to the digital divide, it is observed that the Jambi Province has a lower digital divide. As the facts are provided in the research, it may even be classified into places with
The Effect of Facebook Social Media on Recent Jambi Regional …
31
moderate levels of digital divide [40]. According to this, the province of Papua, East Nusa Tenggara, and Central Sulawesi have the highest digital divide index value in Indonesia. Jambi Province stands in the 18th position out of 34 provinces. Furthermore, the Jambi Idea Institution, a local survey institute reported in 2018 that Facebook usage in Jambi is significantly higher than other social media platforms. Moreover, half of the residents in Jambi use Facebook i.e., 54.9%. This percentage will rise in line with technology advancements and the ongoing era of globalization [41]. This demonstrates that the digital divide in Jambi province is not as severe. As a consequence, Cek Endra’s dominant political communication on Facebook social media is not directly proportionate to election outcomes and this cannot be clearly attributed to Jambi province’s digital divide.
5 Conclusion This research study concludes that the political communication conducted on social media Facebook is not directly proportional to the election results. Even though a candidate is more dominant in the political communication content of the vision and mission of candidates, leaders, candidate taglines, communication and interaction candidates, and candidate activities compared to other potential candidates on Facebook network. However, according to 2020 Jambi governor election results, the candidate Al Haris was superior to Cek Endra. This is not directly related to the digital divide as Jambi province is not the only region experiencing a high digital range. Therefore, the Facebook account used for political communication for the Jambi governor candidate does not significantly impact the election results. Even though the two potential candidates focus on their political and personal branding in the narrative on their respective Facebook accounts, the leader’s political communication content with candidate activities has the highest affinity. The limitation of the research study is that the data collection source employed just one type of social media platform, namely Facebook, and did not explain in detail the variables that contributed to Al Haris’ success in the 2020 Jambi governor election. As a result, data sources from two social media platforms, Facebook and Twitter, are recommended for future investigation. This is done to ensure that the data obtained is as complete as possible. Then, additional research could be conducted to understand the reasons that contributed to Al Haris’ success in the 2020 Jambi governor election.
32
D. Subekti and T. Purwaningsih
References 1. Aida NR (2020) Berikut Daftar 270 Daerah yang Gelar Pilkada Serentak 9 Desember 2020. Kompas.com. https://www.kompas.com/tren/read/2020/12/05/193100165/berikut-daftar-270daerah-yang-gelar-pilkada-serentak-9-desember-2020?page=all. Accessed 20 Mar 2021 2. Almunanda F (2020) Diikuti 3 Paslon, Ini Nomor Urut Cagub-Cawagub di Pilgub Jambi. Detiknews.com. https://news.detik.com/berita/d-5186718/diikuti-3-paslon-ini-nomorurut-cagub-cawagub-di-pilgub-jambi. Accessed 20 Mar 2021 3. Jambi.kpu.go.id (2020) Hasil Penghitungan Aplikasi Sirekap KPU Provinsi Jambi Sudah 100 Persen. https://jambi.kpu.go.id/berita/detail/304/hasil-penghaitungan-aplikasi-sirekapkpu-provinsi-jambi-sudah-100-persen/. Accessed 03 Jun 2021 4. Davis J (2017) Presidential campaigns and social networks : how Clinton and Trump used facebook and twitter during the 2016 election. Dominic Sch 5. Cameron MP, Barrett P, Stewardson B (2016) Can social media predict election results? Evidence from New Zealand. J Polit Mark 15(4):416–432 6. Ross K, Fountaine S, Comrie M (2020) Facebooking a different campaign beat: party leaders, the press and public engagement. Media Cult Soc 42(7–8):1260–1276 7. Stieglitz S, Dang-Xuan L (2013) Social media and political communication: a social media analytics framework. Soc Netw Anal Min 3(4):1277–1291 8. Stieglitz S, Brockmann T, Xuan LD (2012) Usage of social media for political communication. In: Proceedings of Pacific Asia Conference on Information Systems, PACIS 2012, February, 2012 9. Klinger U, Svensson J (2015) The emergence of network media logic in political communication: a theoretical approach. New Media Soc 17(8):1241–1257 10. Stier S, Bleier A, Lietz H, Strohmaier M (2018) Election campaigning on social media: politicians, audiences, and the mediation of political communication on facebook and twitter. Polit Commun 35(1):50–74 11. Nulty P, Theocharis Y, Popa SA, Parnet O, Benoit K (2016) Social media and political communication in the 2014 elections to the European Parliament. Elect Stud 44(July):429–444 12. MacNamara J, Kenning G (2011) E-electioneering 2010: Trends in social media use in Australian political communication. Media Int Aust 139(139):7–22 13. Karlsen R, Enjolras B (2016) Styles of social media campaigning and influence in a hybrid political communication system: linking candidate survey data with Twitter data. Int J Press 21(3):338–357 14. Yang X, Chen BC, Maity M, Ferrara E (2016) Social politics: agenda setting and political communication on social media. Lecture Notes in Computer Science (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics), vol 10046 LNCS, pp 330–344 15. Himelboim I, Lariscy RW, Tinkham SF, Sweetser KD (2012) Social media and online political communication: the role of interpersonal informational trust and openness. J Broadcast Electron Media 56(1):92–115 16. Papakyriakopoulos O, Serrano JCM, Hegelich S (2020) Political communication on social media: a tale of hyperactive users and bias in recommender systems. Online Soc Netw Media 15:100058 17. Narasimhamurthy N (2014) Use and rise of social media as election campaign medium in India. Int J Interdiscip Multidiscip Stud 1(8):202–209 18. Kreiss D, Mcgregor SC (2018) Technology firms shape political communication: the work of microsoft, facebook, twitter, and google with campaigns during the 2016 U.S. presidential cycle. Polit Commun 35(2):155–177 19. Stieglitz S, Brockmann T, Xuan LD (2012) Usage of social media for political communication. In: Proceedings of Pacific Asia Conference on Information Systems, PACIS 2012, July 2012 20. Jamie Bartlett SJ, Krasodomski-Jones A, Daniel N, Fisher A (2015) Social media for election communication and monitoring in Nigeria. DEMOS Dep Int Dev 64 21. Utomo WP (2013) Menimbang media sosial dalam marketing politik di Indonesia: Belajar dari Jokowi-Ahok di Pilkada DKI Jakarta 2012. J Ilmu Sos dan Ilmu Polit 17(1):67–84
The Effect of Facebook Social Media on Recent Jambi Regional …
33
22. Setiadi A (2015) Pemanfaatan media sosial untuk efektifitas komunikasi. J Hum 16(2):1–7 23. Anshari F (2013) Komunikasi politik di era media sosial Faridhian Anshari staff pengajar STT PLN Jakarta. J Komun 8(1):91–101 24. Zamri (2017) Komunikasi politik era medsos. J Chem Inf Model 53(9), 1689–1699 25. Ixsir Eliya IZ (2017) Pola Komunikasi Politik Ganjar Pranowo dalam Perspektif Sosiolinguistik di Media Sosial Instagram Info Artikel. Seloka J Pendidik Bhs dan Sastra Indones 6(3), 286–296 26. Susanto EH (2017) Media sosial sebagai pendukung jaringan komunikasi politik. J ASPIKOM 3(3):379 27. Siagian HF (2015) Pengaruh dan efektivitas penggunaan media sosial sebagai saluran komunikasi politik dalam membentuk opini publik. J Al-Khitabah 2(1):17–26 28. Enli GS, Skogerbø E (2013) Personalized campaigns in party-centred politics: twitter and facebook as arenas for political communication. Inf Commun Soc 16(5):757–774 29. Benoit WL (2014) Content analysis in political communication. Sourceb Polit Commun Res Meth Meas Anal Tech 268–279 30. Sahrial (2020) Sampai Visi dan Misi, CE: Jambi Punya Sumber Energi Sangat Potensial. Metrojambi.com. https://metrojambi.com/read/2020/10/24/58123/sampai-visi-dan-misice-jambi-punya-sumber-energi-sangat-potensial. Accessed 04 Mar 2021 31. Jambiberita.com (2020) Ini Paparan Visi Misi Cek Endra, Fachrori dan Al Haris di Pilgub Jambi. https://jamberita.com/read/2020/10/24/5963113/ini-paparan-visi-misi-cekendra-fachrori-dan-al-haris-di--pilgub-jambi/. Accessed 06 Jun 2021 32. Abdullah S (2020) Calon Gubernur Jambi beradu ‘tagline’ tarik simpati warga. Antaranews.com. https://www.antaranews.com/berita/1335086/calon-gubernur-jambi-ber adu-tagline-tarik-simpati-warga. Accessed 04 Mar 2021 33. Fayzal M (2020) Taktik Al Haris Kampanye di Tengah Pandemi. Gatra.com. https://www. gatra.com/detail/news/482341/politik/taktik-al-haris-kampanye-di-tengah-pandemi. Accessed 03 Jun 2021 34. Almunanda F (2020) Diduga Kampanye di Masa Tenang, Cek Endra-Ibu Tiri Zola Diadukan ke Bawaslu. Detiknews.com. https://news.detik.com/berita/d-5285741/diduga-kampanye-dimasa-tenang-cek-endra-ibu-tiri-zola-diadukan-ke-bawaslu. Accessed 03 Jun 2021 35. Nurdin S (2020) Prihatin, Kampanye Cagub-Cawagub Jambi Tak Miliki Izin. Viva.co.id. https://www.viva.co.id/pilkada/pilgub/1310791-prihatin-kampanye-cagub-cawagub-jambitak-miliki-izin. Accessed 04 Mar 2021 36. Schill D (2012) The visual image and the political image: a review of visual communication research in the field of political communication. Rev Commun 12(2):118–142 37. Lestari EA (2020) Narasi Komunikasi Politik Indonesia. Republika.co.id. https://republika.co. id/berita/q9afuc469/narasi-komunikasi-politik-indonesia. Accessed 03 Jun 2021 38. Almunanda F (2020) Deklarasi Pilkada Sehat, 3 Paslon Pilgub Jambi Sebar Masker Bareng TNI-Polri. Detiknews.com. https://news.detik.com/berita/d-5168130/deklarasi-pilkada-sehat3-paslon-pilgub-jambi-sebar-masker-bareng-tni-polri. Accessed 04 Mar 2021 39. Jambi.kpu.go.id (2020) Hasil Penghitungan Aplikasi Sirekap KPU Provinsi Jambi Sudah 100 Persen. https://jambi.kpu.go.id/berita/detail/304/hasil-penghitungan-aplikasi-sirekap-kpu-pro vinsi-jambi-sudah-100-persen/#:~:text=Jambi%2Cjambi.kpu.go.id.&text=Berdasarkanhasi lRealCountKPU,suaraatau38%2C1persen. Accessed 13 Mar 2021 40. Ariyanti S (2015) Studi pengukuran digital divide di Indonesia. Bul Pos dan Telekomun 11(4):281 41. Jambiupdate.co (2018) Mau Tahu Jumlah Pengguna Facebook di Jambi, Baca Di Sini. https://www.jambiupdate.co/artikel-mau-tahu-jumlah-pengguna-facebook-di-kotajambi-baca-di-sini.html. Accessed 03 Jun 2021
Estimating the Intervals Between Mount Etna Eruptions Kshitij Dhawan
Abstract Mount Etna is infamous for being among the world’s most active volcanoes. It is the highest volcano in Europe and its constant eruptions have garnered the attention of plenty of researchers. The growing interest in estimating the intervals between the eruptions has led to this study. In this research, we attempt to find the most effective model to predict Mount Etna’s log interval times. The authors come forward with two models for the log interevent times and perform a comparative analysis to determine the most appropriate predictive model. Keywords Machine learning · Mount Etna · Volcanic eruptions · Log interevent times
1 Introduction Interevent times measure the number of days between eruptions for Mount Etna. Across four hundred years, 62 events took place. Some events occurred as little as 40 days apart, while several happened over 10 (and in one case 53) years after the previous eruption. This right-skewness leads us to take the logarithm of the interevent times (Fig. 1). The histogram for the data seems to suggest normality. However, we still expect long tails, leading us to consider two models: (1) a student-t with random effects for the location, and (2) a normal distribution. We present the details of the two models in Sect. 2, the sampling scheme to obtain posterior draws, and the methods for validating and comparing the two models. In Sect. 3, we report an analysis of the models [1–3]. We concluded with a discussion in Sect. 4.
K. Dhawan (B) Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_3
35
36
K. Dhawan Etna volcano data
0.2 0.0
0.1
Density
0.3
0.4
Fig. 1 Histogram for the n = 62 log interevent times
4
5
6
7
8
9
10
Log Interevent Time
2 Methods For both models, define Y i = log T i , where T i is the interevent time for the ith observation.
2.1 Model 1: Student-t (M1) In order to better model the long tails we expect in the data, we assume a student-t likelihood where each of the n = 62 observations receives its own random effect: yi |μi , σ 2 ∼ tν (μi , σ 2 ), i = 1, ..., n μi |μ, τ 2 ∼ N (μ, τ 2 ), i = 1, ..., n. We fix ν = 5. After choosing priors for μ, σ 2 , and τ 2 we could sample from the posterior using Metropolis updates. Alternatively, we could use the definition for the t distribution as a scale mixture of normals enabling us to obtain closed-form full posterior conditionals. That is,
f yi |μi , σ
2
∞ = 0
This implies the following model.
v v σ2 N yi μi , Gaλi , dλi. λi 2 2
Estimating the Intervals Between Mount Etna Eruptions
37
yi |μi , σ 2 , λi ∼ N (μi , σ 2 /λi ), λi ∼ Ga(ν/2, ν/2), μi |μ, τ 2 ∼ N (μ, τ 2 ). Choosing a normal prior for μ and inverse gamma priors for σ 2 and τ 2 completes the model formulation: μ ∼ N m, s 2 , σ 2 ∼ I G(aσ , bσ ), τ 2 ∼ I G(aτ , bτ ). It is straightforward to obtain the full conditionals for each parameter. They are given by λi |· ∼ Ga μi |· ∼ N
1 v+1 v , + (yi − μi )2 , 2 2 2σ 2 τ 2 yi λi + σ 2 μ
τ 2σ
,
2
, τ 2 λi + σ 2 τ 2 λi + σ 2 n μi + τ 2 m s 2 i=1 s2τ 2 μ|· ∼ N , , ns 2 + τ 2 ns 2 + τ 2 n 1 , bσ + λi (yi − μi )2 !, 2 2 i=1 n
σ 2 |· ∼ I Gaσ +
n 1 , bτ + (μi − μ)2 ! 2 2 i=1 n
τ 2 |· ∼ I Gaτ +
We are assuming prior independence among all of the variables. This may not be realistic, particularly for the joints of mean and variance parameters, but we are left with a posterior that is easy to work with. Also, we have not consulted expert opinion to help identify any prior dependence. Consequently, we choose somewhat noninformative values for m, s2 , aσ , bσ , aτ , and bτ . We choose m = 6 which corresponds to a prior average interevent time of exp(6) = 403 days. 1 year. s2 = 10 is very non-informative, especially on the log-scale. We let aσ = aτ = 3 and bσ = bτ = 4 which yield prior means of 2 and variances of 4 for both σ 2 and T 2 .
38
K. Dhawan
2.2 Model 2: Normal (M2) A much simpler model is also considered: yi |μ, σ 2 ∼ N (μ, σ 2 ), i = 1, ..., n μ ∼ N m, s 2 , σ 2 ∼ I G(aσ , bσ ). We use the same values for m, s2 , aσ , and bσ from above. The full conditionals are given by μ|· ∼ N
s2
n i=1 ns 2
yi + σ 2 m s2σ 2 , , + σ2 ns 2 + σ 2
n 1 , bσ + (yi − μ)2 ! 2 2 i=1 n
σ 2 |· ∼ I Gaσ +
2.3 Model Validation and Comparison Validation We utilize two measures for model fit. The first is a χ 2 test from Johnson (2004) in “A Bayesian χ 2 Test for Goodness-of-Fit.” The test is comparable to classical χ 2 goodness-of-fit tests. With this test, we calculate a distribution of pvalues where high values are in favor of a good model fit. See Johnson (2004) for details. The second measure is based on leave-one-out cross-validation. We obtain B posterior draws for each model while leaving one observation out, yi . Posterior predictions yjb are then obtained and we compute p( j,i)
B 1 = 1(y(∗j,b) ≤ yi ) j = 1, 2. B b=1
Distribution theory requires that pj· be uniformly distributed for the model to be appropriate. The Kolmogorov–Smirnov test is used to determine whether pj· is uniform. Graphically, a kernel density plot of the posterior predictive distribution against the data can provide a quick assessment for the model fit. Comparison To compare models 1 and 2, we use three measures. The deviance information criterion (DIC) is defined as
Estimating the Intervals Between Mount Etna Eruptions
39
D I C = D(θ ) + var(D(θ ))/2 where D(θ ) = 2 log p(y θ ), D(θ ) is the average of D(θ ) over all posterior samples θ and var(D(θ )) is the sample variance of D(θ ). DIC measures goodness-of-fit and penalizes model complexity; a lower DIC is preferred. The second measure is the posterior predictive loss (PPL) which is similar in nature to DIC. It is computed by PPL = G + P =
n i=1
(yi − μi∗ )2 +
σi∗
i=1
where μi∗ and σi∗ are the mean and variance of the posterior prediction for observation i. The third and final measure we use is the Bayes factor. The Bayes factor in favor of model 1 is given by B12 =
m 1 (y) θ1 p1 (y|ϑ)g1 (ϑ)dϑ . = θ2 p2 (y|ϑ)g2 (ϑ)dϑ m 2 (y)
Large values of B12 give evidence in favor of model 1. We do not evaluate either integral. Instead, we estimate mj (y) using Monte Carlo integration. By sampling N times from the prior for model j, we can obtain the approximation N 1 m j (y) = E g j p j (y|ϑ) ≈ p j (y|ϑi ), j = 1, 2. N i=1
An immediate drawback to this approach is the instability of the estimate. We have to choose N very large to get a somewhat believable Bayes factor, but even then we are not as confident in using this measure over the first two.
3 Results Posterior draws for models 1 and 2 are obtained by iteratively sampling from the full conditionals as given in Sects. 2.1 and 2.2. We obtain B = 20,000 draws and did not observe any issue with the sampler to be concerned with convergence. Some of the posterior distributions are presented in Fig. 2. In each figure, red figures and objects correspond to model 1 while blue is reserved for model 2. Since model 1 is hierarchical, there are several ways to obtain posterior predictions. We could make predictions for observation i, given the posterior draws for μi , σ 2 , and λi . Figure 3 shows a plot of the observed versus the fitted values for each observation. Note the trends in the figure. The mean predictions (the dots) are
40
K. Dhawan μ
0.0 0.5 1.0 1.5 2.0
0.0 0.5 1.0 1.5 2.0 2.5
μ
6.5
7.0
6.5
7.5
μ
7.0
σ2
σ2
0.0
0.5
1.0
1.5
0.0 0.5 1.0 1.5 2.0
7.5
μ
0.5
1.0
1.5
2.0
1.0
2.5
1.5
σ2
2.0
2.5
3.0
σ2
0.0
0.5
1.0
1.5
τ2
0.5
1.0
1.5
2.0
2.5
τ2
Fig. 2 Posterior distributions from each model. Left column: posteriors for μ, σ 2 , and τ 2 in model 1. Right column: posteriors for μ and σ 2 in model 2. The shaded regions are the HPD regions
systematically off from the observed values. For this reason, the validity of model 1 is suspect, despite the measures used for validation as discussed in Sect. 2.3.
6
Fitted
8
10
M1: Predictions based on random effects
4
Fig. 3 Plot of the observed versus fitted values for individual observations in model 1 [9]. The dots are the predictive means and the vertical bars are equal-tailed 95% probability bounds. The black line is the line y = x
4
5
6
7 Observed
8
9
10
Estimating the Intervals Between Mount Etna Eruptions
41
Posterior predictive distributions M1 M2 Data
0.2 0.0
0.1
Density
0.3
0.4
Fig. 4 Posterior predictive distributions for models 1 (red) and 2 (blue). The histogram is of the data. Predictions for model 1 are based on future observations
4
5
6
7
8
9
10
Log Interevent Time
If we wish to make a prediction from model 1 for a future observable (one in which we do not know the random effect), we must first draw a new μ0 from a N (μ, τ 2 ) using the posterior samples for μ and τ 2 . We could then either (1) sample a new λ0 from a Ga(ν/ 2, ν/ 2) and then sample y0 from N (μ0 , σ 2 /λ0 ) or (2) directly sample y0 from t ν (μ0 , σ 2 ). Both approaches are equivalent, and both are very different than predictions made for a particular observation in which the random effect μi is known. The posterior predictive distribution of a future observation for each model is presented in Fig. 4. As mentioned in Sect. 2.3, the measures of model validation we use are based on posterior predictions. For model 1, we make the calculations based on future observables, not from a particular observation. For the χ 2 test, we use all data points when obtaining posterior samples. The B p-values are computed for both models and their distributions are given in Fig. 5. In both, the probability that the p-value is greater than 0.05 is at least 0.95, providing evidence the models are appropriate for the data [4]. The second measure is based on leave-one-out cross-validation. The resulting distributions of probabilities (i.e. the proportion of posterior predictions less than the left-out observation) should be uniformly distributed (see Fig. 6). A Kolmogorov– Smirnov test is performed on these probabilities and both result in large p-values (0.70 for model 1 and 0.80 for model 2). So again, we confirm that the models provide adequate fits to the data [5]. Posterior distributions from each model are shown in Fig. 2. We are now left to question which model is preferred. Based on the evidence so far, both models seem to perform equally well, so we would thus favor the simpler model 2. We can be more certain in our decision as we take a look at the three measures discussed in Sect. 2.3.2. The first two are presented in Table 1. For the
42
K. Dhawan
Fig. 5 Bayesian goodness-of-fit distribution for p-values
Fig. 6 Distributions of the posterior CDF evaluated at each observation
posterior predictive loss (PPL) criterion, we decompose the value into the goodnessof-fit term (G) and the penalty (P) to see how the models differ [6]. In both instances (DIC and PPL), the simpler model 2 outperforms the hierarchical model 1 [7]. The Bayes factor is estimated using Monte Carlo integration. At one million samples from the prior distribution, we estimate B12 0.623. Since B12 < 1, this is evidence in favor of model 2, but it is not substantial [2, 3, 8, 9]. All three measures taken together suggest that model 2 is a better choice: it performs just as well as the more complicated model and yet does so with fewer parameters [10].
Estimating the Intervals Between Mount Etna Eruptions
43
Table 1 Model comparison quantities DIC
G
P
PPL (=G + P)
Model 1
311.8
94.7
134.0
228.7
Model 2
205.9
94.6
99.3
193.9
4 Conclusion In this paper, we considered two models for the log of interevent times for Mount Etna eruptions. A hierarchical, student-t model was intended to correctly model any long tails in the data, but we have seen that a simple, two-parameter normal model is adequate. Both graphical and numerical measures have been used to assess whether the models were an appropriate fit for the data. These are not absolutely decisive, but we found no evidence against the validity of either model. A comparison was made to see which model performed better. Though they performed about the same, the hierarchical model was penalized far more based on DIC and PPL. All the evidence we have looked at suggest that model 2 is a better candidate for predicting log interevent times.
References 1. Freret-Lorgeril V, Bonadonna C, Corradini S, Guerrieri L, Lemus J, Donnadieu F, Scollo S, Gurioli L, Rossi E (2022) Tephra characterization and multi- disciplinary determination of eruptive source parameters of a weak paroxysm at mount Etna (italy). J Volcanol Geotherm Res 421:107–431 2. Platania M, Sharpley RAJ, Rizzo M, Ruggieri G (2022) The contingent equilibrium during imbalanced volcano tourism demand through fee estimation: an empirical analysis of tourism in mount Etna. J Environ Manage 316:115–235 3. Rogic N, Bilotta G, Ganci G, Thompson JO, Cappello A, Rymer H, Ramsey MS, Ferrucci F (2022) The impact of dynamic emissivity–temperature trends on spaceborne data: Applications to the 2001 mount Etna eruption. Rem Sensing 14(7):1641 4. Freret-Lorgeril V, Bonadonna C, Corradini S, Donnadieu F, Guerrieri L, Lacanna G, Marzano FS, Mereu L, Merucci L, Ripepe M et al (2021) Examples of multi-sensor determination of eruptive source parameters of explosive events at mount Etna. Rem Sensing 13(11):2097 5. Cannavo’ F, Sciotto M, Cannata A, Di Grazia G (2019) An integrated geophysical approach to track magma intrusion: The 2018 Christmas eve eruption at mount Etna. Geophys Res Lett 46(14):8009–8017 6. De Novellis V, Atzori S, De Luca C, Manzo M, Valerio E, Bonano M, Cardaci C, Castaldo R, Di Bucci D, Manunta M et al (2019) Dinsar analysis and analytical modeling of mount Etna displacements: the december 2018 volcanotectonic crisis. Geophys Res Lett 46(11):5817–5827 7. Shajahan R, Zanella E, Mana S, Harris A, de Vries BVW (2022) Anisotropy of magnetic susceptibility (ams) study of magma transport in mount Calanna dyke swarms of mount Etna, Italy. Technical report, Copernicus meetings 8. Lo Presti D, Gallo G, Bonanno DL, Bonanno G, Ferlito C, La Rocca P, Reito S, Riggi F, Romeo G (2022) Three years of muography at mount Etna, Italy: results and interpretation. Muography: Exploring earth’s subsurface with elementary particles, pp 93–108
44
K. Dhawan
9. Mattia M, Bruno V, Montgomery-Brown E, Patan‘e D, Barberi G, Coltelli M (2020) Combined seismic and geodetic analysis before, during, and after the 2018 mount Etna eruption. Geochem Geophys Geosyst 21(9):e2020GC009,218 10. Sahoo S, Tiwari DK, Panda D, Kundu B (2022) Eruption cycles of mount Etna triggered by seasonal climatic rainfall. J Geodyn 149:101–896
Sentiment Enhanced Smart Movie Recommendation System V. Ranjith, Rashmita Barick, C. V. Pallavi, S. Sandesh, and R. Raksha
Abstract Sentiment Enhanced Movie Recommendation System is the new age movie recommender that considers the sentiments of the user on a higher level and recommends the perfect movie for every user. The movies being classified vaguely on the genres, do not always represent the right taste of the user. Therefore, the sentiment enhanced movie recommender considers all the important features in a user’s watch trend, makes a cumulative calculation as to which trends the user is following and recommend the best movie. It allows users to add movies to their watchlist which is also considered for recommendation. Thus, by considering the user’s interests at a deep level, more accuracy can be achieved in recommending the best movie to users and hence engage them for a longer time in the website. The recommendation is based on user-user collaborative, item-item collaborative, item-content and user-content, implemented using SVD, KNN and similarity measures like cosine-similarity. Keywords Content-based · Collaborative based · Singular value decomposition · k-nearest neighbor · Cosine-similarity
1 Introduction Recommendation systems come under information filtering systems. These are generally used in a variety of areas such as Netflix, YouTube, Amazon, Twitter, Instagram, Facebook etc. There are essentially two kinds of Movie Recommendation Systems: The one which depends on the features or properties of the item is called as content based recommendation systems. The other system called collaborative filtering recommendation system relies on clusters of similar items so that the cluster recommendation can happen to similar items. To develop a better recommendation system, a combination of both the systems are made. Similarity between items can be computed using diverse similarity measures. This paper focuses on V. Ranjith · R. Barick (B) · C. V. Pallavi · S. Sandesh · R. Raksha Department of Computer Science and Engineering, BNM Institute of Technology, Bangalore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_4
45
46
V. Ranjith et al.
implementing content based models on cosine-similarity measures and collaborative models on matrix factorization techniques such as SVD and Nearest Neighbors approach such as KNN. Sentiment Analysis (SA) of movie reviews is another feature included in the project which makes use of Deep learning techniques to calculate the sentiment polarity score for each review. Sentiment Analysis is a process of drawing out personalized opinions in a given text. It has various applications in the areas of retrieval of information. Specifically, Sentiment Analysis determines whether an individual’s thoughts towards an object is negative, neutral or non-negative. The project made use of Long Short Term Memory (LSTM) Model to implement this model. The website includes various features which are applications of Deep Learning and Machine Learning. It includes various recommendation models, Sentiment Analysis of reviews and also Poster Search feature which makes use of a Convolution Neural Network VGG-16 for its implementation.
2 Approach The project allows user interaction through the website where the user is recommended movies based on his preferences that he provided during registration and his demographic features like Age, Sex, and Location. User is allowed to browse different movies and according to his browsing pattern, similar movies are recommended. A user is also allowed to rate and review a movie and based on the ratings, similar users are found and this user is recommended the movies watched by users similar to him. A user is also allowed to add movies to his watch list and recommendations happen based on his watchlist too. The website also includes a feature to let the users select the movies that they’ve already watched and those movies can be considered for recommendation. Since they are considered already watched, they are not recommended to the users. Apart from user specific recommendations, the website also includes generic recommendations such as movies based on popularity, on genres, and duration. Figure 1 illustrates the Data Flow Diagram.
2.1 Recommendation Algorithms The website includes four recommendation models viz. User-User Collaborative model, Item-Item Collaborative model, User-content model and Item-content model. The dataset used for Recommendation algorithms is The Movies dataset. It contains data of 45,000+ movies present in MovieLens Dataset. This is called the metadata file consisting of attributes posters, backdrops, budget, revenue, release dates, languages, production countries and production companies. Information of released movies on or before 2017 July are included in the dataset. The same dataset includes files containing 26 million ratings of 2.7 lakh users for 45,000+ movies in total. Ratings are obtained from the GroupLens website and is in a range of 1–5. Keywords file
Sentiment Enhanced Smart Movie Recommendation System
47
Fig. 1 Data flow diagram
contains the movie plot keywords for movies, available in the form of a stringified JSON Object. Credits file consists of Cast and Crew information for all movies which is available in the form of a stringified JSON Object. User-User Collaborative Model It provides appropriate recommendations with respect to the user’s interest by filtering the huge data. This model recommends the user only relevant movies depending on similar users’ choices. It first finds out similar users and predicts the movies that have to be suggested to current users. Similar users can be found by comparing the rating patterns given to different movies. Two users are similar if they have rated same movies with similar ratings. This particular model works by taking into account the ratings given by user for any movies from movie-user matrix. Fig. 2 illustrates the working of User-User Collaborative model. The algorithm that runs behind this model is based on Matrix Factorization technique which is called as Singular Value Decomposition (SVD). This approach takes M data matrix as input data and results in decomposition of the matrix to product of three different matrices as described in the equation below.
48
V. Ranjith et al.
Fig. 2 User-user collaborative filtering
M[ axb] = U[ axr] S[r xr] (V[ bxr] )T
(1)
M is of size a × b which is a matrix consisting of ‘a’ users and ‘b’ movies. U is a × r matrix containing left singular vectors. S is a diagonal matrix having only positive singular values of dimension r × r arranged in descending order in such a way that the largest value is at the beginning. V has a dimension of b x r and contains right singular vector. According to SVD, U, S, and V are unique matrices where U and V are vectors which are orthonormal by column that is U = 1, V T V = 1. Equation (1) can be reduced down to: Mk = Uk Sk (Vk )T
(2)
M k is the matrix that is the nearest linear approximation of M with rank being reduced. Users and movies are imagined to be points in the space of k-dimension after this transformation. The model was trained on the ratings file of The Movies dataset consisting of 100,000 rows from 671 users. The dataset was divided into 80% training and 20% testing set. SVD was employed to train the user-user collaborative model. It gave an accuracy of 82%. User-Content based Model It is also called as cognitive filtering. It recommends only those items in particular to which the user shows more interest. For example, if the user likes a comedy movie, having certain cast and crew, and some descriptive overview of the movie, then he will be recommended movies with similar features. This model first checks the user preference and the demographic features like Age, Sex, Location, then suggest him with the movies based on the best match. The focus
Sentiment Enhanced Smart Movie Recommendation System
49
Fig. 3 User-content based filtering
of the model is purely on the thoughts of a user and provides recommendations purely based on his interest. Fig. 3 shows how content based recommendation model works. The model makes use of cosine-similarity as the similarity measure to find out similar movies based on what a certain user likes. Cosine-similarity measures similarity between vectors having non-zero values in the inner product space. The angle between the 2 vectors is computed. The cosine of 2 vectors i and j having non-zero values can be computed as follows: i. j = i. j.cosθ
(3)
If two vectors i and j are given, the cosine-similarity, cosθ of both the vectors can be calculated as dot product and magnitude of the vectors: cosine_sim(i, j) = cosθ =
ij i. j
(4)
The model was trained on user data. It found similar users through cosine- similarity based on demographic features like Age, Sex, and Location. The watch history of top similar users was recommended to the new user. Item-item Collaborative Model It finds movies that are frequently watched together among a group of users. It creates a matrix of users and movies and makes use of K-Nearest Neighbors algorithm to find out the similar vector which means it finds out movies that are having similar ratings rated by a group of users. Therefore, these movies were watched together frequently. Hence when a movie is watched, all the movies that are watched along with this movie should be recommended. K- Nearest Neighbor makes use of Euclidean distance measure. Euclidean distance finds the least distance between the vectors that was created in the item-collaborative model. The equation is given below.
50
V. Ranjith et al.
k d= (xi − yi )2
(5)
i=1
x i represents the ith vector under consideration, yi represents the ith vector from which the distance of x i has to be computed, and d is the distance between the vectors. The model was trained using K-Nearest Neighbors algorithm. It made use of ratings file described above and was trained on the same which resulted in an output of similar movies that were frequently watched together. Item-Content based Model It finds similarity between movies based on the features they include and recommend similar movies to the movie watched. If a movie has a certain genre, cast, crew, overview, keywords, and ratings, then the model finds out movies that are similar in these aspects. The similarity measure used here is cosine-similarity. The features were first converted into vector using Count Vectorizer after which it was fit into cosine-similarity matrix m × n. The value in the matrix corresponds to the similarity score between mth movie and nth movie. The model was trained on Metadata, credits, and keywords files of the Movies dataset having features: Overview, genres, cast, crew, and keywords respectively. A similar approach as above was used to find out similar movies. The cosine-similarity matrix was generated that had distance scores as values in the matrix. The values were sorted for a given movie and the movies were recommended as similar movies in the same order.
2.2 Analysis of Sentiments Behind Movie Reviews Sentiment analysis technique is a process of identifying emotions behind a text. Long Short Term Memory (LSTM) model was used to predict the sentiments behind movie reviews. The model returns a sentiment score ranging from 0 to 1 where 0 is oriented more towards the negative reviews and 1 more towards the positive reviews. LSTM is a kind of RNN which learns long term dependencies in data. This can happen because of the underlying recurring module of the model which is made up of a combination of four layers communicating among themselves. The layers used in the Sentiment Analysis model are—Embedding, LSTM, and Dense. The model to Analyze Sentiments behind movie reviews was trained on IMDB Reviews dataset which has 50,000 rows of reviews given to the movies where 25,000 rows are labelled as 1, meaning positive reviews and 25,000 are labelled as 0, meaning negative reviews. The dataset was split into 20% test and 80% train. The model showed 98.62% accuracy. The IMDB Reviews dataset was first pre-processed to remove the noise such as HTML attributes, and special characters, the next process involved Stop Word removal, Lemmatization where the texts were reduced to their root word, and Tokenization where the sentence was converted into tokens. The dataset was then split into test and train, and passed to the Sentiment Analysis Model
Sentiment Enhanced Smart Movie Recommendation System
51
Fig. 4 Layers in the model
Fig. 5 Sentiment analysis model accuracy
as described in Fig. 4. The model was run for 5 epochs resulting in the accuracy and loss described in Figs. 5 and 6.
2.3 Poster Search The website included a feature wherein the users were able to search a movie by its poster. The underlying architecture that was used to implement this feature was a pretrained Neural Network viz. VGG-16. The features of all posters in the dataset were extracted by this model and were saved. Given an input image, the model extracted its
52
V. Ranjith et al.
Fig. 6 Sentiment analysis model loss
features and compared it with the features of the dataset to output similar matches. VGG-16 model is trained over 14 million images of ImageNet dataset having 16 layers and is 92.7% accurate. Poster Search model extracted features for about 6000 available posters from the metadata file and saved it as features. Features were extracted for any given input image and compared against these 6000 features, best matching features were returned as search results.
3 Purpose A Movie Recommendation system helps users to choose movies that they are interested in particular from a pool of available movies from various sources. It takes into account a huge set of movies, user’s demographic features and their choices. In turn it presents a subset of movies that are well suited to the user’s description and choices. The motive to develop a smart sentiment enhanced movie recommendation system is to satisfy the need of personalization. This system can adapt by itself to the changing behavior of users while interacting with them. Without the inclusion of a recommendation system in any OTT platform, users’ need for personalization is unsatisfied. It results in users browsing through a huge set of movies with no suggestions about what they might like. As a result, the users lose interest in engaging with the platform, which in turn harms the business of the service provider. Movie recommendation systems helps in clustering similar users with similar interests and the users are recommended movies based on the cluster’s likes. Most of the recommendation systems do not consider user sentiments such as his reviews given to a
Sentiment Enhanced Smart Movie Recommendation System
53
certain movie. The current approach in the project considers the sentiment of the reviews given by a user and computes a sentiment score through the LSTM trained model to give a cumulative score of total ratings and reviews given to a movie by the users who have watched them. This in turn can be used to best recommend similar movies to other users.
4 Major Research Findings Reddy et al. [1] built a recommendation system considering user preferred genres. The paper made use of a content filtering model while using genre correlation. The dataset used was Movie Lens. If a movie of certain genre is given a high rating by the user, movies of same genres are recommended to the user. Collaborative filtering model was explained by Raghuwanshi et al. [2] by making use of distance measures such as pearson or cosine-similarity, Euclidean etc. to calculate similarities between user’s choices and items in database. Their approach was creating a common community with similar likes and hence if a user belonging to that community hasn’t rated a movie, then he gets a recommendation of other movies positively rated by people of that community. The idea is to create a community that shares a common interest. Content based filtering, collaborative filtering and a hybrid of both the models were explained by Goyani et al. [3] using Deep Learning. The limitations of both the models were explained, and the means to overcome it was to build a hybrid approach. They proposed that different similarity measures when combined together produces a better recommendation system than when a single similarity measure is used. An algorithm to extract users’ profile features and group them into clusters of similar users was proposed by Zhang et al. [4] A virtual leader of each cluster represents the opinion for that cluster. This reduced the dimension of user-movie matrix because only one opinion leader represented that cluster. This additionally led to information loss reducing the accuracy considerably lower than Singular Value Decomposition. Collaborative model including a factorization form and cosine-similarity was explained by Bhalse et al. [5] The number of parameters that were considered by the model was drastically reduced and complexity was controlled. Ahmed et al. [6] proposed a methodology that uses clustering by K-means algorithm to group users into similar clusters and a neural network model was created for each cluster. Movie reviews were analyzed by Baid et al. [7] using various techniques to identify the sentiments/polarity of the tweets. The Naive Bayes classifier—81.45% accuracy, Random Forest classifier—78.65% accuracy, and K-Nearest Neighbor classifier— 55.30% accuracy. They created hybrid methods so that accuracy of the results can be increased. Rehman et al. [8] proposed a hybrid model of convolution network and LSTM network to overcome the problem happening with sentiment analysis. The Word2Vec takes in textual content and then converts it into a series of number vectors. It finds
54
V. Ranjith et al.
out word distances among different words, followed by grouping words based on their similarity in what they mean. Features that are extracted are combined by Word embedding stage of the model. The results were very promising. A sentiment model was built by Kapoor et al. [9] using the TMDB dataset to predict two class sentiments from the reviews given by users. Then the model rates movies in accordance with the sentiment and scoring of the reviews. Their approach showed an accuracy of about 85%. Specific domains heuristic based on features for aspect-level as proposed by Singh et al. [10] the movie reviews were analyzed and provided a label of sentiment based on the aspect. Aggregation of scores of each aspect given multiple reviews happen and an average profile of sentiment is created for that movie [11–15]. SentiWordNet scheme with language feature selections of two different kinds consisting of verb, adverb, adjectives and n-gram feature extraction was used in this paper.
5 Practical Implications The current work is a blend of all four recommendation models that if considered separately may not yield good results. OTT platforms require good recommendation algorithms to let the viewers be engaged in their platforms and to stay interested. The current work eliminates the problem of cold-start that is associated with new users by considering their demographic features and preferences and then recommending movies to them. The website also includes various features like poster search, audio and text search which might meet the user’s needs. Also considering the Sentiment score of reviews to check the polarity of opinion on a movie is also an interesting feature that is included in the website.
6 Research Limitations The Movies Dataset had huge files with larger number of rows due to which computation of cosine-similarity matrix required a lot of RAM which wasn’t available. Hence only 50% of the dataset was considered for implementation of Item-content model. Also, the metadata file did not have paths for all the available posters and hence only 6000 available poster paths were considered for implementation of poster search model.
Sentiment Enhanced Smart Movie Recommendation System
55
7 Originality The current work considers the sentiment score of each of the reviews given to a movie and assigns an average sentiment score to the movie. A cumulative score of average ratings and average senti-score is assigned to the movie to provide users better insight or other user’s perspective on the worthiness of the movie. The current work also implements a hybrid approach by considering all the four recommendation models hence eliminating the shortcomings of each of them when considered separately. The hybrid model recommends movies that are common to item-content, item-collaborative and user-collaborative models in accordance to a user’s likes.
8 Conclusion and Future Research Work The current work implements four recommendation models that eliminates the shortcomings of each of the models when implemented separately. The user-user collaborative model is associated with cold-start problems because there is no data on users’ ratings of movies. To overcome this user-content based model was developed that considers users’ demographic features and his chosen preferences at the beginning to recommend movies which eventually is replaced by recommendations to movies that he watches in the website. Item-content and Item-item collaborative model contributes in recommending movies that are similar to certain movie and movies watched frequently together respectively. The future work focuses on including many other features including search history recommendation, extracting the level of genres from movie subscripts, extracting the theme of movie and recommending movies based on the theme, considering the location of shots and current affairs that match the movie description.
References 1. Reddy SRS, Nalluri S, Kunisetti S, Ashok S, Venkatesh B (2019) Content-based movie recommendation system using genre correlation. Springer 2. Raghuwanshi SK, Pateriya RK (2019) Collaborative filtering techniques in recommendation systems. Springer 3. Goyani M, Chaurasiya N (2020) A review of movie recommendation system: limitations, survey and challenges, electronic letters on computer vision and image analysis, vol 19, no 3 4. Zhang J, Wang Y, Yuan Z, Jin Q (2020) Personalized RealTime movie recommendation system: practical prototype and evaluation. IEEE 5. Bhalse N, Thakur R (2021) Algorithm for movie recommendation system using collaborative filtering. Science Direct 6. Ahmed M, Tahsin Imtiaz M, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. IEEE
56
V. Ranjith et al.
7. Baid P, Gupta A, Chaplot N (2017) Sentiment analysis of movie reviews using machine learning techniques. Int J Comput Appl 179(7) 8. Rehman AU, Malik AK, Raza B, Ali W (2019) A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Springer 9. Kapoor N, Vishal S, Krishnaveni KS (2020) Movie recommendation system using NLP tools. IEEE 10. Singh VK, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews. IEEE 11. Gudla SK, Bose J, Gajam V (2017) Relevancy ranking of user recommendations of services based on browsing patterns. IEEE 12. Maddodi S, Prasad K (2019) Netflix Bigdata analytics—the emergence of data driven recommendation. Int J Case Stud Bus IT Educ 13. Huq MR, Rahman A (2017) Sentiment analysis on twitter data using KNN and SVM. Int J Adv Comput Sci Appl 8(6) 14. Parkhe V, Biswas B (2015) Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Springer 15. Pasumpon PA. Performance evaluation and comparison using deep learning techniques in sentiment analysis. J Soft Comput Paradigm 3(2):123–134
A Machine Learning Model for Predictive Maintenance of a Stepper Motor Using Digital Simulation Data B. Sivathanu Kumar, A. Aravindraj, T. A. S. Sakthi Priya, Sri Nihanth, Dhanalakshmi Bharati, and N. Mohankumar
Abstract The advantage of integration of physical world into the digital world is the ability to implement predictive maintenance on complex machinery. Predictive Maintenance is a data-intensive technique that is built using ML and AI models. It can be used to predict trends, behavior patterns, and correlations to anticipate pending failures in advance to improve the decision-making process for the maintenance activity which reduces the downtime of a device involved. Our aim in this paper is to show a methodology for identifying abnormal operation of a stepper motor, thereby preventing it from a state of mechanical stop. This Preventive Maintenance is done by considering the motors’ operation under various conditions and building a simple, reusable predictive maintenance model for the device by utilizing obtained datasets and values from a matched simulation of the used device. The outputs of the simulation done in Simulink (MATLAB) of the device are compared with real-world outputs obtained from the device and adjusted accordingly. Keywords Predictive maintenance · Random forest classifier · Support vector machine · Decision tree · K-nearest neighbors · Simulation
1 Introduction The technique of Predictive maintenance is mainly done on a device/ system during the service phase of its lifecycle [1]. It is used to predict any operational failures that may occur during the service life of the product. To indicate the time at which a particular part is going to fail, the sensory data of that particular component is obtained from a large dataset that has both working and defective devices to create a fault detection model based on machine learning techniques.
B. Sivathanu Kumar · A. Aravindraj · T. A. S. Sakthi Priya · S. Nihanth · D. Bharati · N. Mohankumar (B) Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_5
57
58
B. Sivathanu Kumar et al.
However, such fault condition data is not easily obtained from field experimentation. Allowing faults to develop, whether supervised or not, is a costly procedure that can result in equipment loss and resource waste [2–6]. The fault data obtained might be incomplete at times as well, due to certain experimental limitations. Through the use of matched simulations [7] of that particular device, the simulation will be able to generate sensory data for various fault conditions by introducing faults virtually into the simulation. We can thus obtain the necessary sensory data required for building an ML model for predictive maintenance using this method. The system will be able to identify components, or even combinations of components, that are on the verge of failing when exposed to such faulty conditions. Thus, this paper defines a technique to enable predictive maintenance on devices that have limited or incomplete fault data for training purposes. The technique proposed is used to identify the optimal classifier as well as the optimal sensor for the prediction model. Finally, the paper also highlights the usage of simulation to derive matched data from the device that can be in turn used to complete pre-existing fault datasets and increase the accuracy of the prediction model.
2 Methodology Creating a matched simulation of the device using simulation software such as Simulink on Matlab is essential [7]. This software enables users to program specific behavioral constants into the device block that can be adjusted to mimic its realworld counterpart [8]. Matching the data obtained from the simulation to the field data observed is necessary for building an accurate prediction model [9]. Adjustment of such parameters can be done by referring to the data sheets of those particular components. To simulate a fault in the system, the introduction or removal of a component needs to be done in the simulation. For example, in the case of a medical infusion pump, dynamically changing the area of the delivery tube by introducing a variable orifice (as in the case of occlusion) is one way of obtaining a faulty reading that can be used for training [7] (Fig. 1). The accuracy of a prediction algorithm increases as more fault-case scenarios are introduced to its training dataset. The obtained sensory readings can be used to complete or enhance a fault detection dataset. By pairing these readings from the simulation with the limited data obtained from the field, a more accurate machine learning model can be created. Increasing the number of training datasets by the method mentioned increases the accuracy of the overall model.
A Machine Learning Model for Predictive Maintenance of a Stepper …
59
Fig. 1 Methodology to train predictive model
2.1 Dataset Features Establishing a predictive maintenance model that is derived from physically obtained sensory data is preferred. To demonstrate the proposed prediction model, a dataset of a Unipolar Stepper Motor, from [10], has been used. The dataset used includes measurements collected from 32 unipolar stepper motors, which have been run simultaneously under 3 different environmental conditions, those being: • 10 °C, 80% humidity • 20 °C, 60% humidity • 50 °C, 40% humidity The motors used for obtaining the data are unipolar permanent magnetic stepper motor with two phases, 12 pole pairs, a 7.5° step angle and a 200 Hz speed range. The motor operates in a 2-phase full step excitation mode, and has an operation frequency of 10,000 Hz. A total of seven different signals are measured from each motor used. The signals that are monitored are as follows: • • • • • • •
Signal 1: current 1, coil 1, via shunt resistor from common port Signal 2: current 2, coil 2, via shunt resistor from common port Signal 3: voltage 1, coil 1, phase 1 Signal 4: voltage 3, coil 1, phase 2 Signal 5: voltage 2, coil 2, phase 1 Signal 6: voltage 4, coil 2, phase 2 Signal 7: vibration
The current sensors are connected across the two coils of the stepper motor, and the readings are obtained from a shunt connected to them. The voltage readings are obtained from the voltages at each end of the coils of the motor. The vibration reading is obtained from the piezoelectric sensor wrapped around the motor.
60 Table 1 Mode of operation of motor during data collection. These modes are used to annotate the dataset
B. Sivathanu Kumar et al. Mode
Description
0
Normal clockwise operation
1
Normal counter-clockwise operation
2
Abnormal clockwise operation
3
Abnormal counter-clockwise operation
The following signals are monitored and are used to predict the mechanical stop condition in the unipolar stepper motors [11]. The above signals were monitored when the motors were run in the normal working range and in the abnormal working range, by monitoring the step pattern of the motors. The normal working range of these motors are 500 steps, and certain directional changes have been introduced to make the motors exceed this range, to obtain the abnormal working readings. Identifying the mechanical stop prevents unnecessary wear and tear on the used stepper motor and can be viewed as a form of predictive maintenance of the motor. However, it is to be noted that the dataset in [11, 12 is incomplete for certain operational modes. This is where the data obtained from the matched simulation is appended, to increase the accuracy of the dataset. The operating modes used are shown in the below Table1.
2.2 Benchmarking The dataset included in [10] was used to initially train the prediction models that determined the mechanical stop of the stepper motor. This initial model was used to benchmark the data and determine the most accurate sensor and the most accurate classification algorithm. This process of benchmarking has been represented in the flowchart in Fig. 2. This benchmarking process determined the optimal sensor in the device, and concurrently, the simulation of the stepper motor was matched by replicating the readings obtained from this sensor.
3 Purpose Establishing a methodology to implement predictive maintenance on devices that do not have adequate data to create a reliable prediction model using simulation derived data is the purpose of this paper. The following steps showcase the methodology that has been used to do so, along with the results such a process produces. In an effort to simplify the process of dataset training and prediction, the four most commonly used classifiers have been considered to initially train the dataset [13]. The accuracy scores obtained as a result of training the dataset included in [10] are compared with each other to obtain the optimal classifier for predictive maintenance, as seen in Fig. 2.
A Machine Learning Model for Predictive Maintenance of a Stepper …
61
Fig. 2 Obtaining the optimal classifiers and sensors
3.1 Finding the Optimal Classifier The comparison is done between four commonly used classifiers, those being the Random Forest Classifier, Decision Tree, Support Vector Machine and the K-Nearest Neighbors classification algorithm. Their implementation is as follows: Random Forest Classifier The Random Forest algorithm is an algorithm that combines a number of decision trees on different subsets of a dataset and averages the results to increase the dataset’s predicted accuracy. The random forest collects the forecasts from each tree and predicts the final output based on the majority of votes of predictions rather than relying on a single decision tree [14]. The greater number of decision trees used will ultimately lead to an increase in accuracy and also prevent overfitting problems. This supervised machine learning algorithm allows the user to change the number of estimators used to increase the accuracy obtained. For the dataset used, a range of estimators were implemented, and the most efficient observed was for 600 estimators. Support Vector Machine The Support Vector classifier creates a decision boundary by establishing a Hyperplane [15]. Extreme points/vectors are used to create the Hyperplane. These extreme points are what constitute a support vector. The Hyperplane is the best decision boundary to classify the data points, out of all the multiple boundaries used to segregate the classes [15]. The dimension of this hyperplane depends on the number of features in the datasets. The support vectors are the datapoints that are closest to the hyperplane and they affect the position of the hyperplane.
62 Table 2 Support vector classifier parameters used in the code
B. Sivathanu Kumar et al. Parameters
Code
Value
Regularization parameter
C
1.0
Cache size
Cache size
200
Decision function shape
decision_function_shape
ovr
Maximum iteration
max_iter
−1
Tolerance for stopping criterion
tol
0.01
The Support Vector Classifier (SVC) is the function used in the classifier program. The parameters assigned to the classifier are shown in the below Table 2. Desicion Tree The structure of this classifier resembles a tree, with nodes that represent the dataset features and branches that are the decisions taken, with the leaf nodes being the outcome of those decisions. The tree represents all the possible ways to get the solution based on the given conditions. It starts as a root node and branches out further to give a tree-like structure. For the implementation of the current project, the test size used was 0.25, with the random state set to 0. The classifier was run after demarcating the values for both the test and training set, and the output observed as a result is shown in the following section. K-Nearest Neighbors This supervised machine learning algorithm uses the similarity between the new and available data to categorize the data. Classification is done on the basis of similarity. Similarity is calculated by considering the Euclidean distance of the K number of neighbors. Using this, the K nearest neighbors are considered [12, 16]. K is a value assigned when defining the function. Amongst the K nearest neighbors, the category of each of these points is noted, and the new data point is assigned to the category for which the number of neighbors is maximum. The following parameters have been set to obtain the maximum possible accuracy from the dataset used with the command KNeighborsClassifier in Python as shown in Table 3. Table 3 KNN parameters used in the code
Parameters
Code
Value
Nearest neighbors
n_neighbors
5
Leaf size
leaf_size
30
Distant metric
Metric
‘minkowski’
Power parameter
P
2
Algorithm
Algorithm
‘auto’
A Machine Learning Model for Predictive Maintenance of a Stepper …
63
3.2 Observed Outcomes These four classification algorithms were run on the dataset. The optimization of the classifiers was done, and the best one among these was chosen to proceed. Using the selected classifier, each of the sensors in the dataset from [10] was run individually using the best algorithm. The low accuracy score of certain sensors meant that those were omitted when taking the data from the simulation. Following this, the simulation data was matched to the sensory data in the dataset, and the simulation is now used to update the dataset. By using the Data Inspector function in MATLAB, sensory datasets can be obtained from the simulation. The new dataset obtained from the simulation was appended to existing datasets, and the simulation was run to complete certain entries in the dataset table. Using the modified dataset, new accuracy scores were obtained. This process increased the accuracy of the prediction model. Classifying whether the motor is operating in the abnormal region is key to determining the eminent Mechanical Stop of the Stepper Motor [10]. The ability to know if a particular motor is nearing mechanical stop by operating for extended periods in the abnormal region, by both internal and external causes, constitutes one of the methods of implementing predictive maintenance in mechanical devices. The proposed methodology deals with preventive maintenance of the stepper motor. While the current model can only predict potential motor faults, future models can be programmed to predict maintenance on other associated components within the device or system.
4 Obtained Results As seen in the predictive maintenance section, there is first the comparison between classifiers, followed by the comparison of the sensors. The final transfer learning model is then trained after this. The results obtained are as follows.
4.1 KNN Model From Sect. 3.1, we have established the method of obtaining the optimal sensor and prediction model that is to be used to run with our dataset to obtain the accuracy. Here is the first trained model of KNN [16]. The K value was set to 5, and the obtained accuracy scores are as seen in Table 4. The mode column represents the operation mode of the motor. The Accuracy Score and Hit-Ratio are the two main values observed from the Python simulation. These values are used to compare and contrast between the other classifiers.
64 Table 4 Accuracy scores obtained for KNN
B. Sivathanu Kumar et al. Mode
Precision
Recall
f1-score
0
0.93
0.95
0.94
1
0.93
0.93
0.93
2
0.90
0.83
0.86
3
0.89
0.85
0.87
Accuracy Score obtained for KNN: 92.52%
Table 5 Accuracy scores obtained for 600 estimators
Mode
Precision
Recall
f1-score
0
0.92
0.93
0.92
1
0.89
0.94
0.91
2
0.83
0.72
0.77
3
0.87
0.75
0.81
Accuracy Score for 600 Estimators: 89.53%
4.2 Random Forest Model The output for the Random Forest model is shown in Table 5, to give a representation of the accuracy obtained at 600 estimators. The accuracy score obtained is almost 90% for the 600 estimators code, with an F1 score of 0.91 and a recall of 0.9.
4.3 Support Vector Model The Support Vector Classifier was run on the dataset and the accuracy scores observed are noted for comparison as shown in Table 6. An accuracy of 60% was achieved by the algorithm. Table 6 Accuracy scores obtained for SVC
Mode
Precision
Recall
f1-score
0
0.69
0.60
0.64
1
0.55
0.81
0.66
2
0.62
0.24
0.35
3
0.37
0.04
0.08
Accuracy Score for SVC: 60.03%
A Machine Learning Model for Predictive Maintenance of a Stepper … Table 7 Accuracy scores obtained for decision tree
65
Mode
Precision
Recall
f1-score
0
0.76
0.78
0.77
1
0.74
0.76
0.75
2
0.46
0.41
0.43
3
0.46
0.41
0.44
Accuracy Score for Decision Tree: 70.58%
Table 8 Comparison of accuracy scores
Classifiers
Accuracy (%)
Recall
f1-score
Support vector
60.3
0.62
0.58
Decision tree
70.6
0.71
0.69
Random forest
89.5
0.90
0.91
KNN
92
0.93
0.94
4.4 Decision Tree The dataset was classified using the Decision Tree algorithm as shown in Table 7. An accuracy score of 71% was obtained. Thus, the classifiers were compared, and the optimal classifier was selected based on the accuracy obtained. The dataset used for these classifiers is the common dataset under the same conditions. The results of the comparisons are as follows:
4.5 Comparison of Classifiers Here, the comparison of the accuracy, recall, and F1 score of the classifiers is observed, from the values that were obtained before. The data in Table 8 shows that both the KNN model as well as the Random Forest Classifier are good options for the optimal classifier. However, the processing time of the KNN classifier was significantly lower than that of the Random Forest classifier. The Support Vector Algorithm provided the least accuracy out of the four, followed by the Decision Tree classifier. Thus, from the data observed for this particular dataset, the KNN classifier is observed to be the optimal classifier.
4.6 Comparison of Sensor Signals Selection of the optimal sensor is done using the optimal classifier obtained from the previous section. The dataset is composed of seven signals obtained from seven sensors that have been attached to each of the 32 motors under observation. The
66 Table 9 Comparison of sensors accuracy scores
Table 10 Predicted output for 3 trial inputs
B. Sivathanu Kumar et al. Signal
Accuracy (%)
Current 1
97.8
Current 2
97.3
Voltage 1
94.4
Voltage 2
93.9
Voltage 3
94.1
Voltage 4
93.9
Vibration
67.9
Trial
Input sequence
Predicted output sequence
1
[1 0 0 0 1 3 1 1 0]
[1 0 0 0 3 1 1 1 0]
2
[1 0 1 0 1 1 1 1 1]
[1 0 1 0 1 1 1 1 1]
3
[2 0 1 3 3 1 1 0 0]
[2 0 1 3 3 1 0 1 0]
dataset was now divided into seven parts, and each dataset was classified with the KNN Classifier. The accuracy scores of each of the sensors are shown in Table 9. From the above Table 9, it is observed that the Current 1 and 2 sensors showed the maximum accuracy out of all the sensors. These current readings and trained models are subsequently used to match the simulation data and complete the dataset to build the transfer learning model. The current data obtained from the matched simulations is trained on these models. Table 10 shows the result obtained.
4.7 Trained Result with New Dataset The general KNN classifier model has now been established, and the preferred sensory signal has also been observed to be the current signals across the two coils of the motor. Collecting the values of these two signals from the data inspector, we now train the labelled signals to the KNN pre-trained model. The results are as observed in Table 10.Sample of dataset obtained and used for predictions is shown in Fig. 3. The first column is the time intervals at which the data was observed, and the next column is the working condition under which the motor was run, with 0 being clockwise normal, 1 being anticlockwise normal, 2 being clockwise abnormal, and 3 being anticlockwise abnormal [10]. Column C is the motor selected, and D is the sensor in the motor selected, which here is the current sensor 1. The values that follow are the current readings obtained from the motor, and 60 readings have been observed when simulated for 10 min. This annotated dataset has the target set in column B, and Table 10 is the result of the trained model matching the column.
A Machine Learning Model for Predictive Maintenance of a Stepper …
67
Fig. 3 Sample of dataset obtained and used for predictions. Only first 4 values of the time series are shown here
5 Research Limitations Existing limitations in the work done have been the inability to create a dynamic simulation that updates the model in accordance with real-time sensory data obtained from the physical stepper motor device. The matching between the physical and virtual devices has been done using the derived sensory signals over a period of time rather than having a dynamic simulation that auto-updates the sensory values. Wireless connections between the sensors and motors can be established [17], and the predictive model can be used to control the functioning of the motor. The dataset obtained from [10] has been observed at different temperatures and different environmental stresses, as explained in Sect. 2.1. However, the simulation data obtained, while similar to the values obtained from the dataset, has not been taken under various environmental conditions. The model was trained under optimal environmental conditions of 20 °C and 60% humidity, which was the dataset used for training. The simulation of the stepper motor has been done with the help of Simulink software on MATLAB [7], specifically, the Unipolar Motor functional block in Simulink. The various constants, such as winding resistance, winding inductance, step angle, total inertia, and maximum detent torque, were all matched to obtain readings in-line with the normal working range of the dataset stepper motor. Changes that induced errors were not implemented for these constants. However, they were implemented on the systems attached to the motor in the simulation, and operational changes observed in the data inspector were used to train the models. Finally, the methodology established has been tested on only four of the most commonly used classification algorithms. Different algorithms might be able to produce even higher accuracies than those that have been observed here.
68
B. Sivathanu Kumar et al.
6 Originality The methodology established in Figs. 1 and 2 is a novel method of finding the optimal classifier that can be used for the dataset used and the optimal sensor from which further readings can be obtained. Our contribution is the novel integration of simulation data into the established machine learning models that enhance the predictive model. The accuracy of our model has been enhanced, and the resources spent to obtain this accuracy have been reduced. Table 10 shows the results obtained. Being able to select the optimal sensor, along with being able to integrate readings obtained from the matched simulation into the dataset, has greatly reduced the computation time as well as the resources needed to undergo the process of predictive maintenance. As mentioned in [1], the lack of failure case data can make this failure prediction tough, as mentioned in [9]. However, the novel method mentioned in this paper can reduce resource usage while providing a better predictive maintenance model.
7 Conclusion and Future Research Work Predictive Maintenance is one of the first applications implemented on any established cyber-physical system that utilizes AI, as it is easy to implement once the datasets are obtained and has a tangible benefit to the machine when implemented successfully. From the results obtained above, we observe that the KNN model is the best suited for our tested application of the stepper motor. We also see that it is the current sensors used for the device that provide the maximum accuracy using the said algorithm for observing the effect of mechanical stop. Thus, in further simulations to observe and potentially predict future mechanical stops, we mainly use the data obtained from these sensors and the KNN algorithm for the trained model. When a new dataset is obtained, be it a synthetic one as used in Table 10 or the one obtained from the machine in real-time via an edge computing device, the same model can be used to run the data and obtain the prediction result. Thus, a methodology to run predictive maintenance on a device has been discussed via implementing the proposed methodology on a stepper motor. The various classification algorithms as well as the results obtained from each of them have been observed. Finally, for one particular case, a prediction was run and observed for a matched simulation of the device, proving the effectivity of the methodology. Detection of abnormal operation of the motor by using the step count of the motor over a given period of time is not the only way to conduct preventive maintenance on motors. Detection of broken bars in induction motors using Neural Networks [18] is another method of conducting fault detection on motors. Future research in this particular method can be done by using more machine learning algorithms to find the optimal one that provides the highest accuracy. Being able to simulate environmental conditions within the simulation and observe the changes that occur
A Machine Learning Model for Predictive Maintenance of a Stepper …
69
under those conditions as well can be explored. Enhancement in the final training model in Sect. 4.7 can also be achieved by the use of partial-transfer learning to get more accurate results for a larger dataset. Finally, this particular method can be extended to various other devices. The same optimization method for both the classifier and the sensor can be done and predictive maintenance can be established for that device as well.
References 1. Lastra R (2019) Electrical submersible pump digital twin, the missing link for successful condition monitoring and failure prediction. Abu Dhabi, UAE 2. Mekaladevi V, Mohankumar N (2020) Real-time heart rate abnormality detection using ECG for vehicle safety. In: Third ınternational conference on ınventive systems and control (ICISC), pp 601–604 3. Kamala Nandhini S, Vallinayagam S, Harshitha H, Chandra Shekhar Azad V, Mohankumar N (2018) Delay-based reference free hardware trojan detection using virtual ıntelligence. In: Bhateja V, Nguyen B, Nguyen N, Satapathy S, Le DN (eds) Information systems design and ıntelligent applications. Advances in ıntelligent systems and computing, vol 672. Springer, Singapore 4. Yuvaraju EC, Rudresh LR, Saimurugan M (2020) Vibration signals based fault severity estimation of a shaft using machine learning techniques. Mater Today Proc 24(2):241–250. ISSN 2214-7853 5. Praveenkumar T, Sabhrish B, Saimurugan M, Ramachandran KI (2018) Pattern recognition based on-line vibration monitoring system for fault diagnosis of automobile gearbox. Measurement 114:233–242. ISSN 0263-2241 6. Abidi MH, Mohammed MK, Alkhalefah H (2022) Predictive maintenance planning for ındustry 4.0 using machine learning for sustainable manufacturing. Sustainability 14(6):3387 7. Kamthamraju R (2021) Modeling an infusion pump. https://in.mathworks.com/videos/series/ modeling-an-infusion-pump.html 8. Miller S (2019) Mathworks: predictive maintenance using a digital twin. https://www.mathwo rks.com/company/newsletters/articles/predictive-maintenance-using-a-digital-twin.html 9. Aivaliotis P, Georgoulias K, Chryssolouris G (2019) The use of digital twin for predictive maintenance in manufacturing. Int J Comput Integr Manuf 10. Goubeaud M, Grunert T, Lützenkirchen J, Joußen P, Ghorban F, Kummert A (2020) Introducing a new benchmarked dataset for mechanical stop detection of stepper motors. In: 27th IEEE ınternational conference on electronics, circuits and systems (ICECS), 2020, pp 1–4 11. Abbate R, Caterino M, Fera M, Caputo F (2022) Maintenance digital twin using vibration data. Proc Comput Sci 200:546–555. ISSN 1877-0509 12. Taunk K, De S, Verma S, Swetapadma A (2020) A brief review of nearest neighbor algorithm for learning and classification. In: International conference on ıntelligent computing and control systems (ICCS), pp 1255–1260 13. Rado O, Neagu D (2019) On selection of optimal classifiers. In: Bramer M, Petridis M (eds) Artificial ıntelligence XXXVI. SGAI 2019. Lecture notes in computer science, vol 11927. Springer, Cham 14. Chaudhary A, Kolhe S, Kamal R (2016) An improved random forest classifier for multi-class classification. Inf Process Agric 3(4):215–222. ISSN 2214-3173 15. Kononenko I, Kukar M (2014) Statistical learning. In: Kononenko I, Kukar M (eds) Machine learning and data mining, chap 10. Woodhead Publishing, pp 259–274 16. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful ınternet systems
70
B. Sivathanu Kumar et al.
2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture notes in computer science, vol 2888. Springer, Berlin, Heidelberg 17. Vinothkanna R (2020) Design and analysis of motor control system for wireless automation. J Electron 2(03):162–167 18. Amanuel T, Ghirmay A, Ghebremeskel H, Ghebrehiwet R, Bahlibi W (2021) Comparative analysis of signal processing techniques for fault detection in three phase induction motor. J Electron 3(01):61–76
Impact of Pollutants on Temperature Change and Forecasting Temperature of US Cities Tanmaay Kankaria, Bandla Vaibhav Krishna, Duppanapudi Surya Teja, D. V. S. Dinesh Chandra Gupta Kolipakula, and R. Sujee
Abstract Planet Earth has endured extensive climatic changes throughout history. According to the 2018 IPCC report, human activities have accounted for 0.8 °C to 1.2 °C of global warming above pre-industrial levels. Since 1901, the global average temperature has surged by an average rate of 255 K per decade. With the rise in levels of air pollutants, temperatures have altered the chemical composition of the atmosphere. However, current reports and theories only point out the relations between this temperature change and greenhouse gases. By exploiting pollution and temperature datasets in a machine learning model, we will scrutinize the correlations amongst the radical elements causing a surge in temperature. Keywords Climate · Temperature · Pollution · Machine learning
1 Introduction Our project aims to evaluate the effect of pollution on temperature levels by finding the correlation between four major pollutants. Next, we predict the average temperature for a given city in the United States over a specified period. In addition, we find the cities in the United States that will experience a significant temperature change from 2013 to 2023. The U.S. pollution dataset has four air pollution categories that are Nitrogen Dioxide (NO2 ), Ozone (O3 ), Sulphur Dioxide (SO2 ), and Carbon Monoxide (CO). Nitrogen Dioxide is produced during combustion processes like those used for power generation, and transportation. Ozone is created when pollutants are released from vehicles, and factories react with photo chemicals. Sulphur dioxide is usually a T. Kankaria (B) · B. V. Krishna · D. S. Teja · D. V. S. Dinesh Chandra Gupta Kolipakula · R. Sujee Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected]; [email protected] R. Sujee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_6
71
72
T. Kankaria et al.
byproduct of fossil fuel combustion and volcanic activity Carbon monoxide is a tasteless, odorless, and colorless toxic gas created by the partial combustion of carbonous fuels. As we will analyze the air quality level in each category, we shall only use the Air Quality Index (AQI) as defined by the U.S. Environmental Protection Agency (EPA). The EPA also sets National Ambient Air Quality Standards to keep these pollutants in compliance with the Clean Air Act. The NAAQS sets permissible concentrations of these pollutants in the air. Using Seaborn and Matplotlib, we will find the associations between the pollutant attributes using a correlation plot. Since we have seasonal time-series data, we have chosen to use the SARIMA model to forecast temperature. For example, the average temperature of yesterday has a high correlation with the temperature of today, so we will use the regression parameter to forecast future temperatures. The time-series dataset will be split into training, validation, and test set. After training the model, we will use the last 5 years’ data to do the validation and test. Accordingly, we will make an extrapolation for the future and compare it to the test set.
2 Literature Review The first article talks about how nanoparticles are being used to extract dye from water, and carbon from the air, and are slowly proving to be useful for combating threats to Earth’s health. It gives an insight into how nanoparticles can tackle the problem of toxic greenhouse gas-related emissions and how effective they can be concerning various compounds of the gases. However, Muralidharan [1] only mentions if greenhouse gases have an impact on climate change. The second article discusses how the impact of air pollution varies between different demographics based on biological and economic vulnerabilities and the type of climate threats. It maneuvers on targeting the cause of climate change like the effects of greenhouse gas releases. Patz and Madeleine [2] mentions fruitful resources that can be used to spread awareness about climate change and the topics that must be developed and integrated into the core curriculum of public health physicians, nurses, and other related sectors’ workforce. The third article conveys that pollutants such as black carbon, methane (CH4 ), tropospheric ozone (O3 ), and aerosols impact the amount of sunlight that is exposed to the earth. Due to this, the Earth’s temperature is increasing, which is, in turn, causing the melting of glaciers and icebergs. This article considers several pollutants such as carbon monoxide (CO), Sulphur dioxide (SO2 ), etc., and analyzes the impact of each pollutant on human lives daily. It also discusses the sources of the exposure of each pollutant and also how climate and air pollution are connected to each other. However, Eljarrat [3] fails to find the correlation between these pollutants and temperature change. In the fourth article, the U.S. Global Change Research Program (USGCRP) shows the scientific understanding behind the impact of climate change on human health, and also highlights social inconsistencies. It provides a comprehensive quantitative estimation of observed health impacts in the US. Further, Holdren [4] informs public health officials, urban response planners, and other concerned personnel, about
Impact of Pollutants on Temperature Change and Forecasting …
73
the risks that climate change presents by analyzing temperature and water-related deaths. However, the report makes no policy recommendations for climate change mitigation or economic valuation. We also cannot understand how large the impact on air quality has been on climate change. The fifth report mentions that climate change is responsible for altering five components of the environment, i.e., oceans, air, water, ecosystems, and weather. This report also aims to both acknowledge and alleviate the environmental effects of climate change in the US. The report is classified into eleven broad human health categories like asthma, cancer, heat-related mortality, etc. Each category has its ties to climate change and identifies the basic and applied research needs, as well as cross-cutting issues. Christopher and Thigpen [5] analyzes the likelihood of the occurrence of diseases related to increasing pollution. However, the report’s approach does not tell the exact cause of the illness that affects humans. The health consequences are not exhaustive, and some of the needs of the research may be notional. While searching for which algorithms to implement to build a forecasting model, we analyzed the following papers. Ananthakrishnan et al. [6] used a data-driven algorithm called Dynamic Mode Decomposition which doesn’t require any model training to predict the temperature in a particular area. Nallakaruppan and Ilango [7] utilizes a web system to process and analyze climate data which is further fed into a regression model. Ashok and Prathibhamol [8] builds a custom LSTM network to predict the stock prices amount of historical data. To predict the monthly prices of areca nut, Sabu and Manoj Kumar [9] compares different machine learning models like Holt-Winter’s Seasonal method, ARIMA, and LSTM. Dhanya [10] utilizes a roll-over technology that overlaps the old information with new data during model training, thus increasing the bitcoin price forecast accuracy. The IoT air monitoring system implemented in Ramana [11] uses machine learning algorithms like K-NN and Naive Bayes to check pollution levels in the air. The convolutional neural network built-in Lai and Chen [12], integrates fog computing with an IoT architecture to handle issues of big data processing and scalability. The ARIMA model proposed in Salmana and Kanigoro [13] implements the grid search technique to optimize the values of p, d, and q used in weather visibility forecasting. The hourly temperatures predicted in Hippert et al. [14], are the output of a hybrid forecasting system that merges multilayer neural networks and linear models. The ANN built-in Pandey et al. [15], depends on several basic parameters like the type of the pollutant, and its concentration, to define air pollution control devices.
3 Design/Methodology/Approach Figure 1 shows the step-wise architecture diagram followed in our paper right from importing the datasets used to forecasting temperature using the SARIMA model.
74
T. Kankaria et al.
Fig. 1 Architecture diagram
3.1 Data Pre-processing and Cleaning The U.S. pollution dataset was documented by the United States Environmental Protection Agency and we downloaded it from Kaggle. This data has four air pollution categories that are Nitrogen Dioxide (NO2 ), Ozone (O3 ), Sulphur Dioxide (SO2 ), and Carbon Monoxide (CO). It has a total of 28 attributes, namely: State Code, County Code, Site Num, Address, State, County, City, Date Local, NO2 Units, NO2 Mean, NO2 1st Max Value, NO2 1st Max Hour, NO2 AQI, O3 Units, O3 Mean, O3 1st Max Value, O3 1st Max Hour, O3 AQI, SO2 Units, SO2 Mean, SO2 1st Max Value, SO2 1st Max Hour, SO2 AQI, CO Units, CO Mean, CO 1st Max Value, CO 1st Max Hour, CO AQI. The Global Land Temperatures By Country and Global Land Temperatures By City time-series datasets were put together by Berkeley Earth and we again downloaded them from Kaggle. It contains the following attributes: date, AverageTemperature, AverageTemperatureUncertainty, Country and date, AverageTemperature, AverageTemperatureUncertainty, City, Country, Latitude, and Longitude respectively. Through the process of dimensionality reduction, we reduced the initial raw dataset to more manageable groups for processing. We removed unnecessary attributes, assigned the correct data types, and treated missing values in each attribute using imputation methods that use mean, median, and mode, assigning a unique category or pruning the data points. We also removed the duplicates and treated the outliers using the Interquartile range, KDE plot, and scatter plots.
Impact of Pollutants on Temperature Change and Forecasting …
75
Fig. 2 Top 10 polluters of SO2 by US State
3.2 Visualizing Patterns and Trends We plotted the US states that are the biggest polluters by the mean value of AQI for CO and SO2 . Figure 2, NO2 , and O3 and the changes in the level of pollution from 2000 to 2016. To do so, we first calculated the mean value of pollutants for every state. After adding four subplots, i.e., one plot for each pollutant, we sorted the AQI values in descending order for each pollutant and then visualized the patterns and trends found using line and joint plots. Similarly, we also identified and tracked the trend of the average values of CO, SO2 , O3 , and NO2 from 2000 to 2016 in Fig. 3. The highest average pollution level for the years 2000–2016 was for ozone, i.e., between 40.51 AQI and 45.36. Next was for NO2 (from 28.38 AQI to 37.99 AQI) and then for SO2 (from 12.89 AQI to 18.55 AQI). The lowest level was for CO and it was between 6.44 AQI to 17.70 AQI. For all pollutants except ozone, AQI decreased from 2000 to 2016. For ozone, the level of AQI for 2016 is practically the same as in 2000. From Fig. 4, we can observe lower temperatures around November and February and higher between May and September.
3.3 Time-Series SARIMA Model SARIMA stands for Seasonal Autoregressive Integrated Moving Average. While analyzing the monthly temperature variations through the years, we identified that the series has some seasonality, i.e., low temperature in winter and high in the summer. Our SARIMA model is capable of handling seasonality in the dataset. This model is built on four parameters (S, p, d, q) as follows:
76
T. Kankaria et al.
Fig. 3 Mean AQI of pollutants in the US
Fig. 4 Monthly temperature variations between 1900 and 2013 in the US
S—Denotes the seasonal period where we are going to calculate the p, d, and q terms. AR (p)—Denotes the seasonal autoregression of the time-series. We used a value of two for p. I (d)—Denotes the seasonal difference and is used when we have a strong and stable time-series pattern. Since our time-series was stationary, this value was set to zero. MA (q)—This factor adds a multiple of the error to the forecast. We used a value of two for q as well.
Impact of Pollutants on Temperature Change and Forecasting …
77
3.4 Evaluating and Optimizing the SARIMA Model The SARIMA model is evaluated by a moving average model and the root mean square error (RMSE). The model takes a percentage of the errors between the predicted values against the real. It assumes that the past errors are going to be similar in future events. We tuned the hyperparameters in the SARIMA model to minimize the error percentage between the predicted values against the real as shown in Figs. 5. and 6. To optimize the forecasts, we created a baseline forecast in the validation set. In our simulation, we have a smaller error compared to this. It considers the previous month as a base forecast for the next month. To further optimize the time-series forecast, the series must be stationary (constant mean, variance, and autocorrelation). Using the Ad-fuller function, as the P-Value was lower than 5%, the series was considered stationary and the model’s RMSE was reduced. In Fig. 7, the QQ plot of the residuals shows a normal pattern with few outliers. The errors are between −0.5 and +0.5 while the temperature increases.
Fig. 5 The predictions fit well on the current values
Fig. 6 The linear distribution of error versus predicted values
78
T. Kankaria et al.
Fig. 7 QQ plot of residuals
4 Major Research Findings 4.1 Co-relation Between Pollution and Temperature Ozone has the highest average pollution level (45.36 AQI) from the years 2000 to 2016. It is followed by NO2 (36.11 AQI), then SO2 (18.55 AQI), and lastly CO (17.70 AQI). Tennessee is the biggest polluter of Ozone. Kentucky is the biggest polluter of SO2 . For NO2 , it is Arizona and the District of Columbia for CO. We analyzed the effect of pollution on temperature change in New York from 2013 to 2023. On visualizing the trend in NYC, we could see in Fig. 9 that pollution has not had a major effect on increasing temperature change. Even though the average temp has been increasing, the reason cannot be attributed to pollution in the case of NYC. This is confirmed through the correlation plot in Fig. 8. There is a negative or inverse relationship between pollution and temperature.
4.2 Forecasting Temperature in the US In the US, we could also confirm that there is a constantly increasing trend in the average temperature, as it surged from 8.5º to 9.5º, that’s 12% in over 110 years (1900–2011). The series showed that in the US, lower temperatures exist from November to February and higher temperatures between May to September. We implemented the SARIMA model city-wise, where the user can choose the city for which they want to forecast the temperature. For example, taking NYC, we first checked if the series is stationary. Using the adfuller function, we noticed that the P-Value is lower than 5% and thus the series is stationary. As shown in Figs. 10 and 11, we also plotted the month-wise trend through the years in NYC. When we forecasted the future temperature values for New York in
Impact of Pollutants on Temperature Change and Forecasting …
79
Fig. 8 Inverse co-relation between Pollution and temperature in New York City
Fig. 9 Line plot showing the relationship between pollution and temperature in NYC
Fig. 12, we could infer that the summers seem to be getting cooler and the winters seem to be getting warmer over the next decade. Using Plotly, we created an interactive map as in Fig. 13, that lets the user hover over the data points to view the top-10 US cities that are going to be up against the most temperature change till 2023 (10 years from 2013).
80
Fig. 10 Cooler summers in NYC in 2023
Fig. 11 Warmer winters in NYC in 2023
Fig. 12 The predicted temperature in NYC in 2023
T. Kankaria et al.
Impact of Pollutants on Temperature Change and Forecasting …
81
Fig. 13 US cities with the most temperature change
5 Practical Implications The practical implications of our findings are that we now know which pollutants should we focus on mitigating first and also whether certain pollutants actually impact temperature against common belief. Further, through our study, we can now help the top-10 US cities that are going to be up against the most temperature change to focus on green measures to decrease the temperature in the future. A recommendation would be that governments should start utilizing these findings and that they should be included in the official energy utilization guidelines.
6 Research Limitations/Implications Since time-series data related to climate is often seasonal, using the SARIMA model to forecast temperature is more precise or more efficient than existing methods and this could enable more researchers globally to study this problem.
7 Originality/Value The originality and value of our work are shown in two ways. First, existing research work focuses on finding the impact of only greenhouse gases/pollutants on temperature change. However, we focused on only four specific pollutants (which aren’t greenhouse gases). Secondly, the SARIMA model forecasts temperature factors in the seasonality in time-series datasets.
82
T. Kankaria et al.
8 Conclusion and Future Research Work Our project explored climate change and its impact on the environment by using data science and machine learning technologies. We exploited large pollution and temperature datasets with an emphasis on visualizing the mined data. After preprocessing the data, we found that the average temperature in the US has increased by 12% indicating the need for imminent action to curb the rising temperatures. The SARIMA time-series model forecasted that summers’ are getting cooler and winters warmer in the US. We measured the computational accuracy of this model by the Mean Squared Error which was 4.678029776148646 and the Mean Absolute Error which was 1.5779911716209696. The predictions were well fit on the current values of temperature. We also noticed that the east coast is going to experience the most temperature change in 10 years. Further, using Pearson’s correlation coefficient we analyzed the association between temperature and pollution levels. The next step would be to extend our work to other countries and accordingly, action pointers can be created to tackle the problem of climate change.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Muralidharan P (2020) Alleviating climate change and pollution with nanomaterials Patz JA, Madeleine C (2018) Climate change and health: moving from theory to practice Eljarrat E (2018) Environmental and health ımpacts of air pollution Holdren JP (2016) The ımpacts of climate change on human health in the United States: a scientific assessment Christopher J, Thigpen K (2010) A human health perspective on climate change Ananthakrishnan S, Geetha P, Soman KP (2020) Temperature forecasting using dynamic mode decomposition Nallakaruppan MK, Ilango HS (2017) Location aware climate sensing and real time data analysis Ashok A, Prathibhamol CP (2021) Improved analysis of stock market prediction: (ARIMALSTM-SMP) Sabu KM, Manoj Kumar TK (2020) Predictive analytics in agriculture: forecasting prices of arecanuts in Kerala Dhanya NM (2020) An empirical evaluation of bitcoin price prediction using time series analysis and roll over Ramana RM (2020) IoT based aır and sound pollutıon monıtıorıng system usıng machıne learnıng algorıthms Lai K-L, Chen JIZ (2021) Development of smart cities with fog computing and ınternet of things Salmana AG, Kanigoro B (2020) Visibility forecasting using autoregressive ıntegrated moving average (ARIMA) models Hippert HS, Pedreira CE, Souza RC (2000) Combining neural networks and ARIMA models for hourly temperature forecast Pandey S, Srivastava A, Sharma AK, Srivastava JK (2014) Modeling of ambient air pollutants through artificial neural network in residential area of Ujjain City
Hybrid Precoding Schemes for mmWave Massive MIMO Systems—A Comprehensive Survey V. Baranidharan, K. P. Nithish Sriman, V. Sudhan Siddarth, P. Sudharsan, M. Krishnan, and A. B. Tharikaa Srinithi
Abstract The mmWave massive Multi Input Multi Output (MIMO) systems play a vital role in beyond 5G communications. The prevalence of massive MIMO systems needs to extend its support to additional transmitting and receiving antennas by utilizing the radio frequency chains. This action will increase its complexity, elevate hardware costs, and reduce the spectrum efficiency. Hence, the precoding methods are developed to address these limitations. This research study reviews the different precoding methods that were recently reported in the research literature. Furthermore, this study discusses about the merits and demerits of precoding methods based on phase modulation arrays, principle component analysis, deep learning approaches, adaptive sub-connected structures, and precoder/combiners. The study also provides a detailed overview on the implementation, performance analysis, complexity analysis, error and signal to noise ratio analysis. Finally, the proposed study has been concluded by stating the future research directions for massive MIMO precoding techniques. Keywords mmWave · MIMO · Precoding · Modulation array · Adaptive strucutures · Deep learning · Combiners
1 Introduction The demand of multimedia and related services has grown at a rapid pace. This technology has been implemented in various real- time applications such as healthcare, smart cities, smart energy meters and measuring equipment, and so on. To increase the capacity, data rates, effective spectrum utilization, and energy levels, these services will need to improve the capabilities of communication equipment. To achieve all of these requirements, a beyond 5G technology based Massive Multi V. Baranidharan (B) · K. P. Nithish Sriman · V. Sudhan Siddarth · P. Sudharsan · M. Krishnan · A. B. Tharikaa Srinithi Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathy, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_7
83
84
V. Baranidharan et al.
Input and Multi Output (MIMO) concepts are required [1]. The primary objective of MIMO systems is to increase the number of antennas in order to attain high spectral efficiency, effective spectrum sharing, and connectivity. This technology enables the use of mmWave technology in the frequency range of 30–300 GHz [2]. Different types of signals are received from various antennas at the same time. The researchers have employed many detection and precoding algorithms at the receiving end to identify the necessary data symbols while deleting irrelevant signals and interferences [3]. Massive MIMO technology employs a greater number of antennas or a single or several antenna terminals in a single receiver functioning on the same frequency range. It is expected that the number of base stations should always be greater than the number of antennas deployed at the user terminals of each cell. The receiver side requires more processing time to recognize and interpret the necessary data symbols [4]. This demand will also drive the development of complex precoding algorithms to address receiver-side constraints. The major goal of the proposed work is to focus on the massive MIMO systems and analyze several existing precoding methods. It covers the theory underlying the recently developed precoding techniques. Furthermore, the complexity and simulated metrics are thoroughly discussed for each precoding technique. Section 2 describes the mmWave massive MIMO systems and their properties. Section 3 provides a detailed discussion on some recently proposed precoding techniques. Finally, in Sect. 4 , the article concludes with a discussion on future research possibilities of precoding techniques in massive MIMO systems.
2 mmWave MIMO Systems: An Overview To establish connectivity with all the user terminals in mmWave massive MIMO systems, a large number of antennas are installed in each and every base station. In massive MIMO designs, all the targeted beams present in a certain region are utilized to connect the terminal users [5] in order to gain more benefits. Some of them are discussed in this section, Spectral efficiency: The 5G mmWave massive MIMO systems are able to achieve high spectral efficiency than the other communication systems by exploiting more number of transmitting and receiving antennas and multiplexing its directive gain [6]. This effective utilization of the antennas will directly connect each and every user equipment with individual downlink and uplink of end user terminals. This property will automatically improve the spectral efficiency 10 times higher than the conventional schemes. Energy efficiency: By directing the antenna array beam, more signals are delivered to its end terminals in massive MIMO systems. Such beams are often transmitted into small regions. When compared to the power radiated, these 5G communication technologies require less power. When a large number of antennas are used in massive
Hybrid Precoding Schemes for mmWave Massive MIMO …
85
MIMO systems, the beams tend to move at the same speed, allowing them to be directed towards different users. Using such small antenna configurations in massive MIMO systems will reduce transmission power nearly 1000 times of 4G systems [7]. Cost efficiency: Massive MIMO systems utilize low-power amplifiers, which are not utilized in 4G technology. Furthermore, 4G technology requires expensive equipment and are much larger in size when compared to 5G technology. Reliability: The massive MIMO systems have a high diversity gain due to the large number of antennas. This will automatically increase the link reliability and elasticity over the network interference and fading [8]. Robustness: In 4G systems, the unintended interference and jamming will terminate the signals obtained from other sources. However, in 5G mmWave massive MIMO systems, the orthogonality of the channels at the receivers and relatively narrow beams handles the powerful signal hacking and eavesdropping attacks. Signal processing: Large antenna arrays are extensively employed in massive MIMO systems, wherein base stations are connected to a variety of receiving terminals or end users. The propagation matrix is asymptotically orthogonal under various favorable propagation assumptions to eliminate the interference effects such as quick fading, slow fading, uncorrelated noise, and thermal noise.
3 mmWave Massive MIMO Precoding Schemes The multi-antenna based transmission systems will generally support different types of multi-stream transmission present in the beamforming and is called as Precoding. This will handle various massive MIMO systems that are used to decrease route loss, interference, and enhance throughput. Base stations can remove Co-channel State Information (CSI) from the Uplink pilot signals provided to receive signals [9]. The base stations do not have accurate channel state information at various user terminals. The downlink performance of the base station depends on the estimated CSI. Further, the Base Station (BS) will exploit the precoding techniques to mitigate the channel interferences and increase the spectrum efficiency. This massive MIMO systems will be converted into a high computational complexity based systems, which will be directly proportional to the number of antennas [10]. To overcome these issues, the low complexity precoder is widely used to exploit the massive MIMO systems.
86
V. Baranidharan et al.
3.1 Hybrid Precoding with Phase Modulation Array The hybrid mmWave precoding system consists of base stations, which is occupied in NT antennas to serve a mobile station with NR antennas as in Fig. 1. Ns is the data streams, which are independent to transmit the signal from RF chains (i.e. NS ≤ NRF < NT ). Two different types of phase shifters are used in this hybrid precoding scheme with phase modulation array [11]. They are considered as fully connected hybrid precoding and sub-connected hybrid precoding. In this fully connected hybrid precoding, every RF chain signal is fed to NT antennas in order to obtain a full beamforming gain. In the sub-connected hybrid precoding, every RF chain will drive the asset of antennas with M = NT /NRF elements. The phase modulation array is illustrated in the fully and sub-connected hybrid precoding, where the switch is connected to the delay lines and further the complex programmable logic device controls the delay lines. The modulation time period TP adjusts the selected delay lines. The enhanced orthogonality-based data transmission can control the proposed algorithm to propose a fully connected phase modulation precoding array. In the sub-connected phase modulation array-based hybrid precoding technique, the enhanced orthogonality constraint will be no longer suitable. Hence, a new effective optimization algorithm is developed to decompose the entire problem that has been formulated into two different sub-problems, which can be widely iterated multiple times to attain the near-optimal and effective solutions. The fully connected hybrid precoding algorithm outperforms the sub-connected model when compared to analog precoding. The PMA-based hybrid precoding can attain a good spectral efficiency when compared to the existing systems. The zero forcing algorithms calculate the spectral efficiency for PS inbuilt multi user hybrid precoding method in mmwave massive MIMO systems. In the multi-user mmwave MIMO system, the PMA-based hybrid precoding can attain a good spectral efficiency by using a phase shifter. Fig. 1 Hybrid precoding with phase modulation array. Source [11, p. 2]
Hybrid Precoding Schemes for mmWave Massive MIMO …
87
3.2 Principal Component Analysis Based Hybrid Precoding Principal Component Analysis (PCA) is the process that has been widely used to decrease the mmwave massive MIMO systems dimensionality for processing large datasets and changing the large set of different variables into a smaller set as in Fig. 2. The PCA-based hybrid precoder/combiner design investigates both the fully connected and partially connected subarray structures. To improve the spectral efficiency performance in an adaptive subarray, a simple hierarchical clustering algorithm is used for achieving an effective optimization. This study has used a downlink channel-based mmWave massive MIMO system with a standard end to end synchronization model by considering both the base stations and the user deployed Uniform Planer Array (UPA) schemes. UPA will assist in combating the frequency selective fading channels of the OFDM adopted antenna [12]. NT = NTV * NTH is the number of antennas equipped at base station (BS) and NT_RF ”, “ 0) is dependent on the task’s computational complexity. Drone dr0 requests that adjacent drone’s dri that can act as fog nodes work together to achieve Task0 . These
Blockchain-Based Remote Construction Monitoring Using UAV …
137
neighboring drones, designated by a set D = {dr1 , dr2 ,...,drN }, are equipped with storage and navigation. The drone’s CPU frequency dr0 is denoted by F0 . In a similar vein, the frequency of nearby drones is given by a set freq = {F1 , F2 ,..., FN }. The drone’s coordinate is dr0 (X0 , Y0 , Z0 ). The three-dimensional coordinates of the drones available nearby are C = {(X1 , Y1 , Z1 ), (X2 , Y2 , Z2 ), · · ·, (XN , YN , ZN )}. Our drones’ signal route was determined as follows: Original signal → Transmitters amplifier gain (AG) → (feedline loss) → TX antenna gain(Gain) → path loss from travel between antennas (PL) → RX antenna gain (Rx) → (preamp?) → Receiver. • The transmitter amplifier gain in a drone uses radio waves to send commands wirelessly to the Radio Receiver, which is attached to the drone being controlled remotely. • The loss of power or voltage of a transmitted wave or current as it travels through a transmission line or path, or through a circuit device, is referred to as feedline loss. • When compared to a theoretical antenna, antenna gain is the ability of the antenna to radiate more or less in any direction. • Any electromagnetic wave’s power density decreases as it travels over space, which is known as path loss. • The RX antenna gain measures how well the antenna converts radio signals from one direction into electrical power. • The volume control is always included in the preamp. • The receiver on a drone is an electronic device that receives radio signals from the drone controller via built-in antenna. AG ∗ Gain1 ∗ Gain2 ∗ PL = Rx is the max − range equation. The drones dr0 and dri coordinated to perform cloud computing in order to finish Task0 jointly, according to the optimal work allocation system. Note that the low-cost drones’ flying speeds are quite slow, and their relative positions are generally fixed, rather than continually changing. Meanwhile, the activities we analyzed are latencysensitive and are processed in general in a very short amount of time. As a result, during the incredibly brief period between task beginning and task completion, the status of the entire drone swarm will remain unchanged. Construction Monitoring Tasks Assignment Flowchart for drone classification and task allocation is shown in Fig. 3.
138
L. S. Beevi et al.
Fig. 3 Flowchart for drone classification and task allocation
4 Results and Discussion The use of computer vision techniques in interior situations is fraught with difficulties. For UAV-based progress tracking, the development of dependable and lowcost vision techniques is crucial since it ensures timely and precise measurement of construction status, allowing for situational awareness. This paper developed a set of algorithms that can automatically recognize the components of interior partitions that are still being built, as well as their overall progress of the partition is also automatically identified with a + 95 percent accuracy rate. Construction Progress indoor Monitoring is shown in Fig. 4. In Fig. 5, we illustrate the execution of the remote monitored site in Ardupilot SITL Simulation and the simulated image explores the possibility of enhancement of monitoring remote construction site without the need for physical investigation. In this scenario, we have modeled the influence of ambulant UAV in ground-to-network and network-to-ground communications with combined network functionality. The scenario can be exemplified for multi-UAV communications, multi-network environments, and integrated IOT applications. The code snippet for the simulation is given below. sim_vehicle.py –v Arduplane—console-map // -v choose the kind of vehicle.
Blockchain-Based Remote Construction Monitoring Using UAV …
139
Fig. 4 Construction progress indoor monitoring
Fig. 5 Construction drone Ardupilot SITL simulation with XBLink
sim_vehicle.py –v Arduplane –f quadplane—console–map // -f changing the frame type, list of frames are plane,copter,rover. cd ArduCopter. sim_vehicle.py-LBallarat–console–map // setting vehicle start location. sim_vehicle.py-vArduPlane–console–map–osd // simulating on board OSD, does not allow multiple screen.
140
L. S. Beevi et al.
paramload../Tools/autotest/default_params/copter.parm // loading the parameter by copter, change the parameter as plane or rover. The Copter is capable of performing the complete spectrum of flight requirements, including fully autonomous complex missions that can be planned through a multitude of actual software base stations, quickly FPV races, silky aerial photography, and more. The complete package is made to be secure, full of features, flexible enough for unique applications, and getting easier even for beginners to use. Additionally, Copter supports additional peculiar frame types like Single and Coax-Copters. paramsave./myparams.parm // save the parameter. // Setting the GPS location. paramsetEK3_SRC1_YAW2. paramsetGPS_AUTO_CONFIG0. paramsetGPS_TYPE17. paramsetGPS_TYPE218. paramsetGPS_POS1_Y-0.2 paramsetGPS_POS2_Y0.2 paramsetSIM_GPS_POS_Y-0.2 paramsetSIM_GPS2_POS_Y0.2 paramsetSIM_GPS2_DISABLE0. paramsetSIM_GPS2_HDG1. statusGPS_RAW_INT// Reboot SITL. The MOUNT STATUS signals from SITL begin to be sent as a result of pivot simulation. Instead of real calculated data, these signals carry the orientation determined by the most recent orders given to the gimbal. The genuine gimbal position may well not correspond as a result, for example, if a command is disregarded or the gimbal is manually moved. File Format does not show changes. paramsetRNGFND_LANDING1 // Rangefinder with a 50 m peak value. You must restart SITL after performing the aforementioned adjustments. Graph RC_CHANNELS.channel3_raw// graphing vehicle state. The majority of simulator backends allow users to alter their running pace. Sim SPEEDUP should only be adjusted when necessary. 1 indicates standard wall-clock time. Five represents five times real time. 0.1 represents 1/10th of real time. Servoset51250. Unresolved issues with UAV flights, as well as some suggestions for how to solve them, are shown in Table 1.
Blockchain-Based Remote Construction Monitoring Using UAV …
141
Table 1 Unresolved issues with UAV flights, as well as some suggestions for how to solve them Experiencing issues
Solutions that have been suggested
Shadows and reflections abound during daylight hours, thus the camera only observes the parts of the home that aren’t obscured by shadows and sunlight
Control the lighting. X-ray imaging with backscatter
Saturation and vignette susceptibility of the camera’s detectors
Create cameras tailored to the needs of unmanned aerial vehicles (UAVs). Instead of retractable lenses, micro four-thirds cameras have fixed interchangeable lenses
UAVs have a limited battery life, especially in wide fields
Additional solar-powered methods should be developed. Adopt a group of UAVs that will work together. Develop wireless charging technologies. Develop energy-efficient unmanned aerial vehicles (UAVs)
In multispectral and hyper spectral cameras, several lenses generate band-to-band offsets
Methods for radiometric, atmospheric, and geometric corrections should be improved
Radiometric, atmospheric, and geometric correction methods should all be improved
To create exact digital landscape models, combine structure from motion (SfM) with ground control points (GCPs)
4.1 A Comparison of a UAV Drone with a Satellite While drones often require the assistance of a pilot, satellites are completely selfcontained. Once in orbit, satellites rotate around the Earth, using sophisticated lenses and sensors to picture the planet and transmit data to the ground station. “Drone” is the umbrella colloquial name used by the FAA for all remotely piloted aircraft. This industry term was coined by the FAA to designate any aircraft that does not have a pilot on board, regardless of size, form, or capability. Several interchangeable terms (UAS (Unmanned Aircraft System), RPA (Remotely Piloted Aircraft System), or UAV) are used under this umbrella term. The physical characteristics of the building, its location, the initial financial investment, and so on. Potential benefits, Interest in acquiring new abilities, and the knowledge curve Transfer rate of data, when comparing the UAV drone to the satellite in terms of data privacy and security, the UAV is the clear winner (Fig. 6).
5 Conclusion and Future Work Many factors influence productivity in the construction sector. When it comes to high performance and productivity, collaboration is crucial. Collaboration techniques have not always been ideal, and there are a few factors that have a significant impact on the process, resulting in bad collaboration. People in the sector are hesitant to collaborate
142
L. S. Beevi et al.
Fig. 6 Quality-wise comparison between UAV drone and satellite
for a variety of reasons, including a lack of trust or human mistake. In the construction business, UAV tools may be utilized to improve collaboration and coordination, with visualization being the most effective mode of communication because people comprehend images. Producing 4D models, cross-sections, and 3D visualizations, among other things, aids in improving communication within a company. From the on-site personnel to the client, UAV can increase communication for all parties inside a business. Aside from creating visual assistance, level 2 UAV also necessitates a shared data environment in which all data is stored, ensuring that all parties have access to all data, and once changed, all updates are available, minimizing translation loss by allowing all teams to see changes immediately. Experts in the field of unmanned aerial vehicles (UAVs) recommend using the Building Information Model because it improves collaboration and coordination within their organization.
References 1. Li T et al (2019) Lightweight security authentication mechanism towards UAV networks. In: Proceedings of 2019 international conference on networking and network applications, Daegu, South Korea, 2019, pp 379–84. https://doi.org/10.1109/NaNA.2019.00072 2. Atoev S et al (2019) The secure UAV communication link based on OTP encryption technique. In: Proceedings of 2019 11th international conference ubiquitous and future networks, Zagreb, Croatia. https://doi.org/10.1109/ICUFN.2019.8806165 3. Lin C, He D, Kumar N, Choo K-KR, Vinel A, Huang X (2018) Security and privacy for the internet of drones: challenges and solutions. IEEE Commun Mag 56(1):64–69 4. Reyna A, Mart´ın C, Chen J, Soler E, D´ıaz M (2018) On blockchain and its integration with IoT. Challenges and opportunities. Fut Gener Comput Syst 88:173–190 5. Kapitonov A, Lonshakov S, Krupenkin A, Berman I (2019) Blockchainbased protocol of autonomous business activity for multi-agent systems consisting of UAVS. In: 2017 Workshop on research, education and development of unmanned aerial systems (RED-UAS). IEEE, pp 84–89 6. Liang X, Zhao J, Shetty S, Li D (2017) Towards data assurance and resilience in IoT using blockchain. In: Military communications conference (MILCOM), MILCOM 2017–2017 IEEE. IEEE, pp 261–266 7. Gharibi M, Boutaba R, Waslander SL (2016) Internet of drones. IEEE Access 4:1148–1162
Blockchain-Based Remote Construction Monitoring Using UAV …
143
8. Hall RJ (2016) An internet of drones. IEEE Internet Comput 20(3):68–73 9. Alladi T, Chamola V, Sahu N, Guizani M (2020) Applications of blockchain in unmanned aerial vehicles: a review. Vehic Commun 100249. https://doi.org/10.1016/j.vehcom.2020.100249 10. Garg S et al (2020) Secure and Lightweight Authentication Scheme for Smart Metering Infrastructure in Smart Grid,” IEEE Trans. Industrial Informatics, 2019, vol. 16, no. 5, May 2020, pp. 3548–57. 11. Miao Y et al (2020) Smart micro-GaS: a cognitive micro natural gas industrial ecosystem based on mixed blockchain and edge computing. IEEE IoT J. https://doi.org/10.1109/JIOT.2020.302 9138 12. Garg S et al (2019) SDN based secure and privacy-preserving scheme for vehicular networks: a 5G perspective. IEEE Trans Vehic Tech 68(9):8421–8434 13. Garg S et al (2019) A hybrid deep learning based model for anomaly detection in cloud datacentre networks. IEEE Trans Netw Serv Manage 16(3):924–935 14. Aggarwal S et al (2019) A new secure data dissemination model in internet of drones. In: Proceedings of IEEE ICC 2019, Shanghai, China. https://doi.org/10.1109/ICC.2019.8761372 15. Zuev A, Karaman D (2018) Practical application of the graphic processing unit for data encryption on the UAV on-board computer. In: Proceedings of 2018 international scientific-practical conf. problems of infocommunications, science and technology, Kharkiv, Ukraine, 2018, pp 765–70. https://doi.org/10.1109/INFOCOMMST.2018.8632091 16. Liu H et al (2018) Opportunistic relaying for low-altitude UAV swarm secure communications with multiple eavesdroppers. J Commun Netw 20(5):496–508. https://doi.org/10.1109/JCN. 2018.000074 17. Lee I, Lee K (2015) The internet of things (IoT): applications, investments, and challenges for enterprises. Bus Horiz 58(4):431–440 18. Irizarry J, Gheisari M, Walker BN (2012) Usability assessment of drone technology as safety inspection tools. J Inf Technol Constr (ITcon) 17(12):194–212 19. Gai K, Wu Y, Zhu L, Choo K-KR, Xiao B (2021) Blockchain-enabled trustworthy group communications in UAV networks. IEEE Trans Intell Transp Syst 22(7):4118–4130. https:// doi.org/10.1109/TITS.2020.3015862 20. Sivaganesan D (2020) Wireless UAV rotary wing communication with ground nodes using successive convex approximation and energy saving mode. IRO J Sustain Wirel Syst 2(2):100– 106. https://doi.org/10.36548/jsws.2020.2.006 21. Shakya S (2019) Efficient security and privacy mechanism for block chain application. J Inf Technol 1(02):58–67 22. Meyer T, Brunn A, Stilla U (2022) Change detection for indoor construction progress monitoring based on BIM, point clouds and uncertainties. Autom Constr 141:104442. ISSN 0926-5805. https://doi.org/10.1016/j.autcon.2022.104442
Papaya Diseases Detection Using GLCM Feature Extraction and Hyperparatuning of Machine Learning Approach Snehal J. Banarase and S. D. Shirbahadurkar
Abstract Agriculure is an important and growing sector for the researcher. Diseases detection at an early stage among crops is an essential step to protect the crop for conservation of food. Papaya is a fruit with high nutritional and medicinal value. Innovative approach is presented here to identify diseases of papaya fruit and leaf using machine learning model, which is concatenation of grey level co-occurrence matrix (GLCM) feature extraction and hyperparatunning of machine learning classifiers. Total of 8 classes of healthy and diseased papaya leaf and fruit are considered for diseases identification and classification purposes. Total of 16 features are extracted using GLCM and for classification purpose machine learning algorithms are used. Hyperparatunning with Support Vector Machine (SVM) and Random Forest (RF) gives the more precise results than rest of the machine learning classifiers. The overall accuracy found is better than existing methods. For papaya leaf and fruit diseases detection, SVM with hyperparatunning gives 91.47% of accuracy and RF provides the accuracy of 90.22%. Performance analysis is measured with the cross validation and different statistical parameters. Keywords Papaya diseases classification · Grey level co-occurrence matrix feature extraction · Machine learning · Hyperparatuning
1 Introduction The agricultural field is the biggest economic source for Indians. It is a backbone for the Indian economy. Nowadays this field has a wide scope for new researchers, where day by day new things are discovered to help farmers. But it’s very difficult to reach actually to the farmers because of illiteracy among people about upcoming technologies. Unfortunately, most of the farmers are not aware of this technology which can be utilized in a real field to minimize their efforts for the number of applications. While doing survey, it is found that most of the research is done in the S. J. Banarase (B) · S. D. Shirbahadurkar ZCOER, SPPU, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_12
145
146
S. J. Banarase and S. D. Shirbahadurkar
horticulture field, food grains, cash crops, etc. As per the research review, it is found that the research in the floricultural field is less as compared to the other field. So, the proposed system deals with the detection of diseases in the floricultural field which is focused on ‘Papaya’. Here we considered papaya leaf and papaya fruit diseases for detection purpose. Papaya is a fruit with a high nutritional value and medicinal value. The proposed system detects diseases in papaya with the help of different machine learning techniques. Plant diseases may occur on any region of the plant, i.e., leaf, fruit, flower, stem, branch, roots, etc. The traditional method for plant diseases detection is done with the naked eye only. Most of the people think that there is no need to take expert’s suggestions for plant diseases recognition. Because of this farmers can’t identify various diseases like fungal, bacterial, and viral diseases at their early stage of infection. Identification of diseases at their initial stage is very much important if it is not detected then it may hamper the annual production which ultimately results in food loss as well as high economical loss. The objective of this research is to identify diseases in papaya leaf and fruit using GLCM features [5] at the initial stage only. To characterize the texture of an image, GLCM functions extract statistical measures from the matrix. For this purpose machine learning approach is proposed which deals with the different machine learning (ML) algorithms like K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision tree (DT) [1–5], SVM, RF, and Naïve Bayes (NB). To validate the result K-fold cross validation [12] is used for diseases detection in papaya leaf. Hyperparantuning is performed to achieve maximum accuracy for papaya diseases detection. The r paest of the paper is described as follows. Section 2 represents the related work. Section 3 describes the proposed methodology. Section 4 provides results and discussion. Section 5 discusses the conclusion and future scope.
2 Related Work The main purpose of this review is to present different machine learning approaches which are focused on different but some closely related objectives, which are efficient to identify commonly infected diseases in fruits. This research is mainly focused on solving multiclass classification problems, where eight different classes of papaya leaf as well as fruit are considered for detection purposes. In review, it is found that less research is done on papaya diseases detection and classification. Also, very few studies are done on multiclass problem resolution. Some researchers have done detection of maturity status of fruit. Also, very few classes of only diseased leaf or the only diseased fruit are considered for evaluation purposes. In 2018, Shima Ramesh et al. [1] proposed different machine learning algorithms to find out the healthy or diseased stages of a leaf. Their algorithm’s goal is to detect anomalies in greenhouses plants or their natural environment. To avoid occlusion, plain background is taken. 160 images of papaya leaves are used to train model using random forest classifier. The model had a classification accuracy of about 70%. To improve accuracy large
Papaya Diseases Detection Using GLCM Feature Extraction …
147
number of images and other local features and global features are used. Histogram of an Oriented Gradient (HOG) is used for feature extraction. In 2020, [2] compared some methods for papaya illness identification, as well as to classify them using an intelligent system based on their disorders. To overcome this, diagnosing papaya disease entails two key challenges: sickness detection and diseases classification. The suggested system designed an online model for papaya illness detection system with a mobile app. It also compares the accuracy of several machine learning algorithms. This intelligent technology is capable of quickly detecting diseases, with high accuracy of 98.4% in predicting papaya disorders. In 2020, Behara et al. [3] detected maturity status of papaya fruit using transfer learning and ML approach. Three stages of papaya fruit are mentioned immature, primary mature, and mature. GLCM, HOG, and LBP feature extraction methods are used in the machine learning approach. Naïve bayes, KNN, and SVM are used as a machine learning classifiers. Using HOG and KNN with the machine learning approach, the proposed system got 100% accuracy. Total 300 images are used out of that 100 images per class are used for experimentation purposes. GoogleNet, AlexNet, ResNet101, ResNet50, ResNet18, VGG16, and VGG19 are used in the transfer learning approach. Out of these VGG19 performs best which gives an accuracy of 100%. In 2018 Habib et al. [4] gives papaya diseases recognition based on machine vision. Decision tree, Naïve Bayes, and SVM are used for diseases recognition. KNN is used for the segmentation purpose. To address the problems of farmers who are far away, they presented an online model. Diseases like black spot, powdery mildew, anthracnose, phytophthora blight, brown spot are considered for the recognition purposes. Using SVM 90.15% accuracy is found. In 2020, Mall et al. [5] worked on medical x-ray image classification based on the GLCM feature extraction method and machine learning approach. Four main stages are considered for the model and with this the GLCM features are extracted. For the evaluation of system, several machine learning algorithms are considered. In 2019, Rauf et al. [6] stated citrus diseases recognition using machine learning approach. Images of healthy and diseased are measured in the dataset. In 2020, Sari et al. [7] described fuzzy-based papaya diseases detection. Fuzzy NBC was used to perform the classification. The system received an accuracy of 80.5%. Rath et al. in 2020 [8] provided a quick overview of strategies offered in research publications from 2010 to 2019, with a focus on state-of-the-art development. Different strategies for fruit identification, classification, and grading are compared in the associated papers. This report also discusses existing successes, constraints, and research ideas for the future. State-wise fruit production report is included. Production survey for different fruits and vegetable production report in India is presented in a good manner. In 2022, Krishnan et al. [9] detected banana leaf illnesses using MATLAB. The classification step follows the picture acquisition, preprocessing, and segmentation phases, and is used to further classify the disorders. TGVFCMS (total generalized variation fuzzy C means) is used for segmentation, and then the CNN approach is used to classify diseases based on the segmented pictures. Experiments are conducted on the banana images and validated using accuracy, sensitivity, and specificity. According to the trial
148
S. J. Banarase and S. D. Shirbahadurkar
findings, the CNN obtained an accuracy of 93.45%, whereas conventional techniques only reached 75–85% accuracy. In 2020, Iqbal et al. [10] developed potato diseases detection using image segmentation and machine learning. 450 images of potatoes are taken to the framework. Training and testing dataset divided into 80% and 20%, respectively. For classification purposes, seven classifier models are utilized. Out of all machine learning algorithms, random forest gives the most elevated precision of 97% over the testing dataset. Three classes of potato leaf are considered. For understanding the execution of our framework superior, and to build an automatic system tenfold cross validation strategy is used. In 2021, Snehal et al. [11] did a review on different ML and DL techniques which are used for fruit plant leaf diseases detection systems. The survey is based on the floricultural field, for this purpose referred papers are from different journals like Springer, web of science, Scopus from the year 2017–2020. More than 20 papers are referred for this review. The different classifiers from the machine learning and deep learning domain are presented in a good manner. Along with this various feature extraction and segmentation techniques used are also noted in the same. In 2021, Sharifah et al. [12] developed citrus diseases identification model in real field. A CNN is used to classify the diseases based on their symptoms. The developed model gives detection accuracy of 94.37% and precision accuracy of 95.8%. Fivefold cross validation strategy is used to validate the results. In 2021, [13] proposed convolutional neural networks and k-means clustering to recognize and detect license plates in the vehicles. In 2019, [14] summarizes the pros and cons of plant disease detection with CV approach. Problems of overfitting or overtraining occurred with NB, SVM, and GA. In 2018 [15] presented deep CNN with conventional classifiers like random forest, SVM, and AlexNet. Cucumber diseases are identified using leaf symptom images with DCNN. Degradation in accuracy is found for unbalanced data. In 2019, [16] stated global pooling dilated along with AlexNet is used. They found higher recognition rate and learning rate for cucumber leaf diseases. The result shows lesion detection failure. In 2019 proposed model for apple leaf diseases detection [17] found an accuracy of 78.80%. Low recognition accuracy was found for alternaria leaf spot and grey spot. Lesion problem is found for alternaria leaf and grey spot.
3 Proposed Methodology The explanation for the proposed methodology is presented in this section. Figure 1 shows working flow of the proposed system for papaya leaf and fruit diseases detection system. The first and most important step is data acquisition followed by preprocessing. After that feature extraction with GLCM is carried out which is focused on total of 16 texture features. Recognition and classification is performed by various machine learning algorithm, i.e., KNN, SVM, RF, LR, and DT. For performance analysis k-fold cross validation is taken along with this different parameters like ROC curve, Accuracy, Precision, F1 score are considered.
Papaya Diseases Detection Using GLCM Feature Extraction …
149
Fig. 1 Proposed methodology
3.1 Data Acquisition and Pre-processing Total 8 classes of healthy and diseased papaya leaf and fruit are considered for diseases identification and classification purpose which are shown in Table 1. Total 401 images are used for the papaya healthy and diseased category, out of that papaya leaf images and papaya fruit images are 287 and 114, respectively. The dataset referred to is from the kaggle papaya dataset. Training and testing dataset took as 70% and 30%, respectively. Images are usually classified on the basis of different features like texture, shape, and color. For plant, leaf texture is identified as a best suitable feature which includes several textural properties. Preprocessing of images is an important set whenever we are dealing with images. Distortion can be minimized by preprocessing. The performance of the diseases detection system varies with a background of images and its capturing conditions. Label encoding is done; it is a very important step in machine learning for the structured dataset. Label encoding converts label into a numeric form that is machine-readable form. Sample dataset images taken are as shown in Fig. 2. The dataset referred from: 1. Queensland govt. dept. of agricultural and fisheries. 2. https://www.sciencedirect.com/science/article [4]. 3. Referred dataset: https://data.mendeley.com/datasets/7dxg9n2t6w/1. Table 1 Dataset description against each of its disease class
S. N.
Disease
No. of images
1
Fresh papaya leaf
64
2
Papaya black spot
54
3
Papaya leaf curl
33
4
Papaya ring spot
72
5
Powdery mildew
64
6
Fruit black spot
24
7
Fruit ring spot
49
8
Fruit powdery mildew
41
9
Total images
401
150
(a) Fresh Papaya Leaf
(e) Powdery Mildew
S. J. Banarase and S. D. Shirbahadurkar
(b) Papaya Black Spot
(f) Fruit Black Spot
(c) Papaya Leaf (d) Papaya Ring Spot Curl
(g) Fruit Ring Spot
(h) Fruit Powdery Mildew
Fig. 2 Sample dataset ımages
3.2 GLCM Computation for Feature Extraction For feature extraction purposes we need to convert RGB images into grayscale images and after that GLCM computation can be carried out. GLCM is highly important for preprocessing remote sensing data. It is a second-order statistical equation that gives the relation of grey levels in neighboring pixels of an image. The matrix was made by mixing the frequency of a pair of successive pixel values with a predetermined orientation. Total 16 GLCM features are taken into consideration like contrast, contrast value, contrast patch (area), and its correlation. Similarly for other parameters like correlation, energy, entropy, dissimilarity, etc. their correlation, patches, and location are considered. For feature scaling different methods are available namely standard scaler, robust scaler, min max scaler, max abs scaler. Out of these, we have done feature scaling with max abs scaler from sklearn library (MaxAbsScaler), i.e., maximum absolute value. This feature scales each feature individually and uses its maximum absolute value. This estimator doesn’t shift or center data, so this can keep sparsity. The number of grey levels plays an important role in GLCM computation. Maximum grey value of a pixel deternines GLCM’s size. Additional layers would improve the precision of the recovered textural information. Gray-Level Co-occurrence Matrix can be defined as:
n−y m−x
P(i, j|x, y) =
n=0
m=0
(1)
Papaya Diseases Detection Using GLCM Feature Extraction …
151
A{1 if fm, n = i and fm + x, n + y = j, 0 elsewhere}; where P(i, j|x, y) is relative frequency and (i, j) is the intensity , (x, y) is pixel distance, and(x, y) occur within a given neighborhood. Basic GLCM textural features are evaluated in this study, as indicated below 1. Contrast (or) Inertia: Contrast =
N −1
p(i, j)(i − j)2
(2)
i, j=0
2. Homogeneity: Homogeneity =
N −1
Pi, j 1 + (i − j)2 i, j=0
(3)
3. Dissimilarity: N −1
Dissimilarity =
|i − j|P(i, j)
(4)
i, j=0
4. Energy: Energy =
N −1
(P(i, j))2
(5)
i, j=0
5. Entropy: Entropy =
N −1 i, j=0
P(i, j)log(P(i, j)
(6)
6. Correlation: N −1 (1 − μi ) j − μ j Pi, j Correlation = σi σ j i, j=0
(7)
where i, j are element of symmetrical GLCM; μ = GLCM mean; N is the number of gray levels in an image, and σi σj arethevarianceoftheintensitiesofallreferencepixels.
152
S. J. Banarase and S. D. Shirbahadurkar
Fig. 3 Error rate versus K value
3.3 Model Training with Different ML Classifiers The performance of any system is based greatly on classifiers employed and applicability of various classification models to detect infected leaves. To evaluate diseases detection multiclass classification problem is considered. This technique deals with one versus rest classifiers, which contains fitting only one classifier per class. The one class is fitted against all other classes for each classifier. This technique is efficient for the computation purpose; it has the advantage of being interpretable. While doing matching each classifier uses individual class for inspection purpose. This is the most frequent multiclass classification approach, and it performs better than other existing methods. Six different ML models are trained for classification purposes. In KNN, the best suitable value found from the error rate graph as shown in Fig. 3 is k = 4, where k is the mean of all neighbors. To calculate information gain ‘entropy’ category is used. In SVM we have used a numeric value of C = 2, SVM deals with 3 kernels namely radial basis function (RBF), linear, and polynomial. The polynomial kernel provides good accuracy. Using cross validation (cv = 10) and hyperparatunning with SVC gives the best classifications accuracy. Random forest is the most flexible method and easy to use. This model selects random samples from the training dataset and gets predictions. For multiclass node impurity may occur, so to remove node impurity ‘Gini index’ criterion is used and N = 200 estimators are referred for classification purposes. With the NB classifier we can balance the curve. A collection of classification algorithms can be done with Naive Bayes. Logistic regression uses continuous and discrete datasets to generate probabilities and based on the probabilities it can classify new data. LR finds most efficient variables for classification. This may be used to categorize observations based on different forms of data. DT solves regression and classification problems. This algorithm uses a tree-like structure to display the predictions which splits branches on the basis of different features, max_depth = 3 is used for splitting purpose. Root node is a starting phase, and a leaf decision is an end. Information gain can be calculated with
Papaya Diseases Detection Using GLCM Feature Extraction …
153
entropy which is a decisive factor in which attribute should be chosen as a decision node or root node since it reduces uncertainty given in a feature. The Receiver Operating Characteristic (ROC) curve is one of the most popular metrics which evaluates the performance of the learning algorithm and it checks the classifier’s output quality. ROC is a graph in between false positive rate (FPR) and true positive rate (TPR). TPR represents positive data which is correctly predicted as a positive value w.r.to all positive data. While FPR presents negative data which is wrongly predicted as positive value w.r.to all negative data. The range of TPR and FPR lies between 0 and 1. FPR and TPR can be calculated by using the above Eqs. (8) and (9). In FPR, FP is a false positive value and TN is a true negative value. In TPR, TP is a true positive value and FN is a false negative value. FPR =
FP FP + T N
(8)
T PR =
TP T P + FN
(9)
Along with the ROC curve, some statistical parameters also predict algorithm’s performance and these may test using a confusion matrix and several machine learning techniques. These parameters are as shown in Eqs. (10)–(13). Accuracy is the correct prediction observation to the total observations, accuracy find a portion of correctly classified values. Precision predicts positive value and tells how often the model is right. Recall gives how often does the model predicts the correct positive values. F1 Score is the average of precision and accuracy, it’s a harmonic mean of recall and precision. To analyze classifier performance, we have trained individual classifiers by providing train, test, and split datasets. Accuracy : A = (T P + T N )/T otal N umber
(10)
Precision : P = T P/(F P + T P)
(11)
Recall : R = T P/(F N + T P)
(12)
F1 Score : F1 = 2 ∗ R ∗ P/(R + P)
(13)
To validate the classifier’s performance K-fold cross validation is used. It is a statistical method for estimating the performance of ML models. It is widely used in applied ML to compare and select a model. It is easy to understand, and implement, which uses the simple method of train, test, and split. With k-fold cross validation process we have divided dataset for 5 different values of k, i.e., k = 5, 10, 15, 20, and 25.
154
S. J. Banarase and S. D. Shirbahadurkar
4 Result and Discussion In this study, we analyzed the performance of different ML algorithms using GLCM and hyperparatunning of different ML algorithms to detect papaya diseases. Experimentation is carried out on eight classes of papaya leaf and papaya fruit. Total 404 images are used for the papaya healthy and diseased categories. The performance accuracy of different machine learning classifier models accuracy is illustrated in Table 2. It is found that SVM with hyperparatuning and RF classifiers provide the best accuracy of 91.47% and 90.22%, respectively. Along with this KNN gives better accuracy of 88.38%. These three models are performing good as compared with the rest of the machine learning classifiers. The ROC curve of the proposed model for total of 8 classes of papaya leaf and fruit are as shown in Fig. 4. Total classes of papaya leaf and fruit are stated as powdery mildew, fruit black spot, papaya black spot, fruit ringspot, fresh papaya leaf, papaya leaf curl, papaya ringspot, and fruit powdery mildew. We have calculated these ROC curves for individual classes to compare each class performance. Powdery mildew (h) in fruit got 100% coverage while in leaf (a) it is only 83%. Similarly for other diseases we have got an average score of 91.50% coverage. After training and validation weighted average is calculated for all the classifiers. Figure 5 shows Precision, Recall, and F1 score Comparison for the individual ML classifiers. From this, it is seen that SVM, KNN, and RF model provides good precision accuracy of around 89%. Recall accuracy is also good in SVM and RF. NB and SVM classifier provides 87% of accuracy while RF gives the best score of 89%. We have performed cross validation on different ML classifiers which is as shown in Fig. 6. Total fivefold are considered, i.e., k = 5, 10, 15, 20, and 25. For the data sample, the k value must be carefully chosen. The wrong choice of k results into a poor representation of the model’s skill. The initial step to determine the number of folds used is to choose a value of k. After this, partition of the data takes place depending on number of k. If the value of k = 5, then the dataset will get shuffled by five groups before splitting. It is seen that the accuracy of the classifier varies with different values of k. If we increase the value of k individual classifier performance may improve or it may get hampered. For k = 10 all classifier’s accuracy is improved than for k = fivefold. For 15 fold LR accuracy is reduced while for the rest of the classifiers accuracy is improved. For k = 20 only SVM has provided improved accuracy. For Table 2 Performance accuracy
S.N.
Name of classifier
Accuracy (%)
1
KNN
88.38
2
SVM
91.47
3
RF
90.22
4
NB
86.67
5
LR
82.87
6
DT
81.49
Papaya Diseases Detection Using GLCM Feature Extraction …
155
(a) Powdery Mildew Black Spot
(b) Fruit Black Spot
(c) Papaya
(d) Fruit Ring Spot Curl
(e) Fresh Papaya Leaf
(f) Papaya Leaf
(g) Papaya Ringspot
(h) Fruit Powdery Mildew
Fig. 4 Plot of a ROC curve for the different classes of papaya
25 fold only SVM classifiers accuracy is reduced while other classifiers’ accuracy is improved. The proposed system with other state of art machine learning model performs better (83.33%) than the model proposed [1] which gives accuracy of 70% which have considered similar size of dataset. Maturity status of papaya fruit [3] is considered, and the proposed system achieved an accuracy of 100%. Developed system can only detect maturity status of papaya fruit; the diseases identification and classification is not considered. When compared to model proposed based on machine vision [4] using SVM, our model gives more promising accuracy for all eight classes of papaya [5]. GLCM feature extraction is performed on medical images, where the developed system gives an average accuracy of 62% for SVM classifier [7]. Agro medical expert
156
S. J. Banarase and S. D. Shirbahadurkar
Fig. 5 Precision, recall, and F1 score comparison
Fig. 6 Cross validation performance
system is presented to recognize papaya’s diseases. The system attained an accuracy of 80.5%. Developed potato diseases detection [10] using image segmentation and machine learning, tenfold cross validation strategy is used. Elevated precision of 97% is achieved for RF classifier, but only 3 classes of potato are considered. When compared to the developed CNN model [12] detection accuracy of 94.37% and precision of 95.8% is found, and our model slightly underperforms because we have considered less dataset.
Papaya Diseases Detection Using GLCM Feature Extraction …
157
5 Conclusion and Future Scope The proposed system can effectively classify eight distinct classes of papaya leaf and fruit images. Here, we have presented the GLCM feature extraction matrix which effectively extracts 16 different features for exact diseases recognition and classification. All ML classifiers have given good performance accuracy as compared with the existing methods. With the hyperparatunning approach, SVM achieves an accuracy of 91.47% and RF provides 90.22% of accuracy. The reliability and consistency of the model are analyzed through ROC, precision, recall, and f1 score. K-fold cross validation is used to validate the classifier’s performance for different values of k. The overall performance of the model is good, and computational time can be improved further. The inclusion of real-time images can be done in the future for accurate diseases detection. Using some optimization technique, we could enhance the result. By applying larger training dataset, this model could achieve better accuracy. Acknowledgements We would like to acknowledge the MAHAJYOTI, Nagpur for supporting the research work. We have referred datasets from Mendeley and King Saud University.
References 1. Ramesh S, Hebbar R, Niveditha M (2018) Plant disease detection using machine learning. ICDI3C 41–45 2. Islam MA, Islam MS (2020) Machine learning based ımage classification of papaya disease recognition. ICECA. ISBN: 978-1-7281-6386-4 3. Behera SK, Rath AK, Sethy PK (2020) Maturity status classification of papaya fruits based on machine learning and transfer learning approach. Inf Process Agric. Elsevier 4. Habib MT, Majumder A, Jakaria AZM (2018) Machine vision based papaya disease recognition. J King Saud Univ Comput Inf Sci. Elsevier 5. Mall PK, Singh PK, Yadav D (2020) GLCM based feature extraction and medical XRAY ımage classification using machine learningtechniques 6. Rauf HT, Saleem BA, Ikram Ullah Lali M (2018) A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning 7. Sari WE, Kurniawati YE, Santosa PI (2020) Papaya disease detection using fuzzy Naïve Bayes classifier. ISIRTI 42–47 8. Behera K, Rath A, Mahapatra A, Sethy PK (2020) Identification, classification & grading of fruits using machine learning & computer intelligence: a review. J Amb Intell Human Comput. Springer 9. Krishnan G, Deepa J, Rao PV (2022) An automated segmentation and classification model for banana leaf disease detection. J Appl Biol Biotechnol 213–220 10. Iqbal MA, Talukder KH (2020) Detection of potato disease using ımage segmentation and machine learning. Carleton University UTC from IEEE Xplore 43–47 11. Banarase MSJ, Shirbahadurkar SD (2021) A review on plant leaf diseases detection and classification. IJARESM. ISSN: 2455-6211, 2785-92 12. Farhana S, Rahman SA, Hesamian MH (2021) Citrus disease detection and classification using end-to-end anchor-based deep learning model. Appl Intell. Springer 13. Chen JIZ, Zong JI (2021) Automatic vehicle license plate detection using K-means clustering algorithm and CNN. J Electr Eng Autom 3(1):15–23
158
S. J. Banarase and S. D. Shirbahadurkar
14. Kaur S, Pandey S, Goel S (2019) Plants disease identification and classification through leaf images: a survey. Arch Comput Meth Eng 26:507–530 15. Maa J, Dua K, Zhenga F, Zhangb L, Gongc Z, Suna Z (2018) A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput Electron Agric 154:18–24 16. Zhanga S, Zhangb S, Zhangc C, Wanga X, Shia Y (2019) Cucumber leaf disease identification with global pooling dilated convolutional neural network. Comput Electron Agric 162:422–430 17. Jiang P, Chen Y, liu B, he D, Liang C (2019) Real-time detection of apple leaf diseases using deep learning approach based on ımproved convolutional neural networks. IEEE J 7
Image Forgery and Image Tampering Detection Techniques: A Review S. Hridya Nair, Kasthuri A. S. Nair, Niharika Padmanabhan, S. Remya, and Riya Ratnakaran
Abstract Image forgery is the process of manipulating digital images to obscure critical information or details for personal or business gain. Nowadays, tampering and forging digital images have become frequent and easy due to the emergence of effective photo editing software and high-definition capturing equipment. Thus, several image forgery detection techniques have been developed to guarantee the authenticity and legitimacy of the images. There are several types of image forgery techniques; among them, copy-move forgery is the most common one. The paper discusses the types of image forgery and the methods proposed to detect or localize them, including Principal Component Analysis (PCA), DCT, CNN models like Encoder-Decoder, SVGGNet, MobileNet, VGG16, Resnet50, and clustering models like BIRCH. A comparison of different detection techniques is performed, and their results are also observed. Image Forgery Detection in the Medical Field is a significant area for smart health care. Keywords Copy-move image forgery · SVGGNet · MobileNet-V2 · Balanced iterative reducing and clustering using hierarchies (BIRCH) · Support vector machines (SVM) · Encoder-decoder · Convolutional neural network (CNN) · Convolutional neural network with error level analysis (CNN_ELA) · Convolutional neural network with error level analysis and sharpening pre-processing techniques (CNN_SHARPEN_ELA) · Principal component analysis (PCA) · Discrete cosine transform (DCT)
1 Introduction It is a huge challenge to tell if an image is original or doctored. The tampered images can be identified by detecting discrepancies in image features like lighting levels, brightness levels, edge changes, etc. But these manipulations of images will not leave any visual traces of changes. Detection techniques, therefore, have a great deal S. H. Nair (B) · K. A. S. Nair · N. Padmanabhan · S. Remya · R. Ratnakaran Computer Science and Engineering, Amrita School of Engineering, Amritapuri, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_13
159
160
S. H. Nair et al.
of significance. There are Camera-based, Pixel-based, Physical environment-based, Format-based, and Geometry-based image forgery detection techniques. Different deep learning and machine learning approaches are also enforced to identify forged images on different real-world datasets. Mainstream media and the Internet are experiencing an increase in image forgeries. This has adversely affected the security and credibility of digital images. Therefore, advocating different image forgery detection techniques for verifying digital images’ authenticity is critical.
2 Image Forgery Detection Kumar and Cristin [1] define the theoretical concepts of image forgery.
2.1 Problems in Detecting Forgery 1. To protect the rights, it is essential to know the source of the original image, but there will be a large number of images, so finding the origin becomes difficult. 2. The images will be mostly compressed with many resolutions applied to them. So the need for an open dataset is essential. 3. The same image appears in the original image with the same shape and size [2–5].
2.2 Types of Image Forgery With the rapid advancement of technology in the modern world, it is impossible to define the various types of forgery accurately. Image retouching, morphing, splicing, cloning, forgery, etc. are a few of the types that exist.
2.3 Detection of Image Forgery To maintain image integrity and confidentiality, image forgery detection is a must. Many algorithms exist that detect it. The general structure for detection is that first, the forged image is taken, then pre-processing is done, features are extracted, and a classifier is selected. After that, classification is done. As a result, the forged and original images are separated. There are two approaches, active and passive.
Image Forgery and Image Tampering Detection Techniques: A Review
161
In the active approach, some hidden data is added to the image to assure its authenticity. This data is compared with stored data and verified at the receiver’s end. The hidden data used here are watermarks and digital signatures. Digital watermarking—Some digest is attached to the image during creation, and it is manipulated when it differs from the original one. Digital signature—Unique image properties are captured and verified at the other end. If it is varied, then the image has been tampered with. In the passive approach, there is no need for the image; only the statistics and content of the image are utilized. Pixel-based, camera-based, forgery shadow, format-based, forgery shadow, and reflection detection techniques are some of the approaches. Pixel-based—It analyzes the Pixel constitutes and changes detected. Format-based—Change in image format. Camera-based—There can be many anomalies during the time of capture of the image because of the camera lens, imaging sensor, sensor noise, etc. Inconsistencies in these artifacts are used as evidence of manipulation of the image. Forgery Shadow Detection—When manipulating the image, the main object only tampers. By analyzing the shadow properties of that corresponding image, we can understand whether the image is manipulated. Forgery Reflection Detection—Inconsistencies in the image reflection are used to detect the forgery.
2.4 Medical Image Forgery Detection for Smart Health Care Image forgery detection in this literature uses several pixel-level algorithms [6]. Among those, LBP calculations are very fast, but they are prone to noise. Next is WLD [7, 8]. This is less noise-sensitive but a more computationally expensive version of LBP. Other texture descriptors are also used in the forgery detection process, such as Histogram of Gradient (HOG), Markov chain, and circular LBP. Next, a counterfeit detection system was proposed for use in smart health care. Noise pattern extraction method is used here where, regression filters are created using multiple resolutions and two classifiers. The workflow procedure is shown below. 1. Precisely break a color image into red, green, and blue components. This involves breaking a colored picture into three basic color structures, i.e., red, blue, and green. A program image is nothing but a numeric matrix stored in a memory, and each colored image is a combination of the three components mentioned above (not required for monochrome images).
162
S. H. Nair et al.
Fig. 1 Workflow diagram of the proposed system
2. A Wiener filter is a type of image-processing filter where it executes an optimal trade-off between inverse filtering and noise smoothing. It removes the additive noise and inverts the blurring simultaneously, and is applied to each element, depending on whether the image is monochrome or color. The result of this step is a noise-free image. This is being used here to obtain a clear image. 3. To estimate the noise pattern, simply denoise an observed image, and you’ll get a second image for those algorithms requiring two images, and they shall give you a rough score. It is referred to as the image’s digital fingerprint. 4. The noise pattern is subjected to a multi-resolution regression filter. The nearest eight pixels are weighted 1, the next eight pixels are weighted 2, etc., and the center pixel’s relative intensity is recorded. The weight is standardized between 0 and 255 to keep the grayscale image’s intensity level constant. 5. The SVM (support vector machines) classifier and the ELM (Extreme Learning Machines) classifier both receive filter output as input. 6. BSR is used to fuse the SVM score and ELM score, and the BSR score is used to determine if the image is fake. The workflow diagram of the proposed system is shown in Fig. 1.
2.5 A Comparison Between Image Forgery Detection Algorithms Nazli and Maghari [9] analyze four forgery detection algorithms, SVM, Expanding Block-based Algorithm, DCT, and Generic Algorithm, to determine the most accurate one. Image forging refers to the addition of a new item to an original image, as well as the concealment or deletion of any item in the original image. Because of their accessibility and low cost, anyone can use image tampering tools, which make it easy to create and edit digital images. Because human eyes cannot detect changes in original photos, powerful computer algorithms can be utilized to determine whether or not the image is forged. SVM:
Image Forgery and Image Tampering Detection Techniques: A Review
163
A training dataset is used to train the SVM algorithm, which is then used to identify whether the class for a tested image is fabricated or original. DCT: It uses the idea of the divide and conquer technique which is to divide the image into overlapping blocks that may be used to identify duplicate parts in the image. The DCT algorithm has a lot of advantages, such as speed and efficiency. Block-based algorithm: The image is divided into overlapped chunks of a certain size, with a feature vector generated as a hash value for each block to match the differences and determine whether or not the photos include a duplicated section. Generic algorithm (GA): By selecting the most suitable and fewest features from the image, the genetic algorithm (GA) creates high-quality optimization solutions that employ Euclidean distance to locate damaged regions. We may deduce that the Generic method is the most effective at recognizing forged images. This is due to the algorithm’s advantage, which may be defined as its great speed and precision. We may infer that GA can be utilized to detect forged photos, with 98 percent true positive values for forged images that are discovered.
2.6 CNN Pre-trained Models VGG16 and Resnet50 architectures are fine-tuned and applied to the CASIA V2.0 dataset to compare the efficiency of these two models in image forgery detection in [10]. Fine-tuned Resnet50 model provided the highest accuracy. VGG16 consists of 13 convolution layers and three fully connected layers, as shown in Fig. 2. Resnet50 consists of 48 convolution layers and two pooling layers as shown in Fig. 3.
Fig. 2 VGG16 architecture
164
S. H. Nair et al.
Fig. 3 Resnet50 architecture
2.7 CNN with a Combination of ELA and Sharpening Pre-processing Techniques Error Level Analysis is a JPEG lossy, irreversible compression algorithm combined with the CNN model in [10]. The images are divided into overlapping blocks. The inconsistency in the quality rate of each grid indicates an image modification. Additionally, sharpening, which enhances pixel contrast in dark or bright regions, can also be combined with the CNN model separately to identify image forgery, which is proposed by [10]. The CNN model combined with the sharpening filter and preprocessing technique, as shown in Fig. 4 is the one that gave the highest accuracy. All three models were trained using the CASIA V2.0 dataset. Fig. 4 Sharpening filter
Image Forgery and Image Tampering Detection Techniques: A Review
165
3 Image Tampering Localization 3.1 Encoder-Decoder Model Zhuang et al. [11] proposes an encoder-decoder model with dense connections and dilated convolutions for localizing tampered regions. Photoshop scripting is used to generate two tampered image datasets with post-processing operations like scaling, rotating, and denoising being performed, which are used to train the model. The dataset generated models the real-world tampered images as the tools used are the most repeatedly used ones in photoshop. The architecture suggested by [11] uses seven dense blocks, as shown in Fig. 5. The encoder part transforms the input image into distinct feature maps, and the decoder part processes the feature map to produce pixel-level predictions. The encoder contains five dense blocks, among which two dense blocks involve a dilated convolution layer, which is introduced to achieve a larger receptive field. The dense blocks, as shown in Fig. 6, consist of internal convolutional layers and one transition layer. These dense blocks are used to encourage feature reuse, which will result in the effective learning of tampering traces. The kernel size except for the last layer in the decoder part is 3 × 3, and the last layer kernel size is 5 × 5. 2 × 2 average pooling layers are added after each of the first three dense blocks.
Fig. 5 The encoder-decode architecture proposed by [11]
Fig. 6 A dense block with four internal convolutional layers and one transition layer
166
S. H. Nair et al.
4 Recoloring Forgery Detection Koshy and Warrier [12] propose a method to detect recolored image forgery using a 3-layer convolutional neural network which outputs the probability of recoloring as shown in Fig. 7.
4.1 Method The input is the original image and two inputs that are derived from the original image. One among the input images is based on inter-channel correlation, and the other is illumination consistency. Output is the authenticity of the image. Feature extraction is done using convolutional neural networks with three layers. These features are merged using a concatenation module. Original image, Difference image (DI) and Illuminant map (IM) are the inputs. When there is a change in the statistics and characteristics of an image, we can know that recoloring is done. Figure 8 shows the overall idea of the method.
Fig. 7 Flow diagram of recoloring forgery detection
Image Forgery and Image Tampering Detection Techniques: A Review
167
Fig. 8 The layers for each image
5 Copy-Move Image Forgery Detection 5.1 Using Structural Similarity Index [SSIM] Measure It is a process where a small portion of the image is cut and placed above the same image to alter the information. In [12], SSIM is used to detect the forgery. A threshold value is determined and applied over the image, and a contour plot is made. SSIM identifies the perceptual distinction between 2 images. If the value of SSIM is above 12, then they are perfectly matched. A Gaussian sliding window is used to calculate this value Figs. 9, 10, 11, 12, 13, and 14). Three convolutional layers are selected for feature extraction, and a matrix with [width, height, channels] are the inputs. Three convolutional layers are selected for feature extraction, and a matrix with [width, height, channels] are the inputs. The concatenation layer merges these features. Batch size four and patch size 128 × 128 are used. Forty epochs, filter size of 3 × 3, 64 channels, and 0.0001 as learning rate are also used. Figure 15 visualizes the result acquired by using the above method. A different image is found from the original image. The difference image is first expressed as points which lie in the range [0, 1]. Then it is converted to 8-bit integers.
168 Fig. 9 The 5 stages on which IC-MFDs rely
Fig. 10 Flow diagram of the proposed system
Fig. 11 SVGGNet
S. H. Nair et al.
Image Forgery and Image Tampering Detection Techniques: A Review
169
Fig. 12 MobileNetV2
Fig. 13 Schematic flow diagram of an implementation of SVGGNet
Fig. 14 Image forgery detection—block diagram
This value lies in the range [0, 255]. THRESH OTSU and THRESH BINARY INV are combined to get the different images using OR operator. The pixel and threshold d values are compared, and if the pixel value is less, then it is set to 0, or else to the maximum value. At last, contours are found, which mark a rectangle around the manipulated part. The Bounding Rect function is applied to find the bounding box. The binary threshold equation is (1)
170
S. H. Nair et al.
Fig. 15 Visualization of the accuracy and loss of the method
D(x, y) =
0, i f sr c(x, y) > Thresh maxval, otherwise
(1)
5.2 Using IC-MFD Model Ahmed et al. [13] propose an IC-MFD model using the spatial domain. MICC-F220 dataset is used, which contains 110 forgeries, 220 images, and 110 original images. Rotated and scaled images are present in the dataset. JPEG format images are used. The classifier used is Support Vector Machine (SVM) with the kernel as a radial base function (RBF). For splitting the dataset, ten-fold cross-validation is used. The training is performed on 90% of the data, and 10% of the data is used for testing. It is observed that when block size increases, the accuracy also increases. So, the block size of 64 × 64 is used. There are five stages on which IC-MFDs rely, which are shown in Fig. 9. 1. Image pre-processing: Converting the colored image to a grayscale image to decrease the complexity. There is an equation to convert it: Igray = 0.228R + 0.587G + 0.114B
(2)
where Igray refers to gray-level values, and (R, G, B) for red, green, and blue, respectively. 2. Block dividing stage: The grayscale is divided into different-sized nonoverlapping blocks. The mean and standard deviation features are extracted from the blocks.
Image Forgery and Image Tampering Detection Techniques: A Review
171
3. Feature Extraction Stage: This stage is crucial as finding the best features provides better results. In duplicated regions, the statistics of natural images will be different. Therefore, statistical features like mean and standard deviation can be used as a feature vector for every block. The mean and standard deviation of an image Igrey(M,N) can be formulated as μ= σ =
1 M l i=1 M
1 M (l − μ)2 i=0 M
(3)
respectively. The computed results are stored in the feature vector matrix (F). 4. Sorting Stage: All the rows in the matrix are sorted lexicographically. Then similar rows come closer, and they can be detected easily. The computational time is also less. 5. Classification Stage: With great success in classification, SVM is used as the classifier. It outputs whether the image is authentic or not.
5.3 Using Machine Learning and Deep Learning Techniques Detection of CMIF using both deep learning and machine learning techniques is discussed below. Although the machine learning algorithm—Birch Clustering— provides a good separation of images, the deep learning models are more efficient. The copy-move forgery is a common method of image forgery [14]. In this paper, a passive method to identify such images is discussed [15]. Detection of CMIF using lightweight deep models—SVGGNet and MobileNet-V2—is implemented in this paper. Both models are variants of deep learning models that are resource- and timeconsuming. These deep learning models are designed to overcome this disadvantage so that detection techniques will be more efficient. 1. CMIF detection using BIRCH clustering. Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is an unsupervised machine learning technique that can handle a large dataset by first considering a small part and then adding information to it. The clustering process involves grouping data points according to their similarity. Thus, a cluster represents a group of data that falls into similar categories. Here, the feature vectors extracted from the dataset will be represented as a tree. For the study, they use two datasets—the COMOFOD dataset which contains 260 sets of images; each set has an original image, and two masks, a forged image, and MICC-F220, which consists of 110 original images and 110 tampered ones. First, the images in the dataset are converted to grayscale images, and the mean and standard deviation of the moment are derived as the feature vectors [14]. The equations for mean and standard deviation (SD) are given below.
172
S. H. Nair et al.
Mean (x) = x /n SD(x) =
((x − x)2 / n − 1)
(4)
Then the correlation between the standard deviation and mean is taken. The input parameters include N and LS, where N is the number of data points and LS is the linear sum of the data points. The output will be similar to images represented by the cluster feature tree, where the forged and original images will have a parentchild relationship. Figure 10 illustrates the flow diagram of the whole process of the BIRCH algorithm. 2. CMIF detection using the SVGGNet model. Smaller Visual Geometry Group Net (SVGGNet) is a variant of VGGNet which is modified to overcome the issue of resources and time. As given in Fig. 11, this model has three convolution layers having kernel size 3 × 3 and 32,64,128 filters, respectively [15]. Max-pooling layers with kernel size 3 × 3 for the first one and the rest with size 2 × 2 are applied after the output of each convolutional layer. The image fed to the input layer is of size 96 × 96x 3 . ReLU is used as the activation function, and batch normalization is done to stabilize the model. To reduce overfitting, a dropout of 25% was done between each layer. The last layer is the fully connected layer and has a flattened 1-dimensional array, which is then fed to the softmax function. The softmax function gives the probability that decides whether the input is a forged or authentic image. 3. CMIF detection using the MobileNet-V2 model MobileNet-V2 is a variant of the MobileNet architecture which is lightweight and resource- and time-friendly. The input image fed is of size 224 × 224 × 3 [15]. As given in Fig. 12, it has 2 convolution layers which have 7 bottleneck layers between them. Average pooling is applied at the output of the second convolutional layer and a global average pooling is done thereafter instead of a fully connected layer. The output is then fed to the softmax function and the detection is done. The learning rate is 0.0001 and the batch size is 32. Adam optimizer was used, and the loss function is Binary cross-entropy. A schematic flow diagram of an implementation of SVGGNet is shown in Fig. 13 and MobileNet-V.
5.4 PCA and DCT Principal Component Analysis (PCA) and Discrete Cosine Transform (DCT) algorithms are suggested for copy-move image forgery detection by [16]. Images are divided into small blocks of size 32 × 32. For each block, the corresponding eigenvalues and eigenvectors are calculated and sorted lexicographically, resulting in the
Image Forgery and Image Tampering Detection Techniques: A Review
173
automatic detection of duplicate regions. Also, another proposed algorithm by [16] is DCT. After lexicographical sorting of the DCT coefficients arising for each image block, the adjacent identical pairs are assessed to be the tampered or duplicated region. The implementation of the system, as shown in Fig. 14, involves a series of steps. Image is taken as input to the model and is divided into overlapping blocks which are passed to the feature extraction algorithm (PCA and DCT), which in turn produces the result after lexicographical sorting. Lexicographical sorting is performed based on the distance to arrange blocks that are similar to each other together. This step involves the creation of a histogram depending on the number of similar segments, i.e., a copy-moved region will have a higher number of similar segments located at the same distance.
6 Results 6.1 Image Forgery Detection • Medical image forgery detection for smart health care Three databases were used to evaluate the system: two with natural photos and one with mammograms. It works using an image’s noise map, which is derived using a Wiener-filter-based noise reduction algorithm. The noise map is then subjected to a multi-resolution regression filter, with the output being sent to SVM-based and extreme-learning-based classifiers. Then a Bayesian sum algorithm is used to aggregate the scores of these two classifiers. The system had an accuracy of over 98 percent for natural images, and for medical images, it had an accuracy of 84.3 percent. When we united the scores of two classifiers, the system functioned best. The noise map is created at an edge computing resource, while the filtering and classification are carried out in a core cloud-computing resource [9]. • CNN pre-trained models Resnet50 provides the highest accuracy as compared to VGG16, which are the two models proposed by [10], as shown in Table 1. • CNN with a combination of ELA and sharpening pre-processing techniques Among the proposed models, the combination model of CNN and ELA and the combination model of CNN, ELA, and Sharpening by [10] the CNN_SHARPEN_ELA provided the highest accuracy as shown in Table 2.
174
S. H. Nair et al.
Table 1 Accuracies and losses of different transfer learning models Model
Train accuracy (%) Validation accuracy (%)
Train loss (%)
Validation loss (%)
VGG16
80.25
81.92
0.5128
0.4966
VGG16_ Bottleneck
83.18
84.26
0.323
0.3833
VGG16_ Finetuned
99.16
94.77
0.018
0.3
Resnet50_ Finetuned
98.65
95.72
0.048
0.18
Table 2 Accuracies and losses of different CNN models Model
Train accuracy (%)
Validation accuracy (%)
Train loss (%)
Validation loss (%)
CNN
77.13
75.68
0.47
0.5
CNN-ELA
94
90.02
0.21
0.3
CNN_ sharpen-ELA
97
94.52
0.10
0.3
6.2 Image Tampering Localization The proposed encoder-decoder architecture with dense connections and dilated convolutions for tampering localization put forward by [11] is compared with various handcrafted features and deep learning-based approaches like ADQI, DCT, NADQ, Chen’s method, Bayar’s method, MFCN, LSTM, and Mantra-net. F1 score, IOU, MCC, and AUC evaluation metrics are used to evaluate the proposed model. The proposed encoder-decoder model outperforms all the other models, as shown in Table 3. Their performance is evaluated on three different datasets generated using photoshop scripting.
6.3 Recoloring Forgery Detection The method that is used to detect recoloring forgery shows the following results as in Figs. 15 and 16.
NIST-2016
NIST-2016
PS-arbitrary
PS-boundary
MCC
0.59
0.15
0.07
F1
AUC
0.59
AUC
0.09
0.09
IOU
0.07
0.15
F1
IOU
0.72
MCC
0.07
MCC
AUC
0.11
0.13
F1
IOU
0.72
AUC
0.20
0.25
IOU
MCC
0.29
F1
ADQI
0.53
0.05
0.04
0.08
0.53
0.05
0.04
0.08
0.64
0.05
0.07
0.09
0.58
0.11
0.08
0.18
DCT
Table 3 Performance of different models
0.57
0.08
0.05
0.13
0.57
0.08
0.05
0.13
0.74
0.04
0.08
0.07
0.63
0.08
0.07
0.12
NADQ
0.65
0.13
0.15
0.19
0.65
0.09
0.10
0.14
0.89
0.15
0.27
0.22
0.91
0.46
0.35
0.46
Chen 32 × 32
0.67
0.15
0.17
0.21
0.66
0.10
0.10
0.14
0.86
0.22
0.34
0.30
0.9
0.51
0.42
0.52
Chen 64 × 64
0.67
0.12
0.14
0.18
0.65
0.08
0.10
0.12
0.90
0.16
0.28
0.23
0.91
0.48
0.35
0.47
Bayer 32 × 32
0.70
0.13
0.16
0.20
0.65
0.09
0.09
0.13
0.90
0.22
0.35
0.31
0.92
0.55
0.44
0.55
Bayer 64 × 64
0.67
0.10
0.12
0.15
0.64
0.04
0.06
0.06
0.87
0.15
0.24
0.22
0.92
0.47
0.35
0.46
Forensic similarity
0.6
0.1
0.1
0.1
0.6
0.1
0.1
0.1
0.9
0.1
0.2
0.2
0.8
0.2
0.1
0.2
MFCN
0.71
0.11
0.14
0.17
0.67
0.11
0.14
0.18
0.66
0.04
0.07
0.07
0.73
0.15
0.12
0.19
LSTM-EnDec
0.76
0.12
0.18
0.18
0.76
0.12
0.18
0.18
0.86
0.09
0.16
0.14
0.82
0.22
0.15
0.23
Mantra-net
0.75
0.23
0.29
0.31
0.63
0.15
0.15
0.20
0.91
0.50
0.58
0.57
0.90
0.61
0.50
0.61
Proposed method
Image Forgery and Image Tampering Detection Techniques: A Review 175
176
S. H. Nair et al.
Fig. 16 Image and its moments
6.4 Copy-Move Image Forgery Detection 1. CMIF detection using the BIRCH The algorithm provided the accuracy based on the percentage of similarity as given in Eq. (5). x
i=1 (G ri
− G r ) × (G f i − G f )
%similarity = x
_
_
2 i=1 (G ri − G r ) × (G f i − G f ) 2
_
× 100
(5)
_
Gri Mean of the feature vector of the image Ir . Gfi Mean of the feature vector of the image If . The method resulted in the classification of the forged images as a cluster, and thus they can be detected from the original ones. Thus, this technique helps in the detection of copy-move forged images and can be used for forensic purposes. In future research, the efficiency of searching in the tree can be discussed. 2. CMIF detection using SVGGNet The datasets used were CoMoFoD and MICC-F2000. The performance of the SVGGNet model was measured using the evaluation metrics. The results are given below in Table 4. Table 4 Performance summary of SVGGNet
Evaluation metric
SVGGNet (%)
Accuracy
87
TPR
87
FPR
13
Image Forgery and Image Tampering Detection Techniques: A Review
177
The model was also trained with images modified after the forgery. The comparative results are given in Table 5. 3. CMIF detection using MobileNet-V2 Here also, the dataset used were CoMoFoD and MICC-F2000. The performance summary is given in Table 6. Table 7 gives the total summary of the model’s performance. Table 5 Summary of performance of SVGGNet on different forged image sets Dataset and image details
Input images
SVGGNet TPR%
FPR%
93
28.6
40
4
67
16.3
CoMoFoD *Original
15
Forged+brightness change, blurred, noisy
15
MICC-F2000 *Original
15
Forged+geometric (scaling and rotation)
15
Overall Original
30
Forged
30
Table 6 Performance summary of MobileNetV2
Evaluation metric
MobileNetV2 (%)
Accuracy
85
TPR
85
FPR
19
Table 7 Summary of performance of MobileNetV2 on different forged image sets Dataset and image details
Input images
MobileNetV2 TPR%
FPR%
91
26.7
77
2
84
14.35
CoMoFoD *Original
15
Forged+brightness change, blurred, noisy
15
MICC-F2000 *Original
15
Forged+geometric (scaling and rotation)
15
Overall Original
30
Forged
30
178
S. H. Nair et al.
Fig. 17 Accuracy versus Existing model graph
From the performance summary, it is evident that mobile-V2 performs efficiently in post-processed images. It is observed that this is due to the depth of the model which captures the features without losing them. 4. When the performance of the models were compared, as in Fig. 17, the IC-MFD proposed model to the existing models, it had the highest accuracy of 98.44%. It shows that this model outperforms the state-of-the-art models.
7 Conclusion In this paper, we have discussed various image forgery and image tampering detection techniques. There are several image forgery and tampering detection techniques that help to enhance the quality and originality of the images. These techniques are based on various branches of methodologies like deep learning models, clustering, PCA, DCT, etc. The performance of CNN models like VGG16, Resnet50, SVGGNet, and MobileNet V2 was efficient for the images that were forged in different ways. An encoder-decoder model with dense connections was found to be better at localizing the tampered regions of an image.
References 1. Kumar S, Cristin R (2018) A systematic study of image forgery detection. J Comput Theoret Nanosci 2. Myvizhi D, Miraclin Joyce Pamila JC (2022) Extensive analysis of deep learning-based deep fake video detection. J Ubiquit Comput Commun Technol 4(1):1–8 3. Sungheetha A, Rajesh Sharma R (202) Classification of remote sensing image scenes using double feature extraction hybrid deep learning approach. J Inf Technol 3(02):133–149 4. Gardella M, Musé P, Morel J-M, Colom M (2021) Forgery detection in digital images by multi-scale noise estimation. J Imaging 5. Li S-P, Han Z, Chen Y-Z, Fu B, Lu C, Yao X (2010) Resampling forgery detection in JPEGcompressed images. In: 2010 3rd International congress on image and signal processing
Image Forgery and Image Tampering Detection Techniques: A Review
179
6. Ghoneim A, Muhammad G, Amin SU, Gupta B (2018) Medical image forgery detection for smart healthcare. IEEE Commun Mag 56(4):33–37. https://doi.org/10.1109/MCOM.2018.170 0817 7. Elaskily MA, Aslan HK, Elshakankiry OA, Faragallah OS, Abd El-Samie FE (2017) Comparative study of copy-move forgery detection techniques. In: 2017 International conference on advanced control circuits systems (ACCS) systems and 2017 International conference on new paradigms in electronics & information technology (PEIT) 8. Hrudya P, Nair LS, Adithya SM, Unni R, Vishnu Priya H, Poornachandran P (2013) Digital image forgery detection on artificially blurred images. In: International conference on emerging trends in communication, control, signal processing & computing applications (C2SPCA) 9. Nazli MN, Maghari AYA (2017) Comparison between image forgery detection algorithms. In: 2017 8th International conference on information technology (ICIT), 2017, pp 442–445. https://doi.org/10.1109/ICITECH.2017.8080040 10. Singh A, Singh J (2021) Image forgery detection using deep neural network. In: 2021 8th International conference on signal processing and integrated networks (SPIN), pp 504–509. https://doi.org/10.1109/SPIN52536.2021.9565953 11. Zhuang P, Li H, Tan S, Li B, Huang J (2021) Image tampering localization using a dense fully convolutional network. IEEE Trans Inf Forensics Secur 16:2986–2999. https://doi.org/ 10.1109/TIFS.2021.3070444 12. Koshy JMTL, Warrier GS (2020) Detection of recoloring and copy-move forgery in digital images. In: 2020 Fifth international conference on research in computational intelligence and communication networks (ICRCICN), pp 49–53. https://doi.org/10.1109/ICRCICN50933. 2020.9296173 13. Ahmed IT, Hammad BT, Jamil N (2021) Image copy-move forgery detection algorithms based on spatial feature domain. In: 2021 IEEE 17th International colloquium on signal processing & its applications (CSPA), pp 92–96. https://doi.org/10.1109/CSPA52141.2021.9377272 14. Nirmala G, Thyagharajan KK (2019) A modern approach for image forgery detection using BIRCH clustering based on normalized mean and standard deviation. Int Conf Commun Signal Process (ICCSP) 2019:0441–0444. https://doi.org/10.1109/ICCSP.2019.8697951 15. Abbas MN, Ansari MS, Asghar MN, Kanwal N, O’Neill T, Lee B (2021) Lightweight deep learning model for detection of copy-move image forgery with post-processed attacks. In: 2021 IEEE 19th World symposium on applied machine intelligence and informatics (SAMI), 2021, pp 000125–000130. https://doi.org/10.1109/SAMI50585.2021.9378690 16. Sharma S, Verma S, Srivastava S (2020) Detection of image forgery. ijert.org
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated Single-Stage Inverter-Based Photovoltaic System Using Fuzzy Logic Control M. Sahana and N. Sowmyashree
Abstract Low-voltage ride-through capabilities are one among many of the unexplored challenges in integrating photovoltaic (PV) systems into the power grid. The control strategy for the PV system which uses a three-phase single-stage inverter connected to the grid, which consists of the capability of low-voltage ride-through, is proposed in this paper. An intelligent fuzzy logic control method is implemented to adjust the inverter’s performance during the time of operating LVRT. The Fuzzy Logic Control tunes the commands of reference active and reactive power of the inverter to the grid based on the power availability and voltage sag from the PV system. To perform this, the inverter operates in two modes: (i) MPPT—at the period of grid’s normal operation, and (ii) When fault occurs, the PV system takes support of LVRT control. While operating in LVRT mode, active power available at the inverter output injects the required reactive power, hence it provides required voltage support to the grid and improves the inverter’s utilization factor by the grid. The proposed system is simulated in a MATLAB Simulink environment and the results are presented. Keywords Low-voltage ride-through · Photovoltaic system · Fuzzy logic control · Grid-connected inverter
1 Introduction The electrical power system serves for the generation, transmission, and distribution of electrical energy to load centres in an efficient, economical, and reliable manner. In the operation of electric power systems and the expansion of existing networks due M. Sahana (B) Department of Electrical and Electronics Engineering, JSS Science and Technology University, Mysuru, Karnataka, India e-mail: [email protected] N. Sowmyashree Department of Electrical and Electronics Engineering, SJCE, JSS Science and Technology University, Mysuru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_14
181
182
M. Sahana and N. Sowmyashree
to increasing demand, several technical complications exist in both the transmission and distribution levels in terms of maintaining the quality of power supply. The prime importance has to be provided for supplying good quality power to consumers. In recent years, photovoltaic power generation has emerged as one of the most trusted renewable energies on the globe and is expected to improve unbelievably in the upcoming years. Due to the large penetration of PV systems into power grids, it will not be possible to shut down these power plants in the event of failure. This is because it can cause problems in terms of reliability, stability, and power system performance [2]. Whenever a low-voltage fault occurs in a PV system, the voltage source inverter tends to disconnect from the PV system which in turn disconnects the whole system and causes instability during the fault period. The grid will completely collapse if these faults are not found early enough, which will be caused by system vulnerabilities. Therefore, it is essential to create electricity systems that are secure, dependable, and safe [16]. The two major issues that need to be addressed with respect to the PV systems to achieve LVRT during faults are over-voltage on DC output side and over-current that could occur on the AC side. Another one is the reactive current injection, which is believed to be a working solution to restore voltage and provide support to the circuit to beat voltage sag causes [1, 5, 6]. Researchers have discussed the FRT capability of grid-connected PV systems in recent studies. The control of the PV system by injecting Reactive power can be done by various methods, e.g., droop concept, direct power control, and singlephase PQ theory. Though these methods are less complex, it requires additional units like an orthogonal signal generator (OSG). Dynamic Voltage Restorers (DVR) require other ancillary like voltage source converters, injection transformers, energy storage devices, etc. This leads to a less economic and more expensive solution when considering PV systems [13]. The main purpose of choosing Fuzzy Logic is it increases the accuracy of the LVRT control that results in improving the performance when operating in a fault condition. By using the proposed controller, the maximum voltage drop under a fault situation is decreased. Less voltage drop after the fault occurrence using the proposed FC results in improving the LVRT capability and makes the PV system able to meet grid code requirements [14]. To propose a control scheme which makes the PV system to be able to overcome faults in a grid, for maintaining an inverter so that it doesn’t get disconnected, and to generate electricity continuously is the main objective of this proposed project [6–9]. To overcome this problem, and to keep the inverter connected to the system even during the fault period, an intelligent method of fault ride-through is proposed in this thesis which uses fuzzy logic. A Fuzzy Logic Control is used to calculate how much reactive power needs to be injected into the grid. The reactive power injection is inversely proportional to the level of voltage that is dropping at the inverter side. Voltage and power available at PCC in the DQ reference frame is used to calculate the power factor by FLC at which power is fed into the grid. The difference in this method compared to other LVRT methods is that FLC is used in this method to accomplish LVRT operation [4–6, 9]. This method could be used in smart grids and wireless smart grids, which is one of the technologies that is advancing the quickest
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ...
183
and has found widespread application in the current power supply sector and power governance systems due to its excellent performance [17]. The presented work in this paper is arranged as follows: Theoretical background is presented in Sect. 2 which consists of the main blocks of the proposed system. In Sect. 3, the theoretical background of FLC is presented. Section 4 presents the results obtained from the MATLAB Simulink platform. In Sect. 5, the conclusion of the developed system is presented.
2 Modelling of Single-Stage PV System A single-stage PV system’s block diagram is shown in Fig. 1. It consists of a power circuit with an LCL filter, which is grid-connected, voltage and current transformation blocks, current controller block, PLL block, inverse transformation blocks to get the reference voltage for PWM generation, and PWM generation block whose signals are given to the gates of IGBT. It also consists of PI and fuzzy logic control block which serves as feedback to the PV system, which detects the error, and the same error is fed to the controller to compensate for the loss.
Fig. 1 Block diagram of single-stage inverter-based PV system
184
M. Sahana and N. Sowmyashree
2.1 Structure of Equivalent Circuit of PV Module The structure of the PV module’s equivalent network is shown in Fig. 2. It constitutes a number of PV cells. The relationship between current and output voltage of the PV module is shown by the equations below [1].
q VD IL = IPh − Isat (I ex p m Ns K T
−
I L R S + VL RP
G IPh = Isc + αi T − Tr e f Gr e f R p + Rs I SC = I SC,r e f Rp Isc,r e f + αi T − Tr e f Isat = eq(V oc,r e f +αi (T −Tr e f )/Ns m K T ) − 1
(1) (2) (3)
(4)
where IL and VL Iph Rs and Rp Vd Isat T
are the Load current and Load voltage, respectively is the current intensity generated by the light are the series and parallel resistances, respectively is the voltage across the diode is the saturation current of the PV panel is the temperature of the atmosphere.
According to standard test conditions (STC), the irradiance (Gref) and temperature (Tref) are 1000 W/m2 and 25 °C, respectively [1]. The specifications for the PV module are mentioned in Table 1. Fig. 2 PV module equivalent circuit
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ... Table 1 Specification of PV module
185
Parameters
Ratings
No. of series connected output
96
Open-current voltage (V)
50
Short-circuit current (A)
5.96
Series resistance ()
0.18
Parallel resistance ()
360
No. of modules in series
3
No. of modules in parallel
10
2.2 LCL Filters To intensify the ability of the LVRT and to eliminate the harmonics in the system, an LCL filter is used in this proposed model. The LCL filter that removes the harmonics is generally utilized to provide a link between the grid and the inverter. Benefits given by LCL filters such as cost-effectiveness, improved operation, high attenuation, less weight, and small size have paved a way for choosing it [10]. Figure 3 shows the arrangement of the LCL filter which is in between the Inverter and the Grid. The LCL filter is designed by taking into account the following features—inverter output voltage, switching frequency, grid frequency, active power, and resonance frequency and is computed using mathematical calculations [11]. To design the specific filter, transfer functions of currents and voltages are required. To obtain the transfer function of voltage and currents, it is necessary for the LCL filter circuit to be converted into the ‘s’ domain. After conversion, the circuit is formed as shown in Fig. 4 [11].
Fig. 3 LCL filter
Fig. 4 LCL filter circuit in s domain
186
M. Sahana and N. Sowmyashree
Vx Vi − Vx = Ig + s L1 1/sC
(5)
V x = I gs L2
(6)
Ig 1 = 3 Vi s L 1 L 2 C + s(L 1 +L 2 )
(7)
L2 L1 + L2 = L and Lp = LL11+L in Eq. (7). 2 The final transfer function of the desired filter is given by Eq. (8):
1 Ig = Vi s L(1 + s 2 C L p)
(8)
where Vi Vx Ig sL1, sL2 sC
is the input voltage to the filter from the inverter is the voltage at node ‘x’ is the current fed to the grid from the filter are the inductances in the ‘s’ domain is the capacitance in the ‘s’ domain.
By solving Eqs. (5) and (6), the transfer function of the desired filter is obtained as follows. Based on switching frequency, grid frequency, active power, reactive power, and resonance frequency values for filter components can be obtained by the following equations [11]: 1 Resonance frequency = W r es = √ CL
(9)
Reactive power(Q) ≈ 5% of rated power(S). 1 Therefore, Q = (1/(2∗π∗C∗L p) C= L=
V2
0.05 ∗ S ∗2∗π ∗ f 1
W sw VI g(sw) (1 i(sw)
−
W sw 2 ) W r es 2
(10) (11)
Inductor and capacitor values are obtained by substituting the system specification values in Eqs. (10) and (11).
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ...
187
2.3 Methodology When a fault occurs, the inverter switches from regular operating condition to instant Low-voltage Ride-through condition. Due to the incident, the appearance of PV side capacity and grid supply power are unbalanced. To prevent the electronic sources from damage, a control mechanism is adopted on the basis of Malaysia’s requirements for LVRT [1] assuring that the PV system remains connected and overcomes voltage drop. Not only that, the proposed LVRT control method is different from other methods. The major difference is that it can help in restoring the voltage and the network on the verge of failure by injecting real and reactive power according to the intensity of dropped voltage. The proposed system contains an active and reactive current controller, AC current limiter for inverter protection, and reactive power injection according to respective requirements. Immediately after the voltage sag detection, the required reactive current is fed to the inverter as per the requirements to contribute to grid support and voltage recovery.
3 Fuzzy Logic Control A Fuzzy Logic Control System works on the mathematical unit. This unit studies the analog input values in terms of logical variables taking numbers between 0 and 1 and operating on discrete values. The controllers are simple in structure containing an input method, a processing phase, and output phase [15]. For a required amount of injection of reactive power, support to voltage is needed. The voltage support necessities, shown in Fig. 4, indicate the quantity of the reactive power given to grid for backup of the recovery of voltage based on the intensity of the voltage sag [9]. Voltage support requirements for the grid faults are described in Fig. 5 [14]. From Fig. 4, it can be observed that reactive power injection is not needed till the grid voltage remains to stay in the dead-band region (0.9V_N ≤ V_g ≤ 1.1V_N). When there is a sag in the voltage of about 90% of the rated value, the reactive power should be obtained [9] using the below equation
Iq = 2 1 − Vg I N for 0.5 p.u ≤ Vg ≤ 0.9 p.u Iq = I N for Vg < 0.5 p.u
(12)
where Iq reactive current. IN rated current. Vg grid voltage. The reactive power injection requirement differs from one country to another. Basically, a fuzzy control (FLC) is arranged to find the error for which the reactive
188
M. Sahana and N. Sowmyashree
Fig. 5 Voltage support requirements
power and the real power are given to the grid. The two inputs of the FLC are the magnitude of the voltage and power from the PV system. The aim of the FLC is to improvise the reactive power injection by decreasing the highest current limit of the inverter. Solar radiation decides the quantity of the real power by the inverter. The utilization factor of the inverter becomes less as the utility of the inverter may not be full. The FLC in the proposed control theory estimates the amount of the voltage drop and the active power from the photovoltaic system which is used to choose the range of injection of reactive power. FLC inputs and output are shown in Fig. 6. Taking an example, the proposed theory has 5 membership functions for every input, the number of rules are 25 and defuzzification method. The structure of the input and output membership function is based on the voltage-support requirement and the range of the power available from the PV. Based on the fuzzy logic rules, as shown in Table 2, Fuzzy Logic Control is designed for the system. The basic Fuzzy Logic Controller which is required for this proposed LVRT has two inputs and one output. The values of grid voltage are fed to input 1 and values of power at different levels is fed to input 2. At the output, it is the output power from the inverter as in Fig. 7.
Fig. 6 FLC inputs and output
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ...
189
Table 2 Fuzzy Rules NB
NS
ZE
PS
PB
NB
NB
NB
NB
NS
ZE
NS
NB
NB
NS
ZE
PS
ZE
NB
NS
ZE
PS
PB
PS
NS
ZE
PB
PB
PB
PB
ZE
PS
PB
PB
PB
Fig. 7 a Input 1—voltage. b Input 2—Change in power. c Output-Output Power
4 Results and Discussions The complete PV system blocks shown in Fig. 1 are used to examine the LVRT control scheme developed through MATLAB Simulink. The results obtained are shown in the figures below. The fault is generated between 0.1 and 0.15 s. At this time period, the inverter tends to disconnect from the system.
190
M. Sahana and N. Sowmyashree
Fig. 8 Grid voltage and current
Variations in grid voltage and grid currents during ride-through of low voltage are shown in Fig. 8. Grid voltage and grid current are represented along the ‘y’ axis in Volts and Amperes, respectively. Along the ‘x’-axis, time ‘t’ in seconds is represented. The three-phase voltage that needs to be fed to the grid is 415 V. It can be observed that, before the fault occurs, the value of voltage is around 400 V. Once the fault occurs, between 0.1 s and 0.15, the voltage is dropped by 50% of the actual grid voltage due to presence of LLLG fault. During the fault period, the active power that needs to be delivered to the grid gets affected. Since fuzzy logic is implemented, there is only 50% of the voltage drop (50% of rated voltage) shown as there is reactive current injection, which helps to keep the inverter connected. The output voltage of the single-stage inverter is shown in Fig. 9. Since it is followed by the booster circuit, the voltage level initially and post fault is slightly greater than the nominal voltage. In Fig. 9, it is observed that there is sag also in the inverter voltage during the fault period (0.1–0.15 s). Once the fault is ridden through, the operation continues to be normal, hence providing a normal voltage level. The entire ability of the inverter is used to feed reactive power during voltage sag to the grid. Between t = 0.1 and t = 0.15 reactive power of approximately 50% is supplied to the inverter. In the simulation case, it is able to be visible that the entire ability of the inverter has been used to offer reactive energy to the grid all through the voltage drop. Before the occurrence of fault, the power at normal operating conditions is 90 kW and reactive power is 0 kVAR as shown in Fig. 10. When the fault occurs, active power is reduced and tends to fall to zero value with respect to sag in voltage at the instance where the inverter disconnection could be possible. To avoid this, reactive power is injected at t = 0.1 The reactive power that is injected is approximated to 20 kVAR that is fed at 50% voltage sag.
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ...
191
Fig. 9 Inverter voltage during LVRT
Fig. 10 Active power output and Reactive power output
As a voltage source inverter is being used, a constant voltage is maintained throughout the performance, and the current varies according to the input parameters such as input irradiance. Figure 11 represents the active current which is reduced to 0 at the beginning of the fault. Figure 11 represents the reactive current whose value is 0 initially and increases at the fault period due to the injection of reactive power as shown.
192
M. Sahana and N. Sowmyashree
Fig. 11 Active current and reactive current
The Simulink model of the proposed three-phase grid-integrated single-stage inverter-based photovoltaic system is shown in Fig. 12. It follows the block diagram shown in Fig. 1 which constitutes a power circuit with LCL filter, and it is connected to the grid, voltage and current transformation blocks, current controller block, phase locked loop block, inverse transformation block to get the reference voltage for PWM generation, PWM generation block whose signals are given to the gates of inverter’s IGBT, MPPT control block, and the FLC control block.
Fig. 12 Model of proposed three-phase grid-integrated single-stage inverter-based PV system using fuzzy logic control
Low-Voltage Ride-Through for a Three-Phase Grid-Integrated ...
193
5 Conclusion In this paper, the capability of a single-stage inverter-based LVRT control strategy for PV systems operating under fault conditions is addressed. This proposed controller includes modifications that enable the PV system to overcome grid faults which include the adoption of a DC cut-off circuit, a current limiter, and an injected reactive power control. Despite the fact that performance depends on control strategy and detection method, the fuzzy logic controller proposed in this study shows excellent performance in regards to sag detection, maintaining uninterrupted connection, system protection from both low voltage and high current, and amount of reactive power is introduced to assist in voltage recovery until compensation is done.
References 1. Al-Shetwi AQ et al (2017) Low voltage control capability for single-stage inverter-baseed grid-connected photovoltaic power plant. Elsevier Ltd. 2. Hasanien HM (2016) An adaptive control strategy for low voltage ride through capability enhancement of grid-connected photovoltaic power plants. IEEE Trans Power Syst 31:3230– 3237 3. Livinti P (2021) Comparative study of a photovoltaic system connected to a three-phase grid by using PI or fuzzy logic controllers. Sustainability 13:2562 4. Hannan MA et al (2019) A fuzzy-rule-based PV inverter controller to enhance the quality of solar power supply: experimental test and validation. Electronics 8:1335 5. Mahalakshmi R et al (2014) Design of fuzzy logic based maximum power point tracking controller for solar array for cloudy weather conditions. In: 2014 power and energy systems: towards sustainable energy (PESTSE 2014) 6. Kamal Hossain M et al (2017) Fuzzy logic controlled power balancing for low voltage ridethrough capability enhancement of large-scale grid-connected PV plants. IEEE 7. Argyrou MC et al, Modeling of a photovoltaic system with different MPPT techniques using MATLAB/Simulink 8. Bhuyan A et al (2016) Maximum power point tracking for three phase grid connected photovoltaic system using fuzzy logic control. Int J Mod Trends Eng Res (IJMTER) 03(04) 9. Radwan E et al (2019) Fuzzy logic control for low-voltage ride-through single-phase gridconnected PV inverter. Energies 12:4796. https://doi.org/10.3390/en12244796 10. Yagnik UP et al (2017) Comparison of L, LC & LCL filter for grid connected converter. IEEE 11. Dursun M et al (2018) LCL filter design for grid connected three-phase inverter. IEEE 12. Joshi J et al (2021) A comprehensive review of control strategies to overcome challenges during LVRT in PV systems. Article IEEE Access 13. Yang Y, Blaabjerg F (2013) Low-voltage ride-through capability of a single-stage single-phase photovoltaic system connected to the low-voltage grid. Int J Photoenergy 14. Rezaie H et al (2019) Enhancing voltage stability and LVRT capability of a wind-integrated power system using a fuzzy-based SVC. Eng Sci Technol, In-Ternational J 22(3):827–839 15. Soufi Y, Bechouat M, Kahla S, Bouallegue K (2014) Maximum power point tracking using fuzzy logic control for photovoltaic system. In: 3rd international conference on renewable energy research and application (ICRERA) 16. Vivekanadam B (2021) IoT based smart grid attack resistance in AC and DC state estimation. J Electr Eng Automat 2(3):118–122 17. Smys S, Wang H (2020) Optimal wireless smart grid networks using duo attack strategy. J Electr Eng Automat 2(2):60–67
Automated Detection of Malaria Parasite from Giemsa-Stained Thin Blood Smear Images V. Vanitha and S. Srivatsan
Abstract Malaria is a threat to the universe, especially in Asian and African countries. In malaria-affected nations, malaria is diagnosed through a visual examination of blood under a microscope. The reliability of the result depends upon the skills of the lab technician. Any Individual or community will be affected if they are misdiagnosed. To overcome this issue, an automated malaria diagnosis system using nearly 27,000 thin blood smear images has been proposed. It involves two steps: 1. Preprocessing the digitized blood smear images to remove the noise and segmenting them to extract the Region of Interest. 2. A lightweight customized deep learning model to classify digital smears either as infected or normal. We have also implemented an automated malaria detection system using various pre-trained models. The result revealed that our model performed with an accuracy of over 99% and a sensitivity of 100%. Keywords Convolutional neural network · Deep learning · Malaria
1 Introduction Malaria is a communicable disease spread through the bites of female mosquito species named Plasmodium falciparum. Though the global mortality rate has come down steadily over the last ten years, it is still considered to be a dangerous disease. As per World Health Organization (WHO) report released in 2020, over 240 million cases have been detected globally, and approximately 6,27,000 malaria deaths have been recorded worldwide [1]. India contributes to around 2% of the global mortality rate and 2% of the infectious rate [2]. Despite the fact that several new techniques have emerged for the detection of malaria infection, the examination of stained V. Vanitha (B) · S. Srivatsan Sri Ramachandra Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Porur, Chennai, Tamil Nadu 600116, India e-mail: [email protected] S. Srivatsan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_15
195
196
V. Vanitha and S. Srivatsan
smear images under a microscope by skilled experts is seen as the gold standard even in this century [3]. The microscopic examination is simple, cost-effective and hence affordable in countries where malaria endemic is high. Though it is a widely available technique, the reliability of the test depends on the expertise of the lab technician. Due to a heavy workload in the malarial season and resource-limited working environment, there is no way to monitor the effectiveness and reliability of diagnosis. This leads to misdiagnosis which may lead to serious consequences and wrong treatments. Three important tasks [4] in the diagnosis of malaria are (i) Determination of the presence or absence of parasite in blood smear, (ii) Identification of the type of parasite and (iii) Identification of the stage. Samples of different malaria species from thin blood smears of humans and their stages of growth are shown in Fig. 1. The accurate detection of malaria parasites is a crucial task for the treatment of patients and management of the disease. For instance, if a person is wrongly diagnosed with malaria due to the aforementioned constraints, it will lead to unnecessary usage of drugs, side effects and
Fig. 1 Malaria species and their stages (Source Poostchi et al.)
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
197
mental trauma. On the other hand, if the diagnosis of malaria is missed, it may lead to severe progression of the disease. The delay in diagnosis and treatment may lead to loss of life. Moreover, it is essential to quantify parasites to measure the severity of the disease. It requires nearly 6000 non-overlapping microscopic Fields of View (FOV) to be examined, which is prone to errors even when carried out by an expert. This emphasizes the need to automate the examination of blood smears to reduce the misdiagnosis rate and ensure proper treatment of the individual. In recent developments toward automated malaria detection, deep learning (DL) techniques have been implemented. Automated detection systems based on deep learning models require large datasets for better performance of the model and improved accuracy. However, publicly available annotated datasets for malaria are limited and smaller in size. In resource-limited countries where malaria is prevalent, building a dataset is of least importance and priority. To overcome this, image augmentation techniques can be utilized to overcome the paucity of the dataset. Although several studies in the literature reveal that machine learning approaches like SVM have been widely used along with computer vision techniques, few researchers have utilized image processing and deep learning techniques. With the goal to improve the treatment and management of malaria, automatic detection of malaria parasites from digitized thin Giemsa-stained blood smears has been proposed. It serves two purposes; firstly, it will provide an effective and accurate diagnosis with restricted resources; secondly, it is a simple and cost-effective method. To envision the goal of automated malaria detection with good accuracy, the proposed deep learning model is implemented using a benchmark dataset. The performance of the proposed model is evaluated against pre-trained models and deep learning models available in the literature. The literature survey of related works, a preview of various prominent pre-trained models used for this study, architecture of the proposed model and its results are discussed.
2 Related Work Research studies that have been carried out on blood smear images using various techniques have been discussed in this section. A concentrated, higher volume of blood sample obtained from a thick blood smear is analyzed to identify the malarial parasite. It is useful at the initial stage of the infection as there will be fewer parasites. Though it has a high sensitivity, it is not sufficient to differentiate between the malarial species. Thin blood smears are used to identify and quantify them. A few sample studies from each category are given in Table 1. Several studies using traditional machine learning approaches and a few studies using deep learning approaches have been observed and are presented below. Diaz et al. [12] implemented Support Vector Machine (SVM) on a dataset with 450 images to detect the presence of malaria and its stage with a good sensitivity of 94%. Shen et al. [13] proposed an architecture to unearth features of infected/normal cells
198
V. Vanitha and S. Srivatsan
Table 1 Studies using thin and thick blood smears References
Blood sample
Purpose
[5]
Thick
Differentiation of malarial/normal images
[6]
Thick
Detection of malaria at early stage
[7]
Thick
Smartphone-based malaria detection
[8]
Thin
Faster model for malaria classification
[9]
Thin
Distinguishing between malaria and leukemia
[10]
Thin
Detection and quantification for human and mouse
[11]
Thin
Detection, quantification and severity
using stacked autoencoders. In [14], the authors proposed a Convolutional Neural Network (CNN) with 16 layers to detect parasites from thin blood smears. On 27,578 images, the complex model has attained an accuracy of about 97%. A model by Bibin et al. [15] uses Deep Belief Network (DBN) on 4100 images and obtained a sensitivity of 97.60%, and specificity of 95.92%. In [16], a CNN architecture implemented on Leishman-stained images has shown 97.0% sensitivity and 98.0% specificity. In another study [17], an automatic detection system produced an overall precision of 89.7%. In [18], the authors assessed the performance of various pre-trained models built on the ImageNet dataset using their own malarial dataset. They noted that the ResNet50 model produced better results than other pre-trained models with an accuracy of 98.6%. The summary of automated malaria detection is shown in Table 2. Table 2 Summary of related work for malaria detection References
No. of images
Method
[19]
Leishman-stained peripheral SVM blood smear images
Sensitivity: 96.6% Specificity: 68.9%
[20]
9 blood films
Sensitivity 74% Specificity 98%
[21]
30 images
Fuzzy C-Means
Accuracy 85.31%
[14]
27,578 Erythrocyte images
CNN
Sensitivity 96.99%, Specificity 97.75%
[17]
Malaria blood film library
CNN
Sensitivity 91.6%, Specificity 94.1%
[22]
27,558 cell images
CNN, VGG16 CNNEx-SVM
Accuracy: CNN 95.97%, VGG16 97.64%
KNN
Evaluation metrics
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
199
3 Compared Pre-trained Models 3.1 VGG-19 VGG models are one of the pioneer models and prominent models in ImageNet competition and achieved an accuracy of 92.7% on the testing data. They are 19 layers deep and are used as a classification architecture for many datasets. Weights are easily available with other frameworks like Keras so they can be imported and used as one wants.
3.2 DenseNet-121 Dense Convolutional Network (DenseNet) implements dense connections between layers by connecting every layer to other layers. DenseNet reuses the features that are learnt in previous layers to avoid re-learning the learnt features.
3.3 Inception An Inception module computes multiple output transformations on the same input and then finally combines all the output to decide what features to consider. This model uses several tricks to improve the performance and speed of computation. The convolutional layer of inception implements 3 filters of various sizes to perform convolution operation, followed by max-pooling operation. The output features from these layers are combined and given to the subsequent modules. Through regulation of the total number of input channels before performing convolutions with highdimensional filters, the computational cost is reduced.
3.4 Xception Xception involves Depthwise Separable Convolutions. Xception is also known as the “extreme” version of an Inception module. This model also uses modified Depthwise Separable Convolutions where convolution operations are not performed across all channels. This makes the model light and is placed throughout the architecture.
200
V. Vanitha and S. Srivatsan
3.5 MobileNet-V2 MobileNet-V2 is a CNN model designed specifically for mobile devices. For better feature extraction, an inverted residual structure has been implemented.
4 Proposed Deep Learning Model A novel approach for malaria detection is proposed and discussed. It has major steps: (1) Pre-processing techniques to enrich the microscopic image quality and (2) Detection of malaria in cell images. The workflow is depicted in Fig. 2.
4.1 Dataset and Data Augmentation This dataset was acquired from the official website of NIH and has 27,588 images. It has an equal number of parasitized and uninfected images (13,779 images for each class). Sample images from the dataset for healthy and parasitized classes are shown (see Fig. 3).
Fig. 2 Workflow of proposed methodology
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
201
Fig. 3 Parasitized and healthy images
Data augmentation helps to increase the number of images by artificially creating additional instances through transformations such as rotation, translation and scaling. The augmentation techniques applied to the data are tabulated in Table 3. A sample parasitized image and its corresponding augmented images are shown in Fig. 4. Table 3 Augmentation techniques and parameters
Parameters
Value
Rescale
1./255
Rotation
40
Zoom
0.2–0.25
Horizontal flip/Vertical flip
True
Brightness
0.5–0.9
Shear
0.2–0.3
202
V. Vanitha and S. Srivatsan
Fig. 4 Sample augmented images
4.2 Pre-processing As the original images in the dataset are captured through a smartphone, images are in varying sizes. The images are resized to 150*150 pixels with 3 color channels (RGB). Images are pre-processed for illumination correction and noise removal. Illumination correction. The microscopic images tend to have illumination variation. As depicted in Fig. 5, illumination is uneven and hence contrast has to be enhanced before employing any further processing techniques. Through different methods like histogram equalization, we adopted the Gamma Correction technique. It is suitable for microscopic images as it improves image quality while preserving useful information. For a given image I(x, y), the following formula given in Eq. (1) is used for contrast enhancement: Fig. 5 Malaria cell image with uneven illumination
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
203
Fig. 6 Images before and after illumination correction
Iout (x, y) = Imax (x, y) ∗
(I (x, y) − Imin (x, y))(I (x, y) − Imin (x, y))γ (I (x, y) − Imin (x, y))(I (x, y) − Imin (x, y))γ
(1)
where γ is correction factor, and Imin (x, y) and Imax (x, y) are lowest and highest intensity values for the I(x, y), respectively: gamma = log(mid ∗ 225) log(mean)
(2)
where mid = 0.5 and mean is the mean pixel intensity of the gray image. The original and gamma-corrected image for γ = 1.023642 are shown in Fig. 6. Noise Removal. Noise is a common side effect observed in digitized cell images. Denoising is considered to be an essential pre-processing step for medical images. Filters are utilized to remove unwanted noise from the cell images. Mean, Gaussian and Median are the most commonly used filters to remove noise. To choose the appropriate filter and kernel size suitable for the dataset, a mean filter with different filter kernel sizes, a Gaussian filter with different kernel sizes and sigma variances, and median filters with different kernel sizes are applied to sample images. Various noises are infused into the original image, and filters are applied to noise-added images. The performance of each filter is measured using Mean Squared Error (MSE), Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Let I be the original sputum image and Î be the denoised image. PSNR is given by Eq. (3): ML − I2 ˆ PSNR I, I , = 10 ∗ lg MSE
(3)
where ML is the maximum intensity level. Only the filters with good performance measures (highlighted in Table 4) are shown in Fig. 7.
204
V. Vanitha and S. Srivatsan
Table 4 Quality measures of filters Filters
Parameters
MSE
SSIM
PSNR
Mean
Kernel = 3
307.15
0.92
29.62
Kernel = 5
472.21
0.87
29.33
Kernel = 11
893.40
0.72
28.94
Kernel = 5, sigma = 1
549.26
0.83
28.50
Kernel = 5, sigma = 2
413.04
0.88
29.37
Kernel = 11, sigma = 1
263.03
0.92
30.52
Kernel = 21, sigma = 1
284.21
0.92
29.46
Gaussian
Median
Kernel = 3
425.85
0.92
29.75
Kernel = 5
266.06
0.95
29.76
Kernel size = 9
425.85
0.92
29.75
Kernel size = 20
611.92
0.89
29.70
Fig. 7 Selection of filters with best performance evaluation metrics
Gaussian noise is added to the original image, and the aforementioned filters are applied to it (see Fig. 8). Salt and pepper noise is added, and filters highlighted in the table are applied to it. Filters with the good quality measure are shown in Fig. 9. The MSE, PSNR and SSIM values of filters are tabulated in Table 4 and shown in Fig. 10. The best performing filters are shown in bold fonts. From experiments, it is observed that the median filter with kernel size = 5 performs better than other filters on the dataset and hence is chosen for denoising the illumination-corrected images.
Fig. 8 Filters with best performance evaluation metrics on Gaussian noise-added image
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
205
Fig. 9 Filters with best performance evaluation metrics on salt and pepper noise-added image
Fig. 10 MSE, SSIM and PSNR values for selected filters
4.3 Segmentation The Canny edge detection technique has been implemented on pre-processed images. As a deep learning approach will be employed for the classification of nearly 27,000 images through several hundred epochs, computation time needs to be reduced. The edge detection technique serves this purpose by reducing the amount of data. Due to the lack of ground truth images for the dataset, it is not possible to calculate performance metrics on segmented images. Figure 11 depicts the original, pre-processed and segmented images.
4.4 Classification A lightweight CNN model is designed for automated malaria detection. The proposed model has four convolution layers, one max-pooling layer after each convolutional layer and one dense layer as shown in Fig. 12. This constitutes the feature extraction
206
V. Vanitha and S. Srivatsan
Fig. 11 Original, pre-processed and segmented images
Fig. 12 Architecture of the proposed CNN model
part of the proposed model. The output is flattened to a single-column vector and forwarded to the final layer to classify images either as parasitized or healthy. Edge segmented images of 150 × 150 × 3-pixel resolution are used as input. All four convolutional layers perform convolution operation using a 3 × 3 filter with Rectified Linear Unit (ReLU) activation function. The depth of each layer is 16, 32, 64 and 128, respectively. The convolution operation in each layer is trailed by the max-pooling operation. The max-pool filter is designed with a window size of 2 × 2 and 1 × 1 stride. The purpose of implementing a max-pooling layer after each convolution layer is to downsample the feature maps. The features of the last
Automated Detection of Malaria Parasite from Giemsa-Stained Thin … Table 5 Hyperparameters of the proposed model
Parameters
Value
Input dimension
(150, 150, 3)
Convolution
3*3
Max-pooling
2*2
Batch size
32
Epochs
100
207
Optimizer
Adadelta
Loss function
Categorical Cross-Entropy
No. of fully connected layers
1 (512 neurons)
max-pooling layer are flattened and then passed to a dense layer. The fully connected layer is designed with 512 hidden units. To prevent overfitting, a drop-out layer with a ratio of 0.5 is applied. The model is trained using the Adadelta optimizer. The hyperparameters chosen are summarized in Table 5.
5 Results and Discussion The dataset is split in the ratio 9:1, where 25,000 images are used for the model, and 2558 images are used for validation. The performance is measured using accuracy, sensitivity, specificity, Matthew’s correlation coefficient and F1 score. Table 6 presents the comparison of performance between pre-trained models and the proposed model. From the performance metrics, it is evident that the proposed model with shallow architecture has overperformed the pre-trained architectures. It has achieved the accuracy and sensitivity of 99.9 and 100%, respectively. The performance measure is compared with similar models found in the literature and is presented in Table 7. The proposed CNN model has superior accuracy and sensitivity when compared to other models. The proposed model is designed with a lesser number of layers; total Table 6 Comparison between customized model and pre-trained deep learning models Method
Accuracy (%) Sensitivity (%) Specificity (%) F1 Score (%) Matthews correlation coefficient (%)
VGG-19
98.7
99.3
99.6
99.5
99.0
DenseNet-121 91.9
98.9
96.7
97.7
95.6
ResNet-50
98.7
99.5
99.4
99.5
99.0
Xception
89.0
93.53
83.15
88.94
80.02
MobileNet-V2 97.8
99.9
98.2
99.0
98.1
Proposed
100
92.4
96.3
92.66
99.9
208
V. Vanitha and S. Srivatsan
Table 7 Comparison of performance metrics of the proposed model with other models Study
Dataset
[23]
Own
Size
[24]
PIER-VM
24,648
[18]
NIH Dataset
27,588
[25]
NIH Dataset
27,558
0.9500
–
Accuracy
Sensitivity
Specificity
F1 Score
0.840
0.981
0.689
–
0.7921
0.7402
0.8273
0.7439
0.9590
0.9470
0.9720
0.9590
0.8860
0.9500
0.9153
[26]
NIH Dataset
27,558
0.9682
0.9633
0.9778
0.9682
Proposed
NIH Dataset
27,558
0.999
1.00
0.924
0.963
parameters to be trained are also reduced. Thus, this lightweight model provides a huge advantage of low computation time.
6 Conclusion and Future Work Automated detection of malaria from digital blood smear images using deep learning techniques is implemented. The blood smear images are pre-processed and then segmented. The segmented images are augmented and fed to a customized deep learning model. We implemented several pre-trained models for malaria detection. The results from evaluation metrics are compared against pre-trained models and other existing automated malaria detection systems available in the literature. The results revealed that our model has achieved 99% accuracy and 100% sensitivity. The future scope of this work entails improving the specificity of the model and deploying it on a mobile application.
References 1. World Health Organization, World Malaria Report (2020), https://www.who.int/publications/ i/item/9789240015791. Last accessed 20 Apr 2022 2. World Health Organization, Malaria microscopy quality assurance manual, Version 2. (2016). https://www.who.int/docs/default-source/documents/publications/gmp/malaria-mic roscopy-quality-assurance-manual.pdf?sfvrsn=dfe54d47_2,clast. Accessed 20 Apr 22 3. World Health Organization. Guidelines for the treatment of malaria, 3rd ed. (2015). https:// apps.who.int/iris/handle/10665/162441. Last accessed 22 Apr 2022 4. Tek FB, Dempster AG, Kale I (2010) Parasite detection and identification for automated thin blood film malaria diagnosis. Comput Vis Image Underst 114(1):21–32 5. Nakasi R, Mwebaze E, Zawedde A, Tusubira J, Akera B, Maiga G (2020) A new approach for microscopic diagnosis of malaria parasites in thick blood smears using pre-trained deep learning models. SN Appl Sci 2(7):1–7
Automated Detection of Malaria Parasite from Giemsa-Stained Thin …
209
6. Widiawati CRA, Nugroho HA, Ardiyanto I, Amin MS (2021) Increasing performance of plasmodium detection using bottom-hat and adaptive thresholding. In: 2021 IEEE 5th international conference on information technology, information systems and electrical engineering (ICITISEE). IEEE, pp 207–212 7. Yang F, Poostchi M, Yu H, Zhou Z, Silamut K, Yu J, Maude RJ, Jaeger S, Antani S (2019) Deep learning for smartphone-based malaria parasite detection in thick blood smears. IEEE J Biomed Health Inform 24(5):1427–1438 8. Madhu G, Govardhan A, Ravi V, Kautish S, Sunil Srinivas B, Chaudhary T, Kumar M (2022) DSCN-net: a deep Siamese capsule neural network model for automatic diagnosis of malaria parasites detection. Multimed Tools Appl 1–23 9. Sharif MM, Abdelrhman Mohammed H, Mohmmed Hussein E (2022) A proposed model to eliminate the confusion of hematological diseases in thin blood smear by using deep learningpretrained model. Omdurman Islam Univ J 18(1):81–92 10. Poostchi M, Ersoy I, McMenamin K, Gordon E, Palaniappan N, Pierce S, Maude RJ, Bansal A, Srinivasan P, Miller L, Palaniappan K (2018) Malaria parasite detection and cell counting for human and mouse using thin blood smear microscopy. J Med Imaging 5(4):044506 11. Abbas N, Saba T, Mohamad D, Rehman A, Almazyad AS, Al-Ghamdi JS (2018) Machine aided malaria parasitemia detection in Giemsa-stained thin blood smears. Neural Comput Appl 29(3):803–818 12. Díaz G, González FA, Romero E (2009) A semi-automatic method for quantification and classification of erythrocytes infected with malaria parasites in microscopic images. J Biomed Inform 42(2):296–307 13. Shen H, David Pan W, Dong Y, Alim M (2016) Lossless compression of curated erythrocyte images using deep autoencoders for malaria infection diagnosis. In: 2016 picture coding symposium (PCS), IEEE, pp 1–5 14. Liang Z, Powell A, Ersoy I, Poostchi M, Silamut K, Palaniappan K, Guo P, Hossain MA, Sameer A, Maude RJ, Huang JX (2016) CNN-based image analysis for malaria diagnosis. In 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 493–496 15. Bibin D, Nair MS, Punitha P (2017) Malaria parasite detection from peripheral blood smear images using deep belief networks. IEEE Access 5:9099–9108 16. Gopakumar GP, Swetha M, Sai Siva G, Sai Subrahmanyam GR (2018) Convolutional neural network-based malaria diagnosis from focus stack of blood smear images acquired using custom-built slide scanner. J Biophotonics 11(3) 17. Mehanian C, Jaiswal M, Delahunt C, Thompson C, Horning M, Hu L, Ostbye T, McGuire S, Mehanian M, Champlin C, Wilson B (2017) Computer-automated malaria diagnosis and quantitation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision workshops, pp 116–125 18. Rajaraman S, Antani SK, Poostchi M, Silamut K, Hossain MA, Maude RJ, Jaeger S, Thoma GR (2018) Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ 6:e4568 19. Das DK, Ghosh M, Pal M, Maiti AK, Chakraborty C (2013) Machine learning approach for automated screening of malaria parasite using light microscopic images. Micron 45:97–106 20. Tek FB, Dempster AG, Kale I (2010) Parasite detection and identification for automated thin blood film malaria diagnosis. Comput Vis Image Underst 114(1):21–32 21. Mustafa WA, Santiagoo R, Jamaluddin I, Othman NS, Khairunizam W, Rohani MNKH (2018) Comparison of detection method on malaria cell images. In: 2018 international conference on computational approach in smart systems design and applications (ICASSDA). IEEE, pp 1–6 22. Rahman A, Zunair H, Sohel Rahman M, Yuki JQ, Biswas S, Ashraful Alam M, Binte Alam N, Mahdy MRC (2019) Improving malaria parasite detection from red blood cell using deep convolutional neural networks, p 10418. arXiv:1907 23. Das DK, Ghosh M, Pal M, Maiti AK, Chakraborty C (2013) Machine learning approach for automated screening of malaria parasite using light microscopic images. Micron 45:97–106
210
V. Vanitha and S. Srivatsan
24. Pan WD, Dong Y, Wu, D (2018) Classification of malaria-infected cells using deep convolutional neural networks. In: Machine learning:advanced techniques and emerging applications. Intech Open 25. Fatima T, Farid MS (2020) Automatic detection of plasmodium parasites from microscopic blood images. J Parasit Dis 44(1):69–78 26. Maqsood A, Farid MS, Hassan Khan M, Grzegorzek M (2021) Deep malaria parasite detection in thin blood smear microscopic images. Appl Sci 11(5):2284
Forecasting Diabetic Foot Ulcers Using Deep Learning Models Shiva Shankar Reddy, Laalasa Alluri, Mahesh Gadiraju, and Ravibabu Devareddi
Abstract Patients with diabetes often suffer from foot ulcers, one of its most prevalent complications. It could be treated if it is discovered in its early stages. Otherwise, it may need amputations in its more severe manifestations. The primary purpose of this paper is to investigate procedures to determine whether a patient has a foot ulcer or not in the early stage. In this particular instance, foot ulcer data set containing 1029 images is utilized. Different Deep Learning algorithms were implemented for ulcer prediction, namely ResNet, VGG16, DenseNet and MobileNet. The algorithms were assessed based on accuracy, precision, recall, F1-score, Jaccard index, error rate and AUC values. Performance graphs for the algorithms are constructed. After the assessment, VGG16 is more effective than its counterparts for predicting Diabetic Foot Ulcers and identifying their stages. Keywords Diabetic Foot Ulcer (DFU) · Deep Learning (DL) · Residual Network (ResNet) · Visual Geometry Group 16 (VGG16) · Densely Connected Convolution Network (DenseNet) · Mobile Network (MobileNet)
1 Introduction In this twenty-first century, diabetes has become a global health problem. The treatment of diabetic foot is complicated, a common complication of diabetes [1]. Foot ulcers develop in patients with nerve damage, and surgeries are required for severe cases. If precautions are not taken early, it leads to infections, amputations and deaths. Patients in a stable state are easily affected by foot ulcers. Foot ulcers are usually formed under the balls of our feet, where more pressure is applied. Mainly these ulcers are detected lately where most of the damage is already done. Technology S. S. Reddy (B) · L. Alluri · M. Gadiraju · R. Devareddi Department of CSE, Sagi RamaKrishnam Raju Engineering College, Bhimavaram, AP, India e-mail: [email protected] M. Gadiraju e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_16
211
212
S. S. Reddy et al.
can be advantageous for the early detection of these foot ulcers [2]. Foot ulcers have occurred under the feet where the feet are directly exposed to the outer surface. So, it takes more time for healing. The wound healing will be delayed if required treatment is not given based on the correct diagnosis. Due to foot ulcers, the blood supply is improper in veins going towards the legs. Because of that, there is the possibility of necrosis and death related to the necrosis. Sensor improvements and image processing have progressed in early neuropathic and vascular complication detection. The primary reason for high blood sugar resulting in several human body disorders is diabetic Mellitus. High blood sugar results in many organ failures like liver damage and a low immune system. A diabetic foot ulcer is a disease that occurs because of diabetes. 20% of diabetic patients are hospitalized every year due to foot ulcers. In this, the risk of limb amputation is higher for diabetic patients than for ordinary people [3]. Ulcers in the foot are the main reason patients are admitted to hospitals. There is more risk of patients dying with Foot ulcers when compared to patients without foot ulcers. There is a greater than 50% chance of infection occurring for patients with diabetes and a nearly 38% chance of amputation. This foot ulcer problem can be cured when it is identified early. Otherwise, it gets severe, so it requires amputations in many situations. A foot ulcer is a problem that cannot be cured by natural methods or exercises or with any diet control or insulin treatment. Many people with diabetes can develop foot ulcers, but good foot care can help control them. A foot ulcer is problematic when black tissue surrounds the ulcers. From Fig. 1, the different stages of foot ulcers can be observed. The foot is damaged in various stages of the ulcer from Stage 0 to Stage5. This final stage requires amputation of the feet in many cases. Figure 2 shows what the infected foot ulcer looks like. Based on the redness of the wound, swelling, discolouration or discharge, it could be determined whether the wound is infected or not. If the wound is not cared for and precautions are not taken to prevent it properly, it becomes severe and cannot be treated easily. The foot ulcer could be identified based on the features in the image, like redness, swelling, irritation and black tissue around the wound.
Fig. 1 Stages of foot ulcers
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
213
Fig. 2 Infected foot ulcer
Fig. 3 Ischaemic foot ulcer
Figure 3 shows the Ischaemic foot ulcer where the infection becomes more severe. An ulcer is considered a serious situation if black tissue surrounds the wound. It happens because of poor blood circulation to the feet.
2 Related Work Patel et al. [4] focused on the analysis of classification techniques used in medical imaging. They used techniques like the k-nearest neighbour classifier, fuzzy logic techniques, Bayesian networks, neural networks and support vector machine for wound region classification. Results were attained with parameters like accuracy. Based on the colour, texture and features, classification and segmentation are performed. Mao et al. [5] presented a wearable system for foot ulcer wound detection.
214
S. S. Reddy et al.
A temperature sensor with 0.25 °C accuracy is used. A pair of Bluetooth devices are used to transmit data. Based on the lab view, a warning LED and an alarm will be sent to the user interface. Sensors are fixed inside the wearable device, and components are selected on a circuit board. With this, wounds are detected before they get severe. Cui et al. [6] proposed a deep learning method for accurate wound region segmentation. In this, artefacts are extracted by processing input images and nurtured into Convolutional Neural Network that provides a probability map. These probability maps are processed, and the wound region is identified. They introduced an automatic segmentation model consisting of preprocessing, CNN segmentation and post-processing segments. Different illuminations are reduced with colour constancy and white spot removal. A trained patch-based CNN and U-Net models create a probability map. In post-processing, two additional methods and a morphology method are used to refine segmentation results. These results are used for measurement and feature extraction of the wound area, which helps doctors in wound diagnosis and prognosis. Quinn et al. [7] present an approach for temperature hotspot detection on foot based on thermography. Captured data is processed by web-based services to detect higher temperature areas on foot. When they verified this on 10 participants, 60% of the results were correctly detected. R-CNN identifies not only foot objects but also includes some portions of the background showing that R-CNN’s accuracy should improve. In the end, they decided to use a thermal camera which provides both image resolution and required temperature measurement. ShivWanShi et al. [8] discussed an overview of the development of near-infrared spectroscopy for estimating vascular parameters in the lower limbs of the patient. Tissue oxygenation parameters are monitored by the NIRS method, giving tissue oxygenation data at a particular area. Peripheral vascular disease and tissue oxygenation changes are examined at micro- and macro circulatory levels. Cruz-Vega et al. [9] show thermal image classification with the employment of computational intelligence (CI)-based methods. This research automatically extracts the region of interest (ROI) with an evolutionary algorithm. In thermal images, extracting ROIs is difficult, and the main drawback is temperature distribution analysis. The ankle region is included, and the relevant temperature analysis is taken, formed at the ulceration area. Huang et al. [10] developed a non-invasively optical system for monitoring blood circulation changes before and after Buerger’s exercise. The experiment shows that different groups relate to important haemoglobin concentration and oxygen saturation. By doing long-term rehabilitation with this exercise, the blood circulation of diabetic patients comes closer to healthy subjects. Rani et al. [11] investigated three clinical conditions related to wound healing based on the mean temperature of ulcers associated with vascular disease, kidney disease and heart disease. The thermal and RGB images of diabetic foot ulcers are observed in the first 2 weeks of ulceration. They found that the mean temperature change is high in patients with chronic kidney disease. This application shows that temperature observed from thermal images is used as a parameter for its assessment. Pande et al. [12] used a combination of PVA and Agarose by electrospinning them for healing wound patches. The performance of the pure PVA electrospun mat is poor compared to the electrospun blend. It gives
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
215
a better vapour transmission rate and holds its physical structure when immersed in a phosphate solution. Torres et al. [13] presented a virtual platform used in a website specialized in the diabetic foot. This platform supports data logging, medical diagnostic observations and therapeutic prescriptions related to patient medical history. Wang et al. [14] presented a systematic approach for the central design of users. They developed a smart mobile system (SMS) for chronic wound care management that makes the requirements to care for different wounds. This system applies a design approach with user inputs and ensures the system’s function in clinical practice. Rajala et al. [15] produced an in-shoe plantar pressure sensor made of polyvinylidene fluoride with evaporated copper electrodes. Their goal is to provide a light-weighted matrix sensor where location and size for measuring sites are considered. They have chosen eight measuring locations. The sensor can measure piezoelectric sensitivity and plantar pressure. The obtained values vary from 58 to 486 kPa, compared with other reported values. The PVDF used here is suitable for measuring plantar pressure. It can also measure daily life conditions. The insole sensor developed here is used to prevent pressure ulcers. Vali et al. [16] proposed a Chan-Vese algorithm for estimating diabetic foot ulcers. The temperature in the body is an important criterion where the diabetic foot is formatted according to the body temperature. Normally, the difference in temperature in the foot cannot vary greater than 2.2 °C. The ChanVese algorithm is used for image segmentation, where the ulcers are identified and segmented according to the care taken. Sivayamini et al. [17] presented the analysis of foot ulcers, like cuckoo search optimization that is used for optimizing the features and detecting the ulcer by Thermography images. These image-processing techniques are used to detect wounds. They also used the Cuckoo Search algorithm to know how efficient it is in wound detection. They proposed a novel optimization methodology for detecting diabetic foot ulcers using infrared images. Rani et al. [18] introduced a new approach to showing wound contour irregularity and studied some associated medical conditions. It shows that a change in wound contour from the first to the fourth week is closely related to heart disease affecting the healing of DFUS. Through this, healing and non-healing wounds could be divided. Stephanie et al. [2] built a workshop to observe adult patients in hospitals at risk of getting foot ulcers by monitoring the patient’s thermal images, who have reported pain in their right foot. The thermal images of the patient’s left and right feet are subjected to image processing to remove noise and increase contrast. Mean sizing ROI is calculated over time between the left and right heals. Amin et al. [19] used Convolutional Neural Network and YOLOv2-DFU models for the classification and localization of infection. Severe features are taken out through classification and supplied to many classifiers like KNN, Softmax, Ensemble, DT and NB to get the best results. Later, gradient-weighted class activation mapping is used to understand the highlevel infected region better. Wang et al. [1] presented a shoe system for monitoring plantar pressure in people with diabetes. It contains a pressure sensor array that monitors the real-time plantar pressure and displays it on mobile phones. This sensor uses copper and carbon black as an electrode and conductive fillers, respectively. These devices are fixed in shoes, and information is transmitted with a wireless Bluetooth network. Goyal et al. [20] proposed a convolutional neural network and DFUNet
216
S. S. Reddy et al.
to extract the features between the normal skin and DFU. With the performance, DFUNet allows high accuracy in detecting ulcers in foot images. This technique is innovative for evaluating DFU and for medical treatment. Wang et al. [21] reported on the design and performance of an Inductive force sensor for measuring plantar pressure and shear load. A 3D FE model is developed and used for the sensor design parameters. Finally, this reported tri-axis force sensor is used in a shoe load sensing node to assess people with DFU risk. Reddy et al. [22, 23] have done their work on predicting diabetes on readmission of patients in the hospital. By using data mining schemes, they have detected whether the patient has diabetes or not by using a voting strategy [24], weighted conglomeration [25] and correlated ailments [26]. By using various features, they have predicted multiple ailments [27] like Retinopathy [28], Gestational Diabetes [29, 30], Renal fault [31], Myocardial infarction [32], Foot ulcers [33], Neuropathy [34] and Liver metastasis [35]. Using NELM-PSO, they predicted whether the patient had type 2 diabetes [36]. Nethmi Kularathne et al. [37] developed a smart shoe to monitor and help prevent diabetic foot ulcers. It is a mobile-based plug-and-play device fixed to any shoe to manage foot ulcers by recording the patient’s temperature, humidity, weight and step count through the mobile application. With this, they can identify the increasing risk of foot ulcers. You et al. [38] presented a complete wearable system that includes a data collection circuit, wireless transmission circuit and data analysis. The pressure sensor collects the plantar pressure data, and analog signals are sent to the microcontroller. With the observed data, a hot map of plantar pressure is displayed, with which the user or medical staff can know whether the patient is likely to develop an ulcer in the case of a person with diabetes. Robles et al. [39] proposed a system for regular monitoring of standing plantar pressure during walking. It is a two-part insole instrument with eight piezo-resistive force sensors assigned in areas with high pressure while walking. The information is transmitted through the wireless device to the graphical interface for accurate visualization, analysis and storage. Manoharan and Satish [40] recommended a system for patient diet using K-Clique and deep learning classifiers. This system helps suggest the food according to patients’ health conditions, based on their sugar level, fat, age, protein, cholesterol, blood sugar, etc. The developed system using a gated recurrent network and K-Clique is compared with machine learning models like naive Bayes, logistic regression, MLP and RNN for accuracy and exactness. They demonstrated that the proposed model’s outcome has higher accuracy and precision than their counterparts.
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
217
3 Methodology 3.1 Objectives 1. To find the stage of the ulcer by implementing deep learning methods. Here, VGG16 is used as the proposed model; ResNet, DenseNet and MobileNet algorithms are considered for comparing with the VGG16 model. 2. For the best results, three algorithms are assessed using evaluation metrics such as Accuracy, Precision, Recall, F1-score, Jaccard index, error rates and AUC-ROC curves; from those results, VGG16 has produced better results.
3.2 Data Set The data set consists of different types of foot ulcer images with 1029 images. The data set is taken from www.kaggle.com. It contains original images collected from the medical centre. From this, 80% of the data set is considered for training, and 20% is considered for testing. This data set consists of 3 class labels: infection, ischaemia and normal. The function of data preprocessing is to avoid and reduce the noise from the data set. In this process, it identifies whether there are any missing values in the data set. Later, feature scaling is performed to convert them into the same scaled values. Figure 4 shows the entire structure of the foot ulcer analysis. Here, the data set with foot ulcer images are loaded, and on that data set, data preprocessing is done. Later, the data set splits into training and testing data, where 80% of the data set is used for training and 20% for testing. Training data set means data used for training each algorithm that is chosen with the help of the test data trained model is tested. For different models, different results are obtained, and they are compared to generate the best model.
3.3 Algorithms Used A. ResNet Solving a complex problem requires extra layers to be added to the deep neural networks to improve performance and accuracy. The perception behind adding more layers helps learn some more complex features. For example, if image recognition is considered, each layer performs different operations, such as the first layer for detecting edges, the second for identifying textures, the third for detecting objects, etc. Figure 5 shows the entire structure of ResNet. ResNet contains connections that could solve the vanishing gradient problem in deep neural networks by allowing different shortcut paths to flow over the gradient.
218
S. S. Reddy et al.
Fig. 4 Foot ulcer analysis
These connections can help in another way by allowing the model to learn the identity functions so that the lower and higher layers perform equally well and are not worse than each other. In ResNet, the foot ulcer input image’s size should be 224 × 224. If the image size is 512 × 228, then that can be solved by resizing the image. Here, a bounding box is needed if it is required to detect the object, whether it is infected, ischaemic or normal. The ResNet architecture has four stages in which each step performs different operations. Stage one has three residual blocks with three layers each, and the kernel size is optimized. When proceeding from one stage to another, the width of the channel is doubled, and the input size reduces to half. Here, the 3D image will be expanded to a 4D image by using the reshape function, and then later, class labels will be predicted, and the final tag will be identified. B. DenseNet DenseNet is one of the neural networks used for visual object recognition, and it is best for recognizing dense and highly disordered images. DenseNet is used to overcome the problem of vanishing information before reaching its destination, which is caused by the long path between the input layer and the output layer.
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
219
Fig. 5 Architecture for ResNet
Suppose there are N number of layers; in the normal network, there will be N number of connections, whereas in DenseNet, there are N multiplied by N plus one by two links, i.e. N(N + 1)/2. With this, DenseNet has fewer players when compared with other networks. So, models could be trained with more than 100 layers easily. Hence, if the third layer is considered, it takes not only the input of the second layer but also the inputs from all previous layers. In this, every layer can add up to a limited number of parameters. Here, the foot ulcer data set is taken with 1049 images. Because there are more images in the data set, sometimes there occurs a vanishing information problem where some of the information from the foot ulcer image data set will have vanished. So to overcome this problem, the DenseNet model is used. As the name suggests, DenseNet means a densely connected network, and every layer is associated with successive layers. Between every layer, it contains a convolution layer and a pooling layer, where after every layer, the pixel size of the foot ulcer image and parameters are reduced. The final layer prediction shows whether the image is infected, ischaemic, or normal. C. MobileNet MobileNet is a model that is designed to use in mobile applications. In this model, depth-wise separable convolutions are used when there is difficulty in factoring the kernels into smaller kernels. Here, it is used to reduce the number of parameters in the network compared with the other networks. MobileNet is the class that is a Google open source that gives the best starting point for classifier training. First, the requirements in the notebook must be imported, and then the foot ulcer image is loaded. For loading the ulcer image, the foot ulcer image data set is to be added to the folder, and then its path should be passed to
220
S. S. Reddy et al.
the corresponding variable. Here, similar features of ulcer images are identified, and they are kept in a single folder. A pre-trained model is used when using the MobileNet model to test whether the ulcer image is infected, ischaemic or normal. To display the predictions, first, they need to be decoded. For decoding images, net utils can be used. With the help of this, foot ulcer images could be decoded. D. VGG16 VGG16 is an algorithm which is used for detecting and classifying objects. It can organize 1000 images of 1000 different categories with the best accuracy. It is a 16-layer architecture which classifies images into different categories. The pre-trained version of the network could be loaded that is trained on more than millions of images from the data set. While preparing the data, the input to ConvNet is 224*224 RGB image. Subtracting the mean RGB value calculated on the training set from each pixel is the only preprocessing done here. In this, the foot ulcer images are passed through the convolution layers, where the filters are used with a microscopic receptive field of 3 × 3, the region’s size in the input that produces the feature. Here, the VGG model for image classification needs to be imported. In the second step, a sample image of foot ulcers from the Kaggle data set is loaded. The image is loaded in the Python Imaging Library (PIL) form. The colour model is RGB, and the target size of the image is 224 * 224. The third step is to make that foot ulcer image size suitable for input size in VGG16. The fourth step is to predict whether the foot is infected, ischaemic or normal. In this, the only preprocessing is to make a centred mean, and finally, the probability distribution is recognized to understand the class labels so that the prediction is made. The entire structure of VGG16 is shown in Fig. 6. Algorithm Steps: Step 1: Step 2: Step 3: Step 4: Step 5:
Start Select the data set of foot ulcer images. Import libraries. Splitting data of foot ulcer images for training and testing. Applying the VGG16 model and adding layers (cv2d, max pooling, flatten, activation, dense) Step 6: It generates the prediction values of foot ulcer image testing data. Step 7: Compiling the model for which it generates the performance of the model based on evaluation metrics Step 8: Stop
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
221
Fig. 6 VGG16 architecture
4 Result Analysis Results are obtained for evaluation metrics such as accuracy, precision, recall, F1score, Jaccard index, error rate and AUC-ROC curves. In Table 1, values are provided for evaluation metrics and are defined.
4.1 Evaluation Metrics Table 1 illustrates seven metrics used to evaluate ResNet, VGG16, DenseNet and MobileNet methods. Table 1 Metrics to be evaluated
S. No.
Metrics
Equation/Formulae
1
Accuracy
Ac =
2
Precision
Pr e =
(T P) (T P+F P)
3
Recall
Rec =
(T P) (T P+F N )
4
F1-Score
F Sco =
5
Jaccard Index
J (A, B) =
6
Error Rate
ER =
7
AUC-ROC Curve
A Rcr v =
(T P+T N ) (T P+T N +F P+F N )
(T P) 1 (T P+ 2(F P+F N) ) |A∩B| |A∪B|
(F P+F N ) (P+N ) (1+T P R−F P R) 2
222
S. S. Reddy et al.
Table 2 Metric obtained for various models Model
Accuracy (%)
Precision (%)
Recall (%)
F1-Score (%)
Jaccard Index (%)
Error Rate (%)
AUC-ROC Curve (%)
ResNet
33.98
11.32
33.33
16.9
11.54
66.02
50.00
VGG16
99.51
99.46
99.52
99.48
99.03
0.49
50.08
DenseNet
85.92
89.37
84.25
83.89
75.00
14.08
27.60
MobileNet
33.98
11.32
33.33
16.9
11.54
66.02
50.00
4.2 Results Obtained Table 2 shows seven evaluation metrics for ResNet, VGG16, DenseNet and MobileNet methods. With the help of the confusion matrix, evaluation metrics are evolved: Accuracy, precision, recall, F1-score, Jaccard index, error rate and AUC-ROC curve. Figures 7, 8, 9 and 10 are the AUC-ROC curves for ResNet, VGG16, DenseNet and MobileNet. For a given AUC value, an improved model is one that correctly predicts 0 and 1 classes. Patients having the condition may be distinguished from those who don’t by increasing the AUC. With TPR and FPR on the y-axis and x-axis, respectively, ROC curves are drawn for the TPR/FPR comparison. Figure 10 represents the accuracy for various models obtained, and Fig. 11 illustrates the comparison of the evaluation metrics called precision, recall, F1-score, Jaccard index and error rate of the four algorithms. From Fig. 12, it is concluded that VGG16 has generated the highest values compared to the remaining three algorithms.
Fig. 7 ResNet AUC-ROC curve
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
223
Fig. 8 VGG16 AUC-ROC curve
Fig. 9 DenseNet AUC-ROC curve
5 Conclusion Here, the work demonstrated how foot ulcers and their stages are predicted using an effective algorithm utilizing the foot ulcer image data set. VGG16 is proposed in this work as it can classify thousands of images with 1000 different categories with high accuracy. Four algorithms, namely ResNet, DenseNet, VGG16 and MobileNet,
224
S. S. Reddy et al.
Fig. 10 MobileNet AUC-ROC curve
Fig. 11 Graph for accuracy
are considered, including the proposed one for implementation. These are evaluated using the evaluation metrics. VGG16 has achieved better than its counterparts with accuracy, precision, recall, F1-score, Jaccard index, error rate and AUC for ROC curve values as 99.514563, 99.46236, 99.5238, 99.4899, 99.0371, 0.49 and 50.08%, respectively. VGG16 is an effective method for predicting foot ulcers based on obtained results. The final output of this work gives the prediction of foot ulcers like the image predicted as infection, ischaemic or normal with greater accuracy.
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
225
Fig. 12 Performance graph of algorithms
References 1. Wang D, Ouyang J, Zhou P, Yan J, Shu L, Xu X (2020) A novel low-cost wireless footwear system for monitoring diabetic foot patients. IEEE Trans Biomed Circuits Syst 15(1):43–54 2. Bennett SL, Goubran R, Knoefel F (2017) Long term monitoring of a pressure ulcer risk patient using thermal images. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 1461–1464 3. Gupta P, Gaur N, Tripathi R, Goyal M, Mundra A (2020) IoT and cloud based healthcare solution for diabetic foot ulcer. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC). IEEE, pp 197–201 4. Patel S, Patel R, Desai D (2017) Diabetic foot ulcer wound tissue detection and classification. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE, pp 1–5 5. Mao A, Zahid A, Ur-Rehman M, Imran MA, Abbasi QH (2018) Detection of pressure and heat in a compressive orthotic for diabetes prevention using nanotechnology. In: 2018 IEEE international RF and microwave conference (RFM). IEEE, pp 211–214 6. Cui C, Thurnhofer-Hemsi K, Soroushmehr R, Mishra A, Gryak J, Domínguez E, Najarian K, López-Rubio E (2019) Diabetic wound segmentation using convolutional neural networks. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 1002–1005 7. Quinn S, Saunders C, Cleland I, Nugent C, Garcia-Constantino M, Cundell J, Madill G, Morrison G (2019) A thermal imaging solution for early detection of pre-ulcerative diabetic hotspots. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 1737–1740 8. Shivwanshi RR, Seshadri NG, Periyasamy R (2018) A review of present and futuristic development of near infrared spectroscopy system in the assessment of diabetic foot risk. In: 2018 fourth international conference on biosignals, images and instrumentation (ICBSII). IEEE, pp 206–212 9. Cruz-Vega I, Peregrina-Barreto H, de Jesus Rangel-Magdaleno J, Ramirez-Cortes JM (2019) A comparison of intelligent classifiers of thermal patterns in diabetic foot. In: 2019 IEEE international instrumentation and measurement technology conference (I2MTC). IEEE, pp 1–6 10. Huang YK, Chang CC, Lin PX, Lin BS (2017) Quantitative evaluation of rehabilitation effect on peripheral circulation of diabetic foot. IEEE J Biomed Health Inform 22(4):1019–1025
226
S. S. Reddy et al.
11. Rani P, Aliahmad B, Kumar DK (2019) The association of temperature of diabetic foot ulcers with chronic kidney disorder. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 2817–2820 12. Pande D, Chakrapani VY, Kumar TS (2019) Electrospun PVA/AGAROSE blends as prospective wound healing patches for foot ulcers. In: 2019 IEEE international conference on clean energy and energy efficient electronics circuit for sustainable development (INCCES). IEEE, pp 1–6 13. Torres IA, Leija L, Vera A, Maldonado H, Bayareh R, Gutiérrez J, Ramos A (2018) Computational support system for early diagnosis of diabetic foot. In: 2018 global medical engineering physics exchanges/pan american health care exchanges (GMEPE/PAHCE). IEEE, pp 1–5 14. Wang S, Zhang Q, Huang W, Tian H, Hu J, Cheng Y, Peng Y (2018) A new smart mobile system for chronic wound care management. IEEE Access 20(6):52355–52365 15. Rajala S, Mattila R, Kaartinen I, Lekkala J (2017) Designing, manufacturing and testing of a piezoelectric polymer film in-sole sensor for plantar pressure distribution measurements. IEEE Sens J 17(20):6798–6805 16. Vali SB, Sharma AK, Ahmed SM (2017) Implementation of modified chan vase algorithm to detect and analyze diabetic foot ulcers. In: 2017 international conference on recent trends in electrical, electronics and computing technologies (ICRTEECT). IEEE, pp 36–40 17. Sivayamini L, Venkatesh C, Fahimuddin S, Thanusha N, Shaheer S, Sree PS (2017) A novel optimization for detection of foot ulcers on infrared images. In: 2017 international conference on recent trends in electrical, electronics and computing technologies (ICRTEECT). IEEE, pp 41–43 18. Rani P, Aliahmad B, Kumar DK (2017) A novel approach for quantification of contour irregularities of diabetic foot ulcers and its association with ischemic heart disease. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 1437–1440 19. Amin J, Sharif M, Anjum MA, Khan HU, Malik MS, Kadry S (2020) An integrated design for classification and localization of diabetic foot ulcer based on CNN and YOLOv2-DFU models. IEEE Access 18(8):228586–228597 20. Goyal M, Reeves ND, Davison AK, Rajbhandari S, Spragg J, Yap MH (2018) Dfunet: convolutional neural networks for diabetic foot ulcer classification. IEEE Trans Emerg Top Comput Intell 4(5):728–739 21. Wang L, Jones D, Chapman GJ, Siddle HJ, Russell DA, Alazmani A, Culmer P (2020) An inductive force sensor for in-shoe plantar normal and shear load measurement. IEEE Sens J 20(22):13318–13331 22. Reddy SS, Sethi N, Rajender R (2020) A comprehensive analysis of machine learning techniques for incessant prediction of diabetes mellitus. Int J Grid Distrib Comput 13(1):1–22 23. Reddy SS, Sethi N, Rajender R (2020) Evaluation of deep belief network to predict hospital readmission of diabetic patients. In: 2020 second international conference on inventive research in computing applications (ICIRCA). IEEE, pp 5–9 24. Reddy SS, Rajender R, Sethi N (2019) A data mining scheme for detection and classification of diabetes mellitus using voting expert strategy. Int J Knowl-Based Intell Eng Syst 23(2):103–108 25. Reddy SS, Sethi N, Rajender R (2020) Safe prediction of diabetes mellitus using weighted conglomeration of mining schemes. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 1213–1220 26. Reddy SS, Sethi N, Rajender R (2019) A review of data mining schemes for prediction of diabetes mellitus and correlated ailments. In: 2019 5th international conference on computing, communication, control and automation (ICCUBEA). IEEE, pp 1–5 27. Reddy SS, Sethi N, Rajender R (2021) Mining of multiple ailments correlated to diabetes mellitus. Evol Intel 14(2):733–740 28. Reddy SS, Sethi N, Rajender R (2021) Discovering optimal algorithm to predict diabetic retinopathy using novel assessment methods. EAI Endorsed Trans Scalable In-Form Syst 8(29):e1 29. Reddy SS, Sethi N, Rajender R (2021) Rigorous assessment of data mining algorithms in gestational diabetes mellitus prediction. Int J Knowl-Based Intell Eng Syst 25(4):369–383
Forecasting Diabetic Foot Ulcers Using Deep Learning Models
227
30. Shankar RS, Raju VS, Murthy KV, Ravibabu D (2021) Optimized model for predicting gestational diabetes using ML techniques. In: 2021 5th international conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 1623–1629 31. Reddy S, Sethi N, Rajender R (2020) Diabetes correlated renal fault prediction through deep learning. EAI Endorsed Trans Pervasive Health Technol 6(24):e4 32. Reddy SS, Sethi N, Rajender R (2020) Risk assessment of myocardial infarction for diabetics through multi-aspects computing. EAI Endorsed Trans Pervasive Health Technol 6(24):e3 33. Reddy SS, Mahesh G, Preethi NM (2021) Exploiting machine learning algorithms to diagnose foot ulcers in diabetic patients. EAI Endorsed Trans Pervasive Health Technol 7(29):e2 34. Reddy SS, Mahesh G, Rao VV, Preethi NM (2022) Developing preeminent model based on empirical approach to prognose liver metastasis. In: Ubiquitous intelligent systems 2022. Springer, Singapore, pp 665–683 35. Reddy SS, Mahesh G, Preethi NM (2020) Evolving a neural network to predict diabetic neuropathy. EAI Endorsed Trans Scalable In-Form Syst 8(31):e1 36. Reddy SS, Mahesh G (2021) Risk assessment of type 2 diabetes mellitus prediction using an improved combination of NELM-PSO. EAI Trans on Scalable Inf Syst e9 37. Kularathne N, Wijayathilaka U, Kottawaththa N, Hewakoralage S, Thelijjagoda S (2019) DiaShoe: a smart diabetic shoe to monitor and prevent diabetic foot ulcers. In: 2019 international conference on advancements in computing (ICAC). IEEE, pp 410–415 38. You Z, Zahid A, Heidari H, Imran M, Abbasi QH (2018) A compact wearable system for detection of plantar pressure for diabetic foot prevention. In: 2018 IEEE Asia Pacific conference on postgraduate research in microelectronics and electronics (PrimeAsia). IEEE, pp 64–67 39. Robles A, Cardiel E, Alvarado C, Hernández PR (2019) Development of a monitoring system for vertical plantar pressure distribution during human walking. In: 2019 global medical engineering physics exchanges/pan american health care exchanges (GMEPE/PAHCE). IEEE, pp 1–5 40. Manoharan S (2020) Patient diet recommendation system using K clique and deep learning classifiers. J Artif Intell 2(02):121–130
Artificial Intelligence-Based Chronic Kidney Disease Prediction—A Review A. M. Amaresh and Meenakshi Sundaram A.
Abstract Chronic kidney disease (CKD), one of the leading causes of death worldwide, has a significant economic impact on healthcare systems and is considered as a risk factor for cardiovascular disease. Early identification of CKD can save a person’s life from heart attack. CKD is usually asymptomatic until the final stage if the healthcare data is not continuously collected from patients. The severity of Chronic Kidney Disease (CKD) varies based on location, stage, age and gender. This research study has conducted a detailed survey on the process of dataset collection, filtering, feature extraction and classification from high-quality journal repositories. This research study concludes that the CKD prediction can be performed by utilizing ultrasound images of kidney as input and from that the textual data will be collected based on thirteen to fourteen features specified by medical experts. Keywords Chronic disease · Ultrasound image · Artificial intelligence
1 Introduction End-stage renal disease (ESRD) has a high mortality rate, and cardiovascular disease is considered as a potential health risk for 13–15% of the world’s population [1, 2] The primary cause of mortality in recent years is Cardiovascular disease (CVD); however, CKD will contribute by accelerating various risks. Chronic kidney disease (CKD) is presently the world’s 18th most common cause of mortality, increased from 19th position reported by the Global Burden of Disease Study (GBDS) in 1990 [3]. The effects of CKD in regular kidney functioning mechanism are shown in Fig. 1.
A. M. Amaresh (B) · A. Meenakshi Sundaram Research Scholar, School of CSE, REVA University, Bangalore, Karnataka, India e-mail: [email protected] A. Meenakshi Sundaram e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_17
229
230
A. M. Amaresh and Meenakshi Sundaram A.
Fig. 1 Mechanism of kidney during CKD [4]
The Glomerular filtration rate (GFR) is the test used to analyse the risk of CKD disease irrespective of gender and age. The risk of CKD with respect to GFR is as shown in Fig. 2 [5]. The general block diagram used for the classification of the CKD is as shown in Fig. 3. The CKD-affected kidney regions are shown in Fig. 4.
Fig. 2 Classification of the CKD [5]
Artificial Intelligence-Based Chronic Kidney Disease …
231
Fig. 3 General flow diagram for detection of the CKD
Fig. 4 a Autosomal Dominant Polycystic Kidney Disease (ADPKD) kidney and liver cyst; b ADPKD kidney, liver and spleen; c Renal cyst in non-ADPKD [28]
Whenever a patient suffers CKD, the patient’s kidney operation or working will gradually decrease resulting in the symptoms like vomiting, nausea and appetite loss. The filteration process also affects the urinary tracks. The GFR rate, i.e. filteration rate of water is directly proportional to the proper functioning of the kidney if the filteration rate is less than 60 ml per minute as per the clinical research [30]. The recent research works were referred by considering the keywords such as “Chronic dialysis”, “Acute renal failure”, “Chronic Kidney disease”, “Machine learning predication on CKD”, “Neural network”, “Deep neural network” from the web of science, scopus, science direct journals and out of that the following research works have matched with the scope of the proposed review study.
2 Literature Review This section provides a brief analysis on the research works carried out on predicting CKD. The section is segregated into three main phases: 1. Data Collection Phase 2. Pre-Processing Phase 3. Classification Phase Data collection: It is the initiation phase in every machine learning application since the dataset is required for training the model and is described in Table 1.
232
A. M. Amaresh and Meenakshi Sundaram A.
Table 1 Details on available dataset for CKD Citations
Details
No. of attributes
No. of patients
[6]
UCI dataset having 11 are numeric and rest is alphabetical data
24
400
[7]
Author developed a own dataset by consulting the patients from India, Singapore and China whose age were more than 40
CT images
1440
[8]
Ultrasound images are captured from the patients of Keelung Chang Gung Memorial Hospital and prepared a dataset
Ultrasound images of 607
280 patients
[9]
The Xenopus kidney embryos dataset is employed for training the model
208 images [10]
116
[11]
Ultrasound images of the patients 70 are collected and extracted the HOG features
35
Pre-Processing: If the input is an image, the pre-processing phase performs noise reduction. If the input includes text data, pre-processing will normalize the data; different pre-processing methods are presented in Table 2. Classification: Once the feature extraction for each image in the dataset is completed, the features and labels get forwarded to the classifier model in order to determine the presence or absence of CKD. Table 3 shows the classification techniques and feature extraction used by the researchers to identify CKD. Recently, researchers have developed neural network-based prediction using the CNN architecture [31]. The learning process will be based on the thirteen features of the respective patients (normal and CKD). Further, the ANN and CNN models learn as the features flow through the network. Table 2 Pre-processing methods Citations
Pre-processing method
Type of data
Classifier name
[12]
Mean the neighbouring attributes for the missing values
Text
SVM, KNN, Decision tree and random forest
[13]
Impute the neighbouring attributes for the missing values
Text
Random forest
[14]
Remove the entire column if one attribute is missing
Text
Naïve bayes
UCI dataset composed of 400 patients and 25 features
UCI dataset with 24 attributes
UCI dataset with 400 patients
UCI dataset
[16]
[17]
[18]
[19]
Chronic Kidney Disease Epidemiology Collaboration has been employed to calculate the different CKD stages
Missing values are replaced with the mean of neighbouring attributes
Missing values are meant to obtain the processed dataset
Out of 25, only 15 features are extracted by deleting the column with missing values
Authors have considered 798 Median filter. Contour are ultrasound kidney images of Taiwan marked manually and used for patients and predicted five stages of segmentation kidney diseases
[15]
Pre-processing
Dataset
Paper
Table 3 Classification algorithms feature extraction for predication of the CKD Limitations
The performance has not been tested for ANN
–
(continued)
Random forest, J48 and decision tree Missing value need to be handled are employed for classifying in a efficient way different CKD stages. Average accuracy −85%
SVM classifier is employed and achieved an accuracy of 97%
Six classifiers are employed, they are Performance can be checked by SVM, MLP, Logistic regression, using deep learning with same Naïve bayes, Chirp and Adaboost dataset
Classifiers: Naïve Bayes (95%), Decision Tree (99%), SVM (97.75%) and KNN (95%) are applied on the processed dataset
Features: Texture, standard For the testing phase, more number deviation, area and brightness are all of CKD methods should be considered included to enhance the accuracy By using SVM, an accuracy of about 79% has been obtained
Methodology
Artificial Intelligence-Based Chronic Kidney Disease … 233
Employed three datasets UCI, CKD, Attributes present in the dataset Kaggle are converted into binary values, i.e. −1 and +1
Urine samples and MRI images of 241 patients are collected
[21]
[22]
The edges of the kidney are marked
Manually entered data, hence no missing data were found
Created own dataset, which is composed of 218 women aged in the range of 18–60 years For more than 60 years, patients with less bone density were considered
[20]
Pre-processing
Dataset
Paper
Table 3 (continued) –
Limitations
(continued)
The database is then collected by If the detected kidney length has patients in a regular interval of error then the entire model will fail 9.3 years. The middle aged person’s to provide the desired result samples are collected for 11.3 years and 23 persons’ samples were taken every three years. The kidney length is then measured by using the statical methods such as MR and htTKV. Based on the value obtained, the model decides that the presence of CKD
Feature selection: Quest criteria and The model is not tested on the search strategy methods are used. validation data First method is used to analyse the results of the functional sub-sets and the second search method is used for searching the feature sets Classifiers: Naïve bayes, decision tree, knn, ANN and SVM Accuracy −85%
To determine the bone density in the CKD affected patients, the hip and spine values are considered Statical methods were employed to detect CKD. Authors have mainly concentrated on p value (calculation of hip and spine values) and if glomerular filtration rate is 60 mL/min
Methodology
234 A. M. Amaresh and Meenakshi Sundaram A.
Based on the urinary sample of 329 patients ages between 20 and 80 with the Glomerular filtration rate less than 60 ml/minute
Own dataset is prepared by placing – the pregnant women under a sonographic study. With the interval of every two months, the patients will be selected based on the ultrasound data
Predicting the risk of CKD in infants – and the young child. 75 patients were taken to collect the dataset out of it 16 were suffering from disease
[24]
[25]
[26]
Manually entered data which has no missing data
–
Prepared a dataset of 400 patients aged between 40 and 79 years by considering the effect of cardiovascular disease in the last three years
[23]
Pre-processing
Dataset
Paper
Table 3 (continued)
Statical methods are employed to find the injury or disfunction in the kidney at early ages. The estimation includes calculating the value of total renal parenchymal area
Sonographic placement study includes, 1. Placental length, width and thickness 2. Placental cord insertion site 3. Placental morphology 4. Number of vessels in cord 5. Uterine artery doppler Differentiation between the preeclampsia and normal condition affects the child growth inside womb
The urine samples were collected from electrodes at 37°C with 50 GHz cable, which includes a co-axial slim probe (open ended). Before taking the measurement, the urine is stirred, refrigerated and ensured that no air bubble is formed on sample and then the values are recorded. These values with different frequency range of 1 to 50 GHz are obtained and fed to SVM
The result shows that the patient with CKD has 60% chances of heart attack
Methodology
(continued)
Renal echogenicity (RE) and corticomedullary differentiation can be included for the test
The sample collection were limited A potential difficulty has been observed in differentiating the preeclampsia and underlaying CKD
A single spot of urine won’t be able to differentiate diabetic kidney disease and non-diabetic kidney disease
Lesser number of cardiac attack is recorded and hence the number of predication will vary in real time
Limitations
Artificial Intelligence-Based Chronic Kidney Disease … 235
Dataset
Ultrasound images of hundred patients are considered
110 subjects of raw images are obtained from Chang Gung Memorial Hospital. CT images were obtained
Prepared a dataset of 526 MRI images
A text dataset of 4 attributes
Paper
[27]
[28]
[29]
[25]
Table 3 (continued)
Vertical and horizontal flip. Scaling factor of 0.5
Pre-processing– 1. Slice selection 2. JPEG conversion 3. Image enhancement 4. Automatic cropping
Image Data type has been used
Pre-processing
Even if the classification is made correctly, the boundary boxing is not accurately defined
The different segmentation and classification mechanisms carried out by using other algorithms are essential
Limitations
Predicted the risk and affected percentage
Evaluation of disease based on multiple criteria for every predicted risk
Manual Contour segmentation is pervasiveness resulting from the performed to extract the area covered contrast medium, which was used by the kidneys for the purpose of improving CT acquisition
The model is designed with input layer of 224 × 224, ReLU, convolutional layer and pooling layers Learning rate was 0.004 and batch size = 24. Boundary box is placed on the kidneys
ANN is employed to predict the kidney disease by resulting in an accuracy of 99%
Methodology
236 A. M. Amaresh and Meenakshi Sundaram A.
Artificial Intelligence-Based Chronic Kidney Disease …
237
3 Conclusion and Discussion The performed research study offers the information accessible for predicting the coronary kidney disease. The performed evaluation focuses on the dataset collection, which includes the tabulated data with 24 characteristics (reduced attributes for faster implementation), CT, MRI and ultrasound images of the kidney. Different pre-processing approaches such as histogram equalization and contrast enhancement are used to refer to the missing columns in the text dataset. The contour-based and fuzzy c means are concluded as the efficient segmentation process for extracting the region of interest. Further, feature extraction is considered as the most significant phase, where researchers performed a statistical analysis (which involves calculating the spleen, p value or the length of the kidney), texture-based (pattern) and HOGbased selection procedure. Finally, both machine and deep learning-based classifiers are considered as efficient classifiers for performing disease prediction. The future scope is to extend the survey by considering additional parameters and Artificial Intelligence (AI) techniques for the detection of CKD in pregnant women, infants and children.
References 1. Astor BC, Matsushita K, Gansevoort RT et al (2011) Chronic kidney disease prognosis consortium. Lower estimated glomerular filtration rate and higher albuminuria are associated with mortality and end-stage renal disease: a collaborative meta-analysis of kidney disease population cohorts. Kidney Int 79(12):1331–1340. https://doi.org/10.1038/ki.2010.550 2. Bramlage P, Lanzinger S, van Mark G et al (2019) Patient and disease characteristics of type-2 diabetes patients with or without chronic kidney disease: an analysis of the German DPV and DIVE databases. Cardiovasc Diabetol 18(1):33 3. Jha V, Garcia-Garcia G, Iseki K et al (2013) Chronic kidney disease: global dimension and perspectives. Lancet 382(9888):260–272 4. Zoccali C et al (2017) The systemic nature of CKD. Nat Rev Nephrol 13(6):344–358 5. Romagnani P et al (2017) Chronic kidney disease. Nat Rev Dis Prim 3(1):1–24 6. https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease 7. Sabanayagam C, Xu D, Ting DSW et al (2020) A deep learning algorithm to detect chronic kidney disease from retinal photographs in community-based populations. Lancet Digital Health 2020; published online May 12. https://doi.org/10.1016/S2589-7500(20)30063-7 8. Liao Y-T et al (2021) Data augmentation based on generative adversarial networks to improve stage classification of chronic kidney disease. Appl Sci 12(1):352 9. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O (2016) In U-Net: learning dense volumetric segmentation from sparse annotation. Academic 10. Nieuwkoop P, Faber J (1994) Normal table of Xenopus laevis (Daudin). Garland, New York 11. Hippisley J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, Brindle P (2008) Predicting cardiovascular risk in England and wales: prospective derivation and validation of qrisk2. BMJ (Clinical Research Ed.) 336(7659):1475–1482. https://doi.org/10.1136/bmj. 39609.449676.25. PMID:18573856 12. Senan EM et al (2021) Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. J Healthcare Eng
238
A. M. Amaresh and Meenakshi Sundaram A.
13. Samet S, RiddaLaouar M, Bendib I (2021) Predicting and staging chronic kidney disease using optimized random forest algorithm. In: 2021 international conference on information systems and advanced technologies (ICISAT). IEEE 14. Wang Z et al (2018) Machine learning-based prediction system for chronic kidney disease using associative classification technique. Int J Eng Technol 7.4.36:1161–1167 15. Chen C-J et al (2020) Prediction of chronic kidney disease stages by renal ultrasound imaging. Enterp Inf Syst 14(2):178–195 16. Tazin N, AnzarusSabab S, Chowdhury MT (2016) Diagnosis of chronic kidney disease using effective classification and feature selection technique. In: 2016 international conference on medical engineering, health informatics and technology (MediTec). IEEE 17. Khan, Bilal, et al. "An empirical evaluation of machine learning techniques for chronic kidney disease prophecy." IEEE Access 8 (2020): 55012–55022. 18. Almansour NA et al (2019) Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study. Comput Biol Med 109:101–111 19. Ilyas H et al (2021) Chronic kidney disease diagnosis using decision tree algorithms. BMC Nephrol 22(1):1–11 20. Gómez-Islas VE et al (2020) Evaluation of bone densitometry by dual-energy x-ray absorptiometry as a fracture prediction tool in women with chronic kidney disease. Bone Rep 13:100298 21. Shanthakumari AS, Jayakarthik R (2021) Utilizing support vector machines for predictive analytics in chronic kidney diseases. Mater Today: Proc 22. Bhutani H et al (2015) A comparison of ultrasound and magnetic resonance imaging shows that kidney length predicts chronic kidney disease in autosomal dominant polycystic kidney disease. Kidney Int 88(1):146–151 23. Mora SC et al (2017) Cardiovascular risk prediction in chronic kidney disease patients. Nefrología (English Edition) 37(3):293–300 24. Mun PS et al (2016) Prediction of chronic kidney disease using urinary dielectric properties and support vector machine. J Microw Power Electromagn Energy 50(3):201–213 25. Moloney A et al (2020) The predictive value of sonographic placental markers for adverse pregnancy outcome in women with chronic kidney disease. Pregnancy Hypertens 20:27–35 26. Odeh R et al (2016) Predicting risk of chronic kidney disease in infants and young children with posterior urethral valves at time of diagnosis: objective analysis of initial ultrasound kidney characteristics and validation of parenchyma area as forecasters of renal reserve. J Urol 196:862–868 27. Nithya A, Appathurai A, Venkatadri N, Ramji D, Palagan CA (2020) Kidney disease detection and segmentation using artificialneural network and multi-kernel k-means clustering for ultrasound images. Measurement 149:106952 28. Sankhe A, Joshi AR (2014) Multidetector CT in renal tuberculosis. Curr Radiol Rep 2(11):1–11 29. Onthoni DD et al (2020) Deep learning assisted localization of polycystic kidney on contrastenhanced CT images. Diagnostics 10(12):1113 30. Wetzels JFM et al (2007) Age-and gender-specific reference values of estimated GFR in Caucasians: the Nijmegen Biomedical Study. Kidney Int 72(5):632–637 31. Vijayakumar T (2019) Neural network analysis for tumor investigation and cancer prediction. J Electron 1(02):89–98
Smart Home Security System Using Facial Recognition G. Puvaneswari, M. Ramya, R. Kalaivani, and S. Bavithra Ganesh
Abstract Security is the most pre-eminent and concerning issue nowadays. People want to protect their homes from unauthorized intrusion or trespassers. The conventional lock and key system provides less security. The keys can be easily duplicated, lost and locks could be broken. The automated smart security system proposed is based on facial recognition. The system uses a low-cost, high-performance camera module to record live stream, to capture frames on command and send it to cloud service. The smart home security system is placed in the doorway and if the face of the person matches with the already registered faces, the door unlocks. In case of a security threat or detection of entry of an unauthorized person, the security system alerts the authorized persons by gmail notification, by sending the image of the unauthorized person via mobile application and also by ringing the calling bell. Keywords Home security · Surveillance · Facial recognition · Intrusion detection
1 Introduction Internet of Things can be viewed as a connection of physical devices that are accessed through internet. A Thing is defined as a physical device identified with an unique identifier to transfer data over a network. Internet of Things (IoT) is defined as interconnection of sensors, software and other devices for exchanging data over internet. Data from the sensors in IoT devices are collected and sent to a data management and analytics system via IoT gateway. A smart home security system contains electronic components connected together and work together to protect a home. Home security G. Puvaneswari (B) · M. Ramya · R. Kalaivani · S. B. Ganesh Coimbatore Institute of Technology, Coimbatore 641014, India e-mail: [email protected] M. Ramya e-mail: [email protected] R. Kalaivani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_18
239
240
G. Puvaneswari et al.
system implemented in a conventional way uses lock and key. In case of lock and key system, keys can be duplicated and lost and also the locks could be broken. Closed Circuit Television systems provide a view on the areas surrounding a home. Video footages obtained are useful and helpful in monitoring an event after it has occurred rather than during the occurrence. This approach is passive because one has to sit infront of the system and keep monitoring at all the time to identify an intruder. This necessitates an active approach to take the action immediately whenever a security threat is detected. A hierarchical network framework-based face recognition is proposed in [1]. This framework utilizes architectures which pre-trained for face recognization and uses Facenet to validate. To evaluate the performance of the proposed architecture, it is tested on a security door lock system. In [2], convolutional neural network-based system is used to monitor the people not wearing mask and also it uses cloud-based survelliance. A review on face recognization techniques is presented in [3]. The techniques are principal component analysis, linear discriminant analysis (LDA), support vector machine (SVM) and adaboost algorithm. Principal Component Analysis is used for dimensionality reduction, and it is done by removal of redundant information in the face data. LDA uses labelled data and dimensionality reduction is actually supervised. LDA requires higher variance between different data groups and smaller variance within a group. SVM uses extracted face features to find the hyperplane that distinguishes different faces. Adaboost algorithms integrate different classifiers for the improvement of classification accuracy. Neural networks and deep learning approaches extract needed features automatically during training for classification. In [4], a generalized local median preserving projection approach is explained. The generalized local median preserving projection approach transforms the samples into lower dimensional space first and then solves the projection matrix. A feature extraction algorithm that uses neighbouring information within and between-classes to recognize face is proposed in [5]. The discriminating power of the algortihm is improved by combining fisher criteria with graph embedding. GSM and face recognition-based door lock system is presented in [6]. A PIR sensor is used to sense the presence of a person standing in front of a door. The person is permitted access only in case of matching face detected. De Light (DL) network and normalization (N-net) network-based face detection is explained in [7]. DL network is used to eliminate the effect of lighting and N network extracts identity features for the improvement of face recognition accuracy. A multitask convolutional neural network based face recognition approach is proposed in [8]. In this, automatic weighing scheme is used to estimate pose, illumination and expressions and convolutional neural network is used for face recognition. The role of IoT in building smart systems [9] and smart cities, technologies, IoT architecture and challanges faced are addressed in [10]-15. IoT-based systems require sensors, actuators, connectors, data processing units and user interface in developing devices for an application. The key challenges such as data security, energy requirement and management, data privacy, amount of data collected, transmitted and stored, data processing in building IoT-based systems explain the need for higher performance mobile networks.
Smart Home Security System Using Facial Recognition
241
The objective of the proposed work is to develop an automated smart security system that opens the door automatically for authorized persons using face as the key feature for unlocking. In case of a security threat or detection of entry of an unauthorized person, the security system alerts the authorized persons by gmail notification, by sending the image of the unauthorized person via mobile application and also by ringing the calling bell. Section 2 explains the system components, connection and working, Sect. 3 presents the results obtained and Sect. 4 concludes the presented work.
2 Smart Home Security System The main goal of this work is to develop an automatic door unlock system which allows users to unlock a door using face recognition, through a camera implanted on the door and alert the user through a message of intruder’s image. This section explains the proposed system block diagram, work flow and components.
2.1 Block Diagram of Smart Home Security System The proposed smart home security system consists of 3 major components namely Camera, Solenoid lock and Mobile Application. Figure 1 shows the block diagram of the proposed Smart Home Security System.
Ultrasonic Sensor
Door Unlock System
Camera Ardiuno Board Cloud
FIRMATA Protocol
Smart Device Chat bot &Mobile Application
Manual Control for Image Capture and Flash Fig. 1 Block diagram of smart home security system
Calling Bell
242
G. Puvaneswari et al.
Ultrasonic sensor is used for object detection and to calculate the distance between the door and the person standing in front of the door. It consists of a transmitter and a receiver. The transmitter sends signal which travels through air and on detection of an obstacle or object, the signal bounce back to the receiving sensor. The distance is calculated by finding the travel time and the speed of pulse. The camera module captures images and Histogram of Oriented Gradients (HOG) with SVM is used for face recognition. The facial features extracted will be compared with the stored data for decision-making. In case of matched face, the solenoid lock is enabled for unlocking door. If an unmatching face is found, the system alerts the persons inside the home through calling bell and at the same time sends the image to the user’s mobile device. The mobile application used here is Telegram. Section 2.3 gives the detailed description about the components used.
2.2 Flow Diagram Figure 2 shows the flow diagram of the proposed system. Data is initially acquired from the ultrasonic sensor for distance measurement between the door and the object or person standing in front of the door. If the distance measured by the sensor is less than 25 cm, the camera module is activated for acquiring image data. ESP 32 Camera module is used for developing the prototype of the proposed smart home security system. Image data obtained from the camera module is loaded into the cloud using Telegram web server library TDLib. Image obtained is processed using Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) for facial feature extraction and classification to identify the intended users. For this purpose, LD player, an android emulator is used. This emulator enables a host system (personal computer) to behave like guest system (mobile device). It considers the system harddisk to be its part of gallery or file storage to store image files. If the image contains the face of a registered person, the door will be unlocked. If the image does not match with all the registered faces, the calling bell will start to sound. A message is sent to the user via telegram in both cases and at the same time, the captured image is stored in the system memory for future reference and as well as for further action.
2.3 System Components The prototype developed uses the components such as ESP32-CAM module, ultrasonic sensor, mobile device, personal computer, door unlock system, cloud server, Ardunio board and calling bell. Figures 3 and 4 show the pictorial representation of ESP32 camera module. It contains camera and GPIOs (shown in Fig. 4) to connect peripherals. It also contains a microSD card slot to store images or to serve images to clients. The ESP 32 CAM can be powered using 5 V or 3.3 V power source.
Smart Home Security System Using Facial Recognition
243
Start Get data from ultrasonic sensor No
Is distance Thonny Python IDE. • Enter the program in the top pane and click File > Save As; Save and click Run > Run Current Script to run the program. • To run Thonny, you must first save the code to a file. • Press F5 or the Run icon as shown. • Go to the menu bar → press [Run] → [Run Current Script]. • Press CTRL + T or press Run > Run Current Script in Terminal.
4 Hardware Description • • • • • • • •
RASPBERRY Pi Pico Heart Rate Sensor Temperature Sensor (DS18B20) Blood Pressure Sensor ESP8266 (Wi-Fi Module) Buzzer LCD Labview Software
4.1 Software Description • Python Idle
268
C. Visvesvaran et al.
Fig. 3 Raspberry Pi Pico
4.2 Raspberry Pi PiCO The Raspberry Pi Pico is an inexpensive microcontroller device. Microcontrollers are small computers, but they usually lack a lot of memory and peripherals (keyboards, monitors, etc.) to connect to. The Raspberry Pi Pico has GPIO pins just like a Raspberry Pi computer. That is, it can be used to control and receive input from various electronic devices as in Fig. 3.
4.3 Heart Rate Sensor The heart rate sensor produces a digital output of heart stroke as shown in Fig. 4. A human finger is placed between the infrared diode and the phototransistor in the KY039 heartbeat sensor to detect a pulse. The signal output pin represents the pulse. This sensor detects the presence of light, in this case the amount of light passing through a finger, using a phototransistor. The amount of light changes as the blood moves, and this variation can be felt as a pulse.
4.4 Temperature Sensor (DS18B20) The Integrated Max DS18B20 is a single-wire programmable temperature sensor as in Fig. 5. It is widely used for temperature measurement in harsh environments
Health Monitoring System for Comatose Patient Using Raspberry-Pi
269
Fig. 4 Heart Rate Sensor (LCUP)
Fig. 5 Temperature Sensor (DS18B20)
such as chemicals, mines, and soil. The structure of the sensor is rugged and can be ordered with the waterproof version for easy installation. It can accurately measure temperatures ranging from 55 to 125 degree Celsius.
4.5 Blood Pressure Sensor Blood pressure sensors are used to measure blood pressure non-invasively as in Fig. 6. Similar to a sphygmomanometer, it uses a pressure sensor instead of a column of mercury to record blood pressure.
270
C. Visvesvaran et al.
Fig. 6 Blood pressure sensor
4.6 ESP8266 WI-FI Module The ESP8266 is an affordable and easy-to-use device for connecting your work to the Internet as in Fig. 7. This module acts as an access point and a station (by connecting to Wi-Fi), facilitating data acquisition and deployment to the Internet, making IoT as easy as possible. Another attractive feature of this module is that it is much easier to use as it can be programmed using the Arduino IDE. The FIDI board, which supports 3.3 V programming, is the best approach for programming ESP01. If you don’t have one, you can buy it now or use an Arduino
Fig. 7 ESP8266 WI-FI module
Health Monitoring System for Comatose Patient Using Raspberry-Pi
271
Fig. 8 Buzzer
board. This is a typical problem that everyone has with the ESP01. Modules consume a lot of power when programming can be powered by 3.3 V Arduino pins or voltage dividers. Therefore, a small 3.3 V voltage regulator capable of supplying at least 500 mA is essential.
4.7 Buzzer A buzzer is a small but effective sound component for our project or system as in Fig. 8. There are two types of buzzers that are widely used. Here you can see a simple buzzer that emits a continuous sound when switched on. The other type is a commercially available buzzer that is bulky and beeps. The tone is generated by the built-in resonant circuit. This buzzer can be used simply by connecting to a 4–9 V DC power supply. You can also use a simple 9 V battery. However, we recommend a +5 V or +6 V stabilized DC power supply. The buzzer is usually connected to a switch, and this is used to turn off the buzzer at set times and intervals.
4.8 LCD (Liquid Crystal Display) The name LCD comes from the word “liquid crystal display”. It is a mixture of solid phase and liquid phase. The LCD is used to create visible images on the LCD screen as in Fig. 9. Unlike cathode ray tube (CRT) technology, LCD technology can make displays significantly thinner. LCD TV pixels electronically switch on and off by passing polarized light through the LCD. Examples of LCD applications include LCD TVs, computer monitors, instrument panels, aircraft cockpit displays, and internal and external signs.
272
C. Visvesvaran et al.
Fig. 9 LCD (Liquid Crystal Display)
Liquid crystal displays (LCDs) are used in embedded system applications to display different system applications and show different system functions and states. A 16 * 2 LCD is a 16-pin device that contains two lines of 16 characters each.
4.8.1
Python Idle
It is included in many Linux distributions as an optional part of the Python package. It is written entirely in Python and the Tkinter GUI toolkit (a wrapper for the TCL/Tk functions).
5 Results and Output The proposed work has carried out a detailed study of the health surveillance system and proposed a novel idea for a health surveillance aid for comatose patients. The above information showed the configuration of convention substances from nonuseful administrative details to basic executable projects. The proposed work examines how derivations can be made in a formal computation using an interactive verification system. The hardware implementation of the project functional logic is to be performed using Thonny Python as in Fig. 11. The paper shows that the extension of the circulated framework from formal requirements to executable projects is conceivable in the beginning. Hardware implementation is shown in Fig. 10. The resulted output is shown in Figs. 12 and 13.
Health Monitoring System for Comatose Patient Using Raspberry-Pi
Fig. 10 Output for health monitoring system for comatose patients using Raspberry Pi
Fig. 11 Output by using Thonny Python
273
274
Fig. 12 Output by using Raspberry Pi Pico
Fig. 13 Output by using Mobile App
C. Visvesvaran et al.
Health Monitoring System for Comatose Patient Using Raspberry-Pi
275
6 Conclusion The purpose of the system we have developed is to send important patient information to doctors for quick and easy access. The designed model provides excellent and effective medical services for comatose patients, so the collected information is linked via communication with networks around the world, enabling rapid response. During this planned system, body temperature is monitored.
7 Future Scope The Future system includes three approaches used for warning display. On the one hand, it processes the alarm and displays a message on the LCD. On the other hand, it triggers a SMS. On the other hand, it triggers an SMS to the remotely located person through CDMA. The prompts are turned off manually, not mechanically, so there’s no disable switch to disable the prompt for this: • To implement further, in an emergency the system automatically sends an alert or decision to notify the nearest hospital or car when the abnormal information is known to be an observation. • Further development of the designed model requires more parameter to monitor patient health. • Another extension of the current system is a device equipped with an Internet camera. At this point, any patient anywhere in the world can be monitored at any time.
References 1. JosphineLeela R, Hamsageetha PK, Monisha P, Yuvarani S (2018) Body movement and heart beat monitoring for coma patient using IoT. Int J Innov Res Sci, Eng Technol 7(2) 2. Geethanjali R, MajidhaFathima KM, Harini S, Sabitha M (2019) Health monitoring for coma patients. Int J Emerg Res Dev 2(3) 3. Sandeep S, Esther Rani P, Sumalatha G (2018) Monitoring of health parameters by using Raspberry Pi. Int JS Adv Res Comput Sci Softw Eng 8(4) 4. Koganti SC, Suma HN, Abhishek AM (2015) Analysis and monitoring of coma patients using wearable motion sensor system. Int J Sci Res (IJSR) 4(9) 5. http://www.Twilio.com 6. http://thingspeak.com 7. Goswami A, Bezboruah T, Sarma KC (2009) Design of an embedded system for monitoring and controlling temperature and light. Proc IJEER 1(1):27–36 8. Yuksekkaya B et al (2006) Voice and internet controlled wireless interactive home automation system, GSM. In: House Appliance Band 3 IEEE Transaction Nein 52 Nein 8.3.2006 9. Zuria CS et al (1995) MATLAB based image processing laboratory experiments. Comput Graph Minutes, Annu Lect Ser S 297305
276
C. Visvesvaran et al.
10. Funtanilla LA (1996) GIS pattern recognition and rejection analysis. In: Using MATLAB, Proc. der 2nd IEEE/EURASIP conference on pattern recognition. Southampton, UK 11. Smys S (2019) Survey on accuracy of predictive big data analytics in healthcare. J Inf Technol 1(02):77–86 12. Sharma RR (2021) Design of distribution transformer health management system using IoT sensors. J Soft Comput Parad 3(3):192–204 13. Greetal Y (2003) 20 GHz low power static frequency divider with programmable input sensitivity. In: Proceedings RFIC/IMS symposium, pp 235–238 14. Muhammad Ali, Mazidi, und Janice, Gillispe, Mazidi (2006) 8051 Microcontrollers and embedded systems. Pearson Education Ltd., India, SA, pp 35–105 15. Nassir H (1991) Salman, New image processing toolbox with MATLAB code. Springer, Berlin, Heidelberg, pp 126–130
Digital Skeletonization for Bio-Medical Images Srinivasa Rao Perumalla, B. Alekhya, and M. C. Raju
Abstract Skeletonization and thinning techniques are the two techniques to be applied for hospitalized patients’ brain images to find object identification, matching, and tracking, and also for many other purposes. When extracting information from an image (the method is called the binarization process), required information is lost hence the thinning method is applied on grayscale brain image intensity values directly in order to recover the information that is lost. By working on grayscale image intensity values, the proposed method was encouraged to do this proposed work. This will evade distortions in the object’s topology and geometry. In the proposed method, the Skeletonization technique is applied directly to the grayscale image intensity values to extract object information, and applied different algorithms results are analyzed which are shown in the result section. Keywords Skeletonization · Medical images · Grayscale images
1 Introduction Digital image analysis and digital image processing are very important nowadays for analyzing the medical image for detecting any defect in the human body. Skeletonization was extracted from thinning processes which will give the skeleton the exact shape without any disturbance to the image. Preprocessing technique is also used to extract the skeleton of the image and has 3 or more uses in script finding [1], optical letter finding [2], and writer identification [3]. In order to form a Skeleton, thinning is S. R. Perumalla (B) Department of ECE, CVR College of Engineering, Hyderabad, India e-mail: [email protected] B. Alekhya · M. C. Raju Department of ECE, VNRVJIET, Hyderabad, India e-mail: [email protected] M. C. Raju e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_21
277
278
S. R. Perumalla et al.
Fig. 1 Process of thinning
used and classified as iterative and noniterative [4]. In the iterative process, unwanted pixels are removed parallelly called the peeling contour process. In the noniterative process, the skeleton is derived straightaway from the required pixels and this process is very complicated and slow. Skeleton is extracted from the thinning process which is very complicated and difficult to derive the shape exactly and sometimes it also fails [5–9]. Figure 1 shows the thinning process of the given general object of the image, then thinning process is applied to the given general object of the image and then the image size becomes reduced which will be identified by the machine. After this step, an analysis of the object can be performed. Actually, this method is used in many applications as a preprocessing technique for image processing [10–12]. When thinning is applied to the binary image, it will give another image and thinning is applied to monochrome images or color images. Really, thinning is more characteristic for wanted applications, which are as follows: (a) by doing thinning, the amount of data required to store reduces to a process. (b) it also reduces time. (c) skeleton shape can be obtained easily if the thinning image is readily available. Skeleton algorithms are classified under various categories: In Fig. 2, the thinning process can be divided into iterative and noniterative. Iterative can be divided again into sequential and parallel, and noniterative classified into medial axis transforms, line-following, and other pixel-based methods. But in the non-feared pixel method, each pixel is replaced with a value that is the distance between the current pixel and not the nearby featured pixel. DT technique has a disadvantage of connectivity of the Skeleton and complete shape of the Skeleton. Hence, this method is very complex because of many calculations to derive the Skeleton and its time-consuming process. In Fig. 3, one character S is taken and if we apply thinning algorithm, then it becomes thin as shown. Iterative is further divided into parallel and sequential. Noniterative is also further divided into 3 methods as shown in Fig. 2. If it is required to implement this method, because of a lot of calculations, implementation of hardware is a typical process hence this method is not suggestible for my proposed work. In my proposed work, thinning algorithm has been proposed which is used in terms of pixels. Required pixels are preserved and unwanted pixels
Digital Skeletonization for Bio-Medical Images
279
Fig. 2 Classification of thinning
Fig. 3 Thinning of a character
are removed; in this way, the Skeleton of the image is identified with complete shape and with good connectivity.
2 Literature Survey Rosenfeld [13] proposed a skeletonization algorithm that will work directly on grayscale images but will preserve object connectivity. Peryouchine and Leedham, [14], proposed an algorithm that produces a skeleton as cubic B-splines. Dokladal [15] proposed an algorithm that extracts center lines from ridges in the original image. S. Svensson et al. [16] and Couprie [17] are written in their work about twostep processing to get a perfect skeleton from any 2-D grayscale image. All these proposed algorithms work directly on grayscale images, but are sequential in nature [18]. M. Couprie proposed in his work extraction of Skeleton from a 3-D grayscale image parallel in nature. Extraction of the Skeleton image is completely recovered with the help of different algorithms which are available in the literature survey. A few of them are shape recognition systems [19–25], traffic monitoring systems [26], computer vision applications [27–29], pattern recognition [30–32], and hand character recognition [33, 34]. The thinning method is used on binary images in the form of digital lines or curved lines, which roughly lie on the center line or medial axis [35–39]. This algorithm works extremely on images and removes the border points which are more than one neighbor, but it will not remove the endpoints of the curves [40–43].
280
S. R. Perumalla et al.
In the proposed work, work has been implemented in view of the older methods. In this paper, iterative and noniterative methods are used and pixel processing has been done parallelly and sequentially and data has been stored in an inefficient manner. A revised algorithm has been applied to stored data, and Skeleton is extracted from the different images and gave better results than previous methods in terms of Skeleton thickness.
3 Proposed Algorithm A new algorithm was proposed which is used to extract the Skeleton from the 2-D grayscale image which consists of a combination of straight lines and curved lines. An algorithm proposed in [44] is illustrated in the below sections. The algorithm proposed is extended for 3-D images [45]. These algorithms are also implemented on FPGAs [46, 47]. This proposed algorithm is applied to Magnetic Resonance Imaging (MRI) and Computerized Tomography (CT) images and simulation results are discussed in Section IV. Thinning consists of two stages: one is an iterative and noniterative process. Again, in the iterative process pixels are processed in a sequential manner and parallel manner to extract the Skeleton from the image. A variety of approaches for skeletonization are listed as iterative and noniterative processes. Thinning algorithm is defined in the following way: Step 1. Delete all the pixels according to the sub-steps: (a) (b) (c) (d)
Delete all the pixels which are matching template a are removed. Delete all the pixels which are matching template b are removed. Delete all the pixels which are matching template c are removed. Delete all the pixels which are matching templated are removed.
Different types of thinning algorithms are there out of which Iterative thinning algorithm is used. In this algorithm, individual pixels are taken care and boundary pixels are removed so that Skeleton images are formed. This algorithm is further classified into two types: one is sequential thinning algorithms and the other one is parallel thinning algorithms. In the first algorithm, pixels are selected in a predetermined order for deletion, and this process is very complex to analyze and get the result. In order to get this, pixels are scanned line by line and pixel by pixel. In the second algorithm, Skeleton images are taken care of and precautions can be taken to show the Skeleton without any noise and also connectivity of the Skeleton is improved. The block diagram of the overall process is shown in Fig. 4.
Digital Skeletonization for Bio-Medical Images
281
Fig. 4 Block diagram of the overall process
Table 1 Proposed algorithm 1 Image can be scanned as an input which is called the grayscale image of size MXN 2 With the help of a 3×3 window, the input image can be scanned 3 With scanned images, feature extraction can be done. Corner pixels and borders are identified 4 If any missing corner pixels are there, then those corner pixels are retained 5 Border pixels are identified and processed 6 Observe the output image for a satisfactory result if not, repeat steps 2–5 until final result 7 Display the skeleton 8 A few of the main points are highlighted from Skeleton’s object
3.1 Algorithm If the input image is given to the proposed algorithm, it will process and produce the skeleton image with the following steps of execution as in Table 1.
3.2 Identification of Skeletal Pixels—Corner Pixels Table 2 shows the 3xx window with 9 pixels from C1 to C9. C5 is the center pixel. All these 9 pixels are grayscale intensity values. For example, if any pixel intensity value lies between 1 and 255, that pixel is considered for forming an object or processing. And if any pixel intensity value is less than 1 or zero, those pixels are neglected.
282
S. R. Perumalla et al.
Table 2 3 × 3 neighborhood window
Table 3 Skeleton 4 × 3 for iteration 2
1
2
3
4
5
6
7
8
9
R9
R2
R3
R8
R1
R4
R7
P6
R5
R12
R11
R10
When the Skeleton is scanned horizontally resulting in a 3 × 4 window of pixels, any two points detected horizontally are connected to the Skeleton for example in the above table p1 and p4 are detected as horizontal and neighbor pixels. 4*3 pixel window is scanned perpendicular to its axis in the second iteration, and it is shown in Table 3. If this condition is executed, then the corner pixel value keeps it. If R1 = 0 & R9 ≥ T or R1 ≥ T & R9 = 0 and R3 = 0 & R7 ≥ T or R3 ≥ T & R7 = 0 and R2 = 0 & R8 ≥ T or R2 ≥ T & R8 = 0 and R4 = 0 & R6 ≥ T or R4 ≥ T & R6 = 0 The first pixel is observed for whether this pixel is southeast or northeast corner. The second condition is to check whether the pixel belongs to the northeast corner or the southwest corner. If the first iteration, the P1 of contour point, is deleted, the below condition is satisfied: (a) 2 ≤ B(R1 ) ≤ 6 (b) A(R1 ) = 1 (c) R2 x R4 x R6 = 0 (d) R4 x R6 x R8 = 0 In the next iteration, only the above c and d conditions exist and remaining do not. (c) R2 x R4 x R6 = 0 (d) R4 x R6 x R8 = 0 In the above conditions, A(R1 ) is the 01 pattern where A(p1) is the number of 01 patterns in the orders of R2 , R3 , R4 ,…R8 , R9 that are the eight neighbors of R1. B(R1 )
Digital Skeletonization for Bio-Medical Images
283
is the non-zero neighbor of R1 , that is, B(R1 ) = R2 + R3 + ……………………….. + R9 . R2 , R3 , R4 ,…R8 , R9 are the eight neighbors of R1. Mean square error: The difference between the pixels of the Skeleton image and the original image is called the Mean Square Error. If all the pixel values of the original image and Skeleton images are the same, the mean square error should be zero. In the proposed method, the Mean square value should be maintained as a very less value so that it gives output, which is the very quality of the image given by MSE =
m−1 n−1 1 [I (x, y) − K (x.y)]2 mn x=0 y=0
Peak signal-to-noise ratio (PSNR): It is defined as the measure of the quality of the original image and the reconstructed image of the skeletonized image quality. Peak signal-to-noise ratio and mean square error are inversely proportional to each other. If PSNR is large, it gives better quality image, and if MSE is always a less value for better quality: P S N R = 20 ∗ log10 (M AX I ) − 10 ∗ log10 (M S E) Execution time: It is defined as the time taken by the processor to complete the full code.
3.3 Identification of Skeletal Pixels—Border Pixels In Fig. 5a, b, c, the rectangular matrix is used to reduce the topological structures, so it also gives the connected skeletons. Many algorithms in the literature survey say that it cannot give thinned shapes because the array of pixels cannot be more thoroughly eroded.
Fig. 5 a, b, c A suitable rectangular, hexagonal, and triangular matrix for pixel analysis.
284
S. R. Perumalla et al.
Fig. 6 A 5-neighborhood window
Different types of hierarchical algorithms are there but all the algorithms are very difficult to predict the skeleton of the object when it is in motion. And it also gives the noise and variation of the boundary of the object. From Fig. 6 3 × 3 window, find which is the highest value and lowest value, and after that find the difference between these two pixels and the difference denoted as D. Finally, if that difference is greater than the threshold (D > T), then it is called the border pixel.
3.4 Boundary Erosion In order to form boundary erosion, the following points are highlighted. Initially, the image was decomposed into straight lines M, combined up of 2 consecutive pixels to represent the line segment. The slope of each line is treated as M(l). This case is assigned in four directions. Horizontal, vertical, diagonal, and line segments slanting to the right and the left. Side by side points (M1 , N1) and (M2 , N2 ) are straight if • • • •
Horizontally (0°): M2 = (M1 + 1) and N2 = N1 Vertically (90°): M2 = M1 and N2 = (N1 + 1) Corner slanting to the right (45°): M2 = (M1 + 1) and N2 = (N1 + 1) Corner slanting to the left (135°): M2 = (M1 −1) and N2 = (N1 + 1)
The discontinuous and continuous sequence by 0 is shown in Fig. 7. Skeleton images are extracted from the previous sections, 2-pixel difference width is required in some areas and in some other areas, and width is not required. Like this one, more stage is required to avoid this problem. 3 × 3 window is created [32] and values are assigned from the upper center matrix value will be assigned to starting from the upper center matrix value will be assigned clockwise from 1 to 2n. If we add all these values becomes 255. All these values are calculated and shown in Table 3. Neighbor pixels with weights 145 and 1451 weight are shown in Fig. 8
Digital Skeletonization for Bio-Medical Images
285
Fig. 7 a, b The discontinuous and continuous sequence by 0
Fig. 8 a, b, c The 8-neighbor pixels with weights 145 and 1451 weight as results
4 Results and Discussion In the result section, various types of skeletons are generated which will provide the complete and required set of skeletons like transformations adapted for each case. We are very thankful for the proposed algorithm; the use of complex skeletonization will be available to common shapes. The proposed algorithm produces the exact results and shows the skeleton with connected lines. It is also detected that computation time is highly critical and with the flexibility of the advantage of the algorithm, the proposed algorithm can be easily adapted to any digital grid and extended to arbitrary dimensions. Depending on the points generated in the simplified process, with good accuracy, skeletons are represented easily. By seeing all the gray values of tumor images, we can find the level of the tumor which is useful for the doctors to give treatment to the patient according to the situation. Figure 9e, f shows CT scan images of the liver and corresponding extracted skeleton, respectively. The above three algorithms are proposed; all give good results of data reduction to draw the skeletons. The proposed gives the skeleton of the output image with good quality. Figure 10a–c shows Magnetic Resonance Imaging of the brain without and with tumor and corresponding extracted skeletons. And Fig. 10e–f show Magnetic Resonance Imaging of the liver and corresponding extracted skeleton, respectively. Different types of animals and birds and corresponding skeletons are shown in Fig. 11. Figure 12a–f represents the different shapes of images, and results of the existing algorithm, skeletonization of the proposed, rate of reduction compared to
286
S. R. Perumalla et al.
(a) Computed tomography scan image of the brain
(b) Extracted skeleton
(c) Computed tomography scan image of brain with tumor
(d) Extracted skeleton
(e) Computed tomography of liver scan image.
(f) corresponding skeleton image of extraction.
Fig. 9 Computed tomography of liver and its extracted image
the existing algorithm, and the Hilditch algorithm are compared and shown in the table. From the final result, the reduction rate can be calculated as the number of lines required to represent the original image skeleton image.
5 Future Scope In this paper, different skeletonization algorithms are compared, and the proposed scheme is executed and compared to the existing algorithms in terms of mean square error, peak signal-to-noise ratio, execution time, and connectivity. The proposed method gives a good result compared to the existing algorithms, and this work in the future extended to neural networks and deep learning and artificial intelligence and derive the new algorithm to extract the Skeleton images in terms of different parameters and also noise density can be checked, and results are compared with
Digital Skeletonization for Bio-Medical Images
287
(a)Magnetic resonance imaging image of brain
(b) Extracted skeleton
(c) Magnetic resonance imaging image of brain with tumor
(d) Extracted skeleton
(e) magnetic resonance of liver
(f) Extracted skeleton of figure e.
Fig. 10 Magnetic resonance imaging images of various organs of a human being with and without tumor and their correponding extracted skeletons
Fig. 11 Different types of animals and birds and their corresponding skeletons
existing data sets on the internet. results of the existing algorithm, skeletonization of proposed, rate of reduction compared to the existing algorithm, and Hilditch algorithm are compared and shown in the table. From the final result, the reduction rate can be calculated as the number of lines required to represent the original image, and Skeleton images are called the reduction rate.
288
S. R. Perumalla et al.
a)
b)
c)
D)
Temperature distribution with the noise of brains with realistic tumors of different volumes. (a) Tumor with 11.6 cm3 of volume. (b) Tumor with 27.4 cm3 of volume. (c) Tumor with 51.1 cm3 of volume. (d) Tumor with 81.7 cm3 of volume.
Fig. 12 Different shapes of images
Digital Skeletonization for Bio-Medical Images
289
References 1. B Bataineh SNHS Abdullah K Omar 2011 An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows Pattern Recogn Lett 32 1805 1813 2. R Gopakumar NV Subbareddy K Makkithaya UD Acharya 2010 Script identification from multilingual indian documents using structural features J Comput 2 106 111 3. Abu-Ain TAH, Abu-Ain WAH et al (2011) Off-line arabic character-based writer identification – a survey. In: International Journal onb advanced science, engineering and information technology, proceeding of the international conference on advanced science, engineering and information technology. Bangi, Malaysia 4. MA Ali 2012 An efficient thinning algorithm for arabic ocr systems Signal Image Process Int J (SIPIJ) 3 31 38 5. G Nemeth K Palagyi 2011 Topology preserving parallel thinning algorithm Int J Imaging Syst Technol 21 37 44 6. Guo Z, Hall RW (1992) Fast fully parallel thinning algorithms 7. M Ahmed R Ward 2002 A rotation invariant rulebased thinning algorithm for character recognition IEEE Trans Pattern Anal Mach Intell 24 1672 1678 8. YY Zhang PSP Wang 1996 A parallel thinning algorithm with two-subiteration that generates one-pixel swide skeletons Int Conf Pattern Recognit 4 457 461 9. Vijayakumar T, Vinothkanna R (2020) Retrieval of complex images using visual saliency guided cognitive classification. J Innov Image Process (JIIP) 2(02):102–109 10. Gonzalez RC, Woods RE (2008) Digital image processing, 2nd edn. Pearson Education India. ISBN 9780131687288 11. Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall New Jersy, pp 279–325. ISBN 0–13–030796–3 12. Niblack CW, Gibbons PB, Capson DW (1992) Generating skeletons and centerlines from the distance transform. CVGIP: Graph Model Image Process 54(5):420–437 13. CR Dyer A Rosenfeld 1979 Thinning algorithms for grayscale pictures IEEE Trans Pattern Anal Mach Intell 1 1 88 90 14. Pervouchine V, Leedham G (2005) Document examiner feature extraction: Thinned versus skeletonized handwriting images. In: Proceedings of the IEEE region 10 technical conference (TENCON05), pp 1–6 15. Dokladal P, Lohou C, Perroton L, Bertrand G (1999) A new thinning algorithm and its applications to extraction of blood vessels. In: Proceedings of biomedsim, pp 32–37 16. Svensson S, Nystrom I, Arcelli C, Sanniti di Baja G (2002) Using grey level and distance information for medial surface representation of volume images. In: Proceedings of 16th international conference on pattern recognition, pp 324–327 17. M Couprie F Bezerra G Bertrand 2001 Toplological operators for gray scale image processing J Electron Imaging 10 2 1003 1015 18. Couprie M, Bezerra FN, Bertrand G (2013) A parallel thinning algorithm for grayscale images. In: International conference on discrete geometry for computer imagery, pp 71–82 19. Macrini D, Dickinson S, Fleet D, Siddiqi K (2011) Object categorization using bone graphs. Comput Vis Image Underst 115:1187–1206 20. Zaboli H, Rahmati M (2007) An improved shock graph approach for shape recognition and retrieval. In: First Asia international conference on modelling simulation, pp 438–443 21. Bai X, Latecki LJ (2008) Path similarity skeleton graph matching. IEEE Trans Pattern Anal Mach Intell 30(7):1282–1292 22. Goh WB (2008) Strategies for shape matching using skeletons. Comput Vis Image Underst 110:326–345 23. Ngo TG, Nguyen TT, Ngo QT, Nguyen DD, Chu SC (2016) Similarity shape based on skeleton graph matching. J Inf Hiding Multimed Signal Process 7(6):1254–1265 24. X Bai X Yang DY And LJ Latecki 2008 Skeleton based shape classification using path similarity Int J Pattern Recognit Artif Intell 22 04 733 746
290
S. R. Perumalla et al.
25. Kang K-W, Suh JW, Kim JH (1998) Skeletonization of grayscale character images using pixel superiority index. In: IAPR workshop on document analysis systems 26. Chen C, Liu S (2012) Detection and segmentation of occluded vehicles based on skeleton features. In: 2012 second international conference on instrumentation, measurement, computer, communication and control, pp1055–1059 27. Yogameena B, Mansoor Roomi SM, Jyothi Priya R, Raju S, Abhai kumar V (2012) People/vehicle classification by recurrent motion of skeleton features. 6(5):442–450 28. Moschini D, Fusiello A (2009) Tracking human motion with multiple cameras using an articulated model computer vision/computer graphics collaboration techniques (LNCS), pp 1–12 29. Menier C, Boyer E, Raffin B (2006) 3D skeleton-based body pose recovery. In: Proceedings of the third international symposium on 3D data processing, visualization, and transmission, pp 389–396 30. Bakken RH, Eliassen LM (2012) Real time 3D skeletonisation in computer vision based human pose estimation using GPGPU. In: 3rd International conference on image processing theory, tools and applications (IPTA) 31. Wu D, Shao L (2014) Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In: IEEE international conference on computer vision 32. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition, pp 588–595 33. Plamondon R, Suen CY, Bourdeau M, Barriere C (1994) Methodologies for evaluating thinning algorithms for character recognition. Pattern Recognit 283–306 34. Leedham G, Pervouchine V, Tan WK (2004) Quantitative letter-level extraction and analysis of features used by document examiners. J Forensic Doc Exam 196–207 35. Akhter I, Black MJ, Pose conditioned joint angle limits for 3D human pose reconstruction. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1446–1455 36. Lin H, Wan Y, Jain A (1998) Fingerprint image enhancement: algorithm and performance evaluation. IEEE Trans Pattern Anal Mach Intell 777–789 37. Bartnk A, Strm J et al (2006) Adaptive fingerprint binarization by frequency domain analysis. IEEE conference on signals, systems and computers, pp 598–602 38. Tico M, Vehvilainen M, Saarinen J (2005) A method of fingerprint image enhancement based on second directional derivatives. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 985–988 39. Greenberg S, Aladjem M, Kogan D (2002) Fingerprint image enhancement using filtering techniques. 8(3):227–236 40. ML Baker SS Abeysinghe S Schuh RA Coleman A Abrams MP Marsh CF Hryc T Ruths W Chiu T Ju 2011 Modeling protein structure at near atomic resolutions with Gorgon J Struct Biol 174 2 360 373 41. Palagyi K, Sorantin E, Balogh E, Kuba A, Halmail C, Erdohelyi B, Hausegger K (2001) A sequential 3D thinning algorithm and its medical applications. In: Proceedings of 17th international conference of information processing in medical imaging, pp 409–415 42. Chen YS, Hsu WH (1989) A 1 sub cycle parallel thinning algorithm for producing perfect 8 curves and obtaining isotropic skeleton of an L shape pattern. In: Proceedings CVPR ‘89: IEEE computer society conference on computer vision and pattern recognition, pp 208–215 43. Z Guo RW Hall 1989 Parallel thinning with two sub iteration algorithms Commun ACM 32 3 359 373 44. Perumalla SR, Kamatham Y (2018) fpga implementation for skeletonization of 2-D images. In: 2018 3rd IEEE international conference on recent trends in electronics, information and communication technology (RTEICT 2018). Bengaluru, pp1698–1702 45. Perumalla SR, Kamatham Y (2018) A novel algorithm for analysis of a local shape in the 3-D gray image. In: 2019 IEEE global conference for advancement in technology. Bengaluru 46. Perumalla SR, Kamatham Y (2019) Hardware implementation of digital image skeletonization algorithm using FPGA for computer vision applications. J Vis Commun Image Represent 59:140–149
Digital Skeletonization for Bio-Medical Images
291
47. Perumalla SR, Kamatham Y, Racha G (2021) FPGA implementation of digital 3-D image skeletonization algorithm for shape matching applications. Int J Electron 108(8):1362–3060 48. Sathesh A, Adam EEB (2021) Hybrid parallel image processing algorithm for binary images with image thinning technique. J Artif Intell 3(03):243–258
IoT-Based Condition Monitoring of Busbar Lloied Abraham Lincoln, Chandrashekhar Badachi, and Pradipkumar Dixit
Abstract Asset management in any industry is very important to manage the asset efficiently which will improve the life of the asset, maintenance of equipment, and monitoring of the equipment’s health. The asset management mainly involves condition monitoring of the asset. Asset management is very critical in electrical utility, where a failure of an electrical equipment can affect the entire power system causing loss of load and generation. The assets for an electrical utility are mainly Transformer, circuit breakers, surge arresters, Insulators, Current transformers, busbars, etc. Busbars are the critical part of the system which interconnects the other components in the field. The temperature rise in the busbar is a critical factor to decide the life of the busbar. The temperature rise may be due to electrical faults, hot spots due to sharp edges, and also the ambient conditions contribute to it. The Internet of things technology is widely used in monitoring the health of power equipment. The proposed work in this paper deals with the development of an IoT-based condition monitoring of busbar. In this work, the current sensor is used to measure the current through the busbar and a temperature sensor to measure the temperature around it. The sensed data is deployed to a developed website and mobile app to monitor the status of the busbar continuously. Keywords Busbar · Internet of Things · Current sensor · Temperature sensor · Current
1 Introduction Electric utilities which take care of generation, transmission, and distribution of power to the consumers always work toward enhancing the reliability with reference to the challenges posed such as aging of assets, fewer operators, increased cycling, natural disasters, and changing climate. The failure of any critical asset can result L. A. Lincoln · C. Badachi (B) · P. Dixit Department of Electrical and Electronics Engineering, M S Ramaiah Institute of Technology, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_22
293
294
L. A. Lincoln et al.
in the forced outage of the entire power system leading to loss of generation and load, lawsuits due to the injuries or fatalities, and replacing/repair of the damaged equipment. All these can lead to the loss of millions of dollars of related costs. To address these concerns, asset maintenance requires a phase shift from conventional time-bound maintenance to a smart strategy for the continuous condition-based monitoring [1]. The modern sensing technology offers an effective and continuous monitoring of the health of critical power equipment, which can improve the life of the equipment, reliability of the power system, lower maintenance cost, minimize the unplanned downtime, smart maintenance, etc. [2]. Busbar is one of the key components in the power system network. Busbars are used in power distribution systems in power plants, substations, factories, and data centers as they have lower electrical resistance than insulated power cables, they are used to carry large electrical currents since they are uninsulated strips of highly conductive metals such as copper or aluminum. Busbars are also used in power distribution boards. As they are made of highly malleable metals that can be easily shaped to suit any number of facility layouts to offer great physical flexibility. The maximum amount of current that can be safely conducted is determined by the busbar’s material composition and cross-sectional dimensions. Busbar comes in a range of styles and sizes. Due to their large surface area to cross-sectional area ratio, several of these geometries allow heat to dissipate more efficiently. Busbars are bolted, clamped, or welded together as well as to other devices. Due to an increase in electrical resistance, anomalous heating may occur if a bolt or clamp comes free or a welded junction fails. The overheating further increases the electrical resistance and can lead to a burnout or even a fire. To minimize overheating at any of the bus connections, they should be checked on a regular basis for abnormalities in the joint assembly, including those caused by environmental variables. Visual inspection is difficult, however, because busbars are generally hidden under plastic or metal bus ducts and coverings and are often in tough-to-reach areas. The failure of a power supply busbar poses a risk to plant safety and can result in an unanticipated plant shutdown. To avoid such hazards and the high costs of lost output, any hint of overheating in a power busbar must be swiftly detected and responded to [3]. In the process of monitoring, the busbar current is the important parameter to be monitored continuously. To sense the current, current sensors are embedded in the busbars [4]. The giant-magneto-resistive (GMR)-based current sensors have been used in sensing the current [5]. The busbar is normally bolted in the system, where the failure of the bolted connection can also cause an outage [6]. The busbars can come with a single layer or multilayer. The multilayer busbars are generally used in the electric vehicle and are the critical part of an electric vehicle. The rise of temperature in such busbars is mainly because of short circuit faults, which causes the discharge of large current in a short duration. This may further cause the overheating and deformation of the busbar [7]. The performance of the multilayer busbar mainly depends on the insulation used in it and failure of the insulation used may cause serious consequences [8].
IoT-Based Condition Monitoring of Busbar
295
There have been some attempts made in diagnosing the defects of the insulation used in the multilayered busbar. The radio frequency model is used in diagnosing the insulation defects [9]. The Internet of things (IoT) is the most widely used technology in the condition monitoring of the power system [10] and power equipment such as in the condition monitoring of Transformer [11–15], High-voltage cables [16], transmission lines [17], and Insulators [18–20]. Thus, in the proposed work, a condition monitoring of the busbar is developed using the Internet of things via the help of sensors.
2 Block Diagram of Proposed Scheme of Condition Monitoring The objective of the proposed work is to continuously monitor the condition of the busbar, and if any abnormalities are found, then it will alert by sending a message to the concerned people through the mobile application and also through the website so that further damage to it can be minimized. The proposed scheme for the condition monitoring of busbar using IoT is shown in Fig. 1. The scheme consists of a hardware setup consisting of the busbar whose condition is to be monitored is energized by the power supply and a load is connected. A sensing network is interfaced with a microcontroller and a software interface to deploy the sensed data to a website and a mobile application. The hardware and software requirements are discussed below. The hardware components required for the present work are a test busbar, a current sensor, Arduino UNO, NodeMCU-32S-ESP 32, temperature sensor, and two 10 k Ohm resistors. Each hardware component and its specifications are discussed below.
Fig. 1 Block diagram of proposed condition monitoring of busbar
296
L. A. Lincoln et al.
2.1 NodeMCU-32S-ESP 32 The NodeMCU platform is a free and open-source IoT platform. An ESP-12 module serves as the device’s hardware. This module gives you access to the ESP-32 Wi-Fi microchip’s numerous functionalities. The Arduino IDE open-source software can be used to calibrate and control the NodeMCU. It runs on 3.3 V and may be charged through USB. The frequency range is 2.412 GHz to 2.484 GHz. The NodeMCU was chosen as the controller for this work because it has an inbuilt Wi-Fi module and can readily connect with numerous sensors [21]. It is chosen for condition monitoring over other microcontrollers, because it can control the device with the help of Wi-Fi or Bluetooth and also offers high-speed data while being a low-power device.
2.2 Current Sensor (ACS712) It works on the Hall Effect principle. It consists of a Hall IC which senses the magnetic field produced by the current flowing through the input terminals of the sensor and converts it into proportional voltage as output. It has an operating voltage range of 4.5–5.5 V, a sensitivity of 66 mV/A for a supply voltage of 5 V, and a low operating current of 0.3 mA. The current rating of the sensor is 5A. This sensor is chosen for the present work due to its high sensitivity [22].
2.3 Temperature Sensor (DS18B20) This temperature sensor is very precise and it does not require any external components to work. It can measure the temperature from the range of −55 °C to +125 °C with an accuracy of 5%. It works with 3–5.5 V as the operating voltage. It can be energized by the data line. The resolution can be configured for 9, 10, 11, and 12 bits, and if not configured, then by default its resolution will be 12 bits [23].
2.4 Software Tools The present work is intended to sense and deploy the monitoring parameters through the mobile application and website; hence, it is essential to create a website and an app for mobile. The requirements of the software tools are discussed below. NGROK is a powerful tool used for port forwarding. Whenever a web server or website or an application is created with local Internet, then if these are to be accessed by anyone all over the world through the Internet, then NGROK helps in
IoT-Based Condition Monitoring of Busbar
297
doing it seamlessly, through the port forwarding. It is also a multiplatform tunneling with secured tunneling from the public domain like the Internet to a locally running network service [24]. Its architecture is given in Fig. 2. FLUTTER is the platform used for the development of mobile applications, and the developed applications can be tested and deployed to the mobile, website, desktop, and also to the embedded applications. It is basically a Google’s mobile UI framework which provides a fast and expressive way for the developers to build native apps on both iOS and Android [25]. Its architecture is shown in Fig. 3.
Fig. 2 NGROK architecture interfacing the local server to the public domain through the Internet
Fig. 3 Flutter for building the mobile application
298
L. A. Lincoln et al.
3 The Experimental Setup for the Condition Monitoring of Busbar In the proposed work, an attempt is made to develop a system to monitor the status of the busbar and is experimentally demonstrated. The actual experimental setup used for the demonstration is shown in Fig. 4, where the current sensor is connected in series with the busbar and the temperature sensor is connected in the close vicinity of the busbar to monitor the changes in the temperature around it. The busbar is energized with a 220 V AC supply, and to vary the current and temperature through the busbar, it is connected to multiple lamp loads. There are two incandescent lamps (150 W each) and two led lamps (60 W each) used for load. The current sensor is connected in series with the busbar to sense the current through it and a temperature sensor is placed around the busbar to measure the temperature around it. The sensed and measured data is deployed to the website and a mobile application using the NodeMCU-32S-ESP 32, NGROK, and FLUTTER. The proposed work is divided into the following stages. (i) Energizing the busbar (ii) Acquiring the Current and Temperature data from the Busbar Fig. 4 Experimental setup for the condition monitoring of busbar
IoT-Based Condition Monitoring of Busbar
299
Fig. 5 Data flow in the IoT module
(iii) Development of website and mobile app (iv) Deploying the data from server to Website and Mobile application The busbar is energized with a 220 V AC supply and a set of lamps connected as load are switched on in sequence so that the current through and temperature around the busbar are varied. The current sensor which is connected in series with the busbar senses the current and the temperature sensor around the busbar senses the temperature of the busbar, and the same is acquired by the microcontroller which deploys the data acquired to the website and mobile app developed. Hence, the condition of the busbar is monitored continuously. The data flow in the IoT module after it is acquired by the microcontroller is shown in Fig. 5.
3.1 Data Flow in IoT Module The current sensor (AC712) comprises a hall sensor, and the current flowing through this sensor is detected by the magnetic field established by the current. The hall sensor converts the magnetic field to a proportional voltage which is used to measure the current. The voltage output from the sensor which is an analog voltage is fed as
300
L. A. Lincoln et al.
an input to the NodeMCU controller through the ADC (Analog to digital converter) at a rate of 100 samples per second then its average is computed and converted into root mean square values of digital data. Then the actual current is computed with the help of the resolution of the sensor which is 66 mV/A. The current computation takes place on the server and is displayed on an online platform using the Internet in real time. Whenever the current exceeds the threshold value of 5 A (As the current rating of the sensor is 5A), then it alerts the authorities about it for further action.
4 Results and Discussions To demonstrate the condition monitoring of the busbar, the current sensor is calibrated by testing the busbar in the laboratory with the help of a variable resistive load. Calibration of the Current Sensor: The busbar is energized with a 220 V AC supply with variable resistive load. The current sensor and a calibrated ammeter are connected in series with the busbar and the following are the results tabulated in Table 1. From the laboratory testing, it is observed that the average error between the ammeter reading and the current sensor reading is 5.71%. This factor is taken into account for correcting the current sensor reading before it is deployed to the website and the mobile application.
4.1 Determination of Calibration Factor Considering the 3rd reading, from Table 1. Calibration factor = (current sensor reading—ammeter reading)/ammeter reading * 100% = (2.55–2.35)/2.35 = (0.085). Table 1 Readings of ammeter and current sensor during calibration of current sensor Sl. No
Ammeter reading (A)
Sensor reading (A)
Error in (%)
Calibration factor (%)
Calibrated current sensor reading (A)
1 2
0.8
0.86
7.5
5.71
0.81
1.58
1.70
7.6
5.71
1.60
3
2.35
2.55
8.5
5.71
2.40
4
3.15
3.33
5.7
5.71
3.13
5
4
4.12
3
5.71
3.88
6
4.9
5.01
2.04
5.71
4.72
IoT-Based Condition Monitoring of Busbar
301
Table 2 Readings from sensor, ammeter, website, and mobile application Sl.No
Load (Lamps are switched on in sequence)
Current sensor Readings (A)
Website Readings (A)
Mobile application Readings (A)
1
150 W is on
0.67
0.668
0.668
2
150 W + 150 W is on
1.35
1.347
1.347
3
150 W + 150 W + 60 W is on
1.62
1.612
1.612
4
150 W + 150 W + 1.89 60 W + 60 W is on
1.84
1.84
Error Factor = (-0.085) * 100% = (8.5) % Calibrated Reading = (100–8.5)/100 * 2.55 = 2.40 A. Average Error Factor = (7.5 + 7.6 + 8.5 + 5.7 + 3 + 2.04)/6 = 5.71%
4.2 Measurement of Current on the IoT Platform The current sensor after calibration is used in the demonstration of condition monitoring of the busbar as shown in Fig. 4. The results of the demonstration are tabulated in Table2. The measured parameters are deployed to the main server and the results obtained are displayed on the website and the mobile application. Taking for example Table 2, reading number 3, the current and temperature readings are deployed on the website and mobile application and are as shown in Figs. 6 and 7, respectively.
5 Conclusions For effective asset management, condition monitoring is one of the important methods. Busbar being one of the key components in the power system network, it is essential to monitor the health of the busbar under varying load conditions. In the proposed work, a system is developed for condition monitoring of a busbar. Internet of things technology is implemented for the online monitoring of busbar status. A website and a mobile application are developed to monitor the Busbar condition and also get alerted whenever the Busbar’s condition is abnormal with respect to current and temperature. The developed system is verified and tested in the laboratory environment and data sensing and deploying are verified and found that data available on the website and mobile application are very close to the sensed data from the current sensor with an error of less than 1%. This error may be due
302
Fig. 6 Results displayed on the website Fig. 7 Results displayed on the mobile application
L. A. Lincoln et al.
IoT-Based Condition Monitoring of Busbar
303
to the data propagation delay or the data sensing sampling rate being less. Thus, the system developed for condition monitoring of the busbar will enhance the life of the busbar and improve the reliability of the power system network. Since the security of the system developed is not taken care of in the present work, the same can be taken as a future work to implement the security of data communication.
References 1. URL: https://www.emerson.com/documents/automation/electrical-asset-monitoring-en-483 8942.pdf 2. URL: https://www.sensor-works.com/condition-monitoring-benefits/ 3. URL: https://www.yokogawa.com/in/library/resources/application-notes/bus-bar-monitoringfor-overheating/ 4. Kuwabara Y et al (2018) Implementation and performance of a current sensor for a laminated bus bar. IEEE Trans Ind Appl 54:2579–2587 5. Kim W et al (2013) Integrated current sensor using giant magneto resistive (GMR) field detector for planar power module. In: 2013 twenty-eighth annual IEEE applied power electronics conference and exposition, pp 2498–2505 6. Slade PG (2021) Bus bar bolted connections: reliability and testing. In: 2021 IEEE 66th Holm conference on electrical contacts (HLM), pp 209–216 7. Cai Y et al (2021) An improved multiphysics analysis model for the short-circuit fault ridethrough capability evaluation of the MMC submodule busbar. IEEE Access 9:119090–119099 8. Zhang Y, Liu R, Ruan L (2020) Comparison of insulated tubular busbars with different insulated structure. In: 2020 IEEE 4th conference on energy internet and energy system integration (EI2), pp 3404–3408 9. Zhou L, Li S, Zhou Y (2020) Method for diagnosing defects of insulated tubular busbars based on improved RF model. In: 2020 5th Asia conference on power and electrical engineering (ACPEE), pp 1751–1755 10. Guo C et al (2019) Review of On-line condition monitoring in power system. In: 2019 IEEE 8th international conference on advanced power system automation and protection (APAP), pp 634–637 11. Mahanta DK, Rahman I (2022) IoT based transformer oil moisture monitoring system. In: 2022 IEEE Delhi section conference (DELCON), pp 1–4 12. Srivastava D, Tripathi MM (2018) Transformer health monitoring system using Internet of Things. In: 2nd IEEE international conference on power electronics, intelligent control and energy systems (ICPEICES), pp 903–908 13. Hazarika K, Katiyar G, Islam N (2021) IOT Based Transformer Health Monitoring System: A Survey. In: 2021 international conference on advance computing and innovative technologies in engineering (ICACITE), pp 1065–1067 14. Mussin N (2018) Transformer active part fault assessment using Internet of Things. In: 2018 international conference on computing and network communications (CoCoNet), pp 1–6 15. Kumar TA, Ajitha A (2017) Development of IOT based solution for monitoring and controlling of distribution transformers. In: 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT), pp 1457–1461 16. Xu-Ze G et al (2019) IoT-based on-line monitoring system for partial discharge diagnosis of cable. In: 2019 IEEE electrical insulation conference, pp 54–57 17. Shen X, Cao M (2014) Research and application of internet of things for high-voltage transmission line. In: 2014 China international conference on electricity distribution (CICED), pp 889–893
304
L. A. Lincoln et al.
18. Faria JRC et al (2021) Power feeding equipment for the condition monitoring of insulators for overhead power lines. In: Cired 2021—The 26th international conference and exhibition on electricity distribution, pp 351–354 19. Baby Sindhu AV, Thomas MJ (2021) A technique for the condition monitoring of outdoor polymeric insulators made of micro/nano dielectric weathershed material. In: 2021 IEEE 5th international conference on condition assessment techniques in electrical systems (CATCON), pp 232–236 20. Ramani R et al (2019) IoT based condition monitoring of outdoor insulators under heavily polluted conditions. In: 2019 IEEE 4th international conference on condition assessment techniques in electrical systems (CATCON), pp 1–6 21. URL: https://www.waveshare.com/nodemcu-32s.htm 22. URL: https://robu.in/product/acs712-30a-range-current-sensor-module-hall-sensor/ 23. URL: https://lastminuteengineers.com/ds18b20-arduino-tutorial/ 24. Praghash K et al (2021) Tunnel based intra network controller using NGROK framework for smart cities. In: 2021 5th international conference on electronics, communication and aerospace technology (ICECA), pp 39–43 25. Aggarwal D et al (2022) An insight into android applications for safety of women: techniques and applications. In: 2022 IEEE Delhi section conference (DELCON), pp 1–6
Novel CNN Approach (YOLO v5) to Detect Plant Diseases and Estimation of Nutritional Facts for Raw and Cooked Foods M. Najma and G. Sekar
Abstract Spread of diseases in fruit and vegetables is important for the global food security, but their rapid identification remains difficult. Also, in the current generation where people are keen to know about their food and the calorie associated and nutrition details for a healthy life. In this project, a system is designed to understand and identify different fruits and vegetables, detect the presence of disease and the type of disease that infected the fruit/vegetable, which is enhanced to classify raw foods, cooked foods and fast foods—estimating the calorie content and nutrition details of the food. For more accurate information, Convolutional Neural Networks (CNN) are being used and the results are displayed using Streamlit framework. For customized object identification, mainly for real-time objects and detection of other irrelevant objects in the image (such as human hand etc.,) a new technique called as YOLO v5 (You Only Look Once) is used. Keywords Calorie estimation · Nutritional value · Disease identification · Convolution neural network · Object recognition · YOLO
1 Introduction This paper explores the deep learning Convolution Neural Network (CNN) concept for the application in classification and feature extraction of the given input (fruits/vegetables/cooked food/fast foods). We have designed using Python programming language in Anaconda Navigator software (Jupyter notebook). This system uses deep learning CNN and YOLO v5 instead of the widespread method which uses Support Vector Machine (SVM) techniques. Deep-leaning CNN in this proposed paper helps us to overcome the shortcomings of the existing methods which follow M. Najma (B) · G. Sekar Department of Electronics & Communication Engineering, Sri Ramakrishna Institute of Technology, Coimbatore, India e-mail: [email protected] G. Sekar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_23
305
306
M. Najma and G. Sekar
Fig. 1 CNN architecture—deep learning neural networks
graph-cut segmentation and edge detection processes because classification and feature extraction is done in one step in this proposed CNN system. This artificial intelligence technology helps us to avoid a greater number of steps to be followed which helps us to obtain the accuracy with the least procedures. The results are stored in Pb5 and h5 models so that the repetition of the same processes can be avoided. This Deep learning is becoming a fast-evolving technique in Machine Learning and its advantages can be leveraged in the accurate identification of objects. The planned model refers to inputs of different varieties from both real-time and from different datasets of renowned platforms which gets convoluted with deep learning neural networks. The convolution process begins with the activation layer (RELU) followed by pooling and striding. The flattened images undergo rigorous epochs training to obtain more accurate results. CNN Architecture is shown in the below Fig. 1. YOLO stands for “you only look once”, which is a neural network algorithm used for the purpose of object detection in real time with more accuracy rate. The main advantage of using this algorithm is because of it speed and accuracy rate when taking part in the usage of real time object detection with both images and videos and for custom object detections. There lie various versions in YOLO which starts from YOLO V1–V5. The biggest advantage of using this Yolo is because of its superb speed and it performed a high-speed object detection with more accuracy when compared with Mask R-CNN.
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
307
2 Related Works Through this section, we intend to summarize few earlier works in this domain to brief on the methods followed in those works and their key facts and points. Dhanalakshmi and Harshitha [18], made an attempt to explore and compare the various methods/algorithms proposed by different researchers in each step. The paper points out that, although a number of researchers have proposed various methods for the quality inspection of fruits and vegetables still a robust computer vision-based system with improved performance is required to be built. This paper concludes by stating the requirement for a generalized system that may also be designed to grade or sort and detect the defects of multiple fruits and some vegetables. In a study by Raikwar et al. [2] the authors have investigated a wide range of strategies in computer vision and artificial intelligence tailored for automated food recognition and calorie assessment of fast food using deep learning and SVM techniques but estimated the results only for a few items and the results are published for the single food image. Tan et al. [3], this paper proposes a comparison of three methods of YOLO v3, Faster R-CNN, and SSD for Real-Time Pill Identification where the detailed report and analysis are made for three different methods and the estimation is provided with more accurate results; hence it is shown that YOLO stands first place in object detection and comparatively fast R CNN takes the next place where the estimation is done for pill identification. Shen et al’s [4], “Machine Learning Based Approach on Food Recognition and Nutrition Estimation”—This paper provides effective approach of food recognition with its calorie and nutritional content but they estimated results with limited datasets and they are in a way forward mode to analyze the mixed physical images and cooked foods. Mhapne et al. [6]—this paper focuses on quality assessment of only Apple fruit, in particular, to identify the surface defects based on computer vision techniques. Gurubelli et al. [7],—this paper presents an approach of grade classification of pomegranate and mango fruits with texture and color gradient features. This fruit grading system is based on structural LBP features, statistical GLCM, PRLM features, and color gradient features classified with K-SVM classifiers. Patel et al. [8]—this paper focuses on the quality assessment of only orange fruit, in particular, using an SVM classifier. The algorithm for this system is defined with the beginning of orange image input to the system, preprocessing using a Median filter. Subsequent to the segmentation—the color, shape and texture features are extracted and given to the classifier where it classifies the image and assesses the type of defect. Hamza et al. [9] used a computer vision-based classification approach to estimate the ripeness of apples based on color. Their proposed system consists of four major steps. First, for extraction, they used the thresholding segmentation method along with some morphological operations. Color-based features were drawn from segmented apple images and segregated into testing and training data in the second
308
M. Najma and G. Sekar
step. The classifier training parameters are fed in the third step. Classification is achieved in the last step using trained neural networks. Mehra et al. [10] analyzed tomatoes and their leaves and determined their maturity based on fungal infection and color. The system has used a thresholding algorithm to determine the ripeness of tomatoes and by using a k-means clustering algorithm for a generalized result. A comparative study of both methods under various conditions was done to find a better alternative. They used a combination of the kmeans algorithm and thresholding algorithm to segmentation and identification of the fungus. The segmented part of the fungus was then studied to derive the percentage of presence. As a summary, the earlier researched works on CNN technique majorly focused on processing only one or very few kinds of Fruits or vegetables (like Tomato, Mango, Apple etc.,) and considered only high contrast-colored images. And some of the past works uses traditional techniques like Support Vector Machine (SVM) classifier [1] and Gabor feature-based kernel principal component analysis [11] which has its own limitation with processing images and its computational speed. Automation fruit and vegetable for classification based on RGB [12] images with real time techniques also limits with only 15 sets of images which even stands with color feature limits. So, the proposed system in this project uses the latest CNN technique and is also capable to handle multiple varieties of Fruits, vegetables, cooked foods, raw foods and fast foods—irrespective of their color, background, place where it calculates their calories with nutrients values. Also, the proposed system uses new-gen technique YOLO v5, for recognition of irrelevant objects present in the image, with bounding box technique. The system is further enhanced to be user-friendly by usage of Streamlite framework.
3 Proposed System The proposed project is designed for detection and identification of various types of diseases present in the plant with advanced computer vision technique–CNN. The proposed system is extended to detect various kinds of fruits and vegetables, classify their category (viz., Fruit, Vegetable, Fast Food, Cooked Food) and subsequently assess their calorie values for 100 g along with their sub-macro nutrients like Carbs, Proteins, Fiber and Fats. The project focuses on a non-destructive approach, based on artificial intelligence and it is least influenced by the environmental light condition, shape and size of the fruit/vegetable, and the effectiveness of the system is tested with similar images which contain look-alikes in shape and color. The proposed system is enhanced with various types of datasets from different sources by using CNN and YOLO algorithm in one process.
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
309
3.1 System Architecture Jupyter notebook is an open-source web application that allows to create and share documents that contain live codes, equations, visualizations, narrative texts. Its areas of application include—data cleaning, transformation, numerical simulation, statistical modeling, machine learning, data visualization, and so on. Some of the special features of the Jupyter notebook are it provides easy to use environment, interactive data science, an environment that doesn’t work only as an Integrated Development Environment (IDE) but also as a presentation or educational tool. Jupyter notebook is a cost-competitive solution when compared to other available options such as MATLAB. Also, it requires the least requirements to run in a platform, and installation and processing timing are much effective. Jupyter notebook supports more distribution of Python and is a server-client application; it allows editing and running notebook documents directly via the browser. Jupyter notebook application can be installed and executed on a local desktop requiring no internet access or can be installed on a remote server and accessed through internet/intranet through VPN. Unlike Jupyter notebook, Integrated Development and Learning Environment (IDLE) which is a specific IDE for Python doesn’t allow us to write a specific complex code and then compute the results. This permits the user to check each line of the code simultaneously when had been entered into the system. Owing to this advantage, IDLE is preferred to check the error functions. However, IDLE is not a mandatory requirement for using Python but is it comes along with Python and it is not a complex function. TensorFlow is an open-source end-to-end platform and a library for multiple machine learning tasks, while KERAS is a high-level neural network library that runs on top of TensorFlow. Both TensorFlow and KERAS provide a high-level Application Programming Interface (API) that is used for building and training models, without much complexity. However, KERAS is more user-friendly because it has its own built-in Python. TensorFlow provides the designer with a collection of workflows to develop and train models using Python or JavaScript and the same can be easily deployed in the cloud, in the browser, or on-device without any limitation on the language being used. KERAS is tightly integrated into the TensorFlow ecosystem, and includes support for.tf.data, enabling to build high-performance input pipelines. If preferred models can be trained using data in NumPy format which is used in this proposed system. Library packages included are shown in Fig. 2
3.2 Datasets Datasets were fetched from multiple sources, and some are listed as below Table 1.
310
M. Najma and G. Sekar
Fig. 2 Library packages included
Table 1 Details on the datasets used Dataset name
Location
Number of images
Accuracy
Plant village
KAGGLE
54,306
>99%
CIFAR100
KAGGLE
500 training images + 100 testing images
>75%
CIFAR10
KAGGLE
50,000 training images + 10,000 test images
96%
COCO (YOLO v5)
KAGGLE
128
For YOLO only: >99%
FOOD 101
KAGGLE
101
>97%
ECUSTFD
GITHUB
2978
>98%
3.3 System Design As a result of a detailed analysis of various software and coding languages which suit the project and also the base idea on a cost-effective solution, the following code language has been selected which is both a high-end language and is also easy to access and use. Code language • Python Computer Vision • CNN (Convolution Neural Network)
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
311
• YOLO v5 (You Only Look Once) Classification of Datasets • • • •
test train validation results are stored in PB and h5 model
3.4 System Model The complete system is built at the user-end by Streamlit framework, to provide a user-friendly experience. The input for the Streamlit framework comes from pb5 and h5 stored data of CNN. YOLO v5 is used for custom object detection as below in Fig. 3.
3.5 Flow Diagram The below flow diagram illustrates the complete system from the base where the CNN is used for efficient classification and identification of the given input image as in Fig. 4. YOLO provides bounding box with the object name with more accuracy. Results are produced with the calorie and nutrient values along with the name and category of the given image. The images are taken in both individual and clustered form and results are produced through Streamlite framework.
3.6 Image Segmentation From the “Train” and “Test” dataset images, the images are segmented into the categorical form and the data are visualized in each category to perform CNN. An illustrative picture is shown below in Fig. 5.
3.7 Feature Extraction From the trained datasets directory, the given images are arranged in the form of an array to split up the data into infected and non-infected categories. All the detailed structures are illustrated in Fig. 6.
M. Najma and G. Sekar
Fig. 3 System model
312
313
Fig. 4 Flow diagram
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
314
M. Najma and G. Sekar a. Gray scale Image
b. Blurred image
c. Edge detection
d. Dilated image
e. RGB Image
a. Canny image -Palak Panner
b. Contour Image- Palak Panner
Fig. 5 Image segmentation
3.8 Data Augmentation—360° Rotation Based on the categorical entropy, the images are classified with their names and name of the disease and categories they belong to. In CNN, the data are trained in this format. To enhance the feature extraction, data augmentation is performed with 360° rotation and the results are shown in Fig. 7.
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
a. feature extraction of Leaf Diseases-Apple Scab
315
b. Fastfood -Athirasm
c. Vegetable-Eggplant Fig. 6 Feature extraction
Fig. 7 Data augmentation
3.9 Sequential Analysis Sequential analysis is performed concerning the given category of images. KERAS sequential modules are performed with input shape and data channels are performed using “Channel first” as in Fig. 8. Sequential analysis is performed to analyze the
316
M. Najma and G. Sekar
Fig. 8 Sequential analysis
result of number of trainable and non-trainable parameters to evaluate the bunch of scope of the process undergone. The results of Image feature extraction and classification are stored in pb5 and h5 models.
3.10 Number of Epochs Performed A number of Epochs were performed, and the results are obtained with 100% accuracy. This accuracy is obtained with the help of Adam optimizer and entropy used in Binary Cross entropy, where batch normalization is performed with a batch size of 2. The computational speed, training accuracy is mentioned in the Fig. 9.
Fig. 9 CNN architecture—epochs conducted
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
317
3.11 YOLO v5 Algorithm Working Process YOLO (You Only Look Once) uses convolutional Neural Networks to detect the objects in both real time in case of both images and videos with high accuracy and speed. The proposed model uses Pytorch as a framework for Object detection and identification. They are under the process of single forward propagation through a neural network which works by detecting the objects by bounding boxes and labels. They work by dividing the image into N grids where they split in equal dimensional regions of s*s. Each N belongs to detection and localization of objects. It works by finding the contours of the images by providing an imaginary rectangular bounding box (shortly referred as ‘bbox’). The bbox is formed by defining the area into 2 categories one with two longitudes and 2 latitudes. The bbox is represented by 4 coordinates ‘x, y, w, h’, where x & y represent the top and left corner and w & h represent the width and height of the image. YOLO makes the decision by dividing the image into N grits for detection & localization of the object. The bbox is mainly used to find the target and works as a reference point fir detection of the object and creates a collusion box for the object as in Fig. 10.
Fig. 10 YOLO v5—object recognition
318
M. Najma and G. Sekar
3.12 Calorie Estimation Most of the foreign countries like the U.S.A and U.K—the food manufacturers use 4-4-9 rule to estimate the calories based on carbohydrates, fats, proteins to state its accurate values. • For carbohydrates: Calculate carbohydrates and multiply the amount by 4. • For Fat: Find Fat and multiply it by 9. • For Protein: Find proteins & multiply by 4. All are added together to get the total amount of calories.
3.13 User Experience—Streamlite Streamlite is an open-source framework is being used in this model, which provides a user-friendly platform. In this project it is designed under the Python language where it helps to estimate the clear results of calculated Calories and nutritional values which also helps to record and store data. Streamlite software is chosen in this project as it utilizes less storage space compared to other frameworks like Django and Flask web app frameworks. The other important reason to choose Streamlite is it is easy to customize as web apps.
3.14 Advantages of the Proposed Model • The system is very simple and reliable. • It allows real-time images with any kind of data since CNN gives the results from the pre-trained model. • More cost-efficient and easy supported software in most versions so it comes under a more reliable category • High Accuracy rate with lesser time utilization • Analyze both individual and clustered fruits and vegetables like babycorn, grapes which are rarely taken into consideration. • Analyze the fruits and vegetables and identify infected fruits and vegetables specifically with their names and type. • Analyze the raw, cooked and fast foods and estimate their calories and nutrition details. • Experiment is conducted using both real time and customized objects—where object detection is done by emerging technology called YOLO • The system gives more accurate results as it is tested with both customized and real time objects where all the objects are in clustered and unshaped form
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
319
4 Results Since the proposed model starts from using Jupyter notebook, the model begins with the identification of the object irrespective of Fruit/vegetable by training the convolution neural network continuously with different datasets, and it is validated to check whether the appropriate results are obtained like—Image and Name of the object in the image with the accuracy values. The system is designed to check, irrespective of having a single object in the image, or a cluster of fruits/vegetables, and also the size and shape of the object. Also, the results are not influenced by the environment lighting changes and background effects.
4.1 Plant Name and Disease Identification In this proposed model of a non-destructive method for identification and quality analysis of Fruits and vegetables, the computer vision technique is being used to get the output for object name detection and disease identification and detection. The system uses the method of deep learning CNN to convolute the given model to get the required results. The above-illustrated images (Figs. 11 and 12) prove the effectiveness of the system, by correctly identifying the particular Fruit and vegetable, and also displaying the accuracy percentage.
Fig. 11 Name ıdentification of fruits—example
320
M. Najma and G. Sekar
Fig. 12 Name identification of vegetable—example
Black rot in (Fig. 13) is a disease which infects the apple fruit, leave and stem. This disease is caused by the fungus called Botryosphaeria obtusa. Early symptoms are often limited to leaf symptoms such as purple spots on upper leaf surfaces. Timely identification of the symptom will assist to take proper treatment and sanitation to prevent further spread of the fungus.
Fig. 13 Prediction of disease—example: apple black rot
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
321
4.2 Measurement of Performance In this proposed model, a set of datasets is not restricted with only one source but collected from multiple sources like Plantvillage, KAGGLE, CIFAR, etc., The data acquired is divided into 3 sections—Test, Train and Validation. All the CNN models which undergo training of the given datasets were split into 80% for training and 20% for validation. The results are stored as.pb5 and .h5 files to avoid repeated fetch of datasets. The module is trained with similar images which are unavailable in the datasets. Upon review of various other models proposed by researchers over a period of time, the accuracy level to obtain this proposed model is based on the CNN technique that is higher than the earlier designed models based on other techniques such as SVM and ANN. A comparison of different models is illustrated below (Table 2). As illustrated in Fig. 14, Higher accuracy results were achieved in this model by continuous and multiple training of images by conducting more number of Epochs. For each stage of Epochs, the accuracy value increases and the loss is reduced. The training accuracy and computational speed are illustrated in Fig. 14. The following graphs compare the accuracy and loss for training and validation. Figure 15 illustrates the relation of Training accuracy and validation accuracy concerning the number of epochs. Similarly, Fig. 16 illustrates the relation of Training loss and validation loss concerning the number of epochs. Figure 17 illustrates the test accuracy achieved during the process. Table 2 Comparison chart from existing and proposed work—identification
Author(s)
Input image
Classifier
Accuracy
Arivazhagan et al. [13]
–
Min. distance classifier
86.00%
Dubey and Jalal [14]
Apples
Multiclass SVM
95.94% for CLBP
Zhang et al. [15]
Any
BBO + FNN
89.11%
Sendin et al. [16]
Maize
PLS-DA
83–100%
Nandi et al. [17]
Mango
Support vector regression
87.00%
Current proposed model
Multiple fruits and vegetables
Deep learning CNN/CV
85–100%
322
M. Najma and G. Sekar
Fig. 14 Epochs conducted and improvement in the accuracy levels (Disease Identification)
Fig. 15 Graph plot of training and validation accuracy
4.3 Calorie and Nutritional Value Estimation (Raw Foods/Cooked Foods/Fast Foods) The system is enhanced to estimate the number of calories and also to calculate the nutrition details of the given object with various categories and varieties. The Earlier system is proposed with nutrients like Boron and calcium for one particular fruit like apple [5]. The system is designed to handle raw foods, cooked foods and also fast foods. This will enable the consumer to estimate the amount of calorie intake to make their diet plan accordingly. Also, it gives insights on the nutrition content details to the consumers to decide their food accordingly. The output is shown in Figs. 18, 19,
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
323
Fig. 16 Graph plot of training and validation loss
Fig.17 Accuracy range (Disease Identification)
20 and 21 using Streamlit, which is an open source user friendly software to create a user-friendly environment.
4.4 Measurement of Performance Similar to the disease identification algorithm, higher levels of accuracy were achieved by performing continuous and multiple training of images by conducting a greater number of Epochs. The training accuracy, computational speed is illustrated in Fig. 22. The below graphs (Figs. 23 and 24) illustrate the accuracy levels of Training and Validation, and losses are also calculated between Training and validation. Similar to the exercise done in Sect. 4.1, various other models proposed by other researchers in the past are analyzed and compared with the current model. Upon
324
M. Najma and G. Sekar
Fig. 18 Calorie and nutrition estimation red chili pepper
Fig. 19 Calorie and Nutrition Estimation_Ginger
review of various other models, the accuracy level able to obtain in this proposed model based on the CNN technique is higher than the earlier designed models. A comparison of different models is illustrated below Table 3.
Novel CNN Approach (YOLO v5) to Detect Plant Diseases … Fig.20 Calorie and Nutrition Estimation_Apple Pie
Fig. 21 Calorie and Nutrition Estimation_Ice Cream
325
326
M. Najma and G. Sekar
Fig. 22 Epochs conducted and improvement in the accuracy levels (Calorie Estimation)
Fig. 23 Graph Plot_Raw food, cooked food and fast food (Calorie and Nutrition Estimation)
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
327
Fig. 24 Accuracy Range_Raw food, Cooked food and Fast food (Calorie and Nutrition Estimation)
Table 3 Comparison chart from existing and proposed work—calorie and nutritional values estimation Author(s)
Input image
Classifier
Accuracy
Dhanalakshmi et al. [18]
Fruits
CNN
>90%
Shen et al. [4]
Food 101 dataset
CNN
85%
Tan et al. [3]
Pill image dataset
CNN and YOLO v3
80~88%
Raikwar [2]
Fast foods—custom dataset
SVM
90.66%
Current proposed model
Fruits, vegetables, raw foods, cooked foods, fast foods
CNN, YOLO v5
>97%
5 Conclusion The proposed model detects and identifies the fruits and vegetables in a given image data, and also identifies the type of disease which has affected the same using CNN. This model is intended to provide a convenient and accurate insight with less manual dependency. As the base focus idea is on proposing a cost-effective solution, compared to the earlier existing models—the choice was narrowed down to select Jupyter Notebook software with Python programming language which is cost-effective and possible to run in most common computers with no special requirements. Also, the model is designed to recognize image data irrespective of its image quality, cluster or a single object, varying size and shapes, etc., After multiple iterations on the number of epochs, an accuracy between 85% and 100% was able to obtain. This idea is enhanced to the next level where the systems’ scope is to recognize mixed foods (raw, cooked and fast foods) and analyze and quantify the calorie content with Nutritional values (carbohydrates, fats, proteins and fiber) with more number of datasets from different sources are conducted and results are published with both customized and non-customized input object. The project is more specific in identifying the number of other objects present in the image using an emerging technology called YOLO (You Look Only Once)—version 5, to get more accurate data results. The more challenging part of the project lies in identifying the calorie where no reference object is used which makes it unique from existing work where
328
M. Najma and G. Sekar
results of calories and Nutrients are estimated for 100 g by using the 4-4-9 rule. The future scope lies in identifying the ingredients in the identified food objects and to identify foods like liquids and soups items and elaborate the results.
References 1. Subhi MA, Ali SH, Mohammed MA (2019) Vision-based approaches for automatic food recognition and dietary assessment: a survey. IEEE Access 7:35370–35381. https://doi.org/10.1109/ ACCESS.2019.2904519 2. Raikwar H, Jain H (2018) Calorie estimation from fast food ımages using support vector machine. 98 IJFRCSCE|April 2018 3. Tan L et al (2021) Comparison of YOLO v3, Faster R-CNN, and SSD for Real-Time Pill Identification. https://doi.org/10.21203/rs.3.rs-668895/v1 4. Shen Z et al (2019) Machine learning based approach on food recognition and nutrition estimation. In: 2019 ınternational conference on ıdentification, ınformation and knowledge in the Internet of Things (IIKI2019) 5. Makkar, Tanya, Yogesh (2018) A computer vision-based comparative analysis of dual nutrients (Boron, Calcium) deficiency detection system for apple fruit. In: 4th ınternational conference on computing communication and automation, pp 1–6. https://doi.org/10.1109/CCAA.2018. 8777678 6. Mhapne N, Harish V, Kini A, Narendra VG (2019) A comparative study to find an effective ımage segmentation technique using clustering to obtain the defective portion of an apple. In: International conference on automation, computational and technology management, pp 304–30910.1109/ICACTM.2019.8776751 7. Gurubelli Y, Malmathanraj R, Palanisamy P (2020) Texture and colour gradient features for grade analysis of pomegranate and mango fruits using kernel-SVM classifiers. In: 6th ınternational conference on advanced computing and communication systems (ICACCS). Coimbatore, India, pp 122–126 8. Patel H, Prajapati R, Patel M (2019) Detection of quality in orange fruit ımage using SVM classifier. In: 3rd ınternational conference on trends in electronics and ınformatics, pp 74–78. https://doi.org/10.1109/ICOEI.2019.8862758 9. Hamza R, Chtourou M (2018) Apple ripeness estimation using artificial neural network. In: International conference on high-performance computing and simulation, pp 229–234. https:// doi.org/10.1109/HPCS.2018.00049 10. Mehra T, Kumar V, Gupta P (2016) Maturity and disease detection in tomatoes using computer vision. In: Fourth ınternational conference on parallel, distributed and grid computing, pp. 399– 403. https://doi.org/10.1109/PDGC.2016.7913228 11. Zhu B, Jiang L et al (2007) Gabor feature-based apple quality inspection using kernel principal component analysis. J Food Eng 741–749. https://doi.org/10.1016/j.jfoodeng.2007.01.008 12. Rocha A et al (2010) Automatic fruit and vegetable classification from images. Comput Electron Agri 96–104. https://doi.org/10.1016/j.compag.2009.09.002 13. Arivazhagan S et al (2010) Fruit recognition using color and texture feature. J Emerg Trends Comput Inf Sci 90–94 14. Dubey SR et al (2014) Fruit disease recognition using improved sum and difference histogram from images. Int J Appl Pattern Recognit 1(2):199–220 15. Zhang et al (2016) Fruit classification by biogeography-based optimization and feedforward neural network. J Knowl Eng (Wiley) 33(3):239–253
Novel CNN Approach (YOLO v5) to Detect Plant Diseases …
329
16. Sendin K et al (2018) Classification of white maize defects with multispectral imaging. Food Chem 243:311–318 17. Nandi et al (2016) A machine vision technique for grading of harvested mangoes based on maturity and quality. IEEE Sens J 16:6387–6396 18. Dhanalakshmi et al (2020) Food classification and calorie estimation using computer vision techniques. J Emerg Technol Innov Res 7(6)
The Evolution of Ad Hoc Networks for Tactical Military Communications: Trends, Technologies, and Case Studies Zalak Patel, Pimal Khanpara, Sharada Valiveti, and Gaurang Raval
Abstract Modern tactical military networks would rely heavily on Mobile AdHoc Networks (MANETs). Combat operations in regions lacking connectivity to a conventional network infrastructure need tactical networks. Because of the high mobility of network nodes and inconsistent connectivity, such networks have frequent changes in the network structure. The ability of MANETs to self-organize and selfheal makes them ideal for tactical military networks. However, the common assumption is that the same methods and protocols that are used for conventional networks would also work for MANETs. Unfortunately, the traditional network protocols may not be well adapted to dealing with the dynamism of MANET networking. Many researchers have presented the study of design issues for MANETs, especially for military communications. However, there are still many open issues that need to be addressed to get the maximum benefit of the MANET characteristics to design and deploy efficient and improved army tactical networking scenarios. This paper presents the fundamental features, performance and security requirements, and security threats in the context of military mobile ad hoc networks. Keywords Ad hoc networks · Army tactical networks · Routing · Security · Attacks
Z. Patel · P. Khanpara (B) · S. Valiveti · G. Raval Department of Computer Science and Engineering, Nirma University, Ahmedabad, India e-mail: [email protected] Z. Patel e-mail: [email protected] S. Valiveti e-mail: [email protected] G. Raval e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_24
331
332
Z. Patel et al.
1 Introduction Ad hoc wireless networks are self-configuring, adaptable, and independent networks. As radio coverage is typically restricted in such networks, multi-hop routing is frequently required [1]. The implementation of multi-hop routing relies on identifying neighbors and then establishing routes automatically based on the neighbor information. Because the knowledge regarding routes and topology is continuously updated, high mobility of the network nodes is one of the important characteristics as well as requirements in ad hoc networks. Because of these features, ad hoc networks are ideal candidates for deploying Military Tactical Networks, where their capacity to operate quickly and without the need for a centralized organization is critical. A Military Ad hoc Network which is also known as an Army Tactical Mobile Ad hoc Network (Army Tactical MANET) is a form of ad hoc network that differs from a typical wireless ad hoc network in several ways. These kinds of networks usually consist of a larger set of participating nodes compared to conventional ad hoc networks. Apart from this, Army Tactical MANETs require broadcasting capability and high mobility support to facilitate wireless communication [19]. Hence, network scalability is an important requirement in Army Tactical MANETs in comparison to regular MANETs. The basic communication scenario for a military ad hoc network is shown in Fig. 1. AODV, DSR, and TORA are the most commonly used routing protocols in ad hoc networks. However, most routing protocols in MANETs use the shortest path criteria to choose the optimal routes which are insufficient for a battlefield ad hoc network [4]. The major reasons behind this are as follows: (i) considering bandwidth as one of the performance parameters is necessary to get better network capacity and reconnaissance accuracy as well as to decrease transmission latency; (ii) most existing
Fig. 1 Communication scenario in military MANETs [17]
The Evolution of Ad Hoc Networks for Tactical Military …
333
protocols ignore the frequency compatibilities of the communication devices, making them ineffective in situations when nodes are equipped with several communication devices operating at different frequencies, as is prevalent in army wireless networks [6]; (iii) vehicles in military forces are frequently equipped with several radios, each of which operates on a distinct frequency. Radios that are tuned to the same frequency can communicate with one another. As a result, frequency matching should be considered while looking for a communication channel. Military tactical ad hoc networks have several distinct characteristics that set them apart from commercial tactical ad hoc networks in terms of requirements, expectations, demands, and constraints [3]. These traits are linked to fundamental features such as dynamic topology, bandwidth shortage, and excessive latency. Layers of subnets are used in tactical networks and designed using Joint Wireless Tactical Radio Systems. The waveform (SRW) tier is used to create these subnets. It may be divided further into two sub-tiers out of which one can be used for soldier-tosoldier communication and the other can be considered for sensor networking. On top of that, there is the WNW tier where WNW stands for Wideband Networking Waveform. WNW is divided into two sub-tiers from which one creates local subnets to facilitate vehicle-to-vehicle communications, and the other one creates a single subnet for the whole scenario. To support the system of tactical airborne weapons, a stub network called JANTE (Joint Airborne Network-Tactical Edge) is maintained [9, 12]. Due to the intrinsic nature of radio transmission, nothing can guarantee that all communication nodes are one hop distant in wartime settings. In such circumstances, multi-hop architecture for an ad hoc environment is one of the most effective methods for resolving connection issues [10]. The control architecture should be distributed to confine the centralized functions to the best optimization levels. Due to their potential to be operational quickly and without any infrastructure, autonomous packet radio networks are critical. In battlefield scenarios, MANETs are more relevant to the lower echelon of the Brigade echelon due to the scenario-specific communication requirements and other limitations.
2 Challenges in Setting Up Army Tactical Networks In military networks, multicast and broadcast communication patterns are to be observed often. However, there are several concerns that must be addressed. Lossy communication medium and lack of acknowledgments are the major ones among them [2]. To develop highly scalable and fully distributed army tactical MANETs, many TDMA techniques have been proposed by researchers for enhanced resource reservation. They have some restrictions in terms of transmission dependability and radio resource allocation. However, none of them provide a means to verify the outcome of the transmissions done in the broadcast manner that involve explicit acknowledgment signals from receivers. It is a significant issue provided the radio medium’s lack of dependability. To deal with this issue, a network coding approach
334
Z. Patel et al.
Fig. 2 Major challenges in the deployment of army tactical MANETs [17]
that is packet loss resistant by default can be used [14]. As a packet may appear in several coded packets, it has numerous chances to reach its destination. In addition, network coding serves as an implicit method for recognizing broadcast signals. Another important issue to consider here is efficiency. Several transmissions make it probable that a node may receive multiple copies of a message when multicast or broadcast messages are relayed in a tactical military MANET. It is critical to prevent a node from broadcasting the same message more than once in order to maintain effective bandwidth consumption. As a result, duplicate messages must be identified and removed [4, 5]. This strategy is commonly known as DPD (Duplicate Packet Detection). Multicast/broadcast messages are uniquely recognized due to the DPD mechanism. In this case, intermediary network nodes rely on network coding to broadcast linear combinations of incoming packets rather than forwarding them straight [15]. This procedure drastically lowers packet duplication in the network. Figure 2 presents the major challenges in the deployment of army tactical MANETs.
3 Army Tactical Networks: Design Requirements The fundamental requirements of military tactical networks are shown in Fig. 3 and explained in this section [6].
The Evolution of Ad Hoc Networks for Tactical Military … Fig. 3 Design requirements for military MANETs
335
Design Requirements
Mobility Multicasting Reliability Delay Constraints Availability Network Management
3.1 Mobility For the tactical application scenarios, mobility features like the speed of nodes, trajectory (from essentially static units to nomadic forces and swift mobile devices such as helicopters and aircraft), and functional needs (no function to perform while on the move, minimally functional, or fully functional while on the move) can vary widely. This kind of behavior can cause a number of issues with routing and, possibly, addressing systems. A few major issues are listed below: • Routing and, if necessary, addressing must account for user movement from one network to another or from one gateway to another • The movement of assets such as navy ships or aircraft may need to reinitiate routing • An individual mobile node or a set of mobile nodes may be momentarily disconnected from the network for a variety of reasons • Minimizing the number of replays of messages and reconfiguration of paths are critical tasks for maintaining maximum connectivity with optimized data flow and service quality The nodes may move randomly and frequently, necessitating major reconfigurations in the network routing. This requirement needs to be fulfilled with the help of tactical wireless networks as traditional Internet protocols may not be able to cope with massive, fast topological updates.
336
Z. Patel et al.
3.2 Multicasting Multicasting is an absolute requirement in tactical military communications. The ability to effectively disseminate information to dispersed network participants can be critical to a battle’s success. The following are the most important multicast requirements: • Transmissions are ordered by Command and Control (C2) • Dissemination of information about the situation • Efficient multimedia information communication The transmission of command and control adopts a standard framework with welldefined hierarchy-level relationships. The sender’s list of recipients is well-known and based on the sender-oriented mechanism. This is especially true for message handling and its adoption at the routing and transport layers as well as at the lower layers, for example, to make the most use of radio bandwidth [18]. The tactical situation influences how a situation or tactical scene awareness is disseminated. Users may have the right to obtain certain types of information. They are not always worried about every piece of information, despite these rights. This is more akin to a receiver-oriented architecture, in which every user chooses which types of data to receive. This is a service that is designed to be enforced. When entering a contact zone, for example, a military unit would be interested in knowing the positions of certain local enemies and entry violation warnings. Virtual conferences with audio, video and data exchange are facilitated via multimedia multicast. These meetings may or may not involve pre-defined user groups. During the conference, users can join the group. Depending on the type of conference, security constraints and organizational requirements can be determined [7]. In general, a conference master oversees the group composition (accept, deny, or withdraw), as well as the privileges for actions such as sending, receiving, speaking, or listening. All of these applications might be relevant to united joint forces, inclusive of both domestic and ally participants. To deliver essential information to a large number of people, multicast routing can use tactical radio communication channels [8]. In a tactical network, a multicast architecture is used for maintaining the flow of information.
3.3 Reliability For primary command and control, logistics, and nation intelligence communications including certain multicasting demands, reliability is crucial. Depending on the nature of the application, some errors or missing small chunks of data in the picture and video transmissions are acceptable. Some additional processes (encryption, data compression) are beneficial to dependability. After decoding or decompression, the impact of faults in transmissions may be amplified [23].
The Evolution of Ad Hoc Networks for Tactical Military …
337
3.4 Delay Constraints In terms of latency, information communicated for awareness of the situation (for example, sensed information consisting of geographic location, identity, and navigation) is modest in size but must be communicated in a matter of seconds. This necessitates the availability of the required bandwidth continuously available. Certain information transmission would get priority over other communications in the same network. Large multimedia data such as intelligence photographs or battlefield operation instructions, on the other hand, have lower delay limitations or no true delay guarantee. Several normal-sized (usually contain hundreds of bytes) communications with usual normal delay requirements of a few minutes are also supported in command and control or other logistic operations. Certain services, such as audio and video communication, require quasi-real-time performance. Multimedia services are particularly important to the tactical military. This ranges from low-latency, high-priority tactical information transmissions to digitally protected voice combined with applications such as whiteboards and position reporting for planning and scheduling missionrelated things such as detecting targets, identifying enemies, getting location information, etc. As tactical networks, especially radio networks, have limited capacity, resource reservation is critical for controlling access and authorization for such precious resources. Resource allocation and reservation schemes can help to avoid congestion for these restricted resources, in addition to ensuring the Quality of Service (QoS) parameters for multimedia communications.
3.5 Availability and Survivability Due to the tactical context, networks may encounter significant survival issues [13]. At any time, communication links and network nodes can go down, be destroyed, congested, or jammed. Reconfiguring them automatically while managing them in a mesh topology are essential requirements for improving tactical network availability. It is still possible to convey data through other channels if intermediary nodes become unavailable due to any reason. Adopting commercial as well as international standards up to the routing layer allows tactical functions to be supported using civil communications that survive in the event that military communications networks are severely damaged [24].
3.6 Network Management and Control In a networked system, mismanagement of certain functions could lead to system failures. Hence, such functions must be implemented in a redundant fashion to enable
338
Z. Patel et al.
the network to continue functioning even if certain management modules are not operational.
4 Security in Military Ad Hoc Networks Confidentiality, integrity, and availability are the three pillars of information security (CIA). Several network attacks have been introduced over time, each seeking to breach one or more of the CIA principles [22]. Different sorts of dangers can be classified as a result of these attacks. For tactical networks, there are two categories of risks to consider: Passive attacks and Active attacks. Passive threats or attacks rely on an attacker who does not generate any adverse effects directly but monitors the network or node behavior and indirectly attempts to malfunction the standard network operations [16]. Active threats or attacks in which adversaries actively implement certain actions that result in degrading the network performance.
4.1 Passive Threats Various forms of passive attacks are possible in tactical MANETs. A few major types of the same are described below: Traffic Analysis: To gain insight into the network structure and traffic patterns, an adversary gathers details about transmitted energy, traffic flows (protocol headers), packet sizes, and/or transmission durations. Because of their compact size, wireless bandwidth, and extended range, tactical networks face a major danger due to such passive analysis. Though the substance of messages cannot be read, the relative significance of nodes and the speed with which they operate may be assessed. Tactical networks are particularly vulnerable to this attack since it is easy to carry out with only a basic understanding of the network under observation [20]. Eavesdropping: It involves an adversary that studies the substance of communications in order to obtain the information transmitted over the network which puts tactical networks in jeopardy. The main target of the attackers in this kind of attack is to violate the rules of confidentiality to gain some unauthorized information. However, for an adversary to be successful, various levels of protection must be breached and that makes tactical networks comparatively less vulnerable to eavesdropping, though any breach of sensitive information can have a substantial impact on future network operations [20].
The Evolution of Ad Hoc Networks for Tactical Military …
339
4.2 Active Threats The adversary communicates at the tactical network’s frequency to launch active attacks in the network. As the attacker is actively involved and sometimes becomes a part of the network, it is very difficult to identify and isolate it. Some commonly encountered active attacks in the army tactical MANETs are described below: Denial of Service (DoS): An adversary denies or delays network services to authorized participants, aiming to degrade the overall network performance. DoS attacks can take several forms and can occur at any network layer. Sometimes, such attacks are executed in sequence to do the maximum damage in the given scenario. In such cases, the impact generated by one DoS attack will be used as the basis for initiating the next DoS attack [21]. This kind of attack can target almost all the entities of the network, ranging from individual network nodes to a particular region or group of nodes along with the communication links via which they are connected. Due to the complex behavior and execution patterns of DoS attacks, it is difficult to identify the presence of the same in tactical networks [20]. Masquerade: It involves an adversary imitating or obtaining one or more legitimate nodes inside a network in order to launch an attack (e.g. wormhole and Sybil). In tactical networks, where the chances of producing or capturing (and then effectively employing) a suitable platform are less, this attack is usually not observed. However, if such an attack is successful, precious information transferred over the network might be captured and perhaps manipulated, posing a severe threat to confidentiality and integrity [20]. Modification: An adversary modifies the content of an intercepted communication (e.g. node exposure and route modification) before passing it on. To do this, the attacker must be a network participant who has been authenticated. Modifications to a masquerade node are possible all the way down to the application layer. Tactical networks are unlikely to be penetrated at a high enough level to endanger the network’s confidentiality and integrity due to the various degrees of security at each network layer. This sort of attack is most likely to result in decreased availability [20].
4.3 Risk Analysis Based on which layer has been hacked, some preliminary assessments of the dangers listed in the preceding section may be made concerning their impact [17, 20]. A compromise at the physical layer necessitates knowledge about the physical medium in use (for example, the frequency of jamming or traffic analysis). Jamming has a significant influence on the availability of information in the jammed area, and potentially the entire network if the network is partitioned. The result is a loss of secrecy in traffic analysis (e.g. of network topology).
340
Z. Patel et al.
A compromise at the MAC layer necessitates an understanding of the protocols that negotiate access to the physical media (e.g. hello signal for node exposure). The hazards are different from MANETs since tactical networks generally employ TDMA rather than 802.11-based protocols. Although node exposure (loss of confidentiality) and replay attacks (loss of availability) are similar in tactical networks, the design of MAC protocols makes them less of a risk. A compromise at the network layer needs to have the knowledge of the participation in the routing technique in use (e.g. black-hole route creation sequence). It is less of a concern since the physical and MAC layers must be hacked first to compromise their functionalities. By interfering with traffic forwarding, non-cooperative nodes, such as grey-hole nodes, reduce the availability of information (denial of service). The secrecy of any information that comes within the range of the adversary is impacted by traffic analysis at the network layer. Finally, route manipulation (modification) has a major impact since it might prevent communication throughout the whole network [25]. Attacks on the upper network layers require both the information about the communicating applications (e.g. application layer formatting for traffic eavesdropping) and compromise of all lower layers. Such attacks can have a major impact on the network. The availability of the targeted node, as well as the intervening network, is affected by both flooding and sleep deprivation. Snooping at the application layer can cause significant damage since it is the only attack that compromises the message’s integrity [11]. Information security is entirely destroyed if an attacker can read and change packets in transit.
5 Security Mechanisms for Army Tactical Ad Hoc Networks Many researchers have proposed different techniques for securing military communications in Army Tactical Networks. A review and analysis of such techniques are presented in Fig. 4 and discussed in detail in this section.
5.1 A* Algorithm A* method is one of the traditional heuristic search algorithms which is dominant in resolving route-discovery-related issues. A* is expanded in order to accomplish mainly three objectives [6]. First, when looking for the best route that connects the source and the destination nodes, the A* method will first select the next node with the lowest routing metric. Secondly, A* is similar to a reactive routing protocol in that the route is discovered and established after communication has occurred. As a result, nodes do not need to keep track of the network’s up-to-date topology. As a
The Evolution of Ad Hoc Networks for Tactical Military …
341
Fig. 4 Security mechanisms for army tactical MANETs
result, the overhead caused by the process of routing will be greatly reduced. Lastly, A* does use the concept of frequency selection to neatly handle the aforementioned communication problem within the network. If noise in the surroundings is ignored, distance is the sole element that determines whether or not a communication link can be established. When the distance between communicating vehicles is shorter than the coverage area of the radios, the vehicles can successfully communicate with one another. Before initiating the route-finding procedure, it is needed to first determine the network’s topology by retrieving the neighbors of each node in the network deployment area and adding them to the nodes’ neighbor list. Neighbors of a node are the network participants with one-hop distance and operate on the same frequency as the given node. Before starting the route discovery phase, it is important to determine route and node metrics. Usually, paths having the shortest distance to connect two nodes are chosen. Moreover, the crucial aspects to consider while running the algorithms are: • Go through every radio available in the neighboring vicinity to assess their frequency
342
Z. Patel et al.
• Ensure that the chosen neighboring node operates on the same frequency as the given node. If this criterion is satisfied, then that neighboring node’s information is to be stored otherwise no entry in the neighbor table is to be made
5.2 Military Authentication Header The military authentication header protocol which is abbreviated as AH*, is similar to a standard protocol, IPsec. As the datagram travels the network, the AH* avoids undetected changes to the qualifiers that determine the Quality of Service in the IP header [4]. All the participating members of a group are able to assess the validity of messages if the group key is symmetric. The authorized nodes (group members) have the ability to change the Quality-of-Service qualifiers and modify the ICV of the algorithm appropriately. Changes done by outsiders are detected. Before making any forwarding choices, the AH*-enabled nodes verify the ICV when they receive a datagram. The AH* does not need to be evaluated by the destination. The datagram receives the appropriate QoS treatment if the ICV is valid. If not then other datagrams with valid ICVs get precedence. Afterward, the datagram is given to BE or simply discarded, the choice relies on network policy as well as the available capacity. Even if the ICV check fails, the service level agreement and policy may enable a portion of the datagram flow to be handled as per the stated Quality of Service. This might be useful in coalition networks with participants belonging to several security fields. The AH*’s security is equivalent to that of the regular AH, which is used for multicast communication. The secrecy of a group key is used by IPsec for multicast transmission. It is presumed that the techniques are cryptographically robust. Untrusted routers have the ability to forward the datagram. The secret key can still be used by the group members for multicasting and message authentication. If the AH is found to be insecure for multicast traffic, it implies that AH* is also not secure enough. AH* was analyzed using the formal verification Web tool Automated Validation of Internet Security Protocols and Applications (AVISPA), a web-based standard verification tool that is used to analyze the AH* algorithm.
5.3 Cooperative Phase-Steering Technique A spectrum sharing-based technique is proposed for military MANETs in [3]. In this technique, authors have considered a single secondary source node, a single secondary destination node, multiple secondary relay nodes, and multiple primary destination nodes with the cooperative phase steering (CPS) approach. The transmission power of the secondary source node and secondary relay nodes is carefully regulated in the proposed CPS approach such that interference to the primary destination nodes is kept below a specified threshold termed interference temperature. The authors claimed that this CPS approach outperforms the traditional opportunistic
The Evolution of Ad Hoc Networks for Tactical Military …
343
relay selection technique in terms of outage probability, especially when the number of secondary relay nodes is considerable, using computer simulations.
5.4 Ad Hoc On-Demand Distance Vector Routing Protocol The AODV protocol [1] combines the features of distance vector and on-demand route establishment strategies. In the distance vector routing strategy, every node has the knowledge of its neighbors and the distance to reach them in terms of the hop count. This information gets stored in the form of routing table entries, maintained at each node in the network. The routing table also maintains information about the routes to the possible destinations and intermediate nodes present on those routes. As AODV is a demand-based routing mechanism, a source node finds a route to a destination node only when it is required to connect with that node. This parameter guarantees that only the needed (active) routes are stored in each node’s routing tables, hence reducing memory overhead on the network nodes. Moreover, every node is responsible for periodic route maintenance, in which active paths are modified based on the most recent network scenarios, while paths that have become non-existent or have experienced broken connections are broadcasted as inactive and therefore get removed from the routing tables.
5.5 Backbone Formation in Military Multi-Layer Ad Hoc Networks Multilayer wireless ad hoc networks can be found in a variety of current environments. Such networks face hurdles in terms of their entities’ ability to communicate effectively and efficiently. The concept of linked dominant sets can be used to solve the challenge of designing energy-aware backbones for such network types (CDS). However, due to the inadequacy of standard methods for solving this challenge, researchers presented EclPCI, a novel centrality measure that may locate energyrich and central nodes for the topological nodes. An enhancement in this work, referred to as E2CLB has been developed using the concept of the centrality metric and is considered a distributed method for constructing an energy-aware connected dominant set [2].
5.6 Optimized Link-State Routing Optimized Link State Routing, abbreviated as OLSR [1], is a protocol similar to OSPF and can be considered as a pure link-state routing protocol that has been optimized.
344
Z. Patel et al.
It uses the concept of multipoint relaying (MPRs). Employing multipoint relays decreases the size of control messages. Instead of reporting all of its connections to all nodes in the network, a node announces just the multipoint relays with its neighbors. The usage of MPRs also reduces control traffic flooding. Only multipoint relays are capable of forwarding control messages. In this protocol, there is a significant reduction in the retransmissions of broadcast control messages.
5.7 Agile Computing Middleware Agile Computing Middleware (ACM) [19] is a set of integrated components developed to address tactical edge network communication difficulties. In a peer-to-peer network, ACM offers components for session mobility, dependable point-to-point connections, ad-hoc discovery services, and information dissemination.
5.8 Beacon-Based Routing Beacon-Based Routing, abbreviated as BBR, is a generalization of the Pulse Protocol, which was initially designed for infrastructure-based or sensor-based wireless networks. The BBR protocol is a lightweight ad hoc network protocol that is opportunistic and creates a single-spanning-tree structure for the majority of unicast routing and an estimate of the minimal connected dominant set (MCDS) for multicast routing at the same time. These structures are updated on a regular basis with a network overhead cost of M, the count of mobile nodes in the network. All other MANET routing techniques provide overhead that rises at the same pace as M2. These routing architectures are very efficient for tactical networks since the majority of traffic travels up the command hierarchy. BBR protocols for peer-to-peer communication employ a single spanning tree to bootstrap shorter pathways among peers, achieving higher efficiency in the routing process.
6 Conclusions Mobile nodes, which are equipped with wireless communication and serve as network entities form a temporary network. Such networks can be deployed in various application scenarios, particularly when the establishment of conventional wired or wireless networks is not feasible. As usually, military communications take place in remote and not easily accessible regions, most of the time, maintaining network connectivity becomes a major issue. Due to the basic characteristics of ad hoc networks, they are suitable to be used for networking in army tactical communications. If military tactical networks are designed as ad hoc networks, they do not rely on any fixed
The Evolution of Ad Hoc Networks for Tactical Military …
345
infrastructure, can be organized quickly, and do not need any central authority for maintaining network operations. Because of all these benefits of ad hoc networks, they are highly appropriate for army tactical networks. However, there are some characteristics of MANETs that make them vulnerable to a variety of attacks. Moreover, depending on the deployment scenario, the network parameters and requirements also change. As security is one of the major concerns in military communications, it is required to study and analyze the security goals, requirements, and threats very carefully. It not only helps to make network communications secure but also offers benefits for improved network performance. This paper presents the study of the fundamentals of MANETs concerning military communications and discusses the characteristics, requirements, objectives, threats, and existing solutions available for army tactical networking.
References 1. Plesse T, Adjih C, Minet P, Laouiti A, Plakoo A, Badel M, Muhlethaler P, Jacquet P, Lecomte J (2005) OLSR performance measurement in a military mobile ad hoc network. Ad Hoc Netw 3(5):575–588 2. Papakostas D, Eshghi S, Katsaros D, Tassiulas L (2018) Energy-aware backbone formation in military multilayer ad hoc networks. Ad Hoc Netw 1(81):17–44 3. Lee S, Youn J, Jung BC (2020) A cooperative phase-steering technique in spectrum sharingbased military mobile ad hoc networks. ICT Express 6(2):83–86 4. Khanpara P (2014) A review on fuzzy logic-based routing in ad hoc networks. Int J Adv Res Eng Technol 5(5):75–81 5. Hegland AM, Winjum E (2008) Securing QoS signaling in IP-based military ad hoc networks. IEEE Commun Mag 46(11):42–48 6. Jiang H, Ma Y, Hong D, Li Z (2014) A new metric for routing in military wireless network. Int J Model, Simul, Sci Comput 5(02):1450001 7. Trivedi R, Khanpara P (2021) Robust and secure routing protocols for MANET-based internet of things systems—a survey. In: Emergence of cyber-physical system and IoT in smart automation and robotics. Springer, Cham, pp 175–188 8. Khanpara P, Valiveti S, Kotecha K (2010) Routing in ad hoc network using ant colony optimization. In: International conference on future generation communication and networking. Springer, Berlin, pp 393–404 9. Szajnfarber Z, McCabe L, Rohrbach A (2015) Architecting technology transition pathways: insights from the military tactical network upgrade. Syst Eng 18(4):377–395 10. Papakostas D, Basaras P, Katsaros D, Tassiulas L (2016) Backbone formation in military multi-layer ad hoc networks using complex network concepts. In: MILCOM 2016–2016 IEEE military communications conference. IEEE, pp 842–848 11. Shah M, Khanpara P (2019) Survey of techniques used for tolerance of flooding attacks in DTN. In: Information and communication technology for intelligent systems 2019. Springer, Singapore, pp 599–607 12. Amdouni I, Adjih C, Plesse T (2015) Network coding in military wireless ad hoc and sensor networks: experimentation with GardiNet. In: 2015 international conference on military communications and information systems (ICMCIS). IEEE, pp 1–9 13. Khanpara P, Trivedi B (2018) Survivability in MANETs. Int J Adv Res Comput Eng Technol (IJARCET) 7(1)
346
Z. Patel et al.
14. Cole RG, Benmohamed L, Cansever D, Doshi B, Awerbuch B (2008) Gateways for mobile routing in tactical network deployments. In: MILCOM 2008–2008 IEEE military communications. IEEE, pp 1–8 15. Morelli A, Provosty M, Fronteddu R, Suri N (2019) Performance evaluation of transport protocols in tactical network environments. In: MILCOM 2019–2019 IEEE military communications conference (MILCOM). IEEE, pp 30–36 16. Khanpara P, Trivedi B (2017) Security in mobile ad hoc networks. In: Proceedings of international conference on communication and networks 2017. Springer, Singapore, pp 501–511 17. Burbank JL, Chimento PF, Haberman BK, Kasch WT (2006) Key challenges of military tactical networking and the elusive promise of MANET technology. IEEE Commun Mag 44(11):39–45 18. Carter M (2005) A review of transport protocols as candidates for use in a tactical environment 19. Poltronieri F, Fronteddu R, Stefanelli C, Suri N, Tortonesi M, Paulini M, Milligan J (2018) A secure group communication approach for tactical network environments. In: 2018 international conference on military communications and information systems (ICMCIS). IEEE, pp 1–8 20. Kärkkäinen A (2015) Developing cyber security architecture for military networks using cognitive networking 21. Kim J, Biswas PK, Bohacek S, Mackey SJ, Samoohi S, Patel MP (2021) Advanced protocols for the mitigation of friendly jamming in mobile ad-hoc networks. J Netw Comput Appl 1(181):103037 22. Reynolds N (2021) Getting tactical communications for land forces right. RUSI J 14:1–2 23. Rahimunnisa K (2019) Hybrdized genetic-simulated annealing algorithm for performance optimization in wireless adhoc network. J Soft Comput Paradigm (JSCP) 1(01):1–13 24. Smys S, Raj JS (2019) Performance optimization of wireless adhoc networks with authentication. J Ubiquitous Comput Commun Technol (UCCT) 1(02):64–75 25. Khanpara P, Bhojak S (2022) Routing protocols and security ıssues in vehicular Ad hoc networks: a review. J Phys: Conf Ser 2325(1):012042. IOP Publishing
Modified Floating Point Adder and Multiplier IP Design S. Abhinav, D. Sagar, and K. B. Sowmya
Abstract The advent of digital circuits made it easy to realize many mathematical operations using the binary Boolean functions. The only problem was that, the mathematical operations were able to run at a significantly high speed with great accuracy for unsigned or signed integer numbers. Various architectures were able to accelerate the performance of mathematical operations on integers. But most of the real-world problems required operation with real numbers and hence either fixed point or more importantly floating-point arithmetic was necessary. It is possible to run different algorithms to execute operations on floating point numbers in an architecture designed for integer numbers. But the amount of CPU cycles required increases substantially with the complexity of problems involved, the precision required and accuracy as well. It is possible to develop architectures for fixed-point real numbers. But the reusability of the architecture is quite difficult for higher precision. For high accuracy or precision, the architecture hardware utilization increases exponentially. The conversion of fixed point to floating-point increases hardware as well. Hence, we have the IEEE 754 format for single/double precision floating-point architecture. The algorithms provide a fixed register architecture thereby enabling ease of architecture design along with certain acceleration if required. The algorithm predominantly consists of registers to hold the sign, exponent and mantissa along with modules to perform the required arithmetic operations, normalization modules, rounding off modules and bypass circuitry to accelerate addition/subtraction, multiplication with 0, 1 etc. The IP designed here for adder and multiplier, follows the IEEE 754 format for single precision. Bypass modules have been designed for adder and multiplier for addition and multiplication with zero. The final output is available after once clock
S. Abhinav (B) · D. Sagar · K. B. Sowmya RV College of Engineering, Bengaluru, Karnataka 560059, India e-mail: [email protected] D. Sagar e-mail: [email protected] K. B. Sowmya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_25
347
348
S. Abhinav et al.
cycle. This clock is necessary to load the registers with final answers. The normalization modules use chain structure of multiplexers to select the required inputs based on the comparator and for easy swapping. The complete modules by default, compute zero (reset) if inputs are not applied. The control logic involves design with basic gates to measure the direction of normalization and thereby faster computation and less hardware overhead. Universal Shit Registers are used to enable bidirectional shift in normalization of multiplication and shift registers are used for addition operation. Keywords Floating point adder · IP · Multiplier
1 Introduction The real-world signals we encounter are analog in nature. The primary communication with the external world is using analog signals. The use of digital signals arises due to various advantages among them being noise immunity and fast computation architectures and easy development of architectures. The architectures for digital computers or for any digital computation is for integer computation. The conventional procedure to compute floating-point arithmetic involved using the integerbased architecture with software support (algorithms). But the algorithms consumed large CPU cycles [6, 7, 9, 11]. This reduced the throughput of the CPU. To increase the throughput, efficient algorithms have been developed [10, 12], but the CPU cycle consumption reduction is quite feeble. The solution has always been to develop a complete floating-point unit separately operating on data or to develop floatingpoint architecture CPUs. The latter complexity is tremendous and hence, the former is preferred. The CPU off-loads floating-point arithmetic to this complete module. Hence, as a designer it is quite challenging to design a dedicated module for floatingpoint arithmetic which is highly efficient, low power, low area and is as fast as the ALU. The IEEE 754 format for single/double precision floating-arithmetic facilitate an easy architecture based on the algorithm. The algorithm is a multi-cycle-based algorithm. The design takes multiple cycles to evaluate the output but consumes significantly lesser cycles compared to an integer-based architecture with software support. Today many processors and SoC use a floating-point compute unit separately. The architecture of this unit may involve multiple architectures supporting both fixed-point and IEEE standard floating-point as well. It has become necessary to use the design as a separate unit, so that it could be used by different modules or designs. In a modern SoC, there are different compute designs apart from the processor like DSP, Video encoder/decoder, security designs, [8, 13, 14] etc. Hence floating-point unit should be able to accommodate the necessary architectures to facilitate the usage by these other compute designs. It is possible to modify the IEEE standard to user requirements and architecture. But this reduces the reusability of the design. This flexibility to modify the IEEE standard is helpful in easy design of architectures for DSP and other compute
Modified Floating Point Adder and Multiplier IP Design
349
designs whereas the trade-off is the reusability and technical aspects like precision and accuracy and overall features of final product being developed, [15].
2 Literature Survey The Table 1, represents the literature survey for the work done.
3 Methodology The IP design involves a high-speed Vedic multiplier and hybrid adder developed from Carry Lookahead Adder and Ripple Carry Adder as depicted in Figs. 1 and 2. The multiplier is designed using the hybrid adder; hence it is possible to use a full-custom design for the adder if necessary. The design involves usage of certain common design elements like multiplexers, xor arrays for equivalence checks, universal shit registers. The control circuitry is developed based on the following logic for multiplier. • The normalising circuitry has a 49-bit universal shift register (USR) to accommodate the product of the mantissa. • The control signals should be able to normalise the mantissa by either shifting right or left and increment or decrement the exponent. • This is achieved by first comparing the exponents. The unequal exponents undergo normalisation. • Based on the comparator output signals, the exponent is either incremented or decremented, while the mantissa product is shifted right or left respectively. • The control signals are generated by just considering the left most 4-bits of the 49-bit USR. • If the leftmost ‘one’ is present in 49 and 48th bit, the exponent is incremented and USR is shifted right. • If leftmost ‘one’ is present in 47th bit, no shifting and increment/decrement. • If leftmost ‘one’ is present in positions below 47th bit, left shift and decrement is performed. • Hence the overall cycles required to perform this operation is 2 + N cycles, where N is the difference between exponents. • For any input ‘zero’, the output is made ‘zero’. The control circuitry for adder normalization is as follows. • The addition operation is valid only if both exponents are equal. Hence a comparator is used to check for equality. • The unequal exponents are then processed through a series of multiplexers to store the smaller exponent for normalization.
The proposed architecture using reversible multiplier followed by reversible adder. Based on conventional digital counterpart we designed reversible Sigmoid. With the 5 Fredkin gates we have implemented reversible Full-Adder design 3 peres gates and 8 reversible adder is used by reversible multiplier through combination, the implemented module using VHDL and the results are obtained using Altera 10GX FPGA The implementation of reversible exponential function block is saved in the context. The input represented as an addition of an integer part and a function part INPUT = input + (input/2) The functionality of sigmoid based on fixed-point arithmetic is Sigmoid = 1/(1 + 2(1.5*Input) In this type Conventional design uses parallelism to explore within a single 2-D kernel while expanding a Kx * Ky kernel sized window the multiple data which shows single clock cycle uses rows and column directions, Same as the PE architecture the size of the kernel Kx * Ky will not fit, So many CFPGA optimize the designs architecture using 3*3 kernal for the sizes kernel weights the requirement of bandwidth be high as 8192 bit/cycle
2 [2], Y. Yu, C. Wu, T. Zhao, K. Wang and L. He, “OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks,” in IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, vol. 28, no. 1, pp. 35–47, Jan. 2020,https://doi.org/10.1109/TVLSI.2019.293 9726
Techniques
1 [1], K. Khalil, B. Dey, A. Kumar and M. Bayoumi, “A Reversible-Logic based Architecture for Convolutional Neural Network (CNN),” 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 2021, pp. 1070–1073, https://doi. org/10.1109/MWSCAS47672.2021.9531842
Author and year
Table 1 Literature survey table
(continued)
This paper presents a Field-y programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration
In this paper the Convolutional-Neural-Network (CCN) which is used for solving complex image classification or computer vision problems. in various application CNN and more complex architecture are used domains such as object detection, self-driving cars, instance segmentation, Optical Character Recognition (OCR), surveillance and security systems, etc
Remarks
350 S. Abhinav et al.
In this the Function read stage generates voltage drop of BL (delVbl(W)) is directly propotional to the weighted sum of stored data. The voltage drop is generated by simultaneous application The DIMA need Bw L/2 × cycles to read the number of bits compared to a standard SRAM. The RDL accumulates the partial sums computed by N bank if needed, the sigmoid function which is implemented using a piece-wise linear approximation composed 3 addition and 2 shifts Y1Dn(p,q) = W1D,nm(i)X1D,m(p,q,i) In this work, a novel convolution architecture based on optimal Winograd’s 2D convolution algorithm utilizing sharing of partial input transformation data for CNN is designed. The Convolutional architecture comprises of Input Data Feeder. Filter Transformation (FT) block, and Convolutional unit. The data in BRAMs are stored in a pattern that is different from the data pattern fed to the CU it converts data to the pattern required at the input of the convolutional unit
4 [4], K. A. Shahan and J. Sheeba Rani, “FPGA based convolution and memory architecture for Convolutional Neural Network,” 2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID), 2020, pp. 183–188, https://doi. org/10.1109/VLSID49098.2020.00049
Techniques
3 [3], M. Kang, S. Lim, S. Gonugondla and N. R. Shanbhag, "An In-Memory VLSI Architecture for Convolutional Neural Networks," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 3, pp. 494–505, Sept. 2018, https://doi.org/10.1109/JETCAS.2018.2829522
Author and year
Table 1 (continued)
(continued)
The paper suggest that Convolutional Neutral network (CNN) are widely used in vision-based applications to increase the performance but at the cost of higher storage and increases in computation. An on-chip memory bank reuse architecture is also utilized to reduce the number of memory read and write operations to off-chip memory
In this Paper it represents the energy-efficient and high though put architecture for convolutional neutral networks (CNN). System-level simulations using these models will detect accuracy up to 97% the MNIST data set, along with the 4.9 × and 2.4 × improvements in energy efficiency and throughput, respectively, leading to 11.9 × reduction in energy-delay product as compared with a conventional (SRAM + digital processor) architecture
Remarks
Modified Floating Point Adder and Multiplier IP Design 351
5 [5], Y. Chou, J.-W. Hsu, Y.-W. Chang and T.-C. Chen, “VLSI Structure-aware Placement for Convolutional Neural Network Accelerator Units,” 2021 58th ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1117–1122, https:// doi.org/10.1109/DAC18074.2021.9586294
Author and year
Table 1 (continued) The multiplier has BX pull-down paths to each bits Multiplication begins by disabling the precharge path and enabling the switch mult_on The prechared capacitance Cm for Tpulse enabling switch pulse_on the binary data is correspondingly xi = 1. The corresponding discharge current of the LSB position (i = 0) given by I0(Vin) = bVin + c within the dynamic range of Vin = 0.6 V-to-1 V. Provided that the Tpulse _ CmRm(Vin;X), where Rm(Vin;X) is a discharge path resistance, the output voltage drop _Vm
Techniques
In this paper it represents the AI-dedicated hardware designs are growing dramatically for various AI applications. These designs often contain highly connected circuit structures, reflecting the complicated structure in neural networks, such as convolutional layers and fully-connected layers. This paper proposes a novel placement framework for CNN accelerator units, which extracts kernels from the circuit and insert kernel-based regions to guide placement and minimize routing congestion
Remarks
352 S. Abhinav et al.
Modified Floating Point Adder and Multiplier IP Design
353
Fig. 1 Flowchart of floating-point adder
• In the case of addition algorithm, only right shift is required. • Hence for every increment in the smaller exponent, the respective mantissa is shifted right. • A ready signal is generated after normalization process, the exponent is stored in a register. • This ready signal is used to perform valid mantissa addition. • After the addition of mantissa, the sum is checked for normalization. • The carry out of the adder is stored in a flip-flop. • If the carry is ‘zero’, the final value of sum is the normalised value. • If the carry is ‘one’, the exponent had to be incremented once and mantissa has to shifted right once. • Hence, the overall procedure takes, N + 4 cycles. The Figs. 3 and 4 below represents the elaborated designs of both the adder and multiplier.
4 Results The floating-point adder and multiplier IPs were designed in verilog HDL and synthesized without any critical warnings or errors. The pictures below represent the various results obtained for performance analysis.
354
Fig. 2 Flowchart of floating-point multiplier
Fig. 3 Floating-point multiplier elaborated design
S. Abhinav et al.
Modified Floating Point Adder and Multiplier IP Design
355
Fig. 4 Floating-point adder elaborated design
The synthesized design in Fig. 5, above is observed on the Artix 7 Basys 3 FPGA board. The synthesized design will certainly occupy large area due to the fixed size register architecture of IEEE 754 algorithm. This can be evident that for an ASIC design the area for this IP will be sufficiently large and more metal layers maybe required to facilitate larger drive strength to output buffers. Due to large combinational computations present, the power budgeting of floating-point arithmetic IP will be quite higher. The synthesized resource utilization report generated after synthesis is shown in Fig. 6. The floating point multiplier uses less than 1% of the total slice logic cells. This enables designers to facilitate system designs with FPGA to use simple processor core or multiple cores, therby able to realise many different IPs on the FPGA. This provides easy prototyping of a system. The Fig. 7, represents the functional verification of floating-multiplier. The outputs and all the registers in the entire system is reset to ‘zero’ when the reset signal is active high. This is necessary to ensure proper working of the IP and to avoid premature computation. The inputs are sampled at the negative edge of the clock. The clock is of 50% duty cycle, thereby providing half cycle duration to perform necessary combinational computations. Once the next clock edge is encountered, the results are registered. The process of normalization is performed in the next clock cycle, thereby synchronicity is maintained. The multiplier takes 2 clock cycles to produce final output for exponents of same value. The output of the IP is zero when the product is zero and the same is observed in the last section of the waveform towards the right. The synthesized design in Fig. 8, above is observed on the Artix 7 Basys 3 FPGA board. The synthesized design will certainly occupy large area due to the fixed size register architecture of IEEE 754 algorithm. This can be evident that for an ASIC design the area for this IP will be sufficiently large and more metal layers maybe required to facilitate larger drive strength to output buffers. Due to large combinational computations present, the power budgeting of floating-point arithmetic IP will be quite higher. The synthesized resource utilization report generated after synthesis is shown in Fig. 9. The floating point multiplier uses less than 1% of the total slice logic cells. This
356
S. Abhinav et al.
Fig. 5 Floating-point multiplier synthesized design in Artix 7 Basys 3 FPGA board
enables designers to facilitate system designs with FPGA to use simple processor core or multiple cores, therby able to realise many different IPs on the FPGA. This provides easy prototyping of a system. The Fig. 10, represents the functional verification of floating-adder. The outputs and all the registers in the entire system is reset to ‘zero’ when the reset signal is active high. This is necessary to ensure proper working of the IP and to avoid premature computation. The inputs are sampled at the negative edge of the clock. The clock is of 50% duty cycle. The processing of all new data in combinational designs are performed at positive edge of the clock. The results are stored at negative edge of the clock to provide enough time duration to compute control signals which perform necessary increment or decrement and shifting operation in the next positive edge of the clock. The output takes a minimum of 2 clock cycles to provide output when the exponents are equal. It takes a minimum of 4 + N clock cycles when the exponents are unequal. The comparision results are shown in below Table 2.
Modified Floating Point Adder and Multiplier IP Design
Fig. 6 Resource utilization of Slice LUTs and Slice Registers for floating-point multiplier
Fig. 7 Output waveform of functional verification of floating-point multiplier
357
358
S. Abhinav et al.
Fig. 8 Synthesized design of floating-point adder in Artix 7 Basys 3 FPGA board
Fig. 9 Resource utilization of Slice LUTs and Slice Registers for floating-point adder
Modified Floating Point Adder and Multiplier IP Design
Fig. 10 Output waveform of functional verification of floating-point adder
359
360
S. Abhinav et al.
Table 2 Result comparison table Papers 1
2021, pp. 1070-1073, doi: 10.1109/MWSCAS47672.2 021.9531842
Comparison No. of slice used 515 Floating point multiplier uses more than 3% of the total logic cells
193 floating point multiplier uses less than 1% of the total slice logic cells.
No. of Register Used 103
67
Area The synthesized design will occupy large area
The synthesized design will certainly occupy large area due to the fixed size register architecture of IEEE 754 algorithm.
Other Comparison The multiplier takes 6 clock cycles to produce final output for exponents of same value
The multiplier takes 2 clock cycles to produce final output for exponents of same value
It takes a minimum of 2+N clock cycles when the exponents are equal.
It takes a minimum of 4+N clock cycles when the exponents are equal.
The normalizing circuitry has a The normalising circuitry has a 32-bit universal shift resister 49-bit universal shift register 35-47, Jan. No. of slice used 2 pp. 2020,doi:10.1109/TVLSI.2019 487 193 .2939726 Floating point floating point multiplier uses more multiplier uses less than 1% of the total than 1% of the total logic cells slice logic cells.
No. of Register Used 173
Area The synthesized design will occupy large area
67
The synthesized design will certainly occupy large area due to the fixed size register architecture of IEEE 754 algorithm.
Other Comparison The multiplier takes 8 clock cycles to produce final output for exponents of same value
The multiplier takes 2 clock cycles to produce final output for exponents of same value
It takes a minimum of 2+N clock cycles when the exponents are equal.
It takes a minimum of 4+N clock cycles when the exponents are equal.
The normalizing circuitry has a 64-bit universal shift resister
The normalising circuitry has a 49-bit universal shift register
Modified Floating Point Adder and Multiplier IP Design
361
References 1. Khalil K, Dey B, Kumar A, Bayoumi M (2021) A reversible-logic based architecture for Convolutional Neural Network (CNN). IEEE international Midwest Symposium on Circuits and Systems (MWSCAS) 2021:1070–1073. https://doi.org/10.1109/MWSCAS47672.2021. 9531842 2. Yu Y, Wu C, Zhao T, Wang K, He L (2020) OPU: an FPGA-based overlay processor for convolutional neural networks. IEEE Trans Very Large-Scale Integr (VLSI) Syst 28(1):35–47. https://doi.org/10.1109/TVLSI.2019.2939726 3. Kang M, Lim S, Gonugondla S, Shanbhag NR (2018) An in-memory VLSI architecture for convolutional neural networks. IEEE J Emerg Sel Top-Ics Circuits Syst 8(3):494–505. https:// doi.org/10.1109/JETCAS.2018.2829522 4. Shahan KA, Sheeba Rani J (2020) FPGA based convolution and memory architecture for Convolutional Neural Network. In: 2020 33rd ınternational conference on VLSI design and 2020 19th international conference on embedded systems (VLSID), pp 183–188. https://doi. org/10.1109/VLSID49098.2020.00049 5. Chou Y, Hsu J-W, Chang Y-W, Chen T-C (2021) VLSI structure-aware placement for convolutional neural network accelerator units. In: 2021 58th ACM/IEEE design automation conference (DAC), pp 1117–1122. https://doi.org/10.1109/DAC18074.2021.9586294 6. Shu G, Liu W, Zheng X, Li J (2018) IF-CNN: image-aware inference framework for CNN with the collaboration of mobile devices and cloud. IEEE Access 6:68621–68633. https://doi.org/ 10.1109/ACCESS.2018.2880196 7. Farrukh FUD, Xie T, Zhang C, Wang Z (2019) A solution to optimize multi-operand adders in CNN architecture on FPGA.In: 2019 IEEE ınternational symposium on circuits and systems (ISCAS), pp 1–4. https://doi.org/10.1109/ISCAS.2019.8702777 8. Palaria M, Sanjeet S, Sahoo BD, Fujita M (2019) Adder-only convolutional neural network with binary ınput ımage. In: 2019 IEEE 62nd international midwest symposium on circuits and systems (MWSCAS), pp 319–322. https://doi.org/10.1109/MWSCAS.2019.8885354 9. Manatunga D, Kim H, Mukhopadhyay S (2015) SP-CNN: a scalable and programmable CNNbased accelerator. IEEE Micro 35(5):42–50. https://doi.org/10.1109/MM.2015.121 10. Kuo Y-X, Lai Y-K (2020) An efficient accelerator for deep convolutional neural networks. In: 2020 IEEE ınternational conference on consumer electronics. Taiwan (ICCE-Taiwan), pp 1–2. https://doi.org/10.1109/ICCE-Taiwan49838.2020.9258103 11. Zebin T, Scully PJ, Peek N, Casson AJ, Ozanyan KB (2019) Design and implementation of a convolutional neural network on an edge computing smartphone for human activity recognition. IEEE Access 7:133509–133520. https://doi.org/10.1109/ACCESS.2019.2941836 12. Liu Y, Yuan X, Liu W, Chen G (2012) Implementing dynamic reconfigurable CNN-based full-adder. In: 2012 13th ınternational workshop on cellular nanoscale networks and their applications, pp 1–5. https://doi.org/10.1109/CNNA.2012.6331403 13. Misganaw B, Vidyasagar M (2015) Exploiting ordinal class structure in multiclass classification: application to ovarian cancer. IEEE Life Sci Lett 1(1):15–18. https://doi.org/10.1109/ LLS.2015.2451291 14. Nagy Z, Szolgay P (2002) Configurable multi-layer CNN-UM emulator on FPGA using distributed arithmetic. In: 9th ınternational conference on electronics, circuits and systems, vol 3, pp 1251–1254. https://doi.org/10.1109/ICECS.2002.1046481 15. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 ınternational conference on engineering and technology (ICET), pp 1–6. https://doi. org/10.1109/ICEngTechnol.2017.8308186
Meta Embeddings for LinCE Dataset T. Ravi Teja, S. Shilpa, and Neetha Joseph
Abstract Language Identification in code-mixed social media text contest aimed at Multilingual Meta Embeddings (MME), a productive method to learn multilingual representations for Language Identification. Language mixing occurs at a sentence boundary, within a sentence, or a word in code-mixing. This paper proffers an MMEdriven language identification mechanism for code-mixed text. This study zeroed in on the comparison of different classifiers on Hindi-English code-mixed text data obtained from LinCE Benchmark corpus. LinCE is a centralized benchmark for linguistic code-switching evaluation that integrates ten corpora from four different code-switched language pairings with four tasks. Each instance in the dataset was a code-mixed sentence, and each token in the sentence was associated with a language label. Then we experimented with using different classifiers such as convolutional neural network, Gated Recurrent Unit, Long Short-Term Memory, Bidirectional Long Short-Term Memory, and Bidirectional Gated Recurrent Unit and we observed BiLstm outperformed well. A multilingual meta embedding technique was empirically evaluated for language identification. Keywords Code-mixing · Natural language processing · Language identification · Meta embedding
1 Introduction All embracing social media texts, such as Facebook, Instagram, and Twitter messages, have stirred up diverse inventive opportunities for access to multilingual information [1]. Concurrently, these have engendered diverse challenges in the Natural Language Processing (NLP) domain. The un-English populations do not always use T. R. Teja (B) · S. Shilpa · N. Joseph Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] S. Shilpa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_26
363
364
T. R. Teja et al.
Unicode to communicate in their native languages on social media; instead, they often use English elements, mixing in multiple languages. This mixing of languages occurs at a word or a sentence level. This phenomenon is often seen in a diverse languages, among the multilingual speakers within native language sentences. They embed foreign language vocabulary within native language sentences. Detection of code-mixed data and development of tools are vital in user-generated text analytics. Language Identification deals with pinpointing languages to which each word belongs in a code-mixed social media text. This research uses the meta embedding technique to recognize the language at the word level in code-mixed English-Hindi text [2].
2 Literature Survey Wide-ranging studies have been undertaken on the topic of code-mixing, besides language identification. Gustavo Aguilar, Sudipta Kar, and Thamar Solorio, a research team from the University Of Houston [3], detailed their findings on Linguistic CodeSwitching Evaluation (LinCE). LinCE provides datasets for various tasks like named entity recognition (NER), language identification, part-of-speech tagging (POS), and sentiment analysis. Recent addition of machine translation has enhanced the LinCE dataset. Datasets are available for Code-mixed Hindi-English, Modern Standard Arabic Egyptian-Arabic, Spanish English, and Nepali English. The submitted results were compared to a hidden test dataset, and ranks were assigned pursuant to the weighted F1-score metric. The research team experimented with Bidirectional (Bi) Long Short-Term Memory (LSTM) (BiLSTM), Embeddings from Language Models (ELMO), and Bidirectional Encoder Representations from Transformers (BERT models). The empirical studies showed that Bert’s outperformed the other models. The paper ‘Word Level Language Identification in Code-mixed data using Word Embedding Methods for Indian Languages’ focuses on the identification of different languages while processing code-mixed data [4]. The Feature Vectors were generated using Continuous Bag of Words (CBOW) and Skip-Gram models. These feature vectors were as inputs into machine learning (ML) algorithms like Random Forest, Gauss Naive Bayes, Adaboost, k-Nearest Neighbous (KNN), Logistic Regression (LR), and Support Vector Machine (SVM) for selection of the best classifier. However, the result was compromised by the limited corpus for CBOW. Veena et al. [5] demonstrate character-level features to obtain word-level embeddings [5]. Trigram and 5 g context features were fed into the SVM model for classification purposes in their approach. They observed identical performances by trigram and 5 g options. Winata et al. [6] introduced multilingual meta embeddings (MMEs) for solutions to Named entity recognition (NER) in the English-Spanish Code-mixed dataset. Meta Embeddings are combinations of two or more embeddings from different sources. The authors proposed the usage of the pre-trained English and Spanish Embeddings for code mixed as a linear combination. The multilingual embeddings were then fed
Meta Embeddings for LinCE Dataset
365
into the transformer with a CRF layer. They observed multilingual meta embeddings worked much better than embeddings. Autoencoding is an unsupervised approach that could be used for representation learning. In this work, [7] authors experimented with decoupling, averaging the meta embeddings, and concatenation followed by linear projection. The authors used CBOW and GloVe embeddings as source embeddings. They probed the effect of autoencoded meta embeddings on diverse tasks; Word Analogy, Semantic Similarity, Relation Classification tasks, and others. They observed that auto-encoded meta embeddings with non-linear transformation yielded better results than standard meta embeddings.
3 Proposed Method Figure 1 illustrates the proposed method. We have a dataset that is a mix of Hindi and English sentences from LinCE. In the pre-processing step, we used transliteration that transliterates words from Roman Script to Devanagari script and divided the dataset into two Hindi-English datasets. Then by using the meta embedding technique, we experimented different models.
Fig. 1 Proposed language identification mechanism
366
T. R. Teja et al.
3.1 DataSet Description The dataset was obtained from the LinCE Benchmark corpus that comprises codemixed English-Hindi sentences, whose language needs to be identified. The dataset was obtained from Twitter using the Twitter API. Each example in the dataset is a code-mixed sentence, and each word in the sentence is associated with a language label. The dataset is divided into three folds: training, development, and test set. The training set is comprised of 4832 instances, the validation set consists of 744 instances, and the test set is constituted of 1854 instances. Our model was evaluated on the test set by the Lince team. There are eight labels—lang1, lang2, ne, other, mixed, ambiguous,fw, and unk. Consider the following example: Input example: bhaijaan plz 1 reply. Output example: lang2 lang1 other lang1 In the above example, the sentence consists of code-mixed words in Hindi and English languages. Each word in the sentence is associated with a language label as follows bhaijaan—lang2 and plz—lang1.
3.2 Data Pre-processing As Language identification is a word-level task, the only pre-processing step we have used is transliteration. Words in Roman script are transliterated into Devanagari script. Algorithm 1 describes the pseudocode of the pre-processing.
3.3 Meta Embedding The NLP community is rife with numerous approaches to learn embeddings. Word embeddings inferred by using other techniques have different levels of performance. Instead of selecting one best embedding, we can combine different embedding sets. The process of creating a single word embedding from a set of pre-trained input word embeddings is known as meta embedding. As in Fig. 2, we use FastText and Word2vec embeddings. FastText using n-gram approach can provide embeddings to words it hasn’t seen during its training. We have used Facebook’s pre-trained FastText model. Using English and Hindi word embeddings obtained from the pretrained model, meta embeddings are built. Word embeddings of two languages are first projected into a common vector space of d and are added to obtain the meta embeddings [9].
Meta Embeddings for LinCE Dataset
367
Fig. 2 Proposed language identification mechanism
Algorithm 1: - Pseudo-code for the pre-processing method Input: input Dataset containing Code Mi xed Comments Output: Original Data set and transliterated data set //transliteration(): The transliteration API to transliterate words from Roman Script to Devangiri script, this API returns the word as it is if no transliteration exists for it. //split(): Used to split the given sentence based on given parameter //add(): This method adds an element to an already exiting list //join():This method joins all elements in a list and convert to a string //fileWriter(): This methods appends the given string at the end of an opened file f ile ← O pen(“transliterated.t xt ) for comment in input Dataset do wor d List ← Empt y List for wor d in comment.split () do wor d List.add(transliteration(wor d)) end wor d List. join() f ileW riter ( f ile, wor d List) end
3.4 Deep Learning Models Long Short-Term Memory (LSTM): Deep Learning Models LSTM, a recurrent type of artificial neural network (ANN), is endowed, with feedback links, unlike typical feedforward neural networks, to avoid long-term dependencies and gradient decent problems. LSTM is comprised of three gates: forget, input, and output. In our experi-
368
T. R. Teja et al.
Algorithm 2: - Pseudo-code for the proposed method Input: input Dataset containing Code Mi xed Comments Output: Classi f ied label o f each comment //transliteration(): The transliteration API to transliterate words from Roman Script to Devangiri script. //isemoji(): Used to identify whether a token is emoji or not. //WordEmbeddingExtractor(): This method uses FastText model to extract word embeddings from texts in Roman and Devanagiri scripts //metaEmbeddings():This method constructs meta embedding from mono lingual embeddings. //trainClassifiers(): Deep learning models like LSTM,BiLSTM,GRU and BiGRU are trained using input data set. //Predictions: Trained classifiers are used to derive predictions using sentences from the test data set
1: Roman Dataset, Devanagari Dataset ← transliteration(input Dataset)
2: englishW or d V ector s ←
W or d Embedding E xtractor (Roman Dataset)
3: hinW or d V ector s ←
W or d Embedding E xtractor (Devanagari Dataset)
4: meta Emb ←
meta Embeddings(hinW or d V ector s, englishW or d V ector s)
5: classi f ier ← trainClassi f ier s(meta Emb)
6: Pr edictions ← classi f ier s(testdata)
ment, meta embeddings are fed into the LSTM and passed through a fully connected feedforward layer with a SoftMax to estimate the probability for classification. Gated Recurrent Unit(GRU): GRU is a variant of LSTM with a gated mechanism faster than LSTM. The GRU is similar to an LSTM with a forget gate; as it does not incorporate an output gate, it has fewer parameters. After passing through a spatial dropout, we feed the word embeddings into the GRU. Bidirectional LSTM: Bidirectional LSTM is similar to LSTM but has data flow in both forward and backward directions to remember complex dependencies. The hidden state from the BiLSTM is fed into a one-dimensional Conventional Neural Network (CNN) in our experiment. Bidirectional GRU: Bidirectional GRU is also similar to GRU. It uses two GRUs for forward and backward flow. Similar to BiLSTM, the hidden state from BiGRU is fed into one-dimensional Conventional Neural Network(CNN) [10].
3.5 Machine Learning Models We have experimented with the following Machine learning models using the same dataset to analyze the output [15].
Meta Embeddings for LinCE Dataset
1. 2. 3. 4.
369
Logistic Regression The Stochastic Gradient Descent Supported Vector Machine DecisionTree Classifier
Logistic Regression is a Machine Learning technique for classification problems. Regression uses a more complex cost function known as the ‘Sigmoid’ function. The model creates a regression model to predict the prospect of a given data input belonging to the ‘1’/‘0’ category. Here this model classifies languages into two classes. As more appropriate data is added, the algorithms enhance their capability to anticipate classifications within datasets. Then the second classifier is the Stochastic Gradient Descent Classifier, which uses SGD for training a linear classifier. It is used to identify the values of parameters/coefficients of functions that minimize a cost function. Gradient Descent is a convex function that returns the partial derivative of a set of input parameters. The higher the slope, the greater the gradient. Then we used a decision tree classifier that creates a decision tree. Each leaf node represents the outcome, and each internal node represents an attribute [11]. A decision tree recursively splits the data provided to learn decision-making rules. Here it has two subtrees for classifying the languages. The last model is the support vector machine, which is effective in highdimensional spaces. When the number of dimensions exceeds the number of samples, the method is still effective. Before using the SVM approach, we must label all of our input data appropriately. The SVM approach has the advantage of having a higher classification accuracy and the best analysis performance.
4 Experiments First, we collected the data from the LinCE benchmark. Then we cleaned the data by removing unwanted data like emojis and others. We have used a regular expression in Python for this task. Using google transliteration API, we converted the dataset into two, one in English and another in Hindi. We have used Bayesian optimization for hyper-parameter tuning [12]. Dropout ranges from 0.3 to 0.6. Hidden units for recurrent neural networks are between 80 and 150, with intervals of 10, and the Learning Rate value is between 0.00001 and 1. We used Optimizers, Adam, Adadelta, Adagrad, RMSProp, and SGD. Batch size ranges between 32, 64, 128, and 256, and Epoch values are 5, 10, 20, 30, 40, 50, 60, and 70. We have used Python programming language, scikit-learn, and Tensorflow libraries to build and train the classifiers.
370
T. R. Teja et al.
Table 1 Performance of monolingual embeddings on LinCE benchmark Model Word2Vec (F1-score) FastText (F1-score) 1D CNN LSTM BiLSTM GRU BiGRU
0.6615 0.7526 0.6857 0.7372 0.7307
0.6631 0.7568 0.7566 0.7487 0.6878
Table 2 Comparison of performance of monolingual embeddings and multilingual meta embeddings on LinCE dataset based on accuracy Model Word2Vec FastText Meta embeddings LSTM CNN BiLSTM GRU CNN BiGRU
0.7526 0.6857 0.7372 0.7307
0.7568 0.7566 0.7487 0.6878
0.9515 0.9557 0.9548 0.9550
5 Results We used monolingual embeddings from Word2vec and FastText to train models and observed the result. In our model, we did not back transliterate the data. Table 1 summarizes the performance of Monolingual Embeddings on the LinCE benchmark, and it’s clear that monolingual embedding is not alone sufficient for Language Identification. Here LSTM with FastText performed relatively better. We have taken the same English-Hindi dataset for this experiment, but we used only two labels for this classification. SVM and Decision tree gave an accuracy score of 0.93, and Fig. 6 shows the confusion matrix. Then we experimented with meta embeddings with Facebook’s FastText models. Table 2 summarizes the findings. When compared to monolingual embeddings, the results show that Meta Embeddings produce better results. Also, Figs. 3 and 4 show model accuracy and model loss representation of the CNN BiLSTM model. For this experiment, we used a dropout value of 0.36, 40 hidden units in BiLSTM, and Adam optimizer was chosen for training. Once hyperparameters were chosen, we feed the extracted meta embeddings to the BiLSTM-CNN model. The hidden state from the BiLSTM at each time step was used by 1D CNN for classification (Figs. 5 and 6). From Table 2, BiLSTM performed relatively better on the LinCE test data with an accuracy of 0.9557. We also observed that Adam is the most optimal optimizer for all models [13]. Then we used ML models with the same dataset; the result is shown in Table 3. The decision tree classifier and SVM classifier performed relatively better on data.
Meta Embeddings for LinCE Dataset
371
Fig. 3 Model accuracy
Fig. 4 Model loss Table 3 Performance of machine learning models on LinCE benchmark Model Accuracy (Training set) Accuracy (Test set) Logistic regression Stochastic gradient descent Decision tree Support vector machine
0.93300 0.87298 0.97443 0.97537
0.91751 0.86756 0.93401 0.93698
372
T. R. Teja et al.
Fig. 5 Confusion matrix Fig. 6 Confusion matrix
6 Conclusion The purpose of our study was to examine the meta embedding technique for language identification. Machine learning and deep learning algorithms designed to help with participation in the shared activity of hope speech detection in our experiments
Meta Embeddings for LinCE Dataset
373
conclude that CNN BiLSTM performed well, followed by CNN BiGRU. The result also shows that Meta Embeddings outperformed monolingual embeddings. We want to use Language Identification as a pre-processing step in our future research to transliterate words back to their native script and evaluate the performance of pretrained models such as mBERT and others. When we used the same dataset for ML models, SVM classifier performed well on our data. Compared to such a large model, the models we studied for our experiment require less computational resources. Yet, the results we obtained are similar to the results of such large models, and this emphasizes the need for the usage of meta embeddings.
References 1. Thara S, Poornachandran P (2018) Code-mixing: a brief survey. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2382– 2388 2. Sravani L, Reddy AS, Thara S (2018) A comparison study of word embedding for detecting named entities of code-mixed data in Indian language. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2375–2381 3. Aguilar G, Kar S, Solorio T (2020) LinCE: a centralized benchmark for linguistic codeswitching evaluation. arXiv:2005.04322 4. Chaitanya I et al (2018) Word level language identification in code-mixed data using word embedding methods for indian languages. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE 5. Veena PVM, Kumar A, Soman KP (2017) An effective way of word-level language identification for code-mixed facebook comments using word-embedding via character-embedding. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE 6. Winata GI, Lin Z, Fung P (2019) Learning multilingual meta-embeddings for code-switching named entity recognition. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019) 7. Bollegala D, Bao C (2018) Learning word meta-embeddings by autoencoding. In: Proceedings of the 27th international conference on computational linguistics 8. Thara S, Poornachandran P (2021) Transformer based language identification for malayalamenglish code-mixed text. IEEE Access 11(9):118837–118850 9. Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the international conference on language resources and evaluation (LREC 2018) 10. Sreelakshmi K, Premjith B, Amrita_CEN_NLP@ DravidianLangTech-EACL2021 KS (2021) deep learning-based offensive language identification in Malayalam, Tamil and Kannada. In: Proceedings of the first workshop on speech and language technologies for dravidian languages, pp 249–254 11. Ruder S (2017) An overview of multi-task learning in deep neural networks 12. Singh K, Sen I, Kumaraguru P (2018) Lan-guage identification and named entity recognition in hinglish code mixedtweets. In: Proceedings of ACL 2018, student research workshop, pp 52–58. https://doi.org/10.18653/v1/P18-3008 13. Sharma A, Gupta S, Motlani R, Bansal P, Shrivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline—Hindi-English code-mixed social media text. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. https://doi.org/10.18653/v1/N16-1159
374
T. R. Teja et al.
14. Khapra MM, Ramanathan A, Kunchukuttan A, Visweswariah K, Bhattacharyya P, When transliteration met crowdsourcing : an empirical study of transliteration via crowdsourcing using efficient, non-redundant and fair quality control. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14) 15. Gupta DK, Kumar S, Ekbal A (2014) Machine learning approach for language identification and transliteration. In: Proceedings of the forum for information retrieval evaluation
Recent Trends in Automatic Autism Spectrum Disorder Detection Using Brain MRI Triveni D. Dhamale and Sheetal U. Bhandari
Abstract Autism spectrum disorder (ASD) is a multifaceted developmental and psychological disability that consists of importunate challenges regarding non-verbal and speech communication, repetitive or restricted behavior and social interaction. Early detection of ASD can help to take proper curative and preventive measures to improve the health and lifestyle of the patients. Various machine learning-based and deep learning-based approaches have been presented in the past for the automatic detection of ASD. This paper presents the survey of a recent machine and deep learning approaches for ASD detection using brain Magnetic Resonance Images (MRI). It focuses on the methodology, feature extraction techniques, classifiers, database, and evaluation metrics of the various ASD detection approaches. The performance of several machine learning systems such as K-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Classification Tree (CT) is validated for ASD detection on ABIDE-I dataset. Finally, it provides the challenges, constraints and gives the future direction to enhance the performance of the various machine and deep learning-based ASD detection approaches. Keywords Autism spectrum disorder · Deep learning · Machine learning · Magnetic resonance ımages
T. D. Dhamale (B) Research Scholar, Pimpri Chinchwad College of Engineering, SPPU, Pune, India e-mail: [email protected] Assistant Professor, Department of E&TC, Pimpri Chinchwad College of Engineering and Research, Ravet, SPPU, Pune, India S. U. Bhandari Department of E&TC, Pimpri Chinchwad College of Engineering, Nigdi, SPPU, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_27
375
376
T. D. Dhamale and S. U. Bhandari
1 Introduction Autism spectrum disorder (ASD) is a critical psychological and behavioral developmental disorder which shows disability in social communication, non-verbal and verbal communication, and recurring behavior. As per the CDC, ASD can be diagnosed at the age of 2–3 years and one in 59 kids is anticipated to have ASD [1]. It is more common in male compared to female. Autism is an enduring medical stipulation and it has different symptoms for different patients. However, subjects with ASD can live fulfilling, productive and independent life [2]. Autism symptoms and severity fluctuates from person to person. ASD symptoms are grouped in two types such as communication and social interaction problems, and repetitive and restricted action behavior [3]. Some of the common symptoms found in children having ASD are given as: • Unable to respond to his/her name by one years of age • Unable to point at object of interest and focusing attention towards caller by 14 months • Unable to play “pretend game” at the age of 18 months • Preferring to be isolated, avoid cuddle, shows less emotions/interest and evade eye-contact during interaction/communication • Getting distress by inconsequential changes and showing intense/unusual reaction to taste, smell, look or feel • Rocking body, flapping hands, action repetition and spinning body in circles • Getting difficulty in mixing with people, understanding other people’s feelings or sharing personal feelings to other people • Verbal and non-verbal communication disorder • Trouble in adaptation of routine variations • Mild or noteworthy intellectual delay. ASD patients are subjected to risk of seizures, sensory integration disorder, specific learning disability, sleep disorders, depression, anxiety, auditory disorders and mental illnesses. Early ASD diagnosis helps to reduce the autism symptoms and to improve the lifestyle of ASD patients. No standard medical test is available for detection of ASD. Most of the time, ASD is manually diagnosed by professional experts by asking the particular questionnaire to the patients and comparing the responses with normal subject’s responses [4]. However, performance of manual ASD detection is limited due to poor knowledge or expert, fatigue, boredom, or reluctance to the test. Sometime visual inspection and validation are also employed for ASD detection which is time consuming, unreliable, and subjected to many issues regarding culture and environment [5]. Over the past decade, artificial intelligence has attracted wide attraction of researchers for the automatic ASD detection using various image modalities. ASD can be detected using eye gaze [6], brain MRI, facial expression [7, 8], motor movement/control pattern [9], stereotyped behavior [10], speech intonation/speech
Recent Trends in Automatic Autism Spectrum Disorder Detection …
377
Automatic ASD Modalities
MRI (fMRI / sMRI)
Facial Expression /Emotion
Eye gaze data
Motor Control / Movement pattern
Stereotyped Behaviours
Multi-modal data
Fig. 1 Various biometric modalities for ASD
emotion [11–13], electro-encephalogram (EEG) [14], or multimodal data [15] as shown in Fig. 1. Computer vision based systems which are non-invasive and low-cost techniques have shown better reliability, trust, efficiency and higher ASD detection rate [16]. Structural MRI (sMRI) and functional MRI (fMRI) give images of different parts of the brain which can be utilized to detect ASD and typical development (TD). MRI shows the variation in blood volume and blood flow in the brain. The fMRI images are generally utilized to explore the changes in the multifaceted non-social voice processing between TD and ASD subjects. Increase in temporal complexity shows larger variations in Heshl’s gyrus of the ASD patient and larger variations in anterolateral superior temporal gyrus of the TD subject. The fMRI also helps to analyze the blood oxygenation level in the brain. When any activity takes place in any part of the brain image then blood oxygenation level also varies which changes the magnetic state of the blood [17–19]. The structural MRI (sMRI) scans are utilized to develop the 3D model of the brain and to compute the white matter and gray matter volume to distinguish between TD and ASD patients [20]. White matter includes myelinated nerve fibers and gray matter includes tissues of the cerebral cortex. ASD detection based on sMRI uses various modalities such as voxel-based morphometry (VBM), Region of Interestbased morphometry (ROI-BM), tensor-based morphometry (TBM), surface-based morphometry (SBM), and longitudinal MRI studies [21, 22]. ROIBM shows that children with ASD has enlarged overall brain volume and amygdala volume whereas adults with ASD have minimized corpus-callosum volume [23, 24]. VBM analysis is performed using volume and density-based study of the brain region. In case of ASD subjects, gray matter volume increases but its density decreases in the temporal and frontal lobes of the brain [25, 26]. Longitudinal MRI approaches of the ASD detection shows abnormal increase trajectories in the temporal and frontal lobes of the brain in terms of variation in the volume of the brain region, changes in the corpus collosum area, changes in the grey and white matter, etc. [27]. SBM shows enlargement in cortical width in the parietal region of the brain [28]. Figure 2 shows the difference between cortical and sub-cortical volume of the brain MRI of ASD and TD subjects. The generalized supervised process of the ASD detection using brain MRI images consists of the training phase and the testing phase as shown in Fig. 3. During the training phase, the classification algorithm is trained using TD and ASD MRI
378
T. D. Dhamale and S. U. Bhandari
Fig. 2 Difference between brain volume of a ASD and b TD subjects [19]
data. The testing phase encompasses the preprocessing, feature extraction, classification and ASD detection stages [29, 30]. Preprocessing deals with enhancement and segmentation of the brain MRI images. Feature extraction finds the unique characteristics to distinguish the TD and ASD patient’s MRI. Various machine or deep learning classifiers can be employed to detect ASD. This paper offers a brief survey of recent machine and deep learning methodologies adopted for ASD detection using brain MRI images. It gives details about methodology, feature extraction techniques, classification methods, evaluation metrics, and dataset used for ASD detection. The remaining survey paper is arranged as follows: Sect. 2 gives the survey of various machine and deep learning based ASD detection systems based on brain MRI. Section 3 describes the gaps obtained from the comprehensive survey of ASD detection systems. Finally, Sect. 4 concludes the survey and gives future scope for the improvement of ASD detection systems using brain MRI. Brain MRI Testing Data
PreProcessing
Feature Extraction
Classifier
Testing Phase
Brain MRI Training Data
PreProcessing
Feature Extraction Training Phase
Fig. 3 Generalized glow diagram of the ASD system
Training Classifier
ASD Detection
Recent Trends in Automatic Autism Spectrum Disorder Detection …
379
2 Literature Review of ASD Various machine learning and deep learning-based techniques has been implemented for ASD in past. This section provides a survey of the recent work carried out on ASD detection using functional and structural MRI of the brain.
2.1 Machine Learning Techniques Traditional machine learning-based approaches encompas feature extraction and classification stages. The features obtained from various techniques are used to train the classifier. Fadi and Peebles [31] developed a mobile application for ASD using the rule based machine learning (RML) technique. They have collected data using mobile application based on Autism Spectrum Quotient (AQ) test and used rule based machine learning for classification. It is simple and easy method of data collection and required less recognition time. It has shown data imbalance problem that occured due to unequal samples of ASD and normal subjects and the results depend upon the social and communication behavior of the subject. Katuwal et al. [32] presented ASD using morpho-metric and intensity features and random forest classifier. They have used area, volume, thickness, curve index, folding index, mean intensity and standard deviation of intensity features of the brain MRI. They found that the total intracranial volume (TIV) of ASD is 5.5% larger than in non-ASD. Sen et al. [33] have presented a multimodal technique that combines features from fMRI and sMRI. The structural MRI features are computed using an autoencoder-based texture filter followed by the convolution neural network. Non-stationary independent component (PCA/ICA/k-PCA) features are extracted from function MRI. Accuracy depends upon site variations. It has shown less variation in spatial components of normal and ASD patient. Thomas et al. [34] have investigated that various region features and spectral features can be used to train the 3-D Convolutional Neural Network (CNN) to detect autism. Dekhil et al. [35] have extracted correlation features of fMRI and sMRI. Further, the KNN classier has been used to select the significant features. The combined concatenated features are then given to the Global Random Forest Classifier (GRF). It has given a higher prediction rate because of the combination of anatomical and functional connectivity abnormalities. It has several challenges due to data imbalance problem, availability of data under 8 years of age, and scarcity of coordination of scans for various MRI scanners, sequences and head motion. In [36], various structural measures of ASD MRI have been presented such as reduction and enlargement in cortical thickness of the left and right frontal lobe respectively, and significant increase in functional connectivity. Manciu et al. in [37] provide a Spectroscopic Analysis to show how Raman spectroscopy mapping may be utilized to directly identify neurotransmitters. This study addresses the requirement for novel, less invasive techniques in biomedical research. Badgaiyan [38] suggests a cutting-edge technique that broadens the scope
380
T. D. Dhamale and S. U. Bhandari
of human neuroimaging research by enabling the training of neuro-chemical variation related to brain functioning. Hugdahl et al. [39] discuss the divergence taking place in order to give a quick summary of the fMRI principles, which are based on the blood oxygenation level hooked up to MRS, acting as synaptic transmitters. The dataset is a brain MRI image. The rapid development of computer science, according to Liu et al. [40], has increased confidence in the capability to conduct extra precise evaluations of ASD using Eye Gaze Tracking. It is feasible to focus on a single occurrence, which is tough enough for professionals to do, while simultaneously ensuring repeatability over several instances by having the ability to examine the stimuli generated throughout each step of the therapy. This is the major advantage that makes cure possible. Peng et al. [41] claimed to have assessed the efficacy of the ELM and SVM algorithms. The drawback of using structural MRI data in this situation is that extensive brain sub-network damage is not taken into consideration. With the aid of a sizable brain subnetwork, ADHD is categorized. They utilized Free Surfer to measure a variety of brain parameters, including cortical thickness, and they were able to predict ADHD with an accuracy of 84.73% using the SVM algorithm and 90.18% using the ELM approach. They discovered that the ELM function is more accurate than the SVM function and may be used to detect brain disorders. They used a rigorous approach to distinguish affected people from controls, according to Cheng et al. [42]. Here, they concentrate on the difficulty of correctly classifying each person’s brain state for a sizable data collection. They employed SVM for grouping reasons, feature selection from each Pearson correlation, and partial correlation of the functional connection in order to perfect taxonomy. With an accuracy rate of 76.15%, they were able to identify ADHD using a sophisticated pattern recognition method. The neuro-anatomical profile of female infants with ASD emphasizes an untapped topic, according to Calderon et al. [43]. Each unique full brain volume was held and identified using voxel-based morphometry (VBM). The SVM-RFE technique was used to find the most separable voxels in GM components. They used the data file of 38 victims to compare the non-verbal IQ to controls. The development of grey matter in the left small frontal gyrus is hence hidden in ASD by VBM. The majority of ASD-related SVM investigations show that increased cerebral cortical opacity, and the corpus-callosum irregularity were often detected in ASD-related DTI investigations. The multi-parameter distribution strategy is used in this situation, according to Eckerd et al. [44], to discern the intricate intricacy of the grey matter’s architecture. Many different criteria have been obtained in order to identify the ASD patients. SVM was used as the classification method. The findings demonstrate that autism is truly complicated and has a variety of side effects. Future research into the genetic and neuropathological origins of ASD may benefit from the patterns found using SVM.
Recent Trends in Automatic Autism Spectrum Disorder Detection …
381
2.2 Deep Learning Techniques Recently, deep learning has attracted the computer vision expert’s for autism detection because of its higher discriminative nature to represent functional and structural characteristics of the brain MRIs. Heinsfeld et al. [45] presented autism spectrum disorder (ASD) using the deep learning method based on anti-correlation of brain function between anterior and posterior areas of brain. They have used two stacked de-noising auto-encoders for the pre-training of network and DNN for classification to minimize the problem of subjectivity and generalizability in a large database. It has shown the highest anti-correlation for Paracingulate Gyrus, and Supra-marginal Gyrus ASD subjects. Kong et al. [46] proposed DNN based on Autoencoder for the Autism Spectrum Disorder detection. They have used connectivity features in each pair of ROI as the raw input for the deep autoencoder. They have used Destrieux atlas to separate brain images into 148 cortial regions. Then they constructed an individual network for each subject and extracted the gray matter volume as features. Top ranked features obtained from the F-score algorithm are given to the DNN classifier. Sherkatghanad et al. [47] explored the Convolution Neural Network that uses correlation-based features for ASD detection. They have used the parallel architecture of CNN to learn the features from rs-fMRI images. Their method needs very few parameters and is less computationally expensive but suffers from data imbalance problem. Wang et al. [48] presented Multi-layer Perceptron and Ensemble Learning based on Multi-atlas deep feature representation based on stacked de-noising autoencoder (SDA). They have used stacked de-noising auto-encoder (SDA) for the multi-atlas feature extraction of brain function MRI and MLP-EL for the classification of ASD. In this, the effect of age and sex is not considered. Dvornek et al. [49] proposed Recurrent neural networks with long short-term memory (LSTMs) using fMRI time series for ASD detection. They have used RNN with LSTM for the feature representation of fMRI time series data. It has given better results for heterogeneous data. Anatomical regions with high influence may affect the network performance. Soussia and Rekik [50] explored autism detection using T1-weighted MRI based on High-Order Morphological Network Construction (HON) and supervised ensemble classifier. ˙In [51], fMRI has been used for early diagnosis of ASD which has shown noteworthy improvement over sMRI-based ASD detection. They have used structural information of distinct cortical regions to develop morphological brain network (MBN). The comparative study of various machine learning and deep learning based techniques for ASD is shown in Table 1.
3 Experimental Results and Discussions This section presents the validation of several machine learning classifiers such as SVM, KNN, Classification Tree (CT), and KNN for the ASD using GLCM, Local Binary Pattern (LBP) and Geometrical features. The extensive experiments
Modality
Mobile application
sMRI
sMRI and fMRI
Resting-state functional magnetic resonance imaging (rs-fMRI)
T1w MRI images
rs-fMRI
Author
Thabtah and Peebles [31]
Katuwal et al. [32]
Sen et al. [33]
Thomas et al. [34]
Kong et al. [46]
Sherkatghanad et al. [47]
Random forest classifier
Rule based machine learning (RML)
Classifier
ADHD-200 holdout data and ABIDE holdout data
112 Non-ASD and 115 ASD patients data
40 person database
Database
Correlation features
CNN
Connectivity Autoencoder features in each pair of ROI
ABIDE-I
ABIDE-I
Various regional and 3D CNN and SVM ABIDE-I and local connectivity, ABIDE-II features
Autoencoder, Support vector convolutional neural machine network, PCA
Morphometric and intensity features
Data collected from ASD test
Feature extraction
Table 1 Comparative analysis of the ASD techniques
Accuracy
Accuracy, Area Under Curve (AUC)
Accuracy (%)
Accuracy
Area under Curve (%)
Accuracy
Evaluation metrics
70.22% (continued)
Accuracy-90.39%, AUC-97.38%
SVM-63%, DCNN-62%
67.30% (ADHD-200 holdout data and 64.30% (ABIDE holdout data)
Intensity features (95%), cortical folding index (69%), cortical and subcortical volume (69%), and surface area (68%)
Adult-95%, Child = 90%, adolescent = 87%
Performance
382 T. D. Dhamale and S. U. Bhandari
Modality
sMRI and rs-MRI
fMRI
fMRI
T1-w MRI
Functional MRI (fMRI)
Author
Dekhil et al. [35]
Wang et al. [48]
Dvornek et al. [49]
Soussia and Rekik [50]
Heinsfeld et al. [45]
Table 1 (continued)
Posterior and anterior region functional anti-correlation
High-order morphological network construction (HON)
fMRI time series
Stacked denoising autoencoder (SDA)
Correlation matrix
Feature extraction
Database
Evaluation metrics
ABIDE-I
ABIDE-I
DNN
ABIDE
Supervised ABIDE-I ensemble classifier
RNN-LSTM
Multi layer perceptron and ensemble learning
Accuracy, sensitivity, specificity
Accuracy
Accuracy
Accuracy, sensitivity, specificity
K-nearest National database of Accuracy neighbor classifier autism research (NDAR)
Classifier
70% 74% 63%
61.69%
68.5%,
74.52%, 80.69%, 66.71%
fMRI-75%, sMRI-79%, fMRI + sMRI-81%
Performance
Recent Trends in Automatic Autism Spectrum Disorder Detection … 383
384
T. D. Dhamale and S. U. Bhandari
are carried out on the sMRI images obtained from the ABIDE-I dataset. The ABID-I consists of 544 and 531 images of ASD and TD. From the volume of MRI, a single MRI frame is selected for the experimentation that shows all the representations of the brain regions. The images are resized to 256 × 256 pixels dimension. Out of the total number of samples, 70% samples are selected for the training and 30% samples are chosen for the testing. For the performance evaluation of ASD detection, this work selects generalized texture descriptor techniques such as GLCM and Local Binary Pattern (LBP) features to represent the structural changes over the brain region due to ASD. Along with texture features, several geometrical features are selected to characterize and achieve the shape changes on the brain volume caused due to ASD. GLCM and LBP represent the textural information of the sMRI images and geometrical features are extracted for the corpus callosum region of brain sMRI. A total of 12 GLCM features are considered for the experimentation such as energy, entropy, homogeneity, contrast, autocorrelation, correlation, inverse difference, cluster shade, cluster prominence, mean, standard deviation, and dissimilarity. LBP provides the feature vector with 255 values representing the local texture variation over the LBP texture descriptor. Six geometrical features are considered to characterize the corpus callosum region of the brain such as area, perimeter, major axis length, minor axes length, solidity, and extent. The performance of different classifiers is validated using 5-fold, 10-fold and 15-fold cross validation accuracy. Table 2 provides the results of ASD detection on the ABIDE-I sMRI dataset. The experimental investigations show that for GLCM features the CT classifier provides significantly better results compared with KNN and SVM. Among all the investigated techniques, LBP-CT provides higher accuracy for the 15-fold cross validation accuracy. The geometrical features provide 89.72%, 64.45% and 56.76% accuracy for CT, KNN and SVM respectively for 5-fold cross validation. It is noted that the performance of KNN, SVM and CT is dependent upon the type and length of the features. Table 2 Experimental results for ASD on ABIDE-I (% Accuracy) Feature extraction technique GLCM
LBP
Geometrical features
Number of Features 12
255
6
CT
KNN (K = 3)
SVM (Linear)
5
87.87
55.99
56.1
10
87.94
50.38
56.58
K-Fold
15
87.77
55.52
56.5
5
98.59
52.3
63.54
10
98.61
59.3
63.61
15
98.57
59.3
63.41
5
89.72
64.45
52.76
10
89.59
64.20
56.1
15
89.68
62.30
56.12
Recent Trends in Automatic Autism Spectrum Disorder Detection …
385
4 Conclusions Thus, this paper presents the recent survey of various ML and DL-based techniques for ASD detection and classification using brain MRI images. Performance of ASD detection is highly relying on the human expert’s knowledge in case of manual diagnosis of ASD and needs larger time. Thus there is need of automatic detection of ASD using computational algorithm that can reduce the screening time and overcome the fatigue occuring during expert observation. In case of traditional machine learning based techniques, the performance of the system depends upon the type of input images, size of the database, type of feature extraction techniques, preprocessing techniques, etc. therefore, there is need for the development of the generalized architecture for ASD detection irrespective of the database, features, input size and type, etc. Various DL-based techniques have shown the improvement over ML-based techniques for ASD detection and classifications. DL-based techniques have shown better correlation, connectivity and improved representation of the raw data. Many DL-based ASD detection systems suffer from the class imbalance problem because of unequal samples of normal and ASD modality. Larger training time, testing time and complexity in hyper-parameter tuning of DL algorithms are major challenges in front of computer vision-based ASD detection using MRI images.
References 1. Autism Spectrum Disorder. https://www.cdc.gov/ncbddd/autism/index.html. Accessed 10 Feb 2021 2. Lord C, Cook EH, Leventhal BL, Amaral DG (2000) Autism spectrum disorders. Neuron 28(2):355–363 3. Faras H, Al Ateeqi N, Tidmarsh L (2010)Autism spectrum disorders. Annals Saudi Med 30(4):295–300 4. Lauritsen MB (2013) Autism spectrum disorders. Eur Child Adolesc Psychiatr 22(1):37–42 5. Shaw KA, Maenner MJ, Baio J (2020) Early identification of autism spectrum disorder among children aged 4 years—Early Autism and Developmental Disabilities Monitoring Network, six sites, United States, 2016. MMWR Surveill Summ 69(3) 6. Wieckowski AT, White SW (2017) Eye-gaze analysis of facial emotion recognition and expression in adolescents with ASD. J Clin Child Adolesc Psychol 46(1):110–124 7. Leo M, Carcagnì P, Distante C, Spagnolo P, Mazzeo PL, Rosato AC, Petrocchi S, Pellegrino C, Levante A, De Lumè F, Lecciso F (2018) Computational assessment of facial expression production in ASD children. Sensors 18(11):3993 8. Loth E, Garrido L, Ahmad J, Watson E, Duff A, Duchaine B (2018) Facial expression recognition as a candidate marker for autism spectrum disorder: how frequent and severe are deficits? Mol Autism 9(1):1–11 9. Rad NM, Kia SM, Zarbo C, van Laarhoven T, Jurman G, Venuti P, Marchiori E, Furlanello C (2018) Deep learning for automatic stereotypical motor movement detection using wearable sensors in autism spectrum disorders. Signal Process 144:180–191 10. Rodrigues JL, Gonçalves N, Costa S, Soares F (2013) Stereotyped movement recognition in children with ASD. Sens Actuat A 202:162–169
386
T. D. Dhamale and S. U. Bhandari
11. Sonawane A, Inamdar MU, Bhangale KB (2017) Sound based human emotion recognition using MFCC & multiple SVM. In: 2017 ınternational conference on ınformation, communication, ınstrumentation and control (ICICIC). IEEE, pp 1–4 12. Bonneh YS, Levanon Y, Dean-Pardo O, Lossos L, Adini Y (2011) Abnormal speech spectrum and increased pitch variability in young autistic children. Front Hum Neurosci 4:237 13. Bhangale KB, Mohanaprasad K (2021) A review on speech processing using machine learning paradigm. Int J Speech Technol 1–22 14. Bosl WJ, Tager-Flusberg H, Nelson CA (2018) EEG analytics for early detection of autism spectrum disorder: a data-driven approach. Sci Rep 8(1):1–20 15. Jones CRG, Pickles A, Falcaro M, Marsden AJS, Happé F, Scott SK, Sauter D et al (2011) A multimodal approach to emotion recognition ability in autism spectrum disorders. J Child Psychol Psychiatr 52(3):275–285 16. Peacock G, Amendah D, Ouyang L, Grosse SD (2012) Autism spectrum disorders and health care expenditures: the effects of co-occurring conditions. J Dev Behav Pediatr 33(1):2–8 17. Cody H, Pelphrey K, Piven J (2002) Structural and functional magnetic resonance imaging of autism. Int J Dev Neurosci 20(3–5):421–438 18. Raki´c M, Cabezas M, Kushibar K, Oliver A, Lladó X (2020) Improving the detection of autism spectrum disorder by combining structural and functional MRI information. NeuroImage: Clin 25:102–181 19. Conti E, Retico A, Palumbo L, Spera G, Bosco P, Biagi L, Fiori S et al (2020) Autism spectrum disorder and childhood apraxia of speech: early language-related hallmarks across structural MRI study. J Personal Med 10(4):275 20. de Belen RA, Bednarz T, Sowmya A, Del Favero D (2020) Computer vision in autism spectrum disorder research: a systematic review of published studies from 2009 to 2019. Transl Psychiatr 10(1):1–20 21. Chen R, Jiao Y, Herskovits EH (2011) Structural MRI in autism spectrum disorder. Pediatr Res 69(8):63–68 22. Eliez S, Reiss AL (2000) MRI neuroimaging of childhood psychiatric disorders: a selective review. J Child Psychol Psychiatry 41:679–694 23. Sparks BF, Friedman SD, Shaw DW, Aylward EH, Echelard D, Artru AA, Maravilla KR, Giedd JN, Munson J, Dawson G, Dager SR (2002) Brain structural abnormalities in young children with autism spectrum disorder. Neurology 59:184–192 24. Courchesne E, Karns CM, Davis HR, Ziccardi R, Carper RA, Tigue ZD, Chisum HJ, Moses P, Pierce K, Lord C, Lincoln AJ, Pizzo S, Schreibman L, Haas RH, Akshoomoff NA, Courchesne RY (2001) Unusual brain growth patterns in early life in patients with autistic disorder: an MRI study. Neurology 57:245–254 25. Boddaert N, Chabane N, Gervais H, Good CD, Bourgeois M, Plumet M-H, Barthélémy C, Mouren M-C, Artiges E, Samson Y, Brunelle F, Frackowiak RS, Zilbovicius M (2004) Superior temporal sulcus anatomical abnormalities in childhood autism: a voxel-based morphometry MRI study. Neuroimage 23:364–369 26. McAlonan GM, Daly E, Kumari V, Critchley HD, van Amelsvoort T, Suckling J, Simmons A, Sigmundsson T, Greenwood K, Russell A, Schmitz N, Happe F, Howlin P, Murphy DG (2002) Brain anatomy and sensorimotor gating in Asperger’s syndrome. Brain 125:1594–1606 27. Schumann CM, Bloss CS, Barnes CC, Wideman GM, Carper RA, Akshoomoff N, Pierce K, Hagler D, Schork N, Lord C, Courchesne E (2010) Longitudinal magnetic resonance imaging study of cortical development through early childhood in autism. J Neurosci 30:4419–4427 28. Jiao Y, Chen R, Ke X, Chu K, Lu Z, Herskovits EH (2010) Predictive models of autism spectrum disorder based on brain regional cortical thickness. Neuroimage 50:589–599 29. Müller R-A, Shih P, Keehn B, Deyoe JR, Leyden KM, Shukla DK (2011) Underconnected, but how? A survey of functional connectivity MRI studies in autism spectrum disorders. Cereb Cortex 21(10):2233–2243 30. Ismail MM, Keynton RS, Mostapha MM, ElTanboly AH, Casanova MF, Gimel’Farb GL, El-Baz A (2016) Studying autism spectrum disorder with structural and diffusion magnetic resonance imaging: a survey. Front Hum Neurosci 10: 211
Recent Trends in Automatic Autism Spectrum Disorder Detection …
387
31. Thabtah F, Peebles D (2020) A new machine learning model based on induction of rules for autism detection. Health Inform J 26(1):264–286 32. Katuwal GJ, Baum SA, Michael AM Early brain imaging can predict autism: application of machine learning to a clinical imaging archive. 471169 33. Sen B, Borle NC, Greiner R, Brown MR (2018) A general prediction model for the detection of ADHD and Autism using structural and functional MRI. PloS one 13(4):e0194856 34. Thomas RM, Gallo S, Cerliani L, Zhutovsky P, El-Gazzar A, Van Wingen G (2020) Classifying autism spectrum disorder using the temporal statistics of resting-state functional MRI data with 3D convolutional neural networks. Front Psychiatr 11:440 35. Dekhil O, Ali M, El-Nakieb Y, Shalaby A, Soliman A, Switala A, Mahmoud A et al (2019) A personalized autism diagnosis CAD system using a fusion of structural MRI and resting-state functional MRI data. Front Psychiatr 10:392 36. Pereira AM, Campos BM, Coan AC, Pegoraro LF, De Rezende TJ, Obeso I, Dalgalarrondo P, Da Costa JC, Dreher JC, Cendes F (2018) Differences in cortical structure and functional MRI connectivity in high functioning autism. Front Neurol 9:539 37. Manciu FS, Lee KH, Durrer WG, Bennet KE (2013) Detection and monitoring of neurotransmitters—A spectroscopic analysis. Neuromodulation: Technol Neural Interface 16(3):192–199 38. Badgaiyan RD (2014) Imaging dopamine neurotransmission in live human brain. Prog Brain Res 211:165–182 39. Hugdahl K, Beyer MonaK, Brix M, Ersland L (2012) Autism spectrum disorder, functional MRI and MR spectroscopy: possibilities and challenges. Microb Ecol Health Dis 23(1):18960 40. Liu X, Wu Q, Zhao W, Luo X (2017) Technology-facilitated diagnosis and treatment of individuals with autism spectrum disorder: an engineering perspective. Appl Sci 7(10):1051 41. Peng X, Lin P, Zhang T, Wang J (2013) Extreme learning machine-based classification of ADHD using brain structural MRI data. PLoS ONE 8(11):e79476 42. Cheng W, Ji X, Zhang J, Feng J (2012) Individual classification of ADHD patients by integrating multiscale neuroimaging markers and advanced pattern recognition techniques. Front Syst Neurosci 6:58 43. Calderoni S, Retico A, Biagi L, Tancredi R, Muratori F, Tosetti M (2012) Female children with autism spectrum disorder: an insight from mass-univariate and pattern classification analyses. Neuroimage 59(2):1013–1022 44. Ecker C, Rocha-Rego V, Johnston P, Mourao-Miranda J, Marquand A, Daly EM, Brammer MJ, Murphy C, Murphy DG (2010) MRC AIMS Consortium. Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach. Neuroimage 49(1):44–56 45. Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F (2018) Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage: Clin 17:16–23 46. Kong Y, Gao J, Yunpei X, Pan Y, Wang J, Liu J (2019) Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier. Neurocomputing 324:63–68 47. Sherkatghanad Z, Akhondzadeh M, Salari S, Zomorodi-Moghadam M, Abdar M, Acharya UR, Khosrowabadi R, Salari V (2020) Automated detection of autism spectrum disorder using a convolutional neural network. Front Neurosci 13:1325 48. Wang Y, Wang J, Wu FX, Hayrat R, Liu J (2020) AIMAFE: autism spectrum disorder identification with multi-atlas deep feature representation and ensemble learning. J Neurosci Methods 108840 49. Dvornek NC, Ventola P, Pelphrey KA, Duncan JS (2017) Identifying autism from resting-state fMRI using long short-term memory networks. In: International workshop on machine learning in medical ımaging. Springer, Cham, pp 362–370 50. Soussia M, Rekik I (2018) Unsupervised manifold learning using high-order morphological brain networks derived from T1-w MRI for autism diagnosis. Front Neuroinform 12:70 51. Karunakaran P, Hamdan YB (2020) Early prediction of autism spectrum disorder by computational approaches to fMRI analysis with early learning technique. J Artif Intell 2(04):207–216
A Pipeline for Business Intelligence and Data-Driven Root Cause Analysis on Categorical Data Shubham Thakar
and Dhananjay Kalbande
Abstract Business intelligence (BI) is any knowledge derived from existing data that may be strategically applied within a business. Data mining is a technique or method for extracting BI from data using statistical data modeling. Finding relationships or correlations between the various data items that have been collected can be used to boost business performance or at the very least better comprehend what’s going on. Root cause analysis (RCA) is discovering the root causes of problems or events to identify appropriate solutions. RCA can show why an event occurred and this can help in avoiding occurrences of an issue in the future. This paper proposes a new clustering + association rule mining pipeline for getting business insights from data. The results of this pipeline are in the form of association rules having consequents, antecedents, and various metrics to evaluate these rules. The results of this pipeline can help in anchoring important business decisions and can also be used by data scientists for updating existing models or while developing new ones. The occurrence of any event is explained by its antecedents in the generated rules. Hence this output can also help in data-driven root cause analysis. Keywords Data mining · Association rules · Root cause analysis · Business intelligence · Categorical data
1 Introduction BI is a broad term that comprises data mining, process analysis, and descriptive analytics. The main aim of BI is to utilize all the generated data and present informative reports which can anchor business decisions. In this paper, we propose an approach S. Thakar (B) · D. Kalbande Department of Computer Engineering, Sardar Patel Institute of Technology, Andheri (West), Mumbai, India e-mail: [email protected] D. Kalbande e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_28
389
390
S. Thakar and D. Kalbande
that can be used to extract business insights from data. The results of this pipeline are in the form of association rules which can also be used for root cause analysis or understanding why a particular event occurred. We construct a graph G, considering each row to be a node, and connect two nodes if the cosine similarity between them is larger than a threshold. We run the hierarchical Louvain’s algorithm [1] on graph G to detect mutually exclusive communities within G and choose the best communities based on their strength and the number of nodes present in each community. The strength of each community is simply estimated as a ratio of the sum of weights of intra-community edges to the sum of weights of the total number of edges in a community. Then we apply the apriori algorithm to best communities to mine frequent item sets, association rules, and various metrics such as support, confidence, and lift which are used to evaluate the rules. Typically for a large dataset, the number of association rules generated is in the order of millions. It becomes difficult to get business insights from these raw rules since they consist of high redundancy. Hence we process them using the Consequent Based Association Rule Summarization technique to summarize the rules based on common consequents. This technique can help us summarize millions of rules into a few rule summaries. The paper is divided into the following sections. A literature survey undertaken about various existing techniques for clustering, association rule mining, and summarizing association rules is described in Sect. 2. Section 3 of this paper explains, in brief, the Louvain algorithm and association rule mining. In Sect. 4 the proposed method for extracting business insights from data is discussed. Section 5 describes the dataset on which the proposed method is applied and Sect. 6 discusses the results obtained. Section 7 concludes the paper.
2 Literature Survey This section refers to the existing research in the field of root cause analysis and data mining. Various techniques used by researchers for root cause analysis have been discussed here. Researchers have also worked on various techniques for extracting insights via data mining, such approaches have also been discussed below.
2.1 Community Detection There are multiple ways to cluster categorical data, one of them is using the Kmodes algorithm. In [2], researchers discuss the issues with the Kmodes algorithm and propose a community detection-based clustering approach. They have compared the clustering results of Kmodes and community detection and concluded that community detection gives better results. The approach suggested in this paper requires the user to input the number of clusters. In the proposed paper, we have used a modification of this approach that does not require any assumptions on the data/graph.
A Pipeline for Business Intelligence and Data-Driven Root …
391
2.2 Data to Graph In [3], the authors suggest various neighborhood-based methods as well as minimum spanning tree-based methods for converting a dataset to graph form. Some of these methods include the epsilon ball method, k-nearest neighbors and continuous k nearest neighbors. We have used the widely used epsilon ball method for converting the dataset to a graphical form.
2.3 Business Insights Via Data Mining The issue with association rule mining is that it produces a huge number of redundant rules which make it nearly impossible to gain insights from those raw rules. In order to tackle this issue, we can reduce the number of produced rules by summarizing them. For example, in [4] the authors discuss that the issue is not in the high number of rules that data mining techniques generate but rather it is in the way we summarize and present these rules. The author further presents the idea to organize the rules in a multilevel format and then summarize them to eliminate redundant rules. Other approaches include clustering similar association rules and using the concept of rule cover to prune association rules [5]. In order to identify unique and essential association rules some researchers have devised a representative basis and they also present an efficient way of computing this representative basis [6]. In [7], the researchers discuss the drawbacks of association rule mining and propose a simple but effective consequent-based association rules summarization technique that has been used in this paper to filter out redundant rules. Another method for extracting insights from categorical data and understanding relations between columns is to convert the categorical variables to continuous variables using ordinal encoding [16] and then using a heatmap [11] to find relations between the columns. However, this method cannot be applied for columns that are not ordinal. Our proposed method solves this issue since it can be applied to ordinal as well as nominal columns.
2.4 Data Driven Root Cause Analysis In [12] the authors have pointed out that the state-of-art work on RCA suffers from interpretability, accuracy, and scalability. They have proposed an approach where an LSTM autoencoder is used to find anomalies and SHAP, which is a game theory based approach is used to find the root cause of the issue in fog computing environment. Along similar lines, authors in [13] have used a process mining based approach for securing IoT systems. In [14] the authors present 2 data driven methods for RCA, sequential state switching and artificial anomaly association. These approaches have
392
S. Thakar and D. Kalbande
been developed for distributed system anomalies and have been tested using synthetic data and real world data based on the Tennessee Eastman process (TEP). All these data driven RCA and mining approaches are specific to a particular domain and lack generality. The proposed solution aims to provide a general technique that can be applied to any categorical dataset.
3 Preliminaries This section gives a brief introduction to Louvain’s community detection algorithm and why was it preferred over Kmodes. It also explains association rule mining in brief.
3.1 Louvains Community Detection The Louvain method [1], a hierarchical clustering method that was first published in 2008, has become a popular choice for community detection. It is a bottomup folding technique that optimizes modularity [15]. The method is divided into passes, each of which consists of the repetition of two phases. Every node is first given a unique community. Therefore, in the initial phase, there will be an equal number of communities as nodes. The algorithm then assesses every node’s increase in modularity if we move it from its community to that of a neighbor. If there is a positive gain, the node is then assigned to that community; otherwise, it stays in the community to which it was originally assigned. All nodes undergo this procedure consecutively and repeatedly until no more increase in modularity is possible, at which point the first pass is finished. Over the years, other additional clustering techniques have been discovered and suggested (for a recent survey, see [8]). The k-modes technique was created by Huang [9] to cluster categorical data. The k-modes approach is a partitional clustering method and needs the number of clusters as an input. There are methods such as the elbow method which are used to find the optimal number of clusters but typically this method returns a smooth curve making it challenging to detect the elbow. Furthermore, the Kmodes algorithm is quite sensitive to the initial cluster centers selected, and a poor selection may lead to cluster configurations that are very undesirable. The Louvain algorithm does not require any assumptions on the data/graph and is deterministic in contrast to conventional Kmodes with random initialization. Moreover, it is faster when compared to its peers and runs in O(n.log(n)). After considering these factors we have used the Louvain community detection algorithm in the proposed pipeline.
A Pipeline for Business Intelligence and Data-Driven Root …
393
3.2 Association Rule Mining Association rule mining, a data mining technique helps unearth intriguing relationships in data [17]. Market basket analysis [18], which identifies products that are typically bought together in a supermarket, is a problem where association rule mining is often applied. For example, the rule { eggs, butter } { bread } found in the sales data of a supermarket would indicate that if a customer buys eggs and butter together, they are likely to also buy bread. The association rule generation is split into 2 steps: 1. A minimum support threshold needs to be provided based on which frequent itemsets will be filtered. 2. To generate rules from the frequent item sets a minimum confidence threshold needs to be provided. For a given rule X Y, support is nothing but the frequency of X and Y coming together in a transaction. Confidence is the conditional probability of getting Y given that X is already present in the transaction. The issue with confidence is that if the support for Y in the dataset is high then all rules having Y as consequent will have high confidence, hence we use another metric called lift which shows whether X and Y are positively correlated (given X probability of Y being present increases) or negatively correlated. X and Y are said to be positively correlated if the value of lift is greater than 1.
4 Methodoloy This section explains the proposed method which consists of 3 steps including converting data to graph, modularity-based community detection, association rule mining, and summarization. Figure 1 explains the steps involved in the method.
4.1 Graph Conversion The proposed method uses the epsilon ball method for converting the dataset to a graph G. We assume each row to be a node and draw a weighted edge between 2 nodes if the cosine similarity between the 2 rows is greater than epsilon.
4.2 Modularity-Based Community Detection On graph G, we apply the Louvain technique [1] to find strongly related groups of nodes. The best communities based on strength and size will be filtered. The strength
394
S. Thakar and D. Kalbande
Fig. 1 Methodology
Fig. 2 Strength calculation
of a community is calculated using the formula stated in Fig. 2. The communities having a higher number of intra-community edges with more weight will generally have higher strength. Figure 2 depicts an example of strength calculation. Note that modularity [15] is a measure for the complete graph and not individual clusters, hence we define a new measure to compare the strength of clusters. Figure 3 explains the steps involved in detecting top clusters for applying association rule mining.
A Pipeline for Business Intelligence and Data-Driven Root …
395
Fig. 3 Graph conversion and community detection
4.3 Association Rule Mining and Summarization One issue with association rule mining is that it generates too many rules since it considers all permutations and combinations. Many of these generated rules carry the same information and are redundant. Generally, the number of essential rules is very few when compared to the number of redundant rules, which makes it difficult to gain insights from these raw rules. We further process these rules using a consequentbased association rule summarization (CARS) [7] technique to filter essential rules. The CARS [7] technique follows 3 steps: 1. Filter rules having only 1 consequent. 2. Perform a group by operation on the consequents so that we only have unique consequents and rank the antecedents based on their frequency of occurrence.
396
S. Thakar and D. Kalbande
Table 1 Association rule summarization Antecedent Consequent Support (%) (A, B) (A) ((A, 2), (B, 1))
C C C
20 30 [20, 30]
Confidence (%)
Lift
50 70 [50, 70]
1.5 2.4 [1.5, 2.4]
3. Display the minimum and maximum of each interestingness measure with the rule summary. Table 1 showcases an example of how to apply the CARS [7] technique on raw rules. Using this method, a huge number of association rules are reduced to a small number of rule summaries, each of which only carries one unique consequent. Firstly, since a majority of business users assess a rule’s usefulness by assessing the relevance of its consequent, it makes sense to have only 1 consequent. This makes it easier to focus on rules with relevant consequents. Secondly, the antecedents are ranked in each rule summary based on their correlation with the consequent, e.g. in the rule summary in Table 1, A is more closely related to C than B since the frequency count associated with A is higher. This helps in determining which antecedents are more closely related to the consequent. Lastly, each rule summary also has a range of interestingness measures which can again help in evaluating the rule summary.
5 Dataset We have tested our method on a private skin ointment dataset that contains both medical and beauty products. Some important columns include category, manufacturer name, country of origin, area of application, rating(numeric), price(numeric), is prescription needed(bool), side effects(bool), country of sale, male/female, marketer name, skin type, target disease, texture. The processed data has 14 columns and around 10 thousand rows. We converted the 2 numeric attributes to categorical ones using suitable intervals. Further, the dataset was converted to a graphical form using the epsilon ball method.
6 Results Applying Louvain’s algorithm to the data resulted in 3 clusters. Out of the 3 clusters, rule mining was applied on cluster 2 since it had higher strength and a good number of nodes. With a support threshold of 20% and a confidence threshold of 50% more than a million rules were generated. These rules were summarized using the CARS [7]
A Pipeline for Business Intelligence and Data-Driven Root … Table 2 Extracting important features Antecedents Manufac name Lifeline Garnier J&J
Country USA France USA
Consequent Category Beauty Beauty Medical
Area of app Scalp Face Body
Rating 4–5 3–4 2–3
Area of app Face
Country USA
Price(usd) 200 max
Table 3 Root cause analysis Antecedents Category Beauty
Manufac name Lifeline
397
Consequent
technique to 25 rule summarizes. The most important rule summaries are presented in Tables 2 and 3. Table 2, represents all rule summaries having ratings as consequents, and top antecedents for those consequents. These results showcase that if a data scientist wants to train a model to predict the rating then manufacturer name, country of origin, category, and area of application are top features that need to be considered. Table 3, represents a rule summary that has the highest price category as a consequent. This summary can help in understanding the root cause behind a skin product being in the highest price category. It suggests that if a product is of beauty category, was manufactured by Lifeline, is applied on the face and the country of origin is the USA then it will be costly.
7 Conclusion This paper has presented a method for effective root cause analysis and business insights generation using community detection-based clustering and association rule mining. This paper uses the epsilon ball method for converting data to a graph format, however other methods such as knn, cknn, etc mentioned here can be explored. Louvain community detection is preferred over Kmodes as it is a hierarchical clustering method and allows finding clusters without any assumptions on the data/graph. Since the apriori algorithm has a high time complexity, clustering helps in filtering out data points and thus reducing the time taken by apriori. Moreover, clustering also brings similar data points together, this helps in getting essential rules even at higher levels of support threshold. Association rule mining is then applied on top clusters and the results are summarized using the CARS [7] summarization techniques. These summarized rules can help in root cause analysis, business insights generation, and also understanding of the correlation between various features of the categorical dataset.
398
S. Thakar and D. Kalbande
References 1. Blondel VD, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10), Article ID P10008 2. Nguyen HH (2017) Clustering categorical data using community detection techniques. Hindawi Comput Intell Neurosci 2017:11, Article ID 8986360. https://doi.org/10.1155/2017/8986360 3. Liu Z, Barahona M (2020) Graph-based data clustering via multiscale community detection. Appl Netw Sci 5:3. https://doi.org/10.1007/s41109-019-0248-7 4. Liu B, Hu M, Hsu W (2000) Multi-level organization and summarization of the discovered rules. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 208–217 5. Toivonen H, Klemetinen M, Ronkainen P, Hatonen K, Mannila H (1995) Pruning and grouping discovered association rules. In: Proceedings of the mlnet workshop on statistics, machine learning, and discovery in databases, pp 47–52 6. Luong VP (2001) The representative basis for association rules. In: Proceedings of the IEEE international conference on data mining, pp 639–640 7. Tan SC, Sim BH (2014) A pragmatic approach to summarize association rules in business analytics projects. In: Cheng SM, Day MY (eds) Technologies and applications of artificial intelligence. TAAI 2014. Lecture notes in computer science, vol 8916. Springer, Cham 8. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651– 666 9. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discovery 2(3):283–304 10. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174 11. Seaborn heatmap. https://seaborn.pydata.org/generated/seaborn.heatmap.html. Accessed 31 Aug 2022 12. Bulla C, Birje MN (2021) Improved data-driven root cause analysis in fog computing environment. J Reliable Intell Environ. https://doi.org/10.1007/s40860-021-00158-x 13. Shakya Subarna (2020) Process mining error detection for securing the IoT system. J ISMAC 2(03):147–153 14. Liu C, Lore KG, Sarkar S (2017) Data-driven root-cause analysis for distributed system anomalies. In: 2017 IEEE 56th annual conference on decision and control (CDC), pp 5745–5750. https://doi.org/10.1109/CDC.2017.8264527 15. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582. https://doi.org/10.1073/pnas.0601602103 16. Scikit Learn Ordinal Encoding. https://scikit-learn.org/stable/modules/generated/sklearn. preprocessing.OrdinalEncoder.html. Accessed 31 Aug 2022 17. Agrawal R, Imielinski T, Swami A (1993) Mining associations rules between sets of items in large databases. In: Proceedings of the ACM conference on management of data, pp 207–216 18. Tan SC, Lau PS (2013) Time series clustering: a superior alternative for market basket analysis. In: Proceedings of the first international conference on advanced data and information engineering
Non-knowledge Based Decision Support System N. L. Taranath, B. P. Aniruddha Prabhu, Rakesh Dani, Devesh Tiwari, and L. M. Darshan
Abstract The Medical Decision Support System (MDSS) delineates knowledge and supports optimal therapeutic decisions to assist physicians, clinicians, or other healthcare professionals with knowledge and person-specific information in making clinical decisions using targeted medical knowledge which reduces prescribing errors. MDSS are frequently classified as Knowledge base or learning base. Learning bases derive mapping using machine learning (ML), artificial intelligence (AI), or statistical pattern recognition from knowledge-driven human-engineered mapping of literature based, patient-oriented, and practically based data to recommendations. This study offers a merging decision-making support framework that combines a knowledgebased system with a learning-based approach to give and realms medical aid for decision-making despite of events with missing information, with a powerful solution to the information challenge. This work integrates the idea of the data mining of knowledge bases (KB) with artificial intelligence in this context intended to support a combined module of integrated medical decision (AI) In healthcare choices around drug prescribing and evolving medical conditions, poorly experienced inexperienced medical providers can develop an infrastructure to address issues of highly complexity. The rule-based system has its drawbacks, such that the basis of the knowledge uses an explicit system to administer patent medicines with knowledge of each field. However, where the knowledge base available data is unknown, the machine learning strategies are used to respond to a query. The skeleton is query N. L. Taranath (B) Department of Computer Science and Engineering, Alliance college of Engineering and Design, Alliance University, Bengaluru, India e-mail: [email protected] B. P. A. Prabhu · D. Tiwari Department of CS & E, Graphic Era Hill University, Dehradun, Uttarakhand, India R. Dani Department of Hospitality Management, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India L. M. Darshan School of CSE, REVA University, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_29
399
400
N. L. Taranath et al.
oriented and can be adapted to be used for various user interfaces such as desktops, web browsers and mobile apps. Keywords Medical decision support system · Knowledge-based systems · Non-knowledge based systems · Aggregations · Structured query language · Data mining
1 Introduction Machine learning-based MDSS learns from raw data to cope with the information challenge. It requires a training process to create inference models where training is specific to a line of inquiry and is expensive. Key tasks, such as supervised learning, are addressed here. Classification: predicting a data set’s class and regression: predicting a numeric value, as well as unsupervised learning - clustering: related items were grouped together [1]. Machine learning classification refers to a predictive modeling issue in which a class label of a given sample is anticipated: Assume that X be the input space while Y is the output space. As a result, a training set of instances may be given as: D = {(x1, y1); (x2, y2) …. (xn, yn)}. The main focus of machine learning is to create a variable p: X → Y which best describes the data for training while minimizing loss, using a loss function L = f (p (xi), yi), where p (xi) is predicted to be the output and yi is the expected output. yi is the actual output and xi is represented as a feature vector. Here machine learning classifying algorithms are applied to predict the missing values.
2 Review of Literature Rule-Based Systems & Evidence-Based Systems [2] provide computational structure in most expert system. Construction of the system can be based on both data and expert knowledge. The experience of domain experts is captured in terms that can be evaluated as rules for the system. When several laws are collected as a basis for rules, information is evaluated in combination with rules against a basis for rules before a conclusion reaches it. It is therefore the ability to store and analyze vast amounts of information and data. An evident-based strategy on this seems to be an ideal technique [1, 3–6]. For closing the gap between physicists and CDSSs by an extremely effective instrument for optimizing health quality as well as patient outcomes with ability to increase both quality and safety thereby lowering costs [7, 8]. The rules are stored in the knowledge base, the inference engine integrates the principles using patient information, as well as the input process is utilized to show and to provide feedback to the system. In some cases, such as chest pain treatment, adaptive suggestions from a knowledge base server are significantly more acceptable than those.
Non-knowledge Based Decision Support System
401
The neural network [9] establishes a connection between symptoms and diagnoses. It makes use of nodes as well as determined associations. It employs both nodes and weighted associations. The system, on the other hand, fails to explain why the data is being used in this way. As a consequence, its dependability and accountability can be a factor. The neural network’s self-organizing training method, in which it is given no prior knowledge of the groups it would define, has been demonstrated to be capable of collecting relevant information of incoming data and forming class-related clusters. As a result, the network can be trained using only a limited portion of the available data [10, 11]. For example, in order to understand pain in a child, neural networks remove the two features MFCC and LPCC from an infant’s cry and feed them into a recognition module. Neural networks have been extensively used to simulate vast and complicated datasets of medical data and to solve non-linear mathematical simulation problems. The objective of training is to make the network more efficient at predicting output for a particular input space. Algorithm for back propagation training, a popular strategy used for medical databases, the weight of an ANN is modified to reduce a cost function. The ANN preserves accurate classification rates while allowing for a significant decrease in device complexity [12]. The weight-elimination cost feature is sufficient for overcoming network memorization issues [13]. In complex, multi-variable structures, neural networks are often critical for avoiding expensive medical care and pain diagnosis. It has the advantage that it does not require professional feedback. By removing the need for a specialist, the device is able to do away with the need for massive libraries to store input and output. The evolutionary method is used to create Genetic Algorithms [14]. A selection algorithm evaluates the components of a problem’s solutions. The solution that rises to the top is recombined, and the procedure is repeated before the desired result is achieved. The generic system follows an iterative process to arrive at the best possible solution to a problem. Frunza et al. [15] proposed against an existing decision-making method using machine learning techniques (ML). Medical decision support, medical imaging, protein–protein interaction, medical information processing or general patient management diagnosis is all included in the scientific domain of machine learning. In complex, multi-variable systems, it’s also critical for avoiding expensive medical treatment and diagnosing pain. ML is envisioned as a technology that can be used to incorporate computer-based applications into the healthcare industry in order to provide safer, more effective patient services [16]. Their findings showed that using machine learning algorithms has a distinct benefit. They did remember, though, that using machine learning methods alone resulted in further false positives. It combines data over time, which is hard to understand, depending on statistically agreed patterns. In addition, it became more difficult to consider the results of machine learning.
402
N. L. Taranath et al.
Zhu et al. [17] examined the ability of Machine Learning Algorithms in a geriatric rehabilitation setting. When two machine learning policies are compared, they looked at the actual decision-making procedure (merely utilizing a CAP).Its findings have shown significant advantages in using algorithms for learning machines. However his results have shown significant advantages of using learning algorithms. It is very common to use statistical approaches in developing support systems for clinical decision-making [18]. A literature database study, for example, is a promising method for reporting on the economy for post-operative pains for an emphasis on the local regional anesthetic. A survey containing information about the patient’s health, looks, speech patterns, sentiments, and other factors can be used to collect data. It could be a more accurate way of assessing the linear and nonlinear aspects of post-operative pain [14].
3 System Design 3.1 Concerns of Non-knowledge-Based MDSS Development • Machine learning classifier algorithms such as J48, JRip, and Bagging were explored for developing the Learning-based MDSS. Attributes of various classes is shown in Table 1. • Diabetic dataset which contains details of patients are considered. It contains 2500 samples, out of which 2100 were considered for training set and 400 are considered for test set. • The dataset has 37 attributes; out of which 9 were really good for prediction were considered and listed in Table 2. Table 1 Attributes for the various classes defined in the Learning base Classes
No. of instances
Attributes
Patient
5000
Name, Pid, Sex, Age, Address, 1Mobile’, Mailid, *Bloodgroup, *TreatedBy, 1‘Current’_‘consuming’_‘drug’*, *‘Past’_‘condition’, 1current_condition*, *Drug_1for_current_condition*
Doctor
10
Name, Did, Sex, Age, Address, Mobile no, Mailid, Designation, Qualification, Working Hospital
Medical condition
44
Med_Cond_Name, Cause, Test, Symptoms
Drugs
200
Drug_Name, Code, Company Name, Exp-date, Mfg-date
Non-knowledge Based Decision Support System
403
Table 2 Various medical condition instances Medical condition 1. Pregnancy 2. Breast abscess 3. Insomnia 4. Blood Pressure 5. MI 6. Asthma 7. Migraine 8. Throat pain 9. Diabetes 10. Haematoma 11. Stomachache
12. H1N1 13. Psoriasis 14. Peptic Ulcer 15. Viral Fever 16. Malaria 17. Diarrhea 18. Smallpox 19. Obstructive Sleep Apnea 20. Pneumonia 21. Bronchitis 22. Dengue
23. Constipation 24. Alcohol-abuse 25. Elderly 26. Tuberculosis 27. Muscle spasm 28. Kidney Malfunction 29. Gastroenteritis 30. Cirrhosis of liver 31. Chikungunya 32. Haematemesis 33. Hemoptysis
34. Convulsion 35. Paralysis 36. Poison 37. Snakebite 38. Abortion 39. Hemorrhoids 40. Anemia 41. Burns &Scalds 42. UTI 43. Gastritis 44. Chickenpox
3.2 Non Knowledge-Based Medical Decision Support System Architecture MDSS, which is built on machine learning, learns from raw data to work with the information challenge. It requires a training process to create inference models where training is specific to a line of inquiry and is expensive. This system is generally tolerant to noise in data. The important tasks, such as supervised learning, are discussed here—classification: determining a class of a data item; regression: predicting a numerical value and unsupervised learning -clustering: combining together similar elements. The framework is made up of the following components: Machine Learning Classification: X is the input space, while Y is the output space. Following that, a training set of instances could be interpreted as follows: D = {(x1, y1); (x2, y2)…. (xn, yn)} A machine learning aim is to infer a function p: X → Y which best explains the training data while minimizing loss using a loss function L = f (p (xi), yi), where p (xi) is predicted output, yi is desired value, and xi is presented as a feature vector. Here machine learning classifying algorithms are applied to predict the missing values. It is shown in Fig. 1.
3.3 Data Preprocessing The diabetic dataset was analyzed using patient reports. There were 2500 exemplars in all, with 37 different characteristics. The raw dataset is depicted in Fig. 2. Noise and missing values were present in the raw data. The raw data was preprocessed
404
Fig. 1 Architecture for non-knowledge based MDSS
Fig. 2 Raw diabetic dataset
N. L. Taranath et al.
Non-knowledge Based Decision Support System
405
Fig. 3 Preprocessed diabetic dataset
using the WEKA tool by translating this to the ARFF form, which WEKA recognizes. The missing data were accounted with just assigning the results an arbitrary number. Figure 3 depicts the preprocessed information. It also shows attribute information like minimum, maximum, mean, and standard deviation which is depicted through Fig. 3 red and blue color graphs. Training set consisted of 2100 samples, while the evaluation set consisted of 400. Nine attributes were selected from a total of 37 for consideration in the forecast. All is mentioned here. No_of_times_Pregnant, age, result, Plasma_glucose_conc, Diastolic_blood_pressure, Triceps_skin_thick, two_hour_serum_insulin, body_mass_index, Diabetic_pedigree_func.
3.4 Predicting Result Based on J48 Algorithm The Diabetic dataset, which has 9 attributes were supplied to the J48 Algorithm. It constructs the decision tree using the attributes with the highest information gain. Figure 3 illustrates this point. The Fig. 3 demonstrates the result analysis w.r.t the
406
N. L. Taranath et al.
Fig. 4 J48 tree generated w.r.t training set
training set. Out of 2099 samples, 2094 were truly classified and rest 5 were misclassified. Hence the specificity, sensitivity and balanced accuracy obtained here are 0.9985, 0.9957 and 0.9971 respectively. In order to predict the values, the test set was supplied and the predicted results are demonstrated in the Fig. 4. Correctly classified instances
2094
99.7618%
Incorrectly classified instances
5
0.2382%
Kappa statistic
0.9947
Mean absolute error
0.0044
Root mean squared error
0.0468
Relative absolute error
0.9824%
Root relative squared error
9.9123%
Non-knowledge Based Decision Support System
Total Number of Instances
407
2099
== Detailed Accuracy By Class === TP Rate FP Rate Precision Recall 0.997 0.002 0.996 0.997 0.998 0.003 0.999 0.998 Weighted Avg. 0.998 0.003 0.998 0.998
F-Measure 0.996 0.998
ROC Area 1 1
0.998
1
Class t_p t_n
== Confusion Matrix === a b classified as 702 2 | a = t_p 3 1392 | b = t_n
Figure 4 shows the decision tree generated using J48 algorithm for the given training set.
3.5 Predicting the Result Based on Bagging Algorithm The Diabetic dataset, which has 9 attributes was supplied to the Bagging Algorithm.Fig. 5 demonstrates the result analysis w.r.t the training set. Out of 2099 samples, 2053 were truly classified and rest 46 was misclassified. Hence the specificity, sensitivity and balanced accuracy obtained here are 0.9766, 0.9809 and 0.9787 respectively. Now in order to predict the values, the test set was supplied and the predicted results are demonstrated in the Fig. 5.
4 Conclusion *An advanced framework for a Medical Decision Support System is built to help medical practitioners working in remote locations in administering medications. We used a combination of a knowledge base and deep learning classifying algorithms to predict missed values based on the patients’ symptoms. To assess the accuracy of diabetic dataset estimation, an analysis of machine learning classifier algorithms like J48, JRip, and Bagging was carried out, and a comparison of machine learning classifier algorithms such as J48, JRip, and Bagging was conducted. J48, JRip, and Bagging have an estimation accuracy of 95.24%, 90.57%, and 93.43%, respectively. Specificity, sensitivity, and balanced precision are the assessment criteria used. J48 has better precision (0.998), sensitivity (0.995), and balanced accuracy (0.997) than JRip and Bagging classification methods. Future work aims in designing an end user
408
N. L. Taranath et al.
Fig. 5 Result analysis of bagging w.r.t training set
interface to make the user interaction easier. Also many other classifying algorithms may be considered to get more accurate results in predicting the missing values.
References 1. Andi HK (2021) Construction of business intelligence model for information technology sector with decision support system. J Inf Technol 3(4):259–268 2. Abbasi MM, Kashiyarndi S (2010) Clinical decision support systems: a discussion on different methodologies used in health care. Int J Comput Sci Inf Secur 8(4) 3. Sathesh A, Assessment of environmental and energy performance criteria for street lighting tenders using decision support system. J Electron Inform 2(2):72–79 4. Frunza O, Inkpen D, Tran T (2011) A machine learning approach for identifying diseasetreatment relations in short texts. IEEE 23(6) 5. Ordonez C, Chen Z (2011) Horizontal aggregations in SQL to prepare data sets for data mining analysis. In: 2011 IEEE, USA 6. Abidi S, Hussain S (2007) Medical knowledge morphing via a semantic web framework. In: Twentieth IEEE international symposium on CBMS’07. IEEE, pp 554–562 7. Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press 8. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley-Inter science 9. Sarawagi S, Thomas S, Agrawal R (1998) Integrating association rule mining with relational database systems: alternatives and implications. In: Proceedings of the ACM SIGMOD conference, pp 343–354 10. Frize M, Walker R (2006) Clinical decision-support systems for intensive care units using case-based reasoning. Med Eng Phys 22(9):671–677 11. Rossille D, Laurent J, Burgun A (2005) Modeling a decision-support system for oncology using rule-based and case-based reasoning methodologies. Int J Med Inf 299–306
Non-knowledge Based Decision Support System
409
12. Stapley BJ, Benoit G (2000) Bibliometrics: information retrieval visualization from cooccurrences of gene names in MEDLINE abstracts. In: Proceedings of the pacific symposium on biocomputing, vol 5, pp 526–537 13. Cunningham C, Graefe G, Galindo-Legaria CA (2004) PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In: Proceedings of the VLDB conference, pp 998–1009 14. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Exp 11(1) 15. Ordonez C (2004) Vertical and horizontal percentage aggregations. In: Proceedings of the ACM SIGMOD conference, pp 866–871 16. Ye Y, Tong SJ (2009) A knowledge-based variance management system for supporting the implementation of clinical pathways. Manag Serv Sci 1–4 (IEEE) 17. Drugs.com. Drugs.com | prescription drug information, interactions & side effects. http://www. drugs.com/ 18. Roo JD, Euler proof mechanism. http://eulersharp.sourceforge.net/
Interactive Image Generation Using Cycle GAN Over AWS Cloud Lakshmi Hemanth Nallamothu, Tej Pratap Ramisetti, Vamsi Krishna Mekala, Kiran Aramandla, and Rajeswara Rao Duvvada
Abstract In today’s world, most problems could be solved with the help of modern technology. One such situation that requires the help of technology is designing and building infrastructures. For example, an architect or infrastructure designer always needs to know the future outlook of his/her blueprint and does changes accordingly before the construction. This can be done by hand sketching the infrastructure design and inputting it into a Deep Learning technique that generates images showing the building’s future outlook. The paper shows the use of Cycle Generative Adversarial Network (Cycle GAN) as the Deep Learning model in the above case where the application is deployed on the Cloud as a ready-to-use service for the architect or designer. Keywords Artificial intelligence · Web app · Architect · Deep learning · Cycle generative adversarial network (cycle GAN) · Cloud
1 Introduction Today, many problems are made easier with the use of all kinds of technologies. Even the problems that involve designs and architecture are consistent with mindset that they are easy to create. Such problems are usually solved using Artificial Intelligence (AI). Such visuals are usually hand sketched at the start after which the image is inputted to any AI model. The first step i.e., hand sketching is done with the help of a Graphical User Interface (GUI), a program that interacts with users to the machine and in this case, an AI model. The next difficulty in the problem is getting an accurate real-life image corresponding to our Hand Sketch which is the heart of the problem itself.
L. H. Nallamothu · T. P. Ramisetti (B) · V. K. Mekala · K. Aramandla · R. R. Duvvada Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_30
411
412
L. H. Nallamothu et al.
Fig. 1 Input and output for the project
Getting an accurate image as shown in Fig. 1 can be done with help of many algorithms in AI but the most commonly used algorithm in such cases is the Generative Adversarial Network (GAN).
1.1 Basic Concepts 1.1.1
Deep Learning
Deep learning techniques are one of the advanced machine learning techniques which involve a neural network with more than two layers. Deep learning focuses on how to teach and what to teach a computer to do just like what humans do. One application of deep learning is driverless cars. In deep learning, the code is trained with a bulk of data and passed through neural networks that contain two or more layers wherein each layer has data that is divided and computed with weights and biases [1–6].
1.1.2
Generative Adversarial Network
A Generative Adversarial Network (GAN) is a popular Machine Learning (ML) model that comes under unsupervised learning. Here two neural networks compete with each other and they strive to become more accurate in their predictions [7–10]. Since the training of GAN will be done in an unsupervised way, the dataset used will be an unlabeled dataset such as images. In GAN, there are some generators and discriminators that are crucial in training and predicting the output. The generator’s role is to generate images where a random noise image is given as the input. After the
Interactive Image Generation Using Cycle GAN Over AWS Cloud
413
model is trained, the discriminator classifies the images that have been taken from the generator as real and fake as per the dataset.
1.1.3
Cycle-GAN
A Cycle—Generative Adversarial Network (Cycle GAN) is one of the rarely used deep learning techniques for training the deep convolutional neural network by using image-to-image translation without direct pairing [11–13]. This was first introduced by Jun-Yan Zhu and his team in reference to [14]. According to [14], Cycle GAN consists of two mappings that are paired with each other in such a way that the image generated in mapping-1 is sent to mapping-2 while maintaining a cycle consistency that prevents generated images contradicting mapping-1 and mapping-2. In this project, we have chosen Cycle GAN so that we can calculate identity loss functions, discriminator loss functions and reduce the losses occurred to obtain good quality images.
1.2 Motivation In today’s age and time, buildings and infrastructures are designed more efficiently and with minimal time to spare. This makes people believe that architects complete the infrastructure blueprints in no time which is just a myth but the process itself is so important because we need to consider all elements and also know what the future outlook of the blueprint is. The core idea of the project is to let the architect or designer know what the future outlook of the building blueprint is like and also to reduce the time taken to design with little to no cost. The motivation of the project is to use a GAN model deployed Cloud as application for helping architects and designers.
1.3 Problem Statement In the real world, designing a building and knowing its future outlook is interconnecting. Hence knowing the future look of a building during the design phase solves the problem. So, the proposed project focuses on developing a cloud application that generates images of the building sketches and accurately shows the future look of the building with help of a variant of the GAN model. The generated image helps architects and designers to know how their building appears. When comparing with previous GAN models, deep convolutional GAN (DC-GAN) that is used in [3] is a traditional approach. But Cycle GAN can give better results than DC-GAN as this model has more loss functions to provide good quality outputs.
414
L. H. Nallamothu et al.
1.4 Scope • The project model doesn’t identify shapes other than rectangle or square drawn present in the input image. • The project is limited to generating only building images that are in JPEG or PNG format.
1.5 Objectives • To train a GAN model for the facade’s dataset with help of 2 generators and discriminators with help of Adam Optimization Solver. • To build a webapp that sends input to the GAN model that is hosted on AWS Cloud. • To deploy the application as a web service.
1.6 Advantages • Images can be generated for simple hard-drawn sketches on a user interface. • The user interface accepts both drawn and uploaded images as input. • Use of Cloud Compute instances reduced the time taken for a model to generate new images. • Input sketches and Generated images stored on cloud with access from anywhere in the world.
1.7 Applications • Image generated used by architects and building designer. • To know the future outlook of a building by anyone. • Can provide accurate and clear images of all kinds of buildings with just a sketch.
2 Literature Survey The Literature Survey section describes the papers that are taken as reference in the project. The methodology present in [14] is the image-to-image translation that maps an image input and an image output to a train set of aligned image pairs, which are further used as the base for Cycle GAN. According to [14], a matched training image dataset will not be accessible for many problems. In the absence of paired instances,
Interactive Image Generation Using Cycle GAN Over AWS Cloud
415
the process of learning to translate a picture from a source domain X to a destination domain Y is used. Using an adversarial loss, it is learnt that mapping one GAN model to another GAN model is where both are linked to create a cycle consistency loss to ensure they are extremely under-constrained (and vice versa). On numerous tasks where in matched trained data are not available, then season transfer, collection style transfer, object transfiguration, qualitative results and photo improvement are reported. In the paper [15] the methodology is defined that the training system is related to pixel wise translation between theoretical sketch and ultrasound cardiac images. The project used in [15] helps to gain more knowledge on clinical ultrasound anatomy. The integration of physical mode stimulation and ultrasound cardiac segmentation through image processing cannot be done easily for interactive translation. So, Cycle GAN is used to translate between two modalities and perceptual loss helps to improve the synthetic ultrasound quality. The methodology defined in [16] of various Cloud services in the industry. The paper shows that Cloud computing is the notion of a user sitting at a terminal using services, storage space, and resources that are delivered someplace else—on another computer, through the Internet. The paper [16] examines the various cloud computing methodologies, solutions, services, advantages, and disadvantages. Cloud computing has a lot of money-making possibilities which makes resources previously exclusively available to huge organizations available to small firms. In a nutshell, cloud computing entails obtaining the highest-performing system at the lowest cost. The paper [17] shows the model trained using different variants of GAN model for image synthesis on MNIST dataset. The image dataset contains handwritten digits and the results are compared. The main goal according to [17] is to show information about different GANs and its variants like Conditional GAN, Deep convolutional GAN, Auxiliary classifier GAN, Adversarial auto encoders, Dual GAN, Wasserstein GAN, Bidirectional GAN, least square GAN, Coupled GAN, and Information maximizing GAN [17] evaluate the accuracy and loss functions.
3 Implementation a. Project requirements The requirements used in the project are Python, HTML, CSS, Node JS, PyTorch, AWS SageMaker Jupyter notebook instance, AWS S3, wandb (Weights and Biases), storage greater than 50 MB, more than 4 GB Ram. The computations we used for Sagemaker is Notebook instance type ml.t2.medium (Volume: 5 GB EBS). b. Methodology The methodology consists of four modules. They are training the GAN model, deploying it in AWS, designing a webapp, integrating AWS and webapp. Each module is explained in detail below.
416
L. H. Nallamothu et al.
Fig. 2 Shows the fake and real image generated by Cycle GAN
i.
Training the GAN model The dataset consists of 1012 images of different buildings. Here Cycle GAN is used to train the data as in Figs. 2 and 3. The Cycle GAN uses two generators and two discriminators. Let the generators be G1 and G2, where the generator G1 converts images from sketch to image and the generator G2 converts images from image to sketch. When we consider these two generators, then it forms a cycle and we call this as Cycle GAN. We have calculated some losses during the training. The discriminators D1 and D2 are used for calculating those losses. At first, the dataset is taken and is examined. Now we created a model that loads and visualizes the data. During the training of this model, we have set a function that iteratively updates the network weights, calculates the loss functions and saves the generated images temporarily in one HTML file. The accuracy of this model is directly proportional to the number of epochs. We have trained up to 25 epochs. Generally, the discriminator is a classifier that behaves on the concept of backpropagation used to specify whether the generated image is real or fake. The discriminator learns how to distinguish real and fake images as we have divided the dataset into real and fake images. The discriminator behavior depends on the loss functions of real and fake images. The overall loss function for the discriminator can be represented as: Loss(Di ) = log(Di (x)) + log (1 − Di (G i (random input))) Here log(Di (x)) gives the probability that the generator Gi is correctly classifying the real image and log (1 − Di (Gi (random input))) helps to
Interactive Image Generation Using Cycle GAN Over AWS Cloud
417
Fig. 3 Images trained with cycle GAN on Jupyter notebook
correctly label the fake images that are generated by the generator. Di and Gi are discriminators and generators respectively where i = 1 and 2. When the generators are trained, the weights and biases of the discriminator are fixed and vice versa. The overall generator loss function can be represented as: Loss(G i ) = log(1 − Di (G i (random input))) Since this is a Cycle GAN model, we have calculated another loss that is Identity loss. When an image is given to the generator G2, then the conversion is not necessary since G2 converts the sketch to image. If the output image from G2 and the original image are not same, then the difference of those images is calculated and is referred as identity loss. We have used the weights and Biases (WandB) platform to track the experiment, to evaluate and optimize the model. After the model is trained, an input sketch that is drawn in web app is given as input. The generator generates the images and the discriminator distinguishes those. This cycle circulates for a number of times until the generator generates a clear image.
418
L. H. Nallamothu et al.
If the discriminator classifies the image as a real image, then it is sent as the output and is displayed on the web app. ii. Designing the Web App Web App is a user interface where the user draws an input image. The web app consists of two frames. One frame is used for user input and another frame is for displaying the output image. The web app consists of various features in the menu. They are rectangle, circle, ellipse, straight line, pencil, fill color, undo, redo and erase. After drawing the rough sketch, we need to save the file in the local system then upload by clicking “Upload” button, then the sketch is sent to the AWS and it is saved in the S3 bucket. The “Upload” button is used to upload the image directly to the S3 bucket. We have created another button “Run” that triggers the AWS lambda function to run the model in the AWS Sagemaker. The “Generate” button is the one that displays the output image on the second frame of the web app. The steps involved in the integration of AWS and Web app are described in Sect. 3. b.iv. The “download” helps to download the output to our local device. Finally, the web app is hosted in the AWS. iii. Deploying the model in AWS The facades dataset, which is pretrained, is taken. The GAN model is deployed in the AWS Sagemaker. An input image sketch is drawn in a webapp where the image will be saved in a S3 bucket linked to AWS Sagemaker. The input image is then sent to AWS Sagemaker on triggering an AWS Lambda function. The Lambda function runs the model in the AWS Sagemaker that is already trained with its endpoint set to the webapp. The image is sent through a generative network where a new image is created. From the generative network, the image is sent to a discriminative network where image is checked if it meets the required criteria. During above two steps, the images are smoothed and improvised into a better image output. The output image from AWS Sagemaker is saved to an S3 bucket. After the model is run, an AWS Lambda function is triggered that sends the generated image to the webapp to be viewed by the user. As shown in the Fig. 4, the input image from the web app is sent to the AWS S3 bucket and then to the AWS Sagemaker. The process that happens in the AWS Sagemaker is shown in the Fig. 5. If the generated image is fake, then it iterates, otherwise the real image that is generated will be saved in the AWS S3 bucket which is shown in the Fig. 4. And finally, the output image will be displayed on the screen. The simplified activity diagram for our project is shown in the Fig. 6. iv. Integrating Web App and AWS The most important part of the project is the integration of the web app and the GAN model wherein the web app is hosted on the AWS S3 bucket as a static website while the model is deployed and run in AWS Sagemaker service. The web app integration is done using various AWS services such as Lambda, API Gateway, Sagemaker and S3 as the main services while AWS CloudFormation and CloudWatch take care of the cloud architecture and
Interactive Image Generation Using Cycle GAN Over AWS Cloud
Fig. 4 Shows part-1 of structure of model in AWS cloud
Fig. 5 Shows GAN architecture that continues Fig. 4
419
420
L. H. Nallamothu et al.
Fig. 6 Activity diagram (Swimlane) for the project
log files, respectively. All AWS services are to provide proper permissions and policies to their connected AWS services such as the AWS S3 Bucket having full access to the AWS Sagemaker service etc. First, the web app is hosted in one of the folders of an AWS S3 Bucket and the Sagemaker model is run after which an endpoint is created. Next, we created a few Lambda API functions that trigger various buttons in the web app i.e. lambda function that runs the Deep Learning model, sends input image from web app to an S3 bucket, runs the model in Sagemaker with input from S3 bucket, and displays image to the web app from the AWS Sagemaker. These functions are used as REST APIs by connecting them to the AWS API Gateway. Then create an environment variable in the API Gateway to connect our AWS Sagemaker endpoint to these Lambda functions while configuring the code so that we can redirect the data from REST API to the AWS Sagemaker. Now that the AWS Lambda functions are created and connected to the AWS Sagemaker model, we created the REST endpoints for each AWS Lambda function that helps to add to appropriate buttons in the web app. On each REST endpoint, we configured such that they are connected to the web app (hosted in AWS S3 bucket) while the corresponding Lambda function is filled with the appropriate LAMBDA ARN after which we run the API Gateway. With this, the web app is integrated into the GAN model in the AWS Sagemaker.
Interactive Image Generation Using Cycle GAN Over AWS Cloud
421
Fig. 7 Shows different building images in dataset
c. Dataset Collection The dataset considered was Facade’s dataset which consists of 1012 images of different buildings. The size of the dataset is 34 MB. This dataset purely contains images of buildings. The facades dataset is available in Berkeley.edu. The link is provided here-http://efrosgans.eecs.berkeley.edu/pix2pix/datasets/facades.tar. gz as in Fig. 7.
4 Result and Analysis During the model training, the learning rate gained a maximum value of 0.9 for the Generator 1. The learning rates can be observed in Figs. 8 and 9. The lines shown in the graphs indicates the learning rates for the generator where it is trained 5 times with certain number of iterations or epochs. For generator 2, the learning rate reaches a maximum value, exceeding 1. These graphs are provided by WandB. After the model gets trained, WandB gives a clear picture of the model performance, learning rates and CPU performance. The loss function values for model 1 are 0.17749 and 0.32508, respectively for the Discriminator and Generator. For model 2, the loss function values are 0.17749 and 0.32508 respectively for the Discriminator and Generator. The identity loss values found in model 1 and model 2 are 0.27357 and 0.44331 which can be calculated by the functions used in the Sect. 3.b.i. as a cycle, each model has a loss function of 1.60813 and 0.73743, respectively. The learning rate can be observed in Figs. 8 and 9. The final application is a web app with an interactive GUI and a GAN model connected to the AWS Service. We have drawn few sketches in the web app and we got some good quality images of buildings. The output images were displayed on the web app which are as follows in Figs. 10 and 11
422
L. H. Nallamothu et al.
Fig. 8 Learning curve at each epoch during model training for generator-1
Fig. 9 Learning curve at each epoch during model training for generator-2
Fig. 10 Webapp interface generating set-2 image which is deployed on cloud
Interactive Image Generation Using Cycle GAN Over AWS Cloud
423
Fig. 11 Webapp interface generating set-2 image which is deployed on cloud
5 Conclusion and Future Work The project presents a step forward in generating real-life images. We have made the application possible using a Generative Adversarial model (Cycle GAN). Since we have hosted this web application in AWS, everyone can make use of this project especially designers and animators who spent a lot of time in drawing these images. The project uses interactive editing on the images before being generated. The images generated are nearly similar to the real life images that were accurate and have good resolution with an added advantage in the project of cloud. By deploying the model in AWS Cloud, we increased our efficiency in predicting the image faster with better resources. This project can be extended to generate 3D images which can show the 3 directional view of the building. However, training and generating 3D images requires high computational power. The above features are left for the future.
References 1. Isola P, Zhu J, Zhou T, Efros AA (2017)Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5967–5976. https://doi.org/10.1109/CVPR.2017.632 2. Khaizi ASA, Rosidi RAM, Gan H, Sayuti KA (2019)A mini review on the design of interactive tool for medical image segmentation. In: 2019 international conference on engineering technology and technopreneurship (ICE2T) 3. Shangguan Z, Zhao Y, Fan W, Cao Z (2020)Dog image generation using deep convolutional generative adversarial networks. In: 2020 5th international conference on universal village (UV) 4. Aggarwal A, Mittal M, Battineni G (2021) Generative adversarial network: an overview of theory and applications. Int J Inf Manag
424
L. H. Nallamothu et al.
5. Jin L, Tan F, Jiang S (2020) Generative adversarial network technologies and applications in computer vision. Comput Intell Neurosci 6. Hitawala S (2019) Comparative study on generative adversarial networks 7. Prasad A, Mehta N, Horak M, Bae WD (2021) A two-step machine learning approach for crop disease detection: an application of GAN and UAV technology 8. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: NIPS 9. Chu C, Zhmoginov A, Sandler M (2017) CycleGAN: a master of steganography. In: NIPS 2017, workshop on machine deception 10. Goodfellow I (2016) NIPS 2016 tutorial: generative adversarial networks 11. Zhao Y, Li C, Yu P, Gao J, Chen C (2020) Feature quantization improves GAN training 12. Sathish A, Adaptive shape based interactive approach to segmentation for nodule in Lung CT scans. J Soft Comput Paradig 2(4):216–225 13. Sungheetha A, Sharma R (2020) A novel capsnet based image reconstruction and regression analysis. J Innov Image Process (JIIP) 2(03):156–164 14. Zhu J, Park T, Isola P, Efros AA (2017)Unpaired image-to-image translation using cycleconsistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244 15. Teng L, Fu Z, Yao Y (2020) Interactive translation in echocardiography training system with enhanced cycle-GAN. IEEE Access 16. Bokhari MU, Shallal QM, Tamandani YK (2016)Cloud computing service models: a comparative study. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 890–895 17. Cheng K, Tahir R, Eric LK, Li M (2020) An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed Tools Appl
Challenges and New Opportunities in Diverse Approaches of Big Data Stream Analytics Nirav Bhatt, Amit Thakkar, Nikita Bhatt, and Purvi Prajapati
Abstract The continuous generation of data develops new data analysis challenge called stream data analysis. However, most of the real-time applications require effective processing of diverse, fast, continually generating, and huge volume data. Data pre-processing plays an efficient role in producing high-quality input data. Applications in which the notion of incoming data changes over time, a single classifier could not perform better for constantly changing input, leading to the ensemble learning strategy, in which we handle stream data with multiple weak learners rather than a single strong learner. Since the properties of the streams are generally unknown, selecting an appropriate method or model is a crucial task. Meta Learning approach helps in performing stream analytics by separating the base level characteristics of input data and convert it into meta-information so that the dynamic model can be selected as per the available meta-information. Deep learning has proven to be the most promising approach specifically for high dimensional big data streams. This research study briefly describes about the different approaches involved in big data stream analytics. Keywords Machine learning · Vectorization methods · Natural language processing
1 Introduction Information and Communication Technologies (ICT) bring new challenges to the real-time applications such as weather forecasting, wireless sensors networks and satellite systems, which infinitely generate an unstable data. The other areas such N. Bhatt (B) · P. Prajapati Smt. Kundanben Dinsha Patel Department of Information Technology, Chandubhai S Patel Institute of Technology, Charotar University of Science and Technology, Changa, Gujarat, India e-mail: [email protected] A. Thakkar · N. Bhatt Department of Computer Science Engineering, Chandubhai S Patel Institute of Technology, Charotar University of Science and Technology, Changa, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_31
425
426
N. Bhatt et al.
as banking and credit transactions will be constantly processed on a non-stop basis. The advent of big data stream analytics is linked to the development of applications that create extremely volatile data with continuous updates as well as the demand to perform extensive analysis of these modifications in real-time [1]. Big data streams are often assumed to arrive from a variety of heterogeneous data sources at high velocities and in large volumes. The method to obtain important information based on the fast data stream and large historical data is based on high—speed data and processing enormous historical data is a new challenge for the data mining operations to accomplish immediate analysis on incoming data [2–6]. There are many typical data mining methods that can be used for performing big data stream analysis and they also include some key features and challenges [7]. Analytics based on the stream data will lead to diverse challenges in different domains, like memory management, data pre-processing, resources and visualization of result as mentioned in Table 1. One of the major concerns for performing the analytics on such stream data is to handle the drift concept [8]. By applying the meta learning method to spot the drifts in data streams and maintain previous stream’s data, it must be assigned for some limited use only [1]. Different methods exist to perform better analytics on big data stream and the following sections of the paper briefly discuss about different methods and challenges in big data stream analytics. Big data stream processing urges the businesses to use planning for requesting an optimized allocation of the resources to large stream data in order to perform data driven stream data analytics [9–12]. In addition, the optimized arrangement of processing the big data stream would ensure the QoS requirements of the given input [9]. Table 1 Taxonomy of challenges and issues in stream data mining [8] Concerns
Issues
Methodologies
Memory organization
Varied, uneven & different data arrival speed over the period
Summarizing methods
Pre-processing of data
Worth of outcomes & automation of pre-processing methods
Shallow pre-processing methods
Compressed data structure
Scarce of memory & massive size of stream data
Progressive preserving of data structure, loading and innovative indexing methods
Resource attentive
Restricted resources such as space & computing power
Algorithm output granularity and [13]
Visualization of outcomes
Difficulties in analyzing data & providing instant decision
No standard method as it is a current research issue
Challenges and New Opportunities in Diverse Approaches of Big Data …
427
2 Big Data Stream Pre-processing Big data stream analytics is a challenging task, as the data represented may change over time. However, data pre-processing techniques like handling the missing values, feature selection and instance reduction has been developed to gain more attention towards data stream analytics. The incoming data streams are often raw and unstructured, with different inconsistencies [14]. Cleaning, integration, reduction, and transformation are some of the techniques used in pre-processing to prepare quality data. Currently, data has become “big data” since the quantity of the data is unknown or frequently limitless; therefore, they must be processed very quickly. In the same circumstance, data pre-processing is required to transform the raw data input into a high-quality influence. Mining from such an environment is difficult when the data is continually entering in batches and there is a fixed size of a block for processing. These problems are known as data stream problems, where the traditional mining algorithms need to be modifying in such a way that it can process the data in some limited amount of time. Hence, the data has to be pre-processed in terms of cleaning the noise and reducing the data from complex continuous space and transform the data [14, 15]. When the characteristics of the data changed over a time, it is known as the concept drifts in different data streams. As we are discussing about the big data streams, there is a need to eliminate high computational complexity in order to quickly respond to the incoming data [14, 16]. Pre-processing techniques that can process the real-time data with low computational complexity are required. Dimensionality reduction is one of the most used methods for reducing the data size [17, 18]. This must be done in a batch mode for data streams without making any assumptions about the receiving data. Figure 1 depicts different data reduction techniques that can be used for stream data pre-processing. Dimensionality reduction can be performed by using various feature set algorithms. One of them is filter-based algorithms, which act before the learning process starts and further the learning process can be carried out on a reduced set of data. Another type of feature set algorithm is wrapper based, where some specific and preidentified learning algorithms are used to assess the sub-categories of the feature. One can also select the fusion approach, which includes the advantages of both filters-based and wrapper-based algorithm. At Feature drift, a set of selected features evolve over time and also the feature space in test instances differ from the current selection, so the different types of conversations to be considered here are: • Lossy Fixed (Lossy-F), • Lossy Local (Lossy-L), • Lossless Homogenizing (Lossless). Spark from Apache, influenced by Map-Reduce process provides a critical framework permitting the processing of diverse data characteristics like graphical, textual and real time streaming [14, 19]. Spark can get an instant rise in performance
428
N. Bhatt et al.
Fig. 1 Hierarchical classification of big data stream pre-processing techniques
with inbuilt memory processing and by allowing Structured Query Language (SQL) through command line [14, 19].
2.1 Challenges and New Opportunities in Big Data Stream Pre-processing Instance reduction is considered as an important pre-processing approach because it allows the users to organize a subset of information to accomplish a comparable learning task that can be performed with original information but with identical performance. It is highly beneficial to have a comprehensive collection of instance reduction procedures for extracting the subsets of data from very large databases for accomplishing certain purposes and standards. The main challenge is that these techniques must be modified in order to manage a large amount of data, they require advanced calculating skills, and they are prepared to follow an iterative strategy. Preprocessing large-scale data and providing real-time responses are two of the most well-known and sought-after ideal models in the market. There are several specified techniques in Big Data streams even for performing programming enhancement. Stream pre-processing solutions, such as noise management can handle the big data conditions in future applications. A self-contained approach for dealing with the preprocessing part of stream data analytics allows various divisions, including both the number and length of sub-windows in an entire sliding window on a specific stream [20, 21].
Challenges and New Opportunities in Diverse Approaches of Big Data …
429
3 Ensemble Learning for Big Data Stream Analysis Ensemble approach for learning a big data stream is highly suitable for real world applications, which are specifically based on data stream classification. Ensemble learner also known as weak learner is convenient to construct when compared to a strong learner. Ensemble learning is considered as an elastic approach, which allows us to add or remove any data at any time and remain as a highly suitable approach for stream analytics as incoming data is infinite in nature and requires analytics as soon as it arrives; also there is an additional limited memory restriction to store the data [22, 23]. Although only a perfect combination of learners can provide better prediction, it is difficult to identify a proper set of learners. In case of multiple ensemble learners, it is mandatory to differentiate between the arrangement of that ensemble member and ensemble voting [24]. According to research, there is no single learner or a classifier available to solve the different characteristics of incoming data. Most of the ensemble learners intend to work with any base learner but the base learner must match the strategy of diversity induction [24].
3.1 Drift Detection with Ensemble Learning Ensemble learning methods are commonly used in performing stream classification as they have a good ability to deal with drift detection. Better data representation can lead to generating high performance with a simple learner. In traditional batch processing, there is a finite number of dataset. After completion of training there is no need to update model, while in data streams or in a real world, the concept drift may occur. On detecting different types of drifts such as sudden or gradual drift, the power of different weak learners can be combined to generate a high prediction like the strong learner. In Fig. 2, due to the concept drift more specifically sudden drift, there is a variation in supply of data which might result in the rise or reduction of the error rate for a diverse classifier. As a result, the concept drift appears and the classifier with the lowermost error rate changes and the corresponding best learner is provided to the user [24–26]. In a non-stationary condition, the ensemble segments of the classifier are developed from the pieces, which can be identified with various parts of the stream. As a solution to handle the concept drift, they generally develop a new classifier for every group of data [14]. The issue with the concept drift is imbalanced class categorization. Generally, this issue isn’t that much held in a non-stationary condition, however the information isolation is considered as the possible solution.
430
N. Bhatt et al.
Sudden Drift Best Classifier = L 1
Best Classifier = L 3
Best Classifier = L 2
L2 L3 Error Rate
L1 L2
L1
L3 L3
L1 L2
T0
Tn
T2n
Concept Drift
Concept Drift
T3n
Fig. 2 Hierarchical classification of pre-processing techniques on big data stream
4 Big Data Stream Analytics with Meta Learning As the data is continuous and unknown for the model present in data streams, the meta-learning plays an important role in selecting an algorithm for learners according to the incoming data characteristics [27, 28]. The most difficult task in a data stream is maintaining a model up-to-date based on the type of new arriving data; to do the same, the data must be monitored continually, and the model must be quick enough to catch up by using the most applicable technique. However, the scenario can be archived with the assistance of meta-learning. The primary difference between base-learning calculation and meta-learning calculation is that in base-learning, learner bias is fixed from the start, but in metalearning, it changes based on the properties of the data. As soon as the data arrives, the base level characteristics get separated, and meta-information will be generated. This data is made up of the metadata of streams. Following that, the learner processes the meta-information to perform regression and assist the meta classifier in predicting the class. One of the most common methods for dealing with data streams is the metastream approach, which employs single or many learners on a regular basis. The use of meta-stream instead of ensemble learner can enhance the incoming prediction and remain more robust than the ensemble learner [29–31].
Challenges and New Opportunities in Diverse Approaches of Big Data …
431
Data Stream …
...
Base Learner
L-1
Drift Detector
L-2
Repository
Meta Learner
Prediction
Similarity Function f based on Fuzzy
Fig. 3 A meta learning model for prediction over recurrent drift
Figure 3 represents the meta learning-based model for performing classification along with the detection of recurrent drift through the fuzzy-based similarity function. The primary objective is to train a meta learner by providing the data identified from the recurring drift that has evolved throughout the framework. The Meta Learner will be able to predict such drifts in the future. The fuzzy inference-based similarity function f is used to measure the amount of similarity across distinct concepts and hence detect concept drift as soon as possible [32, 33].
5 Big Data Stream Analytics with Deep Learning The generation of data representations and features from large and raw data remains a crucial component of machine learning [25]. Since incoming data in streams is unsupervised, deep learning algorithms are influenced by artificial intelligence as they extract characteristics from such huge data to a simpler form. Deep Learning can assist in solving big data challenges including semantic indexing, data tagging, IR, and discriminative modeling [29, 34]. Nevertheless, few deep learning methods are becoming highly expensive when dealing with high-dimensional data. The output generated by the last layer of the deep learning process provides useful data for building classifiers.
432
N. Bhatt et al.
Most deep learning algorithms have complex datasets and a high degree of abstraction, which is why they developed a hierarchical structure for learning and interpreting data. High-dimensional unexploded data is used in big data stream learning systems of deep learning [25]. The main challenge in applying Deep Learning (DL) algorithms with massive data streams is that many of the methods are quite time-consuming when working with streams.
6 Conclusion This article presented the possibilities of performing big data stream analytics by using different approaches like Ensemble Learning, Meta Learning and Deep Learning. This study concludes that the data stream challenge specifically the concept drift can be handled in different ways by using different approaches. As the Ensemble Learning perform better when compared with the single learner in terms of handling sudden drift, Meta Learning approach for stream analytics can limit ensemble learning and perform better in recurrent drifts with the help of a fuzzy function. For base level, the meta-stream remains more robust than the ensemble approach. High dimensional stream data pre-processing is very helpful to obtain quality data as an input for processing. However, for dealing with high dimensional data, deep learning algorithms leverage promising results.
References 1. Shah Z et al (2017) A technique for efficient query estimation over distributed data streams. IEEE Trans Parallel Distrib Syst 28(10):2770–2783 2. García S et al (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):9–9 3. Wu Y (2014) Network big data: a literature survey on stream data mining. J Softw 9(9):2427– 2434 4. Bhatt N, Thakkar A (2021) An efficient approach for low latency processing in stream data. PeerJ Comput Sci 7:426–426 5. Bhatt N, Thakkar A (2019) Experimental analysis on processing of unbounded data. Int J Innov Technol Exp Eng 8(9):2226–2230 6. Bhatt N, Thakkar DA (2019) Big data stream processing: latency and throughput. Int J Adv Sci Technol 28:1429–1435 7. Young P et al (2017) Library support for text and data mining: a report for the University Libraries at Virginia Tech 8. Kholghi M, Keyvanpour M (2011) An analytical framework for data stream mining techniques based on challenges and requirements. arXiv:1105.1950 9. Vakilinia S, Zhang X, Qiu D (2016) Analysis and optimization of big-data stream processing. In: Global communications conference (GLOBECOM), pp 1–6 10. Shakya S, Smys S (2021) Big data analytics for ımproved risk management and customer segregation in banking applications. J ISMAC 3(03):235–249 11. Karthigaikumar P (2021) Industrial quality prediction system through data mining algorithm. J Electron Inform 3(2):126–137
Challenges and New Opportunities in Diverse Approaches of Big Data …
433
12. Chormunge S, Mehta R (2021) Comparison analysis of extracting frequent ıtemsets algorithms using mapreduce. In: Intelligent data communication technologies and ınternet of things. Springer, pp 199–210 13. Teng W et al (2004) Resource-aware mining with variable granularities in data streams. In: Proceedings of the 4th SIAM ınternational conference on data mining, pp 527–531 14. Hashem H, Ranc D (2016) Pre-processing and modeling tools for bigdata. Found Comput Decis Sci 41(3):151–162 15. Prajapati P, Thakkar A (2019) Extreme multi-label learning: a large scale classification approach in machine learning. J Inf Optim Sci 40(4):983–1001 16. Ramírez-Gallego S et al (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57 17. Sammut C, Webb GI (2017) Encyclopedia of machine learning and data mining. Springer Publishing Company 18. Vlachos M (2011) Dimensionality reduction. In: Encyclopedia of machine learning. Springer, pp 274–279 19. Gomes HM et al (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv (CSUR) 50(2):23–23 20. Lan K et al (2017) Self-adaptive pre-processing methodology for big data stream mining in internet of things environmental sensor monitoring. Symmetry 9(10):244 21. Tidke B, Mehta R (2018) A comprehensive review and open challenges of stream big data. In: Soft computing: theories and applications. Springer, pp 89–99 22. Zhou Z-H (2009) Ensemble learning. In: Encyclopedia of biometrics, vol 10, pp 978-0 23. Gomes HM, Barddal JP, Enembreck F (2015) Pairwise combination of classifiers for ensemble learning on data streams. In: Proceedings of the 30th annual ACM symposium on applied computing, pp 941–946 24. Pesaranghader A, Viktor H, Paquet E (2018) Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach Learn 107:1711–1743 25. Krempl G et al (2014) Open challenges for data stream mining research. ACM SIGKDD Exp Newsl 16(1):1–10 26. Krawczyk B et al (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156 27. Rossi ALD et al (2017) A guidance of data stream characterization for meta-learning. Intell Data Anal 21(4):1015–1035 28. Rossi ALD et al (2014) MetaStream: a meta-learning based method for periodic algorithm selection in time-changing data. Neurocomputing 127:52–64 29. Najafabadi MM et al (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–1 30. Babu K, Narsimha RPV (2018) Survey on dynamic concept drift. J Comput Sci Syst Biol 283–283 31. Bhatt N et al (2020) Algorithm selection via meta-learning and active meta-learning. In: Smart systems and IoT: ınnovations in computing. Springer, pp 169–178 32. Ángel AM, Bartolo GJ, Ernestina M (2016) Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst Appl 46:87–105 33. Gomes JB, Menasalvas E, Sousa PA (2010) Tracking recurrent concepts using context. In: International conference on rough sets and current trends in computing. Springer, pp 168–177 34. Bhatt N, Ganatra A (2021) Improvement of deep cross-modal retrieval by generating real-valued representation. PeerJ Comput Sci 7:491–491
Door Lock System Using Cryptographic Algorithm Based on Internet of Things Sumit M. Sangannavar, Sohangi Srivastava, G. N. Sagara, and Usha Padma
Abstract Security is very important in today’s world, and this work is concerned with security. A Secured door lock system is implemented that uses Secure hash algorithm (SHA) to authenticate the authorised person and Internet of Things (IoT) to interconnect the fingerprint sensor, a Global System for Mobile Communication (GSM) module, and an Arduino microcontroller. The approved person’s fingerprints are previously saved in the microcontroller, and secure hash algorithm is used to determine if the individual is authorised or not. If the user’s fingerprint is authenticated, an One Time Password (OTP) is sent to that person’s mobile number through GSM. If the person’s fingerprint does not match any previously saved fingerprints, then he is considered as an unauthorized person, and no OTP will be sent to his mobile number. Instead, a buzzer will be activated to indicate that someone is attempting to open the door. This can be used in places where security is important, such as banks and other workplaces. Keywords Door lock system · Secure hash algorithm (SHA) · Global system for mobile communication (GSM) · Internet of things (IoT) · Arduino microcontroller
1 Introduction Rapid growth of Internet of Things (IoT) along with wide range of applications made IoT popular. The door lock system finds application in office buildings, banks, shopping centres, server rooms, laboratories and households. There are also many other applications of the door lock system. The Internet of Things (GSM) has undergone many revolutionary changes in the latest technology development in the fields of industry, smart home applications, agriculture, medical devices, smartphones, etc. In the field of GSM networks, data related applications find confidentiality, unauthorized access control of confidential data and also remote access to information. Main S. M. Sangannavar (B) · S. Srivastava · G. N. Sagara · U. Padma Departement of Electronics and Telecommunication, R.V. College of Engineering, Bangalore 560059, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_32
435
436
S. M. Sangannavar et al.
goal of the work is to design a door lock system using GSM and SHA algorithms. The system designed based on GSM achieves confidentiality, security and remote data access applications. The door lock system is built to guard the door, secure things and also to provide security for the data transmitted in the system, ensured through SHA algorithm. Every devices are inter-connected within the network and are involved in sharing of information in this digital age. The main concern these days is securing information over the network. Researchers are particularly interested in how they might protect IoT device data from hackers when it is being transmitted over the network. IoT devices exchange critical user information across the network. One of the major challenges is that IoT is utilized to control and manage the smart door lock via a smartphone connected to the network, and confidential data is transmitted to the server, where the confidential data may contain the user’s password and other essential information. Because data transmitted over the internet is vulnerable, cryptographic techniques are employed to hash user data in order to safeguard internet communication.
2 Literature Survey Puts forward a door lock system that enables remote door operation by the user using face recognition technology via a camera mounted on the hardware system. The remote access was done via a Smartphone consisting of a cloud backend, hardware unit and mobile application. In case of any damage to the face, then it will not be able to recognize the authorised user. This is the drawback of this, also there is no security provided to the cloud [1]. The generation of PIN on an android Smartphone was used for door locking. The OTP was generated whenever a user attempted to unlock the door and the PIN was sent as the notification to the authorised user’s Smartphone. There was also the voice command functionality available in the system [2]. The automated door locking using a Bluetooth device was proposed in [3], where the password was entered into this Bluetooth device. This password-based system built using Bluetooth is a low cost unlocking system but the technology has gone through development and there are many such devices which have replaced Bluetooth. Therefore, using Bluetooth is an outdated technique [3]. The RFID and Arduino based automatic door lock system used RFID (radio frequency identification) technique. The system was provided with a card which was tagged to the object. The system used an electromagnetic field of radio frequency for the transmission of data. It could have been developed with low cost. If the card is nowhere to be found or stolen by any person, then the whole system fails to function and in case the card gets stolen then the system will come at risk [4]. For banking and business organisations, a door lock system based on a cryptographic algorithm and an IoT system was developed. The security for door was provided and monitored through Arduino. But only a single user was allowed to operate the door lock [5]. The system designed on the basis of IoT achieves applications of confidentiality, security and also remote access of data. Sometimes, the data shared over a network might not be
Door Lock System Using Cryptographic Algorithm Based on Internet …
437
secured. Therefore, many researchers are taking interest in developing such a system secured from hackers. The cryptographic algorithms which were used in this door lock system will protect the data being communicated from hackers [6]. In an IoTbased system, the user was exchanging confidential information over the network that could have contained passwords and other data which are just as important as the items hidden behind the door. A unique system for a password-based and cryptographically shielded extremely secure door lock system was proposed to address the concerns about the protection of valuables and information [7]. The data over the network was at risk and can be sniffed by any unauthorized person in case of cyber attacks. Therefore, it was essential to maintain data confidentiality and integrity. In this digital era, there is a need for such door lock system which is highly secured [8]. The goal of the smart security system was to implement the Internet of Things concept to provide security. Although incredibly safe, this system was also smart enough to process user inputs in order to authenticate the user. The sensors were able to detect a break in and notify the homeowner by various means [9]. It was suggested to employ a digital door lock with a password or pin code that prevents users from entering the door unless they enter the right one [10]. Building a networked laboratory door opening and shutting system using an ESP8266 node MCU (NLDOC system) is proposed [11]. The ESP8266 node MCU’s function, along with the Arduino application, served as the foundation for the creation of an application that links computers and smartphones, enabling remote control of the practical labs’ door latch over Wi-Fi in addition to security monitoring via a camera system. The smart home environment provides a suitable home configuration where household appliances and equipment are linked together over the internet and may be operated from anywhere by any mobile or remote device. Every smart home’s amenities can be set up with the aid of service devices in accordance with the wishes of the home’s owner [12]. Smart switch automation system provides an intelligent and secured system to enhance the quality of human lives. Variety of sensors are developed to enhance the safety and security in addition to the fully-automated appliances. The system does not only monitor the sensor data, but also it detects the process according to the requirement [13].
2.1 Block Diagram The proposed smart door lock system as shown in Fig. 1 consists of an Arduino microcontroller, Node MCU microcontroller, OPEN/CLOSE servo motor, GSM module for communication, a fingerprint module for identification, Keypad for password, buzzer for security and LCD for display. Node MCU is utilized to perform blockchain technology, whereas Arduino microcontroller handles sensor and other related component integration. The proposed smart door lock system as shown in Fig. 1 consists of an Arduino microcontroller, Node MCU microcontroller, OPEN/CLOSE servo motor, GSM module for communication, a fingerprint module for identification, Keypad for password, buzzer for security and LCD for display. Fingerprint of each user is assigned with a unique identification number. Once the
438
S. M. Sangannavar et al.
Fig. 1 Block diagram of door lock system
fingerprint is received, the system will determine which ID it belongs to. Following the retrieval of the data, a hash code for the data of that specific user ID will be generated using the SHA-256. If the generated hash value matches with that of the saved hash value, a password is selected from a pool of saved passwords, and utilizing GSM, the generated password is transmitted through SMS to the registered mobile phone. The user must enter the password obtained on their mobile phone into the keypad, after which the servo motor rotates and the door opens. If the fingerprint is not matched, a buzzer sounds and the fingerprint must be inserted again to unlock the door.
2.2 Flowchart Figure 2 shows the flow of the door lock system that is implemented. After hardware setup is done, power supply must be given to the ATmega328p and GSM module SIM800c. The Microcontroller integrates sensors and other connected peripherals. Initially, the fingerprints of valid users are stored in the EEPROM of fingerprint sensor. Once the system is active, servo motor is set at CLOSE status. Each user has a distinct ID and password. Next, the LCD display asks user to insert fingerprint
Door Lock System Using Cryptographic Algorithm Based on Internet …
439
Fig. 2 Flowchart of the door lock system
on R307 sensor. If it matches with the saved fingerprint, unique ID of specific user is retrieved and displayed. If the fingerprint is invalid buzzer gets activated alerting unauthorized access into the system. The hashed value of matched ID is generated. The SIM 800c module then transmits password to the registered phone number through GSM. Keypad is used to enter the received password, if it matches servo motor rotates and the door opens or else the system asks to insert the fingerprint again.
440
S. M. Sangannavar et al.
3 System Description and Methodology 3.1 Secure Hash Algorithm (SHA) Secure Hash Algorithms are a class of cryptographic functions that are used to keep data secure. It transforms data using a hash function, which is a method composed of bitwise operations, modular additions, and compression functions. SHA-256 algorithm was employed in our work. Preprocessing stage Initially, the string is converted into binary. Padding Bits are included in the message as a few additional bits, leaving the message’s length exactly 64 bits short of a multiple of 512. The final plaintext is a multiple of 512 by adding the last 64 bits of the last block. Applying the modulus to the original cleartext without the padding results in the calculation of these 64 bits of characters. The entire message block, which is ‘m × 512’ bits long, is broken into ‘m’ chunks, each of which is 512 bits long. These chunks are then subjected to rounds of operations, with the results fed as input for the subsequent rounds of operations. Eight buffers that will be used in the rounds are initialised with their default values. Eight 32-bit hexadecimal words make up the SHA-256 hash values. These words were derived using the first 32 bits of the square roots of the first eight prime numbers’ fractional components. The cube roots of the first 64 prime numbers are kept in an array as 64 separate keys, ranging from K [0] to K [63]. The first 32 bits of the fractional numbers are used in this stage. Hash computation The input data is copied into a new array, W [0…63], where each item is a 32bit word with 48 additional words initialised to zero. This leaves 64 words in the message schedule (W ), which is just a temporary variable to improve the compression function’s quality. The following algorithm is used to modify the zeroed indexes at the array’s end: • S0 = (W [i − 15] rightrotate 7) ⊕ (W [i − 15] rightrotate 18) ⊕ (W [i − 15] rightshift 3) • S1 = (W [i − 2] rightrotate 17) ⊕ (W [i − 2] rightrotate 19) ⊕ (W [i − 2] rightshift 10) • W [i] = W [i − 16] + S0 + S[i − 7] + S1 Set the initial values of variables A, B, C, D, E, F, G, and H to the corresponding current hash values of h0, h1, h2, h3, h4, h5, h6, and h7. As shown below, the round function loop will change the values from a to h: A = H + S1 + ch + k[i] + w[i] + S0 + maj B=A C=B D=C
Door Lock System Using Cryptographic Algorithm Based on Internet …
441
Fig. 3 Round function
E = D + H + S1 + ch + k[i] + w[i] F=E G=F H =G S0 = (A rightrotate 2) ⊕ (A rightrotate 13) ⊕ (A rightrotate 22) S1 = (E rightrotate 6) ⊕(E rightrotate 11) ⊕ (E rightrotate 25) ch = (E F) ⊕ ((~E) G) maj = (A B) ⊕ (A C) ⊕ (B C) After the round function loop as shown in Fig. 3, the modified hash values are added to their respective variables, a-h. 64 iterations of round functions are performed, wherein extra data is added when the hash values produced above are rotated in a certain pattern. On the basis of the results of the past operations, new hash values are generated. One last 256-bit hash value, which serves as the result of SHA-256, is generated in the final round [14].
3.2 Global System for Mobile Communications (GSM) A GSM module is a piece of hardware that connects wirelessly to a network using GSM mobile phone technology. GSM modems are used for communication by mobile phones and other hardware that connects to mobile phone networks. To identify their device to the network, they need SIM cards. GSM Module with a serial interface modem that operates on a 6 V power supply consumes low power for voice and SMS transmission and reception. One SIM card interface is supported, as well as Universal Asynchronous Receiver/Transmitter (RX & TX) pins. SIM800c module
442
S. M. Sangannavar et al.
uses GSM library to allow the microcontroller to connect, exchange SMS and make voice calls. AT-commands will be used to access the functions of GSM Module such as sending and receiving Voice Calls, Text Messages. By sending the AT command AT + CMGF = 1, the Text Mode is activated. Then, via serial1 port, the AT command AT + CMGS signifying send an SMS containing the mobile number is transmitted to the GSM module. It can transmit data at rates ranging from 64 kbps to 120 Mbps. The modulation is performed by using Gaussian Minimum Shift Keying (GMSK). Bandwidth among the users is shared by using a combination of TDMA and FDMA in GSM. Total Bandwidth of 25 MHz is divided into 124 carrier frequencies of bandwidth 200 kHz each using frequency division multiple access. Now, each of these frequencies is sliced into eight timeframes using time division multiple access. Each of these frames is used to carry out data exchange [15].
3.3 Fingerprint Sensor The DSP processor, optical fingerprint sensor, fingerprint alignment algorithm, FLASH chips, and other hardware and software elements are all included in the R307 Fingerprint Module. It provides features like fingerprint entry, image processing, fingerprint matching, search and template storage. An array of inputs is being used to map the pattern of the user fingerprint. Sometimes these sensors also support swipes or force sensing. It is a Fingerprint sensor module with TTL (transisitor-transistor logic) serial communication interface. The user can store the fingerprint data in the module and can configure it in one-to-one or one-to-many mode for identifying the user. The baud rate is 9600. It has 6 wires, out of which 4 wires are used: 2 for data and 2 for powering to Arduino microcontroller. It has a Supply voltage of 3.3 V [16]. Table 1 describes about Pin configurations of the fingerprint sensor. Table 1 Pin description of R307 fingerprint sensor Pin no
Pin name
Details
1
5V
5 V regulated DC
2
Ground
Common ground
3
TXD
Data O/P-connect to microcontroller RX
4
RXD
Data I/P-connect to microcontroller TX
5
Touch
When a finger touches the sensor, the active low output becomes active
6
3.3 V
Instead of 5 V, use this wire to supply 3.3 V to the sensor
Door Lock System Using Cryptographic Algorithm Based on Internet …
443
3.4 ATmega328p The power and reset circuits, as well as the circuitry to programme and communicate with the microcontroller over USB, are already setup on the Arduino hardware platform. In terms of software, Arduino offers a variety of libraries to make microcontroller programming simpler. The I/O pin control and reading capabilities are the most basic of these. The Arduino platform includes a lot of pre-wiring and free code libraries available for prototyping. Allowing to spend more time testing the idea and less time creating supporting circuitry. The ATmega328P as shown in Fig. 4, is a high performance, low power consumption microcontroller. It has a built in selfprogrammable flash program memory and Programming Lock for software security. Table 2 describes about Pin configurations of the Atmega328p. 8-bit single chip microcontroller, that can execute 131 instructions in a single clock cycle due to its reduced instruction set architecture. It is a frequently used processor in Arduino boards and has a watchdog timer and an on-chip oscillator. It also consists of a serial USART which is programmable [17].
Fig. 4 ATmega328p pin diagram
444 Table 2 Pin description of ATmega328P
S. M. Sangannavar et al. Memory type of program
Flash
Memory size of program
32
CPU speed in MIPS
20
SRAM in KBs
2,048
EEPROM data in bytes
1,024
Digital communication peripheral
1-UART, 2-SPI, 1-I2C
Capture, compare, PWM Peripheral
1 ınput capture, 1 CCP, 6PWM
Timers, counters
2 × 8-bit, 1 × 16 bit
Number of comparators
1
Temperature range
−40 to 85°
Operating voltage range in volts
1.8–5.5 V
Number of pins
32
Low power
Yes
4 Results and Analysis Figure 8 depicts the hardware setup of Door lock system consisting of ATmega328p microcontroller, LCD I2C display unit, R307 fingerprint sensor module, NodeMCU, GSM module SIM800c, servo motor, buzzer and a keypad. After making connections, the microcontroller and GSM module are given power supply. Once the system is active GSM module’s LED light blinks, indicating that it is ready for communicating the messages. The LCD displays a message as shown in Fig. 6a indicating system is waiting for a valid fingerprint. Upon inserting a valid fingerprint, system matches with the already saved fingerprint data. If it’s matching, the LCD displays a message as shown in Fig. 6b. After user validation, the user ID and password of user are retrieved. The user ID is displayed on LCD as shown in Fig. 6c. If it’s not matching, buzzer gets activated alerting of unauthorized access to the door. The SIM800c module transmits the user’s password to their registered phone number using GSM. The user receives the password as shown in Fig. 5. LCD displays a message to enter the password as shown in Fig. 7a. Keypad is used to enter the password. If it matches, LCD displays a message as shown in Fig. 7b depicting that the password is matched. The servo motor rotates to OPEN status. If password doesn’t match, the LCD displays a message as shown in Fig. 7c and user is asked to insert fingerprint again.
Door Lock System Using Cryptographic Algorithm Based on Internet …
Fig. 5 Text message received on registered phone number
Fig. 6 a Waiting for valid finger. b Fingerprint is matched. c User ID is retrieved
445
446
S. M. Sangannavar et al.
Fig.7 a Enter the received password. b Password is matched
Fig. 8 Hardware setup
5 Conclusion and Future Scope A secure and protected door lock mechanism is presented in this research. The work seeks to safeguard not just the Door Lock system, but also the information transferred across the network. The hashing algorithm SHA-256 is used to conduct hashing on user input. An automatic door lock system is built with Arduino, a fingerprint sensor, and a GSM module for OTP sharing. Since this work is simpler and less expensive, the typical individual may take advantage of great security. The microcontroller allows for easier system installation when compared to the other existing systems. This system’s applications include use in offices, banks, and other institutions. This work can be upgraded in the future by placing cameras and monitoring invalid passage using image processing techniques.
Door Lock System Using Cryptographic Algorithm Based on Internet …
447
References 1. Delaney JR (2019)The best smart locks for 2019. PC. https://www.pcmag.com/article/344336/ the-best-smartlocks. Accessed 18 Apr 2019 2. Patil BS, Mahajan VA, Suryawanshi SA, Pawar MB (2018) Automatic door lock system using pin on android phone. Int Res J Eng Technol (IRJET) 05(11) 3. Kamelia L, Noorhassan A, Sanjaya M, Mulyana WE (2014) Door-automation system using bluetooth-based android for mobile phone. ARPN J Eng Appl Sci 9(10) 4. Divya R, Mathew M (2017) Survey on various door lock access control mechanisms. In: 2017 ınternational conference on circuit, power and computing technologies (IC CPCT), pp 1–3 5. Nehete PR, Chaudhari J, Pachpande S, Rane K (2016) Literature survey lock security systems. Int J Comput Appl 153:13–18 6. Shafin MK, Kabir KL, Hasan N, Mouri IJ, Islam ST, Ansari L et al (2015) Development of an RFID based access control system in the context of Bangladesh. In: 2015 International conference on ınnovations in ınformation, embedded and communication systems (ICIIECS), pp 1–5 7. Verma GK, Tripathi P (2010) A digital security system with door locksystem using RFID technology. Int J Comput Appl (IJCA) 5(09758887):6–8 8. Madhusudhan M (2017) Implementation of automated door unlocking and security system 2(3):5–8 9. Burglary, Fall 2019. https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-theu.s.-2018/topicpages/burglary. Accessed 3 Mar 2020 10. Jagdale R, Koli S, Kadam S, Gurav S (2016) Review on intelligent locker system based on cryptography wireless & embedded technology. Int J Tech Res Appl 75–77 11. Tuyen NT, Ngo.c NQ, Hung NX (2021) On an application of node MCU Esp8266 in opening and closing the laboratory door—Online practice. Glob J Eng Technol Adv 09(03):086–091. eISSN:2582-5003 12. Hamdan YB (2021) Smart home environment future challenges and issues-a survey. J Electron Inform 3:239–246 13. Bagde S, Ambade P, Batho M, Duragkar P, Dahikar P, Ikhar A (2021) Internet of things (IOT) based smart switch. J IoT Soc Mob Anal Cloud 2:149–162 14. Aradhana SMG (2017) Review paper on secure hash algorithm with its variants. Int J Tech Innov Modern Eng Sci (IJTIMES) 3(05):e-ISSN: 2455-2584 15. Abd Rahman NA, Ibrahim NH, Lombigit L, Azman A, Jaafar Z, Abdullah NA, Mohamad GH (2018) GSM module for wireless radiation monitoring system via SMS. In: IOP conference series: materials science and engineering 16. Namboodiri S, Arun P (2018) Fingerprint based security system for vehicles. Int J Adv Res Ideas Innov Technol 4(4): ISSN: 2454-132X 17. Louis L (2016) Working principle of arduino and using it as a tool for study and research. Int J Control Autom Commun Syst (IJCACS) 1(2)
Knowledge Engineering-Based Analysis of Convolutional Neural Network Architectures’ Performance on Luna16 and GAN Generated Pulmonary Nodule Clipped Patches to Diagnose Lung Cancer Ramasubramanya Mysore Sheshadri, Yash Aryan Chopra, Yashas Anand, G. Sumukh, and S. Geetha Abstract Lung cancer patients constitute a relatively smaller number than the overall population of non-cancerous individuals. This causes an imbalance between benign and malignant nodule image class data, which impacts the Convolutional Neural Network model that classifies whether the nodule image is benign or malignant. The Deep Convectional Generative Adversarial Network (DCGAN) is used to produce nodule samples in order to balance classes. A larger dataset of nodule images generated by using DCGAN can solve the problem of overfitting. The DCGANgenerated nodule samples and original nodule clipped image patches data are fed into various CNN architectures, and their performance is examined to find the best CNN architecture that can classify benign and malignant nodules effectively. Keywords Medical image processing · Pulmonary nodule patches · Convolutional neural networks · Deep convolutional generative adversarial network
1 Introduction Lung cancer is one of the primary causes of mortality. The global chance of developing cancer before the age of 75 is 20.4% [1]. Lung cancer accounts for 11.4 percent of new cases and 18 percent of fatalities of all ages in 2020, according to GLOBOCAN 2020 [2]. Doctors employ low radiation dosage CT to identify cancer early. This is a form of CT that uses low doses of radiation to obtain comprehensive three-dimensional images of the lung. Radiologists and physicians utilize these comprehensive images to diagnose lung cancer. Based on the diagnosis of radiologists, the treatment of cancer is prescribed, but this diagnosis can be affected by R. M. Sheshadri (B) · Y. A. Chopra · Y. Anand · G. Sumukh · S. Geetha Department of Computer Science and Engineering, BNM Institute of Technology, Bangalore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_33
449
450
R. M. Sheshadri et al.
insufficient work experience or exhaustion [3]. Computer Aided Diagnosis has been created to ease physician burden while improving diagnosis accuracy and efficiency [4]. Computer Aided Diagnosis (CAD) has the potential to greatly aid in the diagnosis of malignant pulmonary nodules. The mean Az value rose considerably (P = 0.018) from 0.896 without CAD output to 0.923 with CAD output. The major reason for the performance increase is the shift from false negatives without CAD to true positives with CAD (19/31, 61%) [5]. The authors give a comprehensive examination of knowledge Engineering-based experiments with different convolutional neural network architectures on clipped lung nodule lesions and analysis of the performance of the architectures, challenges, and future work in this publication.
2 Relevant Knowledge 2.1 Convolutional Neural Networks A class of neural networks that are widely used for image processing. It is helpful in identifying and extracting features of an image that can help in solving problems. A CNN is made up of convolutional layers which perform a convolutional operation on the pixel values as shown in Fig. 1 [6]. Filters generate feature maps that highlight areas of the image in which the filters activate the most [15]. Employing multiple filters allow generating more feature maps that highlight more areas.
2.2 Alexnet AlexNet is a dense deep convolutional architecture that was trained on over one million images derived from the ImageNet dataset. The model was finally able to classify the data into a thousand different classes. The network consists of sixty million parameters and between six hundred and fifty thousand neurons. The model consists of five convolution layers followed by subsequent max-pooling layers. The output of convolution and pooling operations is then fed into three fully connected layers with a final thousand-layer softmax classifier. Initially, 227 × 227 × 3 images are passed to a convolution layer having ninety-six filters of size 11 × 11 and a stride of four. The resulting 55 × 55 × 96 image is fed into the max-pooling layer where only relevant features are retained. This is then followed by another convolution layer with two-fifty-six filters of size 5 × 5 and padding of two. The image passes through four more convolution and max-pooling layers. After the final max-pooling operation, the output is mapped onto a fully connected layer of size four thousand and ninety-six. The image passes through two more fully connected layers [16]. The final output layer consists of one thousand neurons to classify the image into one of a thousand classes.
Knowledge Engineering-Based Analysis of Convolutional Neural …
451
Fig. 1 Convolutional operation on pixels in a CNN
2.3 InceptionV3 InceptionV3 is a well-known model used for image classification which was originally trained on the ImageNet dataset on thousand classes. The InceptionV3 is a forty-eight-layer network that uses newer techniques to down-sample the number of parameters computed during the convolution process by using the concept of factorizing convolutions. It initially uses the concept of factorizing larger convolutions of size 5 × 5 into two 3 × 3 convolutions, thereby reducing the computational parameters [7]. This block is referred to as Inception Block-A (Similarly, Inception Block-B and C are to be considered). The next part implements the factorization of smaller convolutions of size 3 × 3 into (1 × 3 and 3 × 1) convolutions using the process of asymmetric convolutions. Inception Block-B is used for handling 7 × 7 convolutions, and Inception Block-C is used for handling 3 × 3 convolutions. 299 × 299 × 3 image is fed into five conv3D layers and two max-pooling layers. The generated
452
R. M. Sheshadri et al.
output is passed on to Inception Block-A and iterated three times. The output is then factorized using reduction blocks and then fed to Inception Block-B iterated four times. Using the concept of auxiliary classification (used to solve the vanishing gradient problem), the model loss is calculated prior to actual classification. The data is then passed to Inception Block-C, whose output is then average pooled. The loss of the auxiliary classifier is added to the main classifier to improve the result of classification.
2.4 Xception Xception is a deep convolutional neural network architecture inspired on Inception, with Inception modules replaced with depth-wise separable convolutions. Xception outperforms VGG16, ResNet, and Inception V3 on the ImageNet dataset. Because the Xception architecture has the same number of parameters as Inception V3, the performance gains are due to more efficient use of model parameters rather than increased capacity. The Xception architecture consists of thirty-six convolutional layers, which form the feature extraction basis of the network. The thirty-six convolutional layers are structured into fourteen modules, which all have linear residual connections around them, except for the first and last modules [8]. The data first passes via the entry flow, then through the middle flow, where it repeats itself eight times, and lastly through the exit flow. In Xception, all Convolution and Separable Convolution layers are followed by batch normalization, and all Separable Convolution layers use a depth multiplier of one [9].
2.5 GoogLeNet GoogLeNet is a deep convolution neural network that is a variant of the Inception network. The GoogLeNet architecture consists of 22 layers (27 including pooling layers). The input layer accepts an image of dimension 224 × 224. Goog-LeNet increases the computational efficiency by reducing the input image, whilst simultaneously retaining spatial details. To achieve this, the first convolution layer in GoogLeNet uses a filter of size 7 × 7, which immediately reduces the input image. The following convolution layer has two as the depth and has one cross one conv block. This affects the dimensionality reduction i.e., reduces computational load. The size of the input image gets reduced by four times at the second conv layer and by eight times before reaching the first inception module. GoogLeNet has nine Inception modules and two notable max-pooling layers between the modules to downsize the input image. A forty-percent dropout layer is used before the linear layer to prevent the overfitting of the network [10].
Knowledge Engineering-Based Analysis of Convolutional Neural …
453
2.6 DenseNet DenseNet is a CNN model, which uses shorter connections between layers to make networks deeper while also making them easier to train. Each layer in DenseNet is directly attached to all other layers in the network. It works in a feed-forward nature, as input for each layer is obtained from previous layers. DenseNet makes use of two important blocks, Dense Blocks, and Transition Layers. DenseNet starts with a simple convolution block with a stride of two and 64 filters of size 7 × 7. The Max-Pooling layer, which has 3 × 3 max-pooling and a stride of two, follows a convolution block. Each conv block, after input, is made of BatchNormalization, ReLU, and a conv2D layer. Each Dense block is made of two convolutions. These two convolutions have kernel sizes of 1 × 1 and 3 × 3. Dense block one makes use of this 6 times, while in Dense block two it is utilized 12 times, in dense block three for 24 times and in dense block four, it is used 16 times. The reason why 1 × 1 convolution is placed before 3 × 3 is because the 1 × 1 layer is used as a bottleneck layer to improve computational efficiency. In transition layers, the number of channels has to be reduced to half the number of existing channels. Each Transition layer consists of a 1 × 1 convolution layer and a 2 × 2 average pooling layer with a stride value of two. Finally, there is the 7 × 7 Global average pooling layer followed by the final output layer with a SoftMax activation function [11].
2.7 ShuffleNet A fifty-layer architecture (forty-four for arch2). To decrease computing costs, pointwise group convolution and channel shuffle are utilized. Pointwise convolution is a convolution using a 1 × 1 kernel. This architecture uses shuffling to shuffle group convolutions after each layer. This allows the various layers to receive inputs from different groups [12].
2.8 GAN Generative in GAN refers to its ability to generate new images from the training dataset. This is different from earlier neural network models that could recognize things. This concept of generating images is helpful in areas where there is a lack of sufficient images to train a neural network [13]. Since the accuracy of neural networks depends heavily on the amount and quality of datasets, GAN plays an important role. The potential of GAN in generating is that it can generate photographs, CT (Computer Tomography) images, paintings of famous painters, and facial images. Adversarial in GAN refers to the two neural networks that are pitted against each other like a cat and mouse struggle. The Generator tries to slip through the discriminator by
454
R. M. Sheshadri et al.
generating the best artificial images, just like a mouse would try to escape from a cat. One set of networks, called generators, is given a set of real images so that they can generate a set of artificial images. Another set of networks called discriminators tries to distinguish between the real set of images given to the model and the generated set of images from the generative network and then assigns a score. The Generator tries to achieve the desired score based on the score from the discriminator and repeats to generate images to achieve the desired score.
3 Experimentation and Results 3.1 Data Pre-processing For the task of pre-processing, CT slices in MHD format were considered from the LUNA16 dataset. Each of the 888 patient folders has a various number of files (slices) within them. These files are cross-sectional slices of the whole lung. The slice file numbers vary with each. MHD file. Each slice is of size 512 × 512. To generate only the malignant nodules, the coordinates from the CSV file named candidates_V2 are used. These coordinates point to the location of nodules in the slice files inside the MHD file. The coordinates in the candidates_V2 CSV file which are in world format were converted to Voxel coordinates. The voxel coordinates are then used to snip the 512 × 512 and to generate 96 × 96 size images (nodule patches). The candidates_V2 CSV file has a class column that specifies whether the location of a nodule in the particular slice is benign or malignant. Based on the value in this column, the nodule patches generated are stored in two .npy files. 900 malignant nodule patches and 1400 benign nodule patches were generated and stored in separate .npy files. To improve the overall accuracy of the classifier, the Generative Adversarial Network (GAN) is considered a data augmentation technique. A seed size (hyperparameter) of 100 is considered to generate noise vectors which are inputs to the generator of GAN. The noise vectors are passed onto a Fully Connected Layer (FCN) which internally implements project and reshape functions that upsample the noise vector to 256 × 6 × 6. The resulting noise vectors are given as inputs to the generator network of the GAN. This creates a random noise sample for training. The output of the generator network (96 × 96 × 3 synthetic images) along with the real images (96 × 96 × 3) from the LUNA16 pre-processed dataset in form of .npy files are fed as input to the discriminator network. The discriminator classifies the images as real or fake using the sigmoid activation function. Adam optimizer is used along with binary crossentropy loss functions for both generator and discriminator networks. After training the network on thousand epochs, a generator loss of 6.81 was observed at the final epoch and a discriminator loss of around 0.5 was observed. Two thousand malignant and two thousand benign which sums to a total of four thousand synthetic images were generated from GAN.
Knowledge Engineering-Based Analysis of Convolutional Neural …
455
3.2 Experiment and Results The data fed into the CNN model was reshaped to fit the input [14] of the CNN model which was 96 × 96 × 3. Block diagram of the experiment conducted is shown in Fig. 2. The data, which included both benign and malignant nodules clipped patches, was fed into multiple CNN models and the results were recorded. The models were trained in Google Colab. The learning rate has been set at 0.0001. All models were given 20 epochs to see which model will get the greatest accuracy and validation accuracy at the earliest. Various optimizers were experimented upon while training the CNN classifier model and finally, one specific best-performing optimizer was chosen for each model that gives the best classification results. This particular CNN model’s ROC with the best optimizer is presented in Figs. 4, 5, 6 and 7. The results presented in form of ROC curves are the curves with the best optimizer used and the outputs added in this paper show the models with optimizers that gave the best accuracy. In Figs. 4, 5, 6 and 7, at the top of ROC curves is the mention of the CNN model, the type of data fed into the CNN model, and the type of optimizers like SGD, Adam, or Adagrad. The author’s initial idea was that as the number of parameters increases model performance would improve during training and reach better accuracy. But contrary to our assumption we could see that the model performed well as the number of parameters increased and then started decreasing. Only at the specific point on the plot is the best f1-score rather than steady growth. That specific point points to the Xception model which performed well in classifying the patches generated from the Luna16 lung CT scan. The validated results are shown in Figs. 3, 4, 5, 6, 7, 8, 9, 10 and 11 result values are tabulated in Tables 1, 2 and 3.
Fig. 2 Block diagram of the experiment conducted
456
R. M. Sheshadri et al.
Fig. 3 ROC curve plot of train, test, and validation data of Alexnet CNN architecture
Fig. 4 ROC curve plot of train, test, and validation data of ınception CNN architecture
Fig. 5 ROC curve plot of train, test, and validation data of Xception CNN architecture
4 Originality The current work involves an in-depth analysis of the classification of pulmonary nodules by various CNN architectures. To the best of the authors’ knowledge, the earlier papers featured one or two CNN architectures and their ROC curves. But
Knowledge Engineering-Based Analysis of Convolutional Neural …
457
Fig. 6 ROC curve plot of train, test, and validation data of GoogLeNet CNN architecture
Fig. 7 ROC curve plot of train, test, and validation data of DenseNet CNN architecture
Fig. 8 ROC curve plot of train, test, and validation data of ShuffleNet CNN architecture
in this paper, multiple ROC curves of different prominent CNN architectures are provided. In this paper, an in-depth analysis is provided with a graphical plot of the f1-score to total parameters provided to depict the best CNN architecture for the classification of pulmonary nodules.
458
R. M. Sheshadri et al.
Xcepon Model
1.2
f1-score
1 0.8 0.6 0.4 0.2 0 0
2,00,00,000
4,00,00,000
6,00,00,000
8,00,00,000
Total Parameters in CNN Architecture Train Data
Train Data
Fig. 9 Plot of f1-score versus total parameters for train data of different CNN architectures
Xcepon Model 1.2
f1-score
1 0.8 0.6 0.4 0.2 0
0
2,00,00,000
4,00,00,000
6,00,00,000
8,00,00,000
Total Parameters in CNN Architecture Test Data
Test Data
Fig. 10 Plot of f1-score versus total parameters for test data of different CNN architectures
5 Conclusion This paper focuses on researching different convolutional neural networks’ performance in the classification of pulmonary nodules. The analysis was on both Luna16 and DCGAN-generated pulmonary nodules. The model performed well after training on DCGAN-generated pulmonary nodules. The model learned well on DCGANgenerated nodules with no false positives and false negatives but the best-performing model showed few false positives and false negatives with Luna16 data. We wanted this as a challenge and decided to research further, our study came to RCNN and Faster RCNN architectures. We implemented the RCNN architecture on the entire 512 × 512 lung CT scan and the RCNN model predicted nodules accurately.
Knowledge Engineering-Based Analysis of Convolutional Neural …
459
Xcepon Model 1.2 1
f1-score
0.8 0.6 0.4 0.2 0
0
2,00,00,000
4,00,00,000
6,00,00,000
8,00,00,000
Total Parameters in CNN Architecture Validaon Data
Validaon Data
Fig. 11 Plot of f1-score versus total parameters for validation data of different CNN architectures
Table 1 F1-scores of different CNN architectures for Train data for Luna16 and DCGAN data Model
Total parameters in CNN architecture
Train data f1-score Luna16 data
DCGAN data
Alexnet
7,00,56,898
0.81
0.89
InceptionV3
2,18,15,074
0.91
1
Xception
2,08,61,480
1
1
GoogLeNet
1
1,96,26,152
0.84
DenseNet
70,45,442
0.86
1
ShuffleNet
24,91,504
0.84
0.97
Table 2 F1-scores of different CNN architectures for Test data for Luna16 and DCGAN data Model
Total parameters in CNN architecture
Test data f1-score Luna16 data
DCGAN data
Alexnet
7,00,56,898
0.76
0.87
InceptionV3
2,18,15,074
0.86
1
Xception
2,08,61,480
0.97
1
GoogLeNet
1,96,26,152
0.82
1
DenseNet
70,45,442
0.87
1
ShuffleNet
24,91,504
0.78
1
460
R. M. Sheshadri et al.
Table 3 F1-scores of different CNN architectures for train data for Luna16 and DCGAN data Model
Total parameters in CNN architecture
Validation data f1-score Luna16 data
DCGAN data
Alexnet
7,00,56,898
0.82
0.8
InceptionV3
2,18,15,074
0.82
1
Xception
2,08,61,480
0.98
1
GoogLeNet
1,96,26,152
0.89
1
DenseNet
70,45,442
0.9
1
ShuffleNet
24,91,504
0.89
1
References 1. Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A (2015) Global cancer statistics, 2012. CA: Cancer J Clin 65(2):87–108 2. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin 71(3):209–249 3. Zhang G, Jiang S, Yang Z, Gong L, Ma X, Zhou Z et al (2018) Automatic nodule detection for lung cancer in CT images: a review. Comput Biol Med 103:287–300 4. Gu Y, Chi J, Liu J, Yang L, Zhang B, Yu D et al (2021) A survey of computer-aided diagnosis of lung nodules from CT scans using deep learning. Comput Biol Med 137:104806 5. Sakai S, Soeda H, Takahashi N, Okafuji T, Yoshitake T, Yabuuchi H et al (2006) Computeraided nodule detection on digital chest radiography: validation test on consecutive T1 cases of resectable lung cancer. J Digit Imaging 19(4):376–382 6. Jogin M, Madhulika MS, Divya GD, Meghana RK, Apoorva S (2018) Feature extraction using convolution neural networks (CNN) and deep learning. In: 2018 3rd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT). IEEE, pp 2319–2323 7. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 8. Humayun M, Sujatha R, Almuayqil SN, Jhanjhi NZ (2022) A transfer learning approach with a convolutional neural network for the classification of Lung Carcinoma. In: Healthcare, vol 10, no 6, p 1058. MDPI 9. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 10. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 11. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 12. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856 13. Arai K (2021) Proceedings of the future technologies conference (FTC), vol 1. Springer Nature 14. Luo T, Zhao J, Gu Y, Zhang S, Qiao X, Tian W, Han Y (2021) Classification of weed seeds based on visual images and deep learning. Inf Process Agric
Knowledge Engineering-Based Analysis of Convolutional Neural …
461
15. Sathish A, Adaptive shape based ınteractive approach to segmentation for nodule in Lung CT scans. J Soft Comput Paradig 2(4):216–225 16. Sungheetha A, Rajesh SR (2020) Comparative study: statistical approach and deep learning method for automatic segmentation methods for Lung CT ımage segmentation. J Innov Image Process 2:187–193
Brain Tissue Segmentation Using Transfer Learning Farhan Raza Rizvi and Khushboo Agarwal
Abstract A brain tumour (BT) is a life-threatening condition produced by aberrant brain cell proliferation that affects human blood cells and nerves. Detecting BTs in a timely and exact manner is critical to preventing complex and unpleasant treatment methods, as it can aid surgeons in preoperative planning. Manually BT detection takes a long time and is extremely reliant on the presence of local experts. As a result, accurate automated techniques for identifying and categorizing diverse forms of brain tumours are urgently needed. However, due to wide differences in size, position, and structure, precise localization and classification of brain tumours is a difficult task. We’ve proposed an innovative approach to dealing with the problems. Keywords Brain tumour · Deep learning · Transfer learning · Magnetic resonance imaging · U-net convolutional neural network
1 Introduction Image processing (IP) is a process that allows us to obtain useful or important data from an image by performing multiple operations on it. Digital IP is a sort of image processing in which the processes are carried out on digital systems. To acquire relevant information from the photos, a variety of methods are applied [1]. Medical imaging is a term that refers to a variety of methods and approaches which are used to create images of the human inside the body that aid in the visualization of bodily parts. Doctors frequently utilize these visualizations to diagnose, monitor and treat ailments. Several hospitals employ digital technologies to aid their job in this digital world. The diagnosis is based on medical photographs, which are utilized by physicians to check the patient’s health in a variety of settings. The physician F. R. Rizvi (B) · K. Agarwal Computer Science & Engineering, Madhav Institute of Technology and Science, Gwalior, India e-mail: [email protected] K. Agarwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_34
463
464
F. R. Rizvi and K. Agarwal
then creates treatment strategies depending on the knowledge obtained from these photos. If the assessment is incorrect, individuals may die as a result. The inaccuracy or inappropriate information gained through medical imaging is the primary cause of this incorrect diagnosis [2]. Aside from the current COVID-19 scenario, one of the leading causes of death in today’s globe is a brain tumour, namely brain cancer. The brain is among the most vital components in our bodies since it controls all of our movements and responses. It is the defining characteristic that sets us apart from other creatures. As a result, brain imaging is critical since it allows healthcare practitioners to inspect and comprehend the active interior of the brain [3]. Any irregular or uncontrollable growth of cells in our brain is referred to be a brain tumour. There are various types of brain tumours (BT), most of which are cancerous and others that are non-cancerous. MRI is among the most advanced brain imaging modalities available today. It is also called Neuroimaging, which encompasses a wide range of methods for creating a visual picture of our nerves, either directly or indirectly. Image segmentation is sometimes referred to as Brain image segmentation and is employed in the study of brain MRI. This technology is used to assess and predict brain structures to detect any abnormalities in our brain. Therefore, for both clinical and experimental purposes, the computerized and precise classification of brain MRI images is critical. Due to the huge variances and challenges in categorizing tumour pictures, timely and correct diagnosis is a difficult endeavour. The structure, size, location, and intensities of the fluctuations are frequently visible. This evaluation aids clinicians in devising successful and efficient treatment plans. MRI [4] and CT (Computed Tomography) [5] scans are commonly used to examine brain anatomy. The brain is a critical organ in a person’s body that helps with decision-making and supervises the activity of all other organs. It is the primary command centre of the nervous system, and it is in charge of conducting the human body’s daily intentional and reflexive functions. Uncontrolled growth of abnormal tissue inside the brain is what we call a tumour. Around 3,540 children under the age of 15 will be diagnosed with a brain tumour year, according to the latest figures. It is vital to have a thorough grasp of BT and its stages to avoid and cure the disease. Radiologists routinely use MRI to study brain tumours to do so. Using deep learning techniques, the results of the investigation presented in this paper demonstrate whether the brain is healthy or unhealthy [6].
2 Related Work Due to the immense significance of ML and DL, a great variety of approaches have been produced, the majority of which employ a particular method. Previously applied DL algorithms have limitations despite making a breakthrough in handling tumour detection tasks. DL approaches based on CNN necessitate large volumes of data, which makes the work hard and costly.
Brain Tissue Segmentation Using Transfer Learning
465
In biomedical image processing, algorithms based on machine learning are increasingly important. However, by introducing optimization at the feature selection stage itself, the achievement of machine learning could be further enhanced. During the course of the investigation [7], provided an optimization method for the selection of features from the MRI images of patients with brain tumours Brain tumours can be detected at an early stage using a convolution neural network. The experimental model is compared to existing artificial neural networks and particle swarm optimization algorithms, and it has a better detection system accuracy than the other algorithms. It is difficult to develop a universally applicable standard procedure for segmenting esophageal tumours because of constant variation of tumour placement, coloration, and intensity as well as the tumour’s shape. Here, in this document [8], used a semi-automated method for segmenting esophageal tumours from computed axial tomography X-ray images. The paper uses an active-contour based quasi segmentation along with level set procedures to identify the affected areas in the computerized axial tomography images of the oesophagus. Work is broken down into four distinct phases in the proposed strategy. The seed points are used to extract the images in the first step. During the second stage, we remove any extraneous elements from our images. The third stage involves setting the threshold values, and the final stage is post-processing. In order to test this idea, tumours from nearby cancer treatment centres were used. Dice similarity, actually implies, medium and maximum surface distance, and Jaccard similarity were used to evaluate the value of the proposed tactic in comparison to existing methods. When it comes to esophageal cancer detection, this concept is both time-saving and more effective. To successfully train tumour detection and segmentation, enormous features are required. A simple and effective Cascade CNN (C-Conv Net/C-CNN) is suggested by Ramin Ranjbarzadeh. The C-CNN technique extracts both locally and globally characteristics in 2 distinct ways. In addition, a novel DWA strategy is suggested to improve the effectiveness of brain tumour segmentation in comparison to previous models. The influence of the tumour’s central location and the brain’s location inside the model are taken into consideration by the DWA procedure. Extensive experiments on the BRATS 2018 dataset demonstrate that the proposed model performs well and the tumour core dice scores are 0.9203, 0.9113, and 0.8726, correspondingly [9]. To autonomously segment and analyse brain MRI images, assisting our health professionals in analyzing and diagnosing the existence of brain tumours in a timeefficient, cost-effective, and efficient manner. In this research [10], shows how fuzzy logic can be used in diagnostic imaging, namely for image segmentation. For successful segmentation of hazy border areas of the brain, the FCM clustering technique is used. As a result, any irregularities in the MRI pictures are segmented. It can be inferred that the suggested system’s algorithms and settings are all designed to improve the process performance by obtaining better. Hua et al. [11] proposed IMV-FCM (improved multi-view FCM clustering method) to improve the algorithm’s accuracy. Because of this, each view’s weight in the IMV-FCM system is determined by its contribution to the cluster. In the end, the view ensemble method is used to determine the division result. It appears that
466
F. R. Rizvi and K. Agarwal
IMV-FCM is better at segmenting brain tissue based on the results of numerous brain MRI scans. IMV-FCM is a superior clustering technique in terms of both flexibility and performance to other techniques. Brindha et al. [12] used self-defined ANN and CNN in the suggested study to indicate the existence of a BT, and their results are evaluated. CNN is one of the most effective methods for evaluating photo and data. In order to produce the forecast, CNN reduces the image in size without losing any one of the information needed to make the prediction. More visual data can improve the ANN model’s accuracy, which is currently 65.21%. Using image augmentation techniques and evaluating the ANN and CNN’s performance can yield similar results. When creating this model, the author used a try and error approach. The number of layers and filters which can be employed in a system will be determined in the upcoming using optimization approaches. For the present time, it looks like CNN is the most effective way of detecting the presence of a brain tumour in the dataset provided. Another framework the DenseNet-41-based CornerNet framework is a revolutionary technique suggested by [13]. There are three steps to the suggested solution. To begin, they create annotations to pinpoint the particular regions. The deep properties from the suspicious samples are extracted in the second stage using a modified CornerNet using DenseNet-41 as the foundation network. The one-stage detection CornerNet is used to discover and categorize numerous brain cancers in the final step. They used two databases to test the suggested technique, namely the Fig share and Brain MRI datasets and achieved an average Jaccard Index of 98.8% and 98.5%, respectively. The method is more competent and accurate than other current innovations in identifying and quantifying distinct forms of brain tumours, according to both descriptive and analytical analyses.
3 Proposed Methodology 3.1 Proposed Algorithm • Step 1-Collect the freely available MRI dataset of from Brain Web: Simulated Brain Database. • Step 2-Pre-process the data by performing data augmenting tasks, flipping, and normalizing. Rescaling and resizing data to (128 × 128) by using the inbuilt function of resizing the image. • Step 3-Splitting them into training (80%) and testing (20%). • Step 4-Passing the data into the training model. • Step 5-Training the model U-Net with EfficientNet and fitting the data. • Step 6-Evaluate the model on the test set and measure the performance in terms of precision, recall, and F1-Score.
Brain Tissue Segmentation Using Transfer Learning
467
3.2 U-Net The U-net architecture is a neural network (NN) [14] architecture mostly used for image segmentation. A U-net design’s basic system is composed of two pathways. When used in conjunction with a convolution network, the encoder, as well as the analytic path (also known as the contracting path), provides categorization information. Up-convolutions, as well as sequences of components from the contraction route make up the secondary channel, also known as the decoder as well as synthesis path. Because of this augmentation, the network is now capable of learning localized classification information [15]. As previously stated, the U-net network is divided into two sections: The first is the contracting path, which employs standard CNN architecture. Two successive 3 × 3 convolutions are preceded by a ReLU activation unit and a max-pooling layer in each block of the contracting path. This pattern is re-created numerous times throughout the story. When each phase is using a 2 × 2 convolution to overfeed this same feature map, this is known as that of the expanding route portion of the U-net. It is then necessary to combine the feature map from the upsampled layer with the convolution layer from the relevant layer inside the contracting path [16]. After that, the ReLU is activated, followed by two more 3 × 3 convolutions. A convolution matrix can be used as an input to ReLU, which is the maximum function (x, 0) with input x. The ReLU algorithm then resets all of the matrix x’s negative values to zero while maintaining all other values as they were. ReLU is a non—linear activation function, like tanh or sigmoid, that is computed just after convolution. An extra 1 × 1 convolution will be used at the end to reduce the feature map to an appropriate multitude of channels as well as generate the segmented image. Due to the lack of contextual information in these pixel characteristics, cropping is necessary. Contextual information is propagated along with the resulting u-shaped network, making it possible to distinguish objects within a specific area from those in a bigger overlapping area. Figure 1 depicts the entire U-network architecture [17].
3.3 Transfer Learning In DL, the model learns model weight and bias while training with a large amount of information. Following that, the values are utilized to test various networking architectures. Weights that have earlier been taught can be used in the new program architecture [18]. A pre-trained classifier has previously been given training on the same parameters. There are several pre-trained structures accessible; a few of the advantages of utilizing pre-trained models are as follows: To begin, complicated models must be trained on large datasets, which necessitates increased computing capacity. Second, training the system requires an inordinate amount of time, which can take several weeks. The process of learning can be sped up by using pre-trained weights to train the new system [19].
468
F. R. Rizvi and K. Agarwal
Fig. 1 U-net architecture
3.4 Efficient Net CNN family somewhat improves Jaccard Index but also increases the efficiency of the model by decreasing the number of components comparable to other existing models by limiting the number of characteristics. The EfficientNet B0 framework is a simple portable benchmark structure that was trained on the ImageNet dataset. The most typical technique to enhance model performance when constructing a NN is to expand the number of units or layers. Efficient Net is founded on the idea that using a compounded scaling strategy to enhance model size (in all directions of depth, width, and resolution) can help the network reach optimal Jaccard Index gains [20].
3.5 Proposed Flowchart The proposed flowchart is shown in Fig. 2.
4 Experimental Analysis 4.1 Input Data The dataset taken for conducting this experiment is Transfer Learning in Magnetic Resonance Brain Imaging (MRI) dataset which are online available from BrainWeb:
Brain Tissue Segmentation Using Transfer Learning
469
Input Data
Data Augmentation Preprocessing
Resizing Rescaling
Flip Hybrid Learning Model
Training (80%)
Validation (20%)
Performance Fig. 2 Proposed flowchart
Simulated Brain Database. The Simulated Brain Database1 (SBD) An MRI simulator is used to generate realistic MRI data volumes. In the context of neuroimaging, these data are used to evaluate the efficacy of different image analysis methods. There are currently two anatomical models of the brain represented in the SBD: normal as well as multiple sclerosis (MS). Simulated full 3-D interests of the data have been created for both of these using three scenes (T1 and T2-weighted, and proton density-(PD-) based). Three orthogonal views of the data are available (transversal, sagittal, and coronal). Data from 125 people aged 21–45 years old with a range of clinical and sub-clinical psychiatric symptoms can be found in the repository. For each participant, the repository contains: • Structural T1-weighted anonymized (de-faced) image: This is the raw T1weighted MRI image with a single channel. • The ground truth can also be referred to as the brain’s image mask. Non-brain tissue is removed through the use of the Beast (Brain extraction predicated on nonlocal data processing) method and manual splices by domain experts. • Using the T1-weighted image as a guide, this is what a “skull-stripped” image looks like. Overlaying masks on top of real images is comparable to this technique. Brain tissue includes this same cerebrum, and brain stem, all of which have distinct functions in the human body as seen through the lens of medical science. White matter (WM), grey matter (GM), as well as cerebrospinal fluid (CSF) can all be classified 1
https://brainweb.bic.mni.mcgill.ca/.
470 Table 1 MRI images reveal the properties of the brain’s major tissues that are linked
F. R. Rizvi and K. Agarwal T2 (ms)
ρ (1.5)
0.8–20
110–2,000
70–230
0.76–1.08
61–100
70–90
Tissue
T1 (s)
CSF WM GM
1.09–2.15
61–109
85–125
Meninges
0.5–2.2
50–165
5–44
Muscle
0.95–1.82
20–67
45–90
Adipose
0.2–0.75
53–94
50–100
as parts of the skull (CSF) [21]. CSF fills this same crooked groove created by the GM folds, which contain WM. The grey levels of multiple brain tissues are shown to change over time and aren’t constant. Furthermore, the biological basis for the solvability of normal brain segmentation can be found in the different densities of different tissues. MRI images of the brain’s major tissues are shown in Table 1.
4.2 Data Pre-processing Pre-processing techniques can be efficient for reducing undesired noise, highlighting parts of the image that can aid in recognition, and even assisting with the DL training stage. The height and width of an image with a present aspect ratio are to be scaled. The size of all input images is filtered using the image filtering pre-processing approach. Following concepts were performed in pre-processing of data. • Data Augmentation-It’s a method for getting more information out of a dataset that already exists. In this case, it creates disturbed copies of the original images. The primary goal is to build a neural network that can differentiate between important and insignificant dataset properties by incorporating a wide range of neural diversity. • Data resizing-Regardless of the method used, resizing means altering the image’s dimensions. This work employs a scaling method that resamples the entire image (say, taking almost every pixel or redoing the pixels*) to increase or decrease its overall size. Although the new dimensions are different, the content has remained largely unchanged. Using the image’s built-in resizing function, we’ve downscaled the data to (128 × 128). • Normalization-When two pixels are joined and the intensity value of the pixel changes, this is referred to as normalization. Its primary goal is to turn an input image into a value that corresponds to the sensory experience of the object being scanned. Image data is normalized by dividing the image by 255 so that all the color ranges fall from whole numbers to float values.
Brain Tissue Segmentation Using Transfer Learning
471
4.3 Feature Extraction When shapes and edges are recognized, they are saved in an unwieldy format. The sheer volume of data required for object analysis makes it extremely difficult. With the addition of feature extraction, a more condensed description of images and database objects can be generated [22]. The features related to the brain tissue segmentation are CSF, WM, GM, Meninges, Muscle, and Adipose.
4.4 Model Summary of Hybrid Learning Model This work has implemented a hybrid learning model by including two deep transfer learning models, namely, U-Net and EfficientNet transfer learning models with some hyperparameters setting. The learning rate of the proposed hybrid model is 1e-05 for training and fitting the data. There are different layers in the hybrid learning model which are convolution, max-pooling and softmax layers. A U-Net-based EfficientNet hybrid model is built with one output layer. Now the question is how to pass the data into training model. Defining a dataset for our model is the first step. Models can be trained on the data that we have available. To put it another way, we will feed all of the training examples into a machine learning algorithm, and then we will let the machine learning algorithm figure out the best way to map the inputs to the output class labels. Calling fit() and passing the training dataset into the function will then allow the model to be fitted to the data. The parameters for the proposed model are shown in Fig. 3. The values corresponding to the results are shown in Table 2. Table 2 shows the performance of the evaluated results after training. In comparison to the base model, the Jaccard value produces efficient results of 0.96 in the
Fig. 3 Parameters of the proposed model
472 Table 2 Evaluated results
F. R. Rizvi and K. Agarwal Performance
Base
Propose
F1-score
–
0.96
Jaccard index/IOU
0.92
0.96
Dice loss
–
0.094
Fig. 4 F1-score of the proposed model
proposed and 0.92 in the base model. The f1-score, Jaccard Index, and Dice Loss of the proposed model are 0.96, 0.96, and 0.94 respectively. Figures 4 and 5 show the f1-score and loss graphs for the proposed model. Figure 6 shows the resultant image after brain tumour segmentation.
5 Conclusion In this study, we used brain MR images segmented into healthy (unaffected) and malignant (infected) tissue. Pre-processing reduces noise and improves this same signal-to-noise ratio by smoothing out the image. Decomposition and recovery of textural features followed by mathematical morphology were performed on the decomposed images. A stochastic NN classifier is used to classify brain tumours from MRI images. The research results of the observational data clearly demonstrate that the detection of brain tumour cells is both rapid and accurate once compared to the traditional detection performed by healthcare workers. Studies of performance variables show that it yields superior outcomes in the sense of parameters. As a result, the model’s ability to identify brain tumours was limited, and the suggested
Brain Tissue Segmentation Using Transfer Learning
473
Fig. 5 Loss score of the proposed model
Fig. 6 The sample result images of segmentation with the proposed model
technique does not provide additional insight into the diagnosis of brain tumours that have received focus in recent studies There are many ways to use this new technique, such as enhancing the proposed model’s ability to recognize the boundary of a brain tumour, and incorporating temporal image data, into future studies.
474
F. R. Rizvi and K. Agarwal
References 1. Tutorialspoint (2020) Digital image processing. https://www.tutorialspoint.com/dip/index.htm 2. Aswathy SU, Deva Dhas GG, Kumar SS (2014) A survey on detection of brain tumor from MRI brain images. In: 2014 international conference on control, instrumentation, communication and computational technologies ICCICCT 2014, pp 871–877. https://doi.org/10.1109/ ICCICCT.2014.6993081 3. Biratu ES, Schwenker F, Ayano YM, Debelee TG (2021) A survey of brain tumor segmentation and classification algorithms. J Imaging 7(9). https://doi.org/10.3390/jimaging7090179 4. Novitchi D (2012) Brain tumor detection and segmentation in multisequence mri. Dr. Thesis, pp 1–30 5. Shah FM et al (2021) A comprehensive survey of COVID-19 detection using medical images. SN Comput Sci 2(6). https://doi.org/10.1007/s42979-021-00823-1 6. Shantta K, Basir O (2020) Brain tumor detection and segmentation: a survey. IRA-Int J Technol Eng 10(4):55. https://doi.org/10.21013/jte.v10.n4.p1. ISSN 2455-4480 7. Karrupusamy D (2020) Hybrid manta ray foraging optimization for novel brain tumor detection. J Trends Comput Sci Smart Technol 2:175–185. https://doi.org/10.36548/jscp.2020.3.005 8. Bindhu V, Saravanampatti PV (2020) Semi-automated segmentation scheme for computerized axial tomography images of esophageal tumors. J Innov Image Process 2(2):110–120. https:// doi.org/10.36548/jiip.2020.2.006 9. Ranjbarzadeh R, Bagherian Kasgari A, Jafarzadeh Ghoushchi S, Anari S, Naseri M, Bendechache M (2021) Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Sci Rep 11(1):1–17. https://doi.org/10. 1038/s41598-021-90428-8 10. MB L, Suresh S, Joseph RB, Joy E (2021) Segmentation and analysis of brain MRI images. SSRN Electron J.https://doi.org/10.2139/ssrn.3852502 11. Hua L, Gu Y, Gu X, Xue J, Ni T (2021) A novel brain MRI image segmentation method using an improved multi-view fuzzy c-means clustering Algorithm. Front Neurosci 15(March):1–12. https://doi.org/10.3389/fnins.2021.662674 12. Brindha PG, Kavinraj M, Manivasakam P, Prasanth P (2021) Brain tumor detection from MRI images using deep learning techniques. IOP Conf Ser Mater Sci Eng 1055(1):012115. https:// doi.org/10.1088/1757-899x/1055/1/012115 13. Nawaz M et al (2021) Analysis of brain MRI images using improved cornernet approach. Diagnostics 11(10):1–18. https://doi.org/10.3390/diagnostics11101856 14. Reva M (2021) Batch normalization in convolutional neural networks. Baeldung 15. Pravitasari AA et al (2020) UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation. Telkomnika (Telecommun Comput Electron Control 18(3):1310–1318. https:// doi.org/10.12928/TELKOMNIKA.v18i3.14753 16. Siddique N, Paheding S, Elkin CP, Devabhaktuni V (2021) U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access 1–28. https://doi.org/ 10.1109/ACCESS.2021.3086020 17. Weng W, Zhu X (2021) INet: convolutional networks for biomedical image segmentation. IEEE Access 9:16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408 18. Shukla U, Tiwari U, Chawla V, Tiwari S (2020) Instrument classification using image based transfer learning. In: 2020 5th international conference on computing, communication and security (ICCCS), pp 1–5. https://doi.org/10.1109/ICCCS49678.2020.9277366 19. Krishna ST, Kalluri HK (2019) Deep learning and transfer learning approaches for image classification. Int J Recent Technol Eng 7(5):427–432 20. Aruleba I, Viriri S (2021) Deep learning for age estimation using EfficientNet. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 12861. LNCS, pp 407–419
Brain Tissue Segmentation Using Transfer Learning
475
21. Xing-Yuan W, Na W, Dou-Dou Z (2015) Fractal image coding algorithm using particle swarm optimisation and hybrid quadtree partition scheme. IET Image Process 9:153–161. https://doi. org/10.1049/iet-ipr.2014.0001 22. Kunaver M, Tasiˇc JF (2005) Image feature extraction—An overview. In: EUROCON 2005— International conference on computer as a tool, vol I, no February, pp 183–186. https://doi.org/ 10.1109/eurcon.2005.1629889
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar Enrolment in Citizen Services Using Blockchain N. Veena and S. Thejaswini
Abstract In the past decades, people had to carry a bunch of documents for acquiring citizen beneficiary services such as opening a bank account, issue of a new sim card, new gas connection, Pradhan Mantri Jan Dhan Yojana etc. In this digital era, Aadhaar is acting as the driving force to gain all citizen services. Unfortunately, lot of security breaches have occurred by compromising the Aadhaar details which are stored in the database. Providing security to Aadhaar data and mitigating the misuse of other’s Aadhaar data to gain citizen services by unauthenticated users is the main concern. Hence, a secure and authenticated system is required to store and use the Aadhaar details securely by authenticated users. In this paper, we are proposing a secure authenticated system which uses IPFS to store fingerprints of the customers and blockchain is used to store the Aadhaar details of the registered citizens. In this system, the biometric is integrated with blockchain to secure Aadhaar details and authenticate the customers before providing any citizen services. The system also generates the Aadhaar card after passing a redundancy check for customers who do not have Aadhaar card to access the citizen services. The system is tested and verified for multiple citizen services. Keywords Blockchain · Biometric · Aadhaar card · Ethereum · Smart contract · Inter planetary file system (IPFS)
1 Introduction The Aadhaar card is a twelve digits unique number issued to citizens of India. It is a centralized and universal identification number. It is issued by the Unique Identification Authority of India abbreviated as UIDAI. The Aadhaar card basically stores N. Veena (B) · S. Thejaswini Siddaganga Institute of Technology, Tumakuru, Karnataka, India e-mail: [email protected] S. Thejaswini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_35
477
478
N. Veena and S. Thejaswini
individual’s personal details in government database. The government takes care of citizen’s welfare through Aadhaar. Hence Aadhaar card is very important for an individual to get various benefits from the government such as: • The subsidy amount in the scheme PAHAL gets credited to the account which is linked with Aadhaar. • The Aadhaar Card helps to remove fake voters as it is linked to voter’s ID card. • It is a mandatory document to get a new sim card, new gas connection, passport and to create account in Jan Dhan Yojana. • Now even the pensioners have to link their Aadhaar to their offices to receive pension. Since every individual may get a lot of benefits using Aadhaar, misusing other’s Aadhaar to get various benefits has become day to day common activity with all the categories of people.
1.1 Motivation In July 2018 telecom regulatory authority of India (trail) chairman R. S Sharma posts his Aadhaar no in twitter and challenged hackers to do him harm if they could. Within 7 h, ethical hackers uploaded a screenshot of giving Sharma $1 using the Citizen Card enabled service, along with 14 other items, including Sharma’s mobile number. According to the Times of India, the Unique Identification Authority of India filed a complaint with the Delhi Police Cyber Cell on February 24, 2020, alleging that Axis Bank Limited, Suvidha Infoserve, and eMudhra had improperly stored biometric data and used Aadhaar authentication without authorization. A group in Kanpur was allegedly operating a scheme to create bogus Aadhaar cards, according to a report. According to UIDAI, its systems discovered unusual activity and reported it as such. According to a recent investigation by The Tribune, unidentified people were willing to exchange Rs 500 for the right to sell the Aadhaar card information of anyone possessing an Aadhaar number. You might also print these Aadhaar cards for an extra Rs 300. All above said incidents motivates us to go for blockchain for its obvious security features. And to bring the proposed system to overcome above said such unfortunate incidents.
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
479
1.2 Issues Addressed In this paper, we have addressed issues like: (1) Counterfeit Aadhaar enrolment. Counterfeit Aadhaar enrolment is avoided by passing redundancy checks by comparing user’s biometrics with existing biometrics already registered and stored in blockchain. (2) Hack and misuse of other’s Aadhaar card details to gain citizen card enabled services. This is avoided by authenticating the user before providing his Aadhaar details and the service required by user.
2 Literature Survey This section presents and discusses several works related to biometrics authentication process based on blockchain technology. Delgado-Mohatar et al. [1] discusses the advantages of blockchain for biometrics, discusses how biometrics and blockchain can complement each other. Blockchain advantages for biometrics are Immutability, accountability, accessibility, and universality. Blockchain benefits from biometrics are more robust frameworks for digital identity. The main idea of the paper [2] is to decentralize the Aadhaar database. By creating exact duplicates of the complete database and storing them in the decentralized blockchain node, they decentralize it in the blockchain network due to the very small amount of data that may be kept. They divided the huge database into numerous tiny pieces. Additionally, raw data can be stored in any manner and is only hashed in the blockchain. For instance, a file system or a relational database. Acquah et al. [3] propose a system where a symmetric peer-to-peer network and symmetric encryption is used to secure fingerprint template. They first encrypt the fingerprint using the Advanced Encryption Standard (AES) technique before uploading it to the Inter Planetary File system, a symmetrically distributed storage system (IPFS). A decentralized blockchain is used to store the hash of the template. Priscilla and Devasena [4] proposes system AIMS (Aadhaar Identity Management System), which allows people to create and control their verifiable credentials without any centralized authority. It provides self-sovereign identity to the people. The paper also compares the consensus algorithm which determines the performance and security of the blockchain system. Agrawal et al. [5] researchers have examined the Aadhaar project from the perspectives of privacy and security, pointing out several technological flaws and suggested fixes. Shortcomings and solutions to some problems, including unauthorized authentication, unauthorized identification using an Aadhaar number, and unauthorized access to CIDR data that results in profiling, monitoring, and surveillance are discussed.
480
N. Veena and S. Thejaswini
Sankaranarayanan and George [6] creates a blockchain for Aadhaar database and implement light weight algorithm for efficiency, scalability and optimization along with blockchain securing algorithm. Kalyankar and Kumar [7] implements a dependable cloud computing platform that will convert manual signatures into digital signatures and store all papers, together with their digital signatures, in the cloud; In this system, the user’s identity will also be verified using the already-existing UIDAI Aadhaar database, allowing users to access Cloud Sign services without having to keep track of ever-more-complex passwords. It will give an organization a safe place to store its data so that neither unauthorized parties nor public cloud service providers may access sensitive data. Bella Gracia et al. [8] giving residents a distinct identity through Aadhaar in order to provide national security has several benefits and made signing up for new services simpler. Aadhaar’s security might be strengthened, which might boost public trust in the system. With the help of this article, the system’s existing problems can be resolved, and user data privacy and security can be guaranteed. The Block Based Aadhaar guarantees the simplicity of the services offered by Aadhaar without raising any security issues. Praveen et al. [9] have adopted cutting-edge technologies, like IoT and blockchain, to securely store information in an encrypted manner that is inaccessible to regular people. This site can be used by the Government of India to keep all of a specific person’s data because it offers the highest level of efficiency and security [10–15]. Due to the fact that when a user establishes a bank account, his information may be extracted and confirmed using his or her fingerprint by cross-checking the same with the already recorded block information, it can also serve as a centralized storage system. Overall, the portal uses the idea of blockchain and IoT technology to serve as a single point of contact for the efficient and safe storage of all sensitive information.
3 Proposed System 3.1 The Proposed Architecture The proposed system architecture highlights the detailed summary of the authentication system built for counterfeit enrolment in citizen services as shown in Fig. 1. It consists of two phases namely: 1. Registration phase. 2. Authentication phase. Registration Phase Registration phase is used to get the details of a user which is required to generate Aadhaar card. The workflow of registration phase is shown in Fig. 2. At first, the enrolment agencies have to get approval from government to do Aadhaar enrolment.
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
481
Fig. 1 System architecture
So, the request is sent to the registrar (government) by specifying their details like name and email id. The registrar enquires and approves the agency. Further at Aadhaar registration counter user provides details like fingerprint, name, DOB, address, etc. To avoid duplicate Aadhaar creation for a person, fingerprint scanned by user is compared with existing fingerprints in IPFS network. If there is a match, indicates the user is already registered and he gets the same message. If there is no match, the fingerprint image will be safely stored in IPFS network as it is distributed in nature it provides high security. Other Aadhaar details will be stored in blockchain. Further Aadhaar card will be issued and the transaction will be recorded with the unique transaction hash which gets stored in Ethereum blockchain. Authentication Phase When a user requests for a service, say for e.g., user wants to get a new sim card and the seller before providing the sim card, performs the authentication process to authenticate the user/customer. The work flow of authentication phase is shown in Fig. 3. At authentication phase, user who wants to get any of the citizen card enabled services has to undergo authentication process, hence, inputs his fingerprint through scanner. The scanned fingerprint is compared with the existing fingerprints in IPFS. If there is a match, that user is authenticated successfully, and the requested service
482
Fig. 2 Work flow of registration phase
Fig. 3 Work flow of authentication phase
N. Veena and S. Thejaswini
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
483
will be provided. If there is no match found user gets unsuccessful authentication message.
3.2 Sequence Diagram of Proposed System At first, the enrolment agencies have to get approval from government to do Aadhaar enrolment. So, the request is sent to the registrar (government) by specifying their details like name and email id. The registrar enquires and approves the agency. Further at Aadhaar registration counter user provides details like fingerprint, name, DOB, address and etc. To avoid duplicate Aadhaar creation for a person, fingerprint scanned by user is compared with existing fingerprints in IPFS network. If there is a match, indicates user is already registered and he gets the same message. If there is no match, the details will be safely stored in IPFS network as it is distributed in nature it provides high security. Further Aadhaar card will be issued, and the transaction will be recorded with the unique transaction hash stored in Ethereum blockchain user who wants to get authenticated inputs his fingerprint through scanner. The service provider requests user to give their fingerprints to do authentication. Scanned fingerprint is compared with the existing fingerprints in blockchain if there is a match, that user is authenticated successfully and the requested service will be provided. If there is no match found user gets unsuccessful authentication message. The sequence diagram of a system is as shown in Fig. 4.
4 Results and Experimental Setup HTML, CSS (Cascading Style Sheets), and express JS were used to construct the front-end design. The Solidity programming language and Node.js is used to write smart contracts and to handle server. Remix IDE is used to deploy smart contract. Ganache is used to generate local Ethereum blockchains. The blockchain is established and the system is accessed through the Local Web3 web interface, Meta Mask is used to create a wallet, creation of account is done using Ganache and Ethereum virtual interface.
4.1 Home Page The home page of the system is shown in Fig. 5. There are two menus one is user menu another is admin menu. User menu has Request access for Aadhaar enrolment, add_Aadhaar and view_Aadhaar actions. Admin menu contains approve agency access action.
484
N. Veena and S. Thejaswini
Fig. 4 Sequence diagram of the proposed model
4.2 Aadhaar Enrolment Enrolment agencies request for Aadhaar enrolment access by providing email id and Ethereum blockchain account address. After getting approval they do registration by collecting basic details and stores in IPFS network and issues Aadhaar card. The transaction hash is stored in Ethereum blockchain. To avoid duplication, they check if user’s biometric is already stored in IPFS network, if there is a match unsuccessful registration, user is already registered message is displayed. The process is shown in Fig. 6 and 7.
Fig. 5 Home page
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
485
Fig. 6 a Request access for enrolment. b Approving request
4.3 Authentication At first, Service providers should create wallet by giving their details. Wallet creation stores Ethereum account address in blockchain which helps to track them if in need. User who wants to get authenticated inputs his fingerprint through scanner. The scanned fingerprint is compared with the existing fingerprints in blockchain if there is a match, that user is authenticated successfully and the requested service will be provided. If there is no match user gets unsuccessful authentication message. Authentication process is shown in Fig. 8.
486
N. Veena and S. Thejaswini
Fig. 7 Aadhaar registration with gas amount confirmation
4.4 Minutiae Based Fingerprint Matching In our proposed system, for fingerprint matching and verification, Minutiae based fingerprint matching is employed. The first step is to use fast Fourier transform to enhance the fingerprint image. Then, binary image is created by using python library OpenCV. After that, details are retrieved and the image is thinned by importing skeletonize, thin library and minutiae points are extracted. To acquire a matching score, 2 fingerprint’s minutiae pairs are finally matched. The main components of a fingerprint image are called minutiae points, and they are used to match fingerprints. These minutiae points are used to determine the distinctiveness of a fingerprint image. Minutiae point extraction and matching is shown in Fig. 9. In this method number of matching minutiae points are counted and if the count is above 12, they are considered as from same finger.
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
487
Fig. 8 a Wallet creation. b. Scanning fingerprint for authentication. c. Successful authentication. d. Aadhaar data displayed. e. Unsuccessful Authentication
5 Conclusion and Future Enhancement Securing Aadhaar details is a very challenging issue in this digital world. As per the survey done, to handle this challenging issue, in all the existing systems Aadhaar details were stored & secured using centralized database and standard security mechanisms respectively. These systems faced various drawbacks. Hence, to overcome the drawbacks of existing system, for the first time a revolutionary security policy using biometrics (fingerprint) and the blockchain is proposed which provides high security for Aadhaar details storage and usage using IPFS network. Therefore, the proposed authentication mechanism is efficient as it handles counterfeit Aadhaar enrolment and misuse of Aadhaar details. The scope of the proposed system is limited to work with less number of users. Authentication is based on one biometric feature i.e. fingerprint. With known content ID and hash of IPFS file, anyone can access the IPFS file. Therefore, the proposed system can be enhanced with respect to above mentioned three aspects. Firstly, the number of customers can be increased so that the system can be implemented
488
N. Veena and S. Thejaswini
Fig. 9 a Minutiae points extraction. b Minutiae points matching
at district level. Secondly efficient biometric features like facial recognition and iris features can be used to implement the system. Thirdly, IPFS content can be encrypted before storing in to IPFS to enhance security.
Aadhaar Block: An Authenticated System for Counterfeit Aadhaar …
489
References 1. Delgado-Mohatar O, Fierrez J, Tolosana R, Vera-Rodriguez R (2019) Blockchain and biometrics: a first look into opportunities and challenges. arXiv:1903.05496v1 [cs.CR] 2. Venkatasubramanian S, Swarnakamali V, Kaviya J, Vigneshwar A (2019) Aadhaar security through blockchain. SSRG Int J Comput Sci Eng 3. Acquah MA, Chen N, Pan JS, Yang HM, Yan B (2020) Securing fingerprint template using blockchain and distributed storage system. Symmetry 12(951) 4. Priscilla CV, Devasena T (2021) Aadhaar identity system using blockchain technology, computer science, mathematics. Int J Comput Appl 5. Agrawal S, Banerjee S, Sharma S (2021) Privacy and security of aadhaar: a computer science perspective 6. Sankaranarayanan PJ, George G (2019) Blockchain based Aadhaar security. Int J Eng Technol 7. Kalyankar MA, Kumar CR (2018) Aadhaar enabled secure private cloud with digital signature as a service. In: Proceedings of the 2nd international conference on electronics, communication and aerospace technology (ICECA 2018). IEEE Conference Record # 42487. IEEE Xplore. ISBN:978-1-5386-0965-1 8. Bella Gracia SJ, Raghav D, Santhoshkumar R, Velprakash B (2019) Blockchain based Aadhaar. In: 3rd international conference on computing and communication technologies ICCCT. 9781-5386-9371-1/19/$31.00 c 2019. IEEE 9. Praveen GL, Saranya VS, Chandran AM, Alex S, Krishna NS (2020) Blockchain based Aadhaar security system. Int J Sci Eng Res 11(7). ISSN 2229-5518 10. Poonguzhali E, Saravanan R, Prrasanthi VS, Sowmyadevi P (2020) Securing Aadhaar details using blockchain. Int Res J Eng Technol (IRJET). 07(03). e-ISSN: 2395-0056 11. Pati RK, Kumar V, Jain N (2015) Analysis of Aadhaar: a project management perspective. IIM Kozhikode Soc Manag Rev 4(2):124–135 (Indian Institute of Management Kozhikode SAGE Publications) 12. Venkatasubramanian S, Swarnakamali V, Kaviya J, Vigneshwar A (2019) Aadhaar security through blockchain. SSRG Int J Comput Sci Eng (SSRG-IJCSE)—Special Issue ICCREST 13. Shah M, Kumar P (2019) Tamper proof birth certificate using blockchain technology. Int J Recent Technol Eng (IJRTE) 7(5S3). ISSN: 2277-3878 14. Joe CV, Raj JS (2021) Deniable authentication encryption for privacy protection using blockchain. J Artif Intell Capsule Netw 3(3):259–271 15. Chen B, Dr. Zong JI, Lai K-L (2020)Internet of things (IoT) authentication and access control by hybrid deep learning method-a study. J Soft Comput Paradig (JSCP) 2(04):236–245
Enhanced Human Action Recognition with Ensembled DTW Loss Function in CNN LSTM Architecture D. Dinesh Ram, U. Muthukumaran, and N. Sabiyath Fatima
Abstract Human action recognition is a concept that involves acquiring information based on the sequence of movements by the target. This recognition model is used to recognise elderly people’s actions to monitor their anomalies and provide appropriate guidance as soon as possible to avoid further disaster. This algorithm involves refining HAR and provides precise decisions to predict any emergencies while handling elderly people. This algorithm uses Dynamic Time Warping as a loss function ensembled with mean absolute error in CNN LSTM neural network. DTW is used in the loss function to get a relationship between two sequences of actions considered as waves. This results in performing better at any speed of action as long as the sequence of movements for action remains the same. The variations in the change of the sequence of movements for action are further countered by mean absolute error to perform better in any variation of an action. Thus, it provides an accuracy of 75% for the taken dataset which is way more than the accuracy produced by DTW or MAE as a separate loss function. Keywords Human action recognition · Dynamic time warping · Mean absolute error · Ensembled loss function · Three-dimensional convolutional neural network · Convolutional neural network long short-term memory
1 Introduction Human action recognition was and is continuing to be one of the most intriguing topics in the computer vision field. The numerous applications it has to offer which include video surveillance, content-based video retrieval, and so on have certainly solidified its position in being one of the most sought out topics. However, Human action recognition has proven to be a very resilient problem. There is a magnitude of supporting evidence as to why it is a very tough problem. D. D. Ram · U. Muthukumaran · N. S. Fatima (B) Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_36
491
492
D. D. Ram et al.
To begin with, humans are flexible beings and they possess a high degree of tensility, and human bodies develop an endless number of variants for each fundamental action. Second, no two people are the same when it comes to body form, volume, and gesture style. Variations in perspective, lighting, shadow, self-occlusion, deformation, noise, clothing, and other uncertainties add to the complexity of the aforementioned challenges. Human motion is typically seen as a continual expansion of the spatial configuration of body posture. If body joints can be reliably extracted and tracked from sequence depth maps, action recognition can be achieved by using the tracked joint positions. One of the main concerns of action recognition is the need for reliably extracting and keeping track of body joints. Due to the enormous volume of the problem, researchers end up resorting to using assumptions and approximations which ultimately end up compromising the accuracy. The proposed ensembled loss approach tackles the main problems of this subject with robustness. Ensemble approaches in statistics and machine learning combine numerous learning algorithms to improve predicted accuracy. Ensemble approaches have been shown to transform weak learners with accuracies significantly better than random guessing into arbitrarily accurate strong learners. The loss function proposed in this paper enables the possibility to prosperously compare similar human actions while not compromising on accuracy. The framework is meant as an extension to simple neural networks, with the loss function replaced by an ensemble of loss functions.
2 Literature Survey Trelinski et al. [1] presented an algorithm that extracts the features based on the depth maps by using dynamic time warping and further classified by using ensemble classifiers. However, the DTW can be better used as a loss function as it finds the difference between the distance for synchronisation, and using a CNN LSTM further enhances the responsiveness of the model instead of 1D CNN with DTW. Seto et al. [2] proposed a template-based feature extraction process using dynamic time warping so that domain knowledge can be avoided. Although the proposed method can be used efficiently in the sensor-based HAR, it cannot be used in the video-based HAR as the video is multimodal data and DTW needs to be modified to learn activities that are more complex and the computational cost of multivariate DTW is high. Pham et al. [3] of HAR used 3D skeletal models in which the features were based on the relative angles of the body parts. The model works best on the accuracy and the computational complexity for the Single Motion Actions (SMA) and Multiple Motion Actions (MMA). Celebi et al. [4] worked on a weighted DTW method where the weights are calculated on the selection of joints that are most involved during an action. The joint’s weights are optimised by a discriminant ratio. Although this method works efficiently
Enhanced Human Action Recognition with Ensembled DTW Loss …
493
on simple gestures if the length of the feature vectors increases, the computation increases drastically. Sempena et al. [5] used sequential single-layered Dynamic Time Warping for Human Activity Recognition. They extracted the feature vector using the joint direction of human joints for each frame as the joint orientation is non-changing for the human body size. The model can be efficient for a single, simple human action but deep neural networks will work better than that. Cuturi et al. [6] presented a method that overcomes the discrepancies of the basic Dynamic Time Warping. This was achieved by smoothening the Bellman’s recursion. The Soft-DTW works well as a loss function; however, the results can be even more enhanced by ensembling the function. Mohanavel et al. [7] have used dynamic time warping as a pre-processing algorithm, whose results are fed into the classification algorithm to yield a better result, along with other pre-processing tools like PCA and ANOVA. However, these may give better results when dealing with numerical-based datasets and not when using any pictorial represented sequential dataset as they require deep learning and classification algorithms to identify these representations, and then, with the help of DTW as a loss function, the accuracy can be enhanced. Hajiabadi et al. [8] presented a novel loss function based on ensemble techniques for text classification tasks, which may be viewed as an early attempt to investigate the application of ensemble loss functions. When compared to the employment of wellknown individual loss functions, the suggested loss function exhibits an improvement but it does not deal well with large outliers. Chaaraoui et al. [9] presented an algorithm in which activity recognition is done with the help of human silhouette and the sequences of poses. They used DTW for increasing the accuracy of the non-temporal data. Although having a greater accuracy, this method would give lesser accuracy as the algorithm will struggle distinguishing the action which has similar bits of the same actions like running forward and backward. Also, the classification of contour using silhouette would fail in situations in dark scenarios even though the background of the image is removed. Manzi et al. [10] proposed a dynamic method using clustering for human activity recognition. The model first recognises the features of small postures from different activities and secondly the classification of the activity is done by finding the minimum number of frames required. This method provided 99.8% accuracy. However, the cost of implementation is high when compared to the proposed method. Bhambri et al. [11] proposed a system that detects human anomaly detection, which uses rcnn to detect such anomalies and helps in preventing these actions, by using this system in a real-life scenario. However, the usage of rcnn is only used to detect these objects but cannot predict or understand the sequence of these actions. Certain anomalies predicted could be of different cause from the intended like taking out a gun to shoot or just holding the gun. Thus, to eliminate such false positives in anomaly detection, it is better to implement an action recognition algorithm instead of an object recognition algorithm.
494
D. D. Ram et al.
Wu et al. [12] put forward a method to recognise human activity by using depth videos with spatial and temporal characteristics. They first built a hierarchical difference image and provided it to the pretrained CNN. The network classifies human activities by combining the videos. However, the recognition has better accuracy than the skeleton-based models. This is due to less available dataset. Song et al. [13] worked on sequential and hierarchy-based recognition of human activity. For learning sequentially, they have used conditional random fields (CRF) with latent variables for the hidden spatio-temporal dynamics. Also, non-linear gate functions have been implemented to learn features from each layer and this has been repeated to produce the summary of the hierarchical sequence. However, there is not much focus on trying to contemplate some form of manner of collective or group behaviour in the pattern of several or multiple trajectories. Brown et al. [14] worked on creating a solution to the time alignment problem that occurs in recognising words that are isolated. It was demonstrated that searching methods using a class of ordered graphs, which are predominantly utilised in the AI field, may be easily used to solve the problem of temporal alignment. Although the proposed method reduces the computation, it still has few path cost difference which may have effects when dtw is used as a loss function, and the main aim for the loss function is to reduce the cost not increase it. Surakhi et al. [15] analyse three strategies for determining the best time-lag value for a time-series forecasting model. The statistically autocorrelation function, a hybrid optimised LSTM and genetic algorithm, and a parallel dynamical selection based on the LSTM approach are the three ways. The goal is to find an appropriate time-lag result that may be employed in the process of forecasting to provide a more precise output and decrease the training process’ non-linear dimensionality. The lag value holds a lot of significance and is too volatile as smaller values affect accuracy by providing insufficient information; on the other hand, extremely large values will cause performance drops while also reducing the accuracy. Dr. Ranganathan [16] proposed a system that identifies real-time human motion by employing space and time depth particulars or the Spatial-Temporal depth details(STDD) along with random forest which is used at the very last stage for movement classification that performs better than the result obtained by using standard video frames to capture human movements. However, there are some difficulties when it comes to dealing with the dimensionality, and there is a need for an efficient realisation process.
3 Proposed Methodology The proposed method, ensembled dynamic time warping loss function, uses dynamic time warping and mean absolute error ensembled together as a loss function to yield much better accuracy and reduce the loss produced by the system significantly. The existing method of using dynamic time warping as a loss function performs poorly when multiple outliers exist in the dataset, while the MAE can handle such outliers
Enhanced Human Action Recognition with Ensembled DTW Loss …
495
but lacks the accuracy by making it difficult to employ the trained algorithm with the MAE loss function in real-time scenarios for human action recognition. Thus, the proposed loss function can handle such outlier exceptions in the dataset as well as produce higher accuracy by using a dynamic time warping technique for action (sequential data) recognition. Ensembled Dynamic time warping loss function is a combination of multiple loss functions along with dynamic time warping for multivariate as loss function. In this loss function, the function uses Euclidean difference along with the dynamic time warping method to get the best yield for the neural networks used for motion detection. This loss function enhances the ability to optimise the predicted results. Here, the major concept is to ensemble a distance difference loss function that has the ability to map the maxima and minima produced by the sequence of movement in the form of a wave, to get the better difference and then ensembled with a Euclidean loss function which performs well for any divergence in the test case data that is used for prediction. For instance, when the dataset has outliers, like in the case of actions where the sequence of movement has few variations from the ideal sequence of movement for an action. Further, the distance loss function is integrated with the SoftMax layer for multiple-class prediction instead of the cross-entropy function used for SoftMax layers. This cross-entropy function is a convex loss function that is used in the prediction layer or the layer after the fully connected layer as it acts as a loss function, which is used to tune the parameters as mentioned in (1) and (2) based on the difference between the prediction made by SoftMax layer and the actual value. Artificial Neural Networks have parameters like the weight and bias which needs to be changed to the value that is most suitable for the model. In this case, the initially used weight and bias are constantly iterated with the derivative of the loss function with respect to the parameters. As per (1) and (2), ∂Lv wt = wt − α∂wt
bi = bi −
∂Lv ∂bi
(1) (2)
Here, wt → weight bi → bias Lv → loss value Thus, the usage of a convex function as a loss function in the middle of the neural network results in undefined outputs from neurons from the neural network layers based on the activation function.
496
D. D. Ram et al.
k=
n
ili · wti
+ bi
(3)
i=1
il → input layer 1to n is the total number of input parameters i → hidden layer k → computation layer output = av(k)
(4)
From (4) and (3), output = av(
n
ili · wti
+b
(5)
i=1
av() → activation layer As, for the prediction multi-class classification a non-linear activation layer is used in the neural network layer as shown in (5), which introduces local extrema in the optimisation. Moreover, it eliminates the convexity problem produced by the crossentropy loss function from the SoftMax layer as the neural networks are complex non-linear models and the ensembled loss function finds the local extrema (minima, maxima) instead of getting stuck in finding the global extrema.
3.1 Working The steps from the data flow diagram (Fig. 1) are as follows: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
START Collection of Data Pre-processing of stream data Feature extraction of the sequence data (selecting frames) One hot encoding for the features and updating dataset Neural network construction SoftMax layer for multiple-class predictions Implementation Ensembled loss function Accuracy Metrix STOP.
Enhanced Human Action Recognition with Ensembled DTW Loss …
497
Fig. 1 Working of ensembled DTW loss function in a neural network
The collected video form of the data for human action recognition is pre-processed by label encoding each action on its respective frames and further selecting the right interval of time to select certain frames to be accumulated for training data for that particular action. This process of frame encoding for the video is achieved by splitting the video used for training data into frames and capturing the first frame for each class (action) and adding the class name and index to the frame. Thus, it makes it easier to train various classes without any confusion. Furthermore, this process of labelling alone is not enough to reduce the complexity of training such sequential image data and requires another step before training, i.e., feature extraction from the data. As the CNN LSTM is used as the neural network model, it requires images at an equal interval from each other for a class, i.e., the selected frames to train the data for an action should be well equipped in terms of capturing all key motions, resulting in one complete cycle of completion of that
498
D. D. Ram et al.
action. This is done by setting the frequency rate of frame capture for the data and selecting the total number of sequential used for training a class. After the extraction of the frames, this process is done for all the classes used for training the model. Furthermore, the captured frames are normalised to improve the training of these datasets. So, all the values stored in the pixels of each frame in the RGB matrix lie between 0 and 1. Once the normalising of frames is finished, they are then appended back to the frame lists, contributing to each action. Thus, the feature selection for the pre-processed data for the human action algorithm is done. The next step is to construct a neural network that trains on the above data. So, with the help of various packages, a neural network is created. While constructing the neural network, it should be capable of processing the image as well as establishing connections between the sequence of frames. The convolutional layer, along with padding and pooling, is added with long-term and short-term memory to process the video. The SoftMax layer is added, which uses cross-entropy as its loss function. It is used to differentiate the values predicted by the neural network model and associate the resultant fully connected layer’s value to a class or action used for detection. This entire process of human action recognition can be optimised by using the appropriate loss function and optimiser. Adam’s optimiser is clubbed with the ensemble dynamic time warping loss function for the current model. This optimisation technique is a stepping stone to the ensembled loss function concept. It uses two gradient descents used for optimisation. This algorithm is a combination of root mean square propagation and momentum. It uses the “exponential moving average” instead of the cumulative sum of squared gradients. As a result, it is adaptable. Momentum works by taking into account the “exponentially weighted average” of the gradients. This approach is used to speed up the gradient descent algorithm. Using averages accelerates the algorithm’s convergence to the minima. ∂Lv ] ∂wtt ∂Lv 2 pgt = β1 pgt−1 + (1 − β1 ) ∂wt t ak t = β1 ak t−1 + (1 − β1 )[
(6)
(7)
β → decay constant akt → aggregate of gradients at time(t) pgt → sum of square of past gradients
The bias-corrected weight parameters ak t and pgt as derived from (6) and (7). Using them in the backpropagation equation results in
wt+1 = wt − ak t
av
pgt + ε
(8)
Enhanced Human Action Recognition with Ensembled DTW Loss …
499
The loss function is merged with (8) derived from Adam’s optimiser. The dynamic time warping is used as a loss function, initially as one of the loss functions to be ensembled. This loss function is constructed from scratch. The value predicted by the model is taken as one sequence of data points to be analysed with the actual sequence produced for an action. So, the product value of the matrix from the fully connected layer after processing through the neural network model is used to compare with the convoluted sequential representation of the actual action. Now, the indentation of each of the two comparable values is arranged properly. Considering that human actions are two-dimensional, i.e., one dimension is associated with the convolutional value of the processed frame and the second dimension is the timeline that consists of the knowledge gained based on the sequence of the movement (arrangement order of the selected frames). Thus, multivariate dynamic time warping is used. The loss function is calculated by finding the distance between the comparative co-ordinates based on the body posture, which has similar co-ordinates (p, q) based on the image, and z, which has similar co-ordinates in terms of the timeline.
DT W i, j˙ − 1 DTW(i, j) = dist( pi , qi ) + min DT W (i − 1, j) ⎩ DT W (i − 1, j − 1)
dist( pi , qi ) = (( p m − pn )2 − qm − qn )2 ⎧ ⎨
(9)
(10)
min(a, b) → minimum value between a and b Dynamic time warping → DTW() distance → dist() (i, j)are indents in the two sequences ( p, q) are the co − ordinates used to find the distance. As per (9), Min function is used to find the minimum value of the three dynamic time-warped functions with respect to previously calculated adjacent indents. For multivariate DTW, in this case, it has three vectors/dimensions. So, first the DTW is calculated between x and y and then the DTW is calculated between DTW (p, q) and r as per (10). This calculation will lead to associating the synchronisation of the action as well as improvising the precision prediction based on both the body position and their timeframes. This loss function usually has a significant decrease in the loss for each epoch of training. However, to improvise it, the mini batches can be inculcated into the loss function system to reduce the complexity of the loss function while calculating
500
D. D. Ram et al.
for an action that requires a much longer time to be recognised or while having to monitor multiple positions in the body of the person. Afterwards, this loss function is ensembled with other loss functions to eliminate flaws in dynamic time warping and produce a more suitable and less complex loss function, which could not only produce a better reduction of loss but also improve the accuracy and decrease the training time. The loss function uses one of the traditional ways of finding the loss, i.e., the Euclidean distance loss functions. In this case, the loss function is just calculated by the difference between the two values. This simpler form of loss function helps in tackling minute changes and increases the accuracy of prediction while reducing the computation value. One such loss function used here is the mean absolute error. This calculation of loss is done by the process of subtraction between the absolute value of the predicted value and the actual value as per (11). This is the simplest way of calculating the loss, and as it uses absolute value (|x|), the generated value is always positive. This results in eliminating any negative value produced by the other loss functions used for ensembling. Thus, eliminating the flaws of its compatriots’ loss functions. The ensemble loss function is constructed by comparing the values produced by both mean absolute error and dynamic time warping and producing a unique value that is most appropriate to the model as mentioned in Eq. (12). Thus, enhancing both the accuracy and reducing the complexity and eliminating any non-defined values produced by loss functions, MAE = v − v
(11)
v → predicted value v → actualvalue
MAE → Mean Absolute Error |a| → absolute value of a
Lv E = min MAE v , v , DTW v , v
(12)
min(a, b) → minimum value between a and b Thus, the loss is significantly reduced and it also improvises the accuracy of the current Human action recognition model using Convolutional Neural Network- Long Short-Term Memory.
Enhanced Human Action Recognition with Ensembled DTW Loss …
501
4 System Architecture The human action recognition system uses the appropriate action data to feed the model. Then it is pre-processed as mentioned in the working of the proposed methodology, by extracting the required frames from the video for training. Then the label encoding for each frame based on the actions is done with the help of one-hot encoding. Once the selected frames are encoded, this pre-processed data is fed into the two different neural networks (Three-Dimensional Convolutional Neural Network and the Convolutional Neural Network Long Short-Term Memory) to make the desirable predictions for the system. As shown in Fig. 2, these two systems use their own architecture, loss functions, and optimisers to yield the best possible outcome. After the implementation of the model, it is time for the predictions. The predictions are made by the SoftMax layer where the likelihood of the categories or classes is determined for the given sequential data. This data is fed into the SoftMax layer to give its predicted class. These predicted classes are further evaluated by the metrics in place. In this system, accuracy and loss value metrics are evaluated for each model to find the best possible network for human action recognition.
4.1 Three-Dimensional Convolutional Neural Network (3D CNN) The process of identifying a series of accelerometer data acquired via customised harnesses or smartphones into identifiable, well-defined motions is referred to as human activity recognition. Hand constructing features from time-series data based on constantly sized windows and developing machine learning models remain to be the two classic ways. This feature engineering, on the other hand, necessitates a great deal of knowledge in the discipline. There are a number of different approaches for overcoming the human action recognition issue given a specific hypothesis and a multitude of datasets, but none of them are particularly effective in a real-world setting. The stated measure is firstly used to represent characteristics from the recorded raw video frames in addition to identifying human behaviours. It’s tricky to pinpoint the distinctive character when the previously made assumptions fail to deliver information at the conclusion of the scenario. Deep Learning models that use a multi-layered, space-time technique to feed input training data may learn features and compute outcomes from specific datasets without jumping to any conclusions, which can partially address the aforementioned challenges. Drawing from the space-time motion information cuboid, 3D Convolutional Neural Networks could aid depict human movements more precisely. The hereunder is the framework of a 3D CNN that has been inflated: The video input is a two-dimensional frame with time as the third dimension, while the three-dimensional input is a two-dimensional frame with time as its third dimension. With stride 2 come convolutional (CNN) layers, followed by numerous inception modules (conv. Layers
Fig. 2 System architecture of the human action recognition model
502 D. D. Ram et al.
Enhanced Human Action Recognition with Ensembled DTW Loss …
503
with one max pooling layer, concatenation is the main task). These modules have grown inflated because they have been stretched into the core of the model. These modules can contain a variety of microarchitectures, including LSTM, two streams, and others. Finally, for prediction, a 1 × 1 × 1 convolutional layer is adopted, as well as an average pooling layer.
4.2 Mean Squared Error (MSE) MSE =
N
2 1 vi − vˆ N i=1
(13)
where, v is observed value v is predicted value N is the number of data points.
To avoid overfitting, the model is now put through a series of tests to ensure its correctness and generality. The average squared error of a dataset is called the Mean Squared Error (MSE). The MSE has a different meaning than the residual error. The residual error is a measure of how well the regression model predicts each individual data point, whereas the MSE is a measure of how well the regression model predicts the entire dataset. There may be many residual errors in a data collection, but only one MSE. The mean squared error is always a positive figure because it is calculated using squared numbers as shown in (13).
4.3 Convolutional Neural Network Long Short-Term Memory (CNN LSTM) Neural networks that consist of one or more convolutional layer in their architecture are called convolutional neural networks. These neural networks are used for image processing, object detection, image classification, etc. In image recognition tasks, Convolutional Neural Networks (CNN) are commonly used. CNNs were able to outperform the state-of-the-art performance in multiple areas, and they have also been used for HAR and Ubiquitous Computing. This approach to HAR is quite popular, and it has fared far better in video and image data. The CNN and LSTM are used in conjunction because the CNN is used to learn spatial characteristics and the LSTM is employed for sequential modelling. A series of movements makes an action. These actions are recorded and examined by CNN, and LSTM analyses the sequence in which they take place. LSTM contains a series of gates that controls the flow of the sequential data in the model, which includes their entry, storage and exit. They mainly contain three gates, that are the “forget”
504
D. D. Ram et al.
gate, “input” gate, and an “output” gate. These gates are considered as filters and they are their own neural network. LSTMs differ from more traditional feedforward neural networks in that they have feedback connections. This form of the LSTMs allows it to process entire sequential data without considering each point separately in the sequence, but rather by holding on to important information about the previous data points in the sequence to aid in new data points processing. A combination of CNN and LSTM is used for classifying human activity recognition with a higher accuracy rate. It is also way better than the 3D CNN because of its less complexity and it also requires a minimal amount of computation power. The structure of the CNN LSTM model is as follows: Input is a resized and augmented video. The CNN is used for learning spatial features while the LSTM is used for sequential modelling. The model consists of three pairs of convolutional LSTM layers with tanh activation and a 3D pooling layer. The last layer consists of a flatten layer. A dense layer with SoftMax activation is added which gives the classification of HAR.
5 Results The tools used in simulation to obtain the methodology’s results are Google Colab, TensorFlow, OpenCV, matplotlib, pandas, and NumPy. The model is trained with UCF101 dataset which is an extension of UCF50 dataset and it contains 101 categories of actions. The human action recognition is implemented in two different neural networks to get the best possible results. The 3D CNN and CNN LSTM are deployed into the system with the same dataset and its pre-processing techniques. The following comparisons as per Table 1 are found based on the implementation of the two networks. As depicted in Table 1, it is clear that the computation power and time taken for training are much lesser in the CNN LSTM. Thus, this network is preferred for the human action recognition and implementation of the ensembled loss function for the taken dataset. The evaluation of the ensembled loss function is measured by implementing it in the CNN LSTM model. Here, three different CNN LSTMs are trained with three different loss functions as a parameter to find out the difference between the loss functions capability based on their outcomes. As the ensembled loss function uses both Mean Absolute Error and Dynamic Time Waring as its loss functions. So, for proper validation, it is required to compare the ensembled loss Table 1 Comparison of the two neural networks Features
Three-dimensional convolutional neural network (3D CNN)
Convolutional neural network—long short-term memory (CNN LSTM)
Computation power
High
Low
Time for training
High
Comparatively low
Accuracy
Low
Comparatively high
Enhanced Human Action Recognition with Ensembled DTW Loss … Table 2 Loss function comparison table
505
Metrics
Ensembled DTW loss
Dynamic time warping (DTW)
Mean absolute error (MAE)
Accuracy (%)
75
62.5
35
Loss value
0.044
0.54
0.36
function with the networks trained with these two as an independent loss function. The model is trained with the above-mentioned loss functions and the observed metrics are noted as per Table 2 Based on the observations in Table 2, it is evident that the ensembled loss function performs better compared to the two individual loss functions for the human action recognition model. The higher accuracy and the minimal loss value show how the ensembling of these two loss functions betters the accuracy and loss value produced by the independent loss function and also eliminates the flaws in these loss functions, i.e., accuracy from MAE and Loss value from DTW.
5.1 Evaluation Metrics They are the features used for evaluating the difference between the trained model based on the three different loss functions. Accuracy percentage This metric uses the categorical prediction made by the SoftMax layer for the given input. The calculation is done by finding the percentage between the predicted categorical value and the labelled value for those frames, constructed during the data pre-processing. After the training of the model, the final accuracy metric is considered for evaluation and these accuracy values are further used in data visualisation methods to have a better understanding of the results as shown in the accuracy bar graph from Fig. 3. Based on the graph, it is evident that the ensembled DTW loss function rises the standards of the Dynamic Time Warping as a loss function in accuracy by 12.5%. This is a significant boost considering the amount of data used for training. The accuracy is least for the Mean absolute error with 35% for the same number of epochs as the other two loss functions. This suggests that in this case the traditional Euclidian methods are not the right approach for finding loss function. Thus, ensembling them with dynamic time warping the accuracy is more than doubled (75%) for the same dataset. Loss Value One of the main metrics to evaluate the loss functions is to find out the difference between the loss produced by the loss functions for the total number of epochs,
506
D. D. Ram et al.
Fig. 3 Accuracy graph
Fig. 4 Loss value graph
trained in the CNN LSTM architecture for human action recognition. Also, make sure other parameters like the learning rate and the total number of epochs and the mini batches remain the same. Now, for evaluation, the final loss value (after all the epochs) is considered for each loss function and is compared by using simple data visualisation. Based on the extracted values, the loss bar graph is constructed as seen in Fig. 4. Based on the graph, it derives a conclusion that the ensembled DTW loss function produces the least loss (deviation between the predicted value and the actual value) value for the given dataset with the same number of epochs and mini batches.
Enhanced Human Action Recognition with Ensembled DTW Loss …
507
Thus, making the Ensembled DTW loss function much more capable than the MAE and DTW as loss functions. The loss difference in the DTW is way higher compared to the Mean Absolute Error. Thus, to reduce the value from 0.54, the ensembled loss function is deployed and the results are much more efficient than that of the Mean Absolute Error ranging close to 0.044. Making this function the most efficient one for getting the least loss value for the Human Action Recognition model.
6 Conclusion The ensembled DTW loss function holds the upper hand towards both the dynamic time warping and the mean absolute error in the evaluation metrics. Thus, increasing the model’s accuracy and response time which enables the system to work more efficiently in action recognition in monitoring elderly people in old-age homes. The computational speed for the proposed methodology is 92 min. In the future, the proposed loss function can be used in other applications or with a much larger dataset to check its validation and its performance based on the input. Upon further innovation, this system can be used as a regular monitoring system for all sorts of action-based fields.
References 1. Trelinski J, Kwolek B (2021) CNN-based and DTW features for human activity recognition on depth maps. Neural Comput Appl 33:14551–14563. https://doi.org/10.1007/s00521-021-060 97-1 2. Seto S, Zhang W, Zhou Y (2015) Multivariate time series classification using dynamic time warping template selection for human activity recognition 1399–1406. https://doi.org/10.1109/ SSCI.2015.199 3. Pham H, Le K, Le H (2014) Human action recognition using dynamic time warping and voting algorithm. VNU J Comput Sci Commun Eng 30:22–30 4. Celebi S, Aydin AS, Temiz TT, Arici T (2013) Gesture recognition using skeleton data with weighted dynamic time warping. In: Proceedings of the international conference on computer vision theory and applications (VISAPP), pp 620–625 5. Sempena S, Maulidevi NU, Aryan PR (2011) Human action recognition using dynamic time warping. In: Proceedings of the 2011 international conference on electrical engineering and informatics, pp 1–5. https://doi.org/10.1109/ICEEI.2011.6021605 6. Cuturi M, Blondel M (2017) Soft-DTW: a differentiable loss function for time-series. In: Working Papers 2017-81, Center for Research in Economics and Statistics. PMLR, vol 70, pp 894–903 7. Mohanavel A, Danaraj DR, Fatima NS (2022) Classification of human emotion using DT-SVM algorithm with enhanced feature selection and extraction. Webology 19(1). ISSN: 1735-188X. https://doi.org/10.14704/WEB/V19I1/WEB19233 8. Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. In: Proceedings of Australasian language technology association workshop, pp 98102
508
D. D. Ram et al.
9. Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2013) Silhouette-based human action recognition using sequences of key poses. Pattern Recogn Lett 34(15):1799–1807. ISSN: 01678655. https://doi.org/10.1016/j.patrec.2013.01.021 10. Manzi A, Dario P, Cavallo F (2017) A human activity recognition system based on dynamic clustering of skeleton data. Sensors 17(5):1100. https://doi.org/10.3390/s17051100 11. Bhambri P, Bagga S, Priya D, Singh H, Dhiman HK (2020)Suspicious human activity detection system. J IoT Soc Mob Anal Cloud 2(4):216–221 12. Wu H, Ma X, Li Y (2022) Spatiotemporal multimodal learning with 3D CNNs for video action recognition. IEEE Trans Circuits Syst Video Technol 32(3):1250–1261. https://doi.org/ 10.1109/TCSVT.2021.3077512 13. Song Y, Morency L, Davis R (2013)Action recognition by hierarchical sequence summarization. In: 2013 IEEE conference on computer vision and pattern recognition, pp 3562–3569. https://doi.org/10.1109/CVPR.2013.457 14. Brown MK, Rabiner LR (1982) An adaptive, ordered, graph search technique for dynamic time warping for isolated word recognition. IEEE Trans Acoust Speech Signal Process 30:535–544 15. Surakhi O, Zaidan MA, Fung PL, Hossein Motlagh N, Serhan S, AlKhanafseh M, Ghoniem RM, Hussein T (2021) Time-lag selection for time-series forecasting using neural network and heuristic algorithm. Electronics 10:2518. https://doi.org/10.3390/electronics10202518 16. Ranganathan G (2020) Real life human movement realization in multimodal group communication using depth map information and machine learning. J Innov Image Process (JIIP) 2(02):93–101
Link Prediction Using Fuzzy Computing Model by Analyzing Social Relationship in Criminal Networks M. R. Sumalatha, Lakshmi Harika Palivela, G. Aishwarya, M. Roshin Farheen, and Aadhithya Raj Madhan Raj
Abstract One of the most popular topics among network analysis is the prediction of links that goes missing or might appear in the future in network graphs. The main challenge in the link prediction algorithms proposed so far lies in the structural characteristics of different networks that will vary dynamically. Mostly, the data required will not be enough to solve the forensic investigation and requires crucial and extensive human skills. Fuzzy logic can be used to deal with the information that is uncertain, imprecise, ambiguous, partially true, or without definite limitations. The proposed conceptual learning framework utilizes a better prediction technique for a given network based on its features that are built upon analyzing the social relationship in terms of nodes and edges present in network theory. Moreover, it offers a potential method for resolving conflicts between various criteria and improved option evaluation; therefore, fuzzy computing can aid in system modeling. This article intends to propose a novel fuzzy logic-driven similarity-based indices for criminal links prediction in a time series network by developing an impact on the algorithms used to estimate the connection strength and/or forecast how criminal networks will evolve. Results indicate that, in comparison to the crisp technique, fuzzy-based technique can yield more accurate predictions by characterizing the network properties in a better way. Keywords Link prediction · Criminal network · Fuzzy · Computing · Forensics · Classifier
1 Introduction Forensics is the process of recovering, investigating, reviewing, and interpreting data in connection to crime. Analysis of social relationships helps to identify gaps and potential connections by providing data for report production [1]. Therefore, it is essential to create a model for performing forensic investigation, which includes M. R. Sumalatha · L. H. Palivela (B) · G. Aishwarya · M. R. Farheen · A. R. M. Raj Information Technology, Madras Institute of Technology, Anna University, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_37
509
510
M. R. Sumalatha et al.
predicting the criminal behavior and intent through efficient network studies that possess transmission capacity. Two key areas in link prediction that indicate potential interactions between people are mining missing links and predicting upcoming linkages [1–20]. It aids legal firms in determining the root of a crime, finding missing evidence, and determining potential criminal intent. Basically, it starts with analyzing criminal network which includes methods and strategies used in the hierarchical study of a criminal network that were created due to social network analysis models and metrics [2]. Social networks are used for a variety of activities, including drug trafficking networks, scientific partnerships, and even SMS messages among users of social media platforms like Twitter, Instagram, and WhatsApp [3, 4]. Graphs are used to portray the architecture of these network communities, with nodes as the actors or participants and the relationships or social interactions as edges or links [5, 6]. The ability to potentially predict the construction of new linkages, the recurrence of prior links, and the destruction of existing links is a major challenge in the study of the topology of these community networks. Such link prediction is helpful in anticipating the evolution of these network communities’ behavioral patterns [7]. Both topological and metrics based on the content of network-oriented domains may be used to create SNA metrics-based link prediction [8]. These content-based topological metrics are divided into three categories: Katz-based, random-walk, and neighborhood. Common neighbor, Adamic-Adar, and Jaccard index are a few examples of neighborhood-based metrics. Time-evolving networks are dynamic networks in which the network topology changes over time [9]. In addition, a time series perspective of link prediction analysis might assist researchers in gaining a deeper understanding on the development of networks. The dynamics of complex networks have become the subject of several studies [10, 11]. The fact that the framework of the criminal network changes throughout time will reflect the impact of chronological data on those actions in the actual world, which adds to the link prediction challenges [12]. Hence, a model for link prediction is created and trained by using fuzzy computations and data on the spatio-temporal properties of the criminal network dataset to predict the missing values and future potential relationships between two entities that remain unnoticed.
2 Related Works Cao et al. [13] proposed a heuristic chaotic ant colony optimization (CACO)-based link prediction model. This paper exploits a quasi-local topological data of the networks that integrate chaotic perturbation and CACO model in order to gain high scalability and reasonably low time complexity. It achieves a satisfactory trade-off among prediction accuracy and robustness. The trade-off among other evaluation metrics is not specifically optimized to the respective network type as the mechanism varies, respectively. It is necessary to further enhance the integration of weight and topology with respect to specific networks.
Link Prediction Using Fuzzy Computing Model by Analyzing Social …
511
Mahrous et al. [14] proposed a fuzzy-enabled blockchain framework. This article focuses on techniques that deal effectively with the way forensic investigators use IoT device traces, and by employing fuzzy hashing, it is possible to identify damning text document that might remain undetected when traditional hashing techniques are employed. Execution time and complexity should be enhanced. Nisioti et al. [15] proposed a data-driven decision support framework called DISCLOSE. This paper focuses on optimizing the challenges that occur during a cyber forensic investigation. In addition to assisting investigations, DISCLOSE can boost inquiry effectiveness while lowering the budget requirements. Lim et al. [16] have developed a Time-based Deep Reinforcement Learning (TDRL). In this paper, by utilizing the DRL approach with comparatively small crime datasets that changes over time, a time-based link prediction model has been proposed. The DRL approach could be used to build ML models in many other domains, where the size of the datasets remains inadequate. The accuracy of the negative link prediction has become limited. Lim et al. [17] have developed fusionbased deep reinforcement learning (FDRL). In this study, a metadata fusion approach in criminal network (FDRL-CNA) was contrasted with a solely time-evolving DRL model (TDRL-CNA) with no metadata fusion for link prediction. The predicted accuracy is increased by accounting metadata fusion. In determining the suggested inspections, knowledge acquisition is not taken into account. Raj et al. [18] develop a secure content transfer model in the social network. Most of the existing works are focused on the local algorithms for topology information and feature extraction, which results in low time complexity with relatively low prediction accuracy whereas the other global algorithms proposed so far results in very high computational complexity with better prediction accuracy. Only a few models have presented metadata fusion methods, which account for better evidence in time-evolving social networks and real-world networks that vary dynamically in terms of size. Other articles [19, 20] have developed hybrid models to improve prediction and classification accuracy. The success of link prediction analysis in diverse networks may help to explain the process of network evolution.
3 Link Prediction Using Fuzzy Computing Model (LPFCM) The following system architecture in Fig. 1 shows the flow of the proposed system through four modules namely network analytics, fuzzy computation, link prediction, and tuning modules that comprise the decision support system. The network analytics module aims to analyze the social relationship among the individuals present in the network of the input data. The fuzzy computation module calculates the similarity indices and enhances the link weight and topology fusion from the extracted information with respect to the criminal network [21]. The link prediction module uses ensemble concept to precisely uncover the missing and future
512
M. R. Sumalatha et al.
Fig. 1 LPFCM system architecture
links in an efficient way. The tuning module for enabling parameter searching procedure and tuning to achieve a better trade-off among evaluation metrics and performance with respect to criminal network type provides consistent results. Decision support systems emerge strongly from the analyzed information present among the individuals in the crime network. To predict the missing and future potential relationships between two entities in a social network that have been unnoticed, the potential findings at a crime scene will be used for report generation. As a result, a link prediction model is developed and trained by using fuzzy computations and data based on the spatio-temporal features of the criminal network dataset, and the resultant reports are then utilized to solve the crimes.
3.1 Network AnalyticsModule This module firstly visualizes the graph data that helps to understand the outliers, trends, and patterns in the time-evolving criminal network of caviar data over a period of 2 years. The network’s observable linkages are divided into training and probing edge sets, which account for 30% of total edges, and the social interaction also known as the correlation among the nodes is analyzed by using the similarity metrics such as page rank, betweenness similarity, and eigenvector centrality. Here, the average degree of a graph measures the number of edges when compared to the number of nodes. A measurement of the frequency of direct connections in a social network is known as network density.
Link Prediction Using Fuzzy Computing Model by Analyzing Social …
513
Given network Gt = (V, E), Y T is a candidate open triad: The formation of triadic closure is given by f as follows: f :
Gt , Y t , X t
t=1,T
→ Y T +1
Page rank, a centrality metric for nodes present on the network, ranks the nodes in the graph based on their placements in the network. It is a subset of Eigen centrality. Pagerank is the ratio of incoming links to total links present in a network. Betweenness centrality is a measure of a node’s influence on the information flow in a network. Eigenvector centrality is a measure of a node’s influence inside a network. Cliques are often made up of people, who are related to one another tightly with similar interests but are unconnected to others outside the group. According to network theory, Cliques are subgraphs that are maximally complete, where every node is directly linked to every other node. The term “maximal” denotes the fact that no further nodes may be included in the clique without reducing its connectivity. Algorithm I: Input: graph G, size n Output: Maximal clique Clique(G, n) Leaf_node(|G|=0) if(maxi_size>n) maxi_size=n return Upper bound fail Generate upper bound u(G) if (u(G)+size Mt(pxi) SET the C3 = 0. ELSE SET C3 = 1. ENDFOR END FOR SET the connection output to the total of each node’ step-by-step outputs. SET the output to the lowest value possible for all phases. END FOR
Link Prediction Using Fuzzy Computing Model by Analyzing Social …
517
3.3 Link Prediction Module According to the fuzzy rules defined in the fuzzy computation module, features such as similarity metrics from the graph are calculated and fed as input features along with the respective nodes resulting in the modified final dataset. Similarity metrics calculated based on fuzzy rules are clustering coefficient Jaccard distance and cosine distance. Other metrics fed as a feature is page rank and weakly connected component. Then, the final data with the computed features is given as input to the ensemble model and then predicted whether there establishes a link between nodes in the upcoming future. The stacked ensemble model takes the revised dataset as input. By using the predictions out of several nodes, stacking creates a new model that views the act of aggregation itself as a learning activity. The aggregation procedure is optimized using a final estimator. The advantages of stacking in the 20-time looped 10-fold cross-validation assessment are shown in Fig. 2. The blue bar displays how well the ensemble model performs. The heterogeneous ensemble comprised of seven basic classifiers: logistic regression (LR), k nearest neighbors (KNN), decision trees (DT), Gaussian Naive Bayes (NB), artificial neural network (ANN), support vector (SVC), and quadratic discriminate analysis (QDA). As the final estimator in stacking, logistic regression is employed to perform aggregation with good accuracy. Stacking enhances the effectiveness of a heterogeneous ensemble, but it limits the number of model types that can be used to build the ensemble. The aforementioned issue could be solved by using different hyper-parameters within a single model type. The variation is added by randomly selecting the hyper-parameters for the ensemble members. The following options are being considered Kernel: [rbf, linear, poly], C: [0.04, 0.2, 0.3], and Gamma: [0.2, 0.5]. When the rbf and poly kernels are selected, gamma is the kernel parameter. The SVC regularization parameter is called C. The possibility of this tactic is depicted by the red bars in Fig. 3. The SVC is the only model being examined, and variation is added by randomly selecting the hyperparameters for the ensemble members. So, choosing hyper-parameters at random can occasionally produce subpar models. However, the performance of the stacked 94
Accuracy
93.5 93 92.5 92 91.5 91
Hetero
7 Stacked
20 Stacked
Model Fig. 2 Comparison of the accuracy of heterogeneous ensemble and the stacking equivalent
M. R. Sumalatha et al.
Accuracy
518 100 90 80 70 60 50 40 30 20 10 0
Model
Fig. 3 Comparison of the accuracy of heterogeneous Proposed ensemble model
ensembles with 7 and 20 estimators is good. The generation of stacking ensembles with several base estimators is therefore possible using this hyper-parameter method. These stacking ensembles based on hyper-parameter selection have to be manually tuned because some hyper-parameter combinations gave really subpar estimators. It should be noted that the size 20 stacked ensemble is not superior to the size 7 ensemble. This could be because the final estimator has a tendency to overfit.
4 Results The experiments were tested on Intel® processor with 16 GB of RAM and all the source code scripts were written in Python using Anaconda. NetworkX package is used for network analysis.
4.1 Dataset Used Caviar investigation focused on a network of cocaine traffickers operating from Montreal [22]. From 1994 to 1996, a combined investigation was focused on the network. During this period, 11 importation consignments were recovered, but no arrests were made until the investigation’s completion. This case presents a unique chance to examine how a criminal network phenomenon emerged and how law enforcement dealt with it. The inherent investigative methodology enables both an assessment of network structure change and a close examination about how nodes in the network nodes respond to and adjust to expanding restrictions put on them. Information provided as testimony throughout the trials of 22 Caviar network members served as the main source of data. It contained 4,279 paragraphs (more than 1,000 pages) of data indicating electronically recorded
Link Prediction Using Fuzzy Computing Model by Analyzing Social …
519
phone calls between network users. Throughout the course of the investigation, this communication system matrix for the drug smuggling operation was built using these transcripts. Not all of the people caught in the monitoring net were involved in the trafficking scheme. 318 people were identified once the names in the monitoring data were extracted in their entirety. From this group, 208 people were not connected to the trafficking schemes. In the numerous transcripts of discussions, the majority were only identified, but never discovered. Other people discovered weren’t obviously participating. Others that were discovered lacked a distinct network participation function (e.g., legitimate entrepreneurs or family members). Therefore, 110 members made up a complete network. Criminals communicate with one another through ties. Values show the volume of conversation. Police wiretapping provides the data. The network size that considers around 110 nodes, the weighted links, results in 205 edges in the communication network.
4.2 Results Analysis The area under the curve (AUC) metric of the link prediction, with a value ranging from zero to one, suggests the prediction results of each learning model. A higher AUC metric model will produce a more accurate prediction. Positives in our situation correspond to links, while negatives correspond to no links then we evaluated the true and false positive samples rate to evaluate AUC. As the final estimator in stacking, logistic regression is employed to perform aggregation with good accuracy having an AUC score of 93%. The proposed algorithm of the ensemble model with fuzzybased calculation of similarity indices yields a training accuracy of 99% and testing accuracy of 92% and the proposed algorithm of the ensemble model with fuzzybased calculation of similarity indices after tuning yields a training accuracy of 97% and testing accuracy of 93% shown in Table 1. The advantages of stacking in the 20-time looped 10-fold cross-validation assessment are shown in Fig. 2. The blue bar displays how well the ensemble model performs. It should be noted that the size 20 stacked ensemble is not superior to Table 1 Classification report Classification
Ensemble model
Metric
Training
Ensemble model after tuning Testing
Training
Testing
T_S
F_S
T_S
F_S
T_S
F_S
T_S
F_S
Precision
0.99
1.00
0.87
1.00
0.95
0.98
0.89
0.97
Recall
1.00
0.99
0.98
0.99
0.98
0.95
0.98
0.98
F1-Score
1.00
1.00
0.92
0.91
0.97
0.97
0.93
0.92
Accuracy
0.99
0.92
T_S: True Samples, F_S: False Samples
0.97
0.93
520
M. R. Sumalatha et al.
the size 7 ensemble. This could be because the final estimator has a tendency to overfit. Stacking enhances the performance of heterogeneous ensemble models. The accuracy of classifiers in the ensemble is shown in Fig. 3.
5 Conclusion Machine learning, which includes feature extraction and sample learning, is a potential technique used for graph network connection prediction. By combining fuzzy similarity features with ensemble learning, this technique provides an excellent solution for link prediction. The developed fuzzy-based link prediction model has identified the edges or links that could emerge and fade from the network over time. The fuzzy-based training model has improved the link prediction model’s performance for predicting the presence of negative connections as well as the formation of links over time. When used, this model is determined to be more accurate than the crisp-based method. This work can be extended by considering the inclusion of data fusion from criminal network analysis (CNA)-related data sources, including court decisions, deaths, arrests, and phone taps, in order to assess the impact of data fusion on the link prediction model’s predictive accuracy. A link prediction model with improved predictive accuracy is anticipated to result from the incorporation of such auxiliary information sources, which can enhance law enforcement agencies’ efforts to disrupt criminal networks. To further increase the accuracy of the classification algorithms embedded into the created fuzzy-based link prediction model, another feasible future work is to create a new prediction strategy by integrating many existing approaches, through the findings of the correlation among the prediction accuracy and performance metrics. This work can be further expanded to include a wide variety of networks, not just criminal cases. Future goals also involve developing specialized CNA for the construction of weights.
References 1. Cho H, Yu Y (2018) Link prediction for interdisciplinary collaboration via co-authorship network. Soc Netw Anal Mining 8(1). https://doi.org/10.1007/s13278-018-0501-6 2. Aghabozorgi F, Khayyambashi MR (2018) A new similarity measure for link prediction based on local structures in social networks. Phys A 501:12–23 3. Dash SK, Safro I, Srinivasamurthy RS (2018) Spatio-temporal prediction of crimes using network analytic approach. IEEE Xplore. https://doi.org/10.1109/BigData.2018.8622041 4. Haoxiang W (2020) Emotional analysis of bogus statistics in social media. J Ubiquitous Comput Commun Technol (UCCT) 2(03):178–186 5. Fu C, Zhao M, Fan L, Chen X, Chen J, Wu Z, Xia Y, Xuan Q (2018) Link weight prediction using supervised learning methods and its application to yelp layered network. IEEE Trans Knowl Data Eng 30(8):1507–1518. https://doi.org/10.1109/tkde.2018.2801854
Link Prediction Using Fuzzy Computing Model by Analyzing Social …
521
6. Li R, Li X, Ding Y (2020) Link prediction algorithm for BLE mesh network in health monitoring system. IEEE Xplore. https://doi.org/10.1109/CCDC49329.2020.9164868 7. Chen B, Hua Y, Yuan Y, Jin Y (2018) Link prediction on directed networks based on AUC optimization. IEEE Acc 28122–28136 8. Li T, Zeng C, Feng Y, Zhang Y, Wang K (2021) research of local similarity index based on OWA integration operator in terrorist network link prediction method. IEEE Xplore. https:// doi.org/10.1109/CCDC52312.2021.9601868 9. Fazil M, Sah AK, Abulaish M (2021) DeepSBD: a deep neural network model with attention mechanism for SocialBot detection. IEEE Trans Inform Forens Secur 16:4211–4223. https:// doi.org/10.1109/tifs.2021.3102498 10. Mokhtari S, Shakibian H (2021) An efficient link prediction method using community structures. IEEE Xplore. https://doi.org/10.1109/IKT54664.2021.9685400 11. Muro C, Li B, He K (2022) Link prediction and unlink prediction on dynamic networks. IEEE Trans Comput Soc Syst 1–12. https://doi.org/10.1109/TCSS.2022.3162229 12. Pan L, Zhou T, Lü L, Hu C-K (2016) Predicting missing links and identifying spurious links via likelihood analysis. Sci Rep 6(1). https://doi.org/10.1038/srep22955 13. Cao Z, Zhang Y, Guan J, Zhou S, Wen G (2021) A chaotic ant colony optimized link prediction algorithm. IEEE Trans Sys Man Cybern: Syst 51(9):5274–5288 14. Mahrous WA, Farouk M, Darwish SM (2021) An enhanced blockchain-based IoT digital forensics architecture using fuzzy hash. IEEE Access 9:151327–151336. https://doi.org/10.1109/acc ess.2021.3126715 15. Nisioti A, Loukas G, Laszka A, Panaousis E (2021) Data-driven decision support for optimizing cyber forensic investigations. IEEE Trans. Inform Forens Secur 16:2397–2412. https://doi.org/ 10.1109/tifs.2021.3054966 16. Lim M, Abdullah A, Jhanjhi N, Khan MK (2019a) Situation-aware deep reinforcement learning link prediction model for evolving criminal networks. IEEE Access 1–1. https://doi.org/10. 1109/access.2019.2961805 17. Lim M, Abdullah A, Jhanjhi NZ, Khurram Khan M, Supramaniam M (2019) Link prediction in time-evolving criminal network with deep reinforcement learning technique. IEEE Access 7:184797–184807. https://doi.org/10.1109/access.2019.2958873 18. Raj JS (2021) Secure data sharing platform for portable social networks with power saving operation. J IoT Soc Mob Anal Cloud 3(3):250–262 19. Yuliansyah H, Othman ZA, Bakar AA (2020) Taxonomy of link prediction for social network analysis: a review. IEEE Access 8:183470–183487 20. Srilatha P, Manjula R (2016) User behavior based link prediction in online social networks. IEEE Xplore. https://doi.org/10.1109/INVENTIVE.2016.7823266 21. Shakibian H, Charkari NM (2018) Statistical similarity measures for link prediction in heterogeneous complex networks. Phys A 501:248–263. https://doi.org/10.1016/j.physa.2018. 02.189 22. Morselli C (2009) Inside criminal networks. Springer, New York. https://sites.google.com/site/ ucinetsoftware/datasets/covertnetworks/caviar
Estimation of Maximum Power Operating Point for Single-Diode Solar Photovoltaic Model Using Python Programming Christina Thottan and Haripriya Kulkarni
Abstract Solar energy has a very positive impact on the sustainable development of the country. Prediction of maximum power point at various weather conditions is required to obtain maximum efficiency from the solar PV. Maximum power tracking requires various inputs to estimate the maximum operating power point of a Solar Photovoltaic system. This paper proposes to use Python programming for estimating the Solar cell current required to find the maximum operating power point of a Solar Photovoltaic cell. Python programming is used due to its merits compared to conventional controller-based maximum operating power point tracking methods. This method is implemented by using numerical methods on a single-diode solar photovoltaic model. The estimated parameters like current and power are compared and verified with the observed datasheet values. Keywords Solar photovoltaic (PV) cell · Maximum power operating point · Python · Single diode solar photovoltaic model
1 Introduction The current scenario of the energy sector shows many nations around the world promoting renewable energy since it provides clean, safe, and efficient source of energy. Solar energy is major contributor in renewable power production due to its abundant availability. It has other favorable benefits like zero CO2 emissions, less maintenance, and low operating costs. But the output of Solar PV extensively depends on weather conditions like cloud cover, the intensity of light, temperature, humidity, etc. It is hence very challenging to get maximum efficiency from Solar PV due to the weather changes [1]. The efficiency of Solar PV to a large extent can be improved C. Thottan (B) · H. Kulkarni Department of Electrical Engineering, Marathwada Mitra Mandal’s College of Engineering, Pune, India e-mail: [email protected] H. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_38
523
524
C. Thottan and H. Kulkarni
by parameters like material used for manufacturing Solar PV, converter technology used, and correctness in tracing maximum power point of the system [2]. Increasing the performance of solar cell by changing material of Solar PV and converter used for the solar system requires exhaustive research and hence is more difficult task. Thus the method of tracing maximum power point on VI characteristics is the most economical and simple method to increase the efficiency of solar PV. To predict the characteristics of Solar PV at different weather conditions precise modeling of Solar PV is required. Equivalent diode models are the most widely used Solar PV models. There are basically two types of diode models: single diode model and double diode model [2–6]. This paper uses the single-diode model of Solar PV rather than doublediode model since it has the advantage of less computational time although both have a similar capacity to draw unknown parameters [7]. There are various methods to track the maximum power point like Hill climbing/P&O, Incremental Conductance, Factional Open Source, Fractional Short Circuit Voltage, Ripple correction control (RCC), Current Sweep, Fuzzy logic Control, and neural network. These methods differ in complexity, rate of convergence, cost, difficulties in implementation and effectiveness [8]. Two of the major concern in tracking maximum power point is its speed and simplicity of the method for further implementation. The method like Fuzzy logic control offers a high-speed tracking system [9, 10]. But the control technique is more complex. Analysis of Solar PV characteristics can be done using MATLAB programming tool [11–13]. But it has a major disadvantage of difficulty in integration with hardware systems. Python programming is used for implementing various techniques related to parameter extraction and maximum power determination of Solar PV in the literature. [14] uses Python for controlling the data acquisition board to determine the VI curve of the Solar cell. The power loss of the PV module is calculated using Python in [15] and it is observed that Python is efficient with respect to time compared to other simulation tools. The determination maximum power point requires voltage and current inputs. This paper proposes a simpler method by only sensing voltage and finding the corresponding current to determine maximum power point of a single diode model. The solar PV cell current is found by a numerical method algorithm based on the Bisection method. The complete implementation of this technique is done using Python programming rather than MATLAB programming.
2 Solar Cell Model 2.1 Single Diode Model of Solar Cell A photovoltaic cell is ideally a PN junction semiconductor. When sunlight falls on it, a current is produced which varies proportionally to irradiance at that instant. Figure 1
Estimation of Maximum Power Operating Point for Single-Diode Solar …
525
Fig. 1 Equivalent circuit of single diode model of solar PV cell
shows the equivalent circuit of a photovoltaic cell with one diode. The series and parallel resistance are responsible for internal dissipations in the cell. Applying KCL in the equivalent circuit I = I p − I D − IRsh
(1)
Substituting the value of I D and IRsh we get, V +IRs V + IRs I = I p − I0 e nVT − 1 − Rsh where I Ip I0 q K T m Rs Rsh VT V
cell current (A), photo generated current (A), diode reverse saturation current (A), electron elementary charge (1.602 × 10–19 coulombs), Boltzmann constant (1.381 × 10–23 J/K), cell temperature (K), diode ideality factor, cell series resistance (), cell shunt resistance (), voltage equivalent of temperature (V) = KT/q, cell output voltage (V).
(2)
526
C. Thottan and H. Kulkarni
2.2 IV Characteristics of Solar Cell A preliminary test was conducted on the solar cell to study the IV characteristics and PV characteristics and its parameters. The voltage and current were recorded by varying the resistive load to plot IV characteristics (shown in Fig. 2) and PV characteristics (shown in Fig. 3). The typical plot of solar cell characteristics can be thus drawn as shown in Fig. 4. The following technical parameters play an important role in the working of a solar cell. These details are provided by the manufacturer of the solar module. • Open circuit voltage (V oc ): It is the voltage in the photovoltaic module when there is no load connected to it. • Short circuit current (I sc ): It is the current in the photovoltaic module when the device is shorted. • Maximum power (Pm ): It is the maximum power that can be produced under certain weather conditions. It is the multiplication of maximum power current (I mp ) and maximum power voltage (V mp ). 200 Current (mA)
Fig. 2 IV characteristics of solar cell
150 100 50 0 0
0.5
1
1.5
2
1.5
2
Voltage (V)
150 Power (mW)
Fig. 3 PV characteristics of solar cell
100 50 0 0
0.5
1 Voltage (V)
Estimation of Maximum Power Operating Point for Single-Diode Solar …
527
Fig. 4 Solar cell characteristics
3 Implementation Using Python The parameter extraction of solar PV can be done using many tools. The work in this paper proposes on using Python programming for parameter extraction of solar PV module due to the below-mentioned advantages. • Python is free and open-source software. It can be downloaded without any cost and the source code can be modified as per the requirement of the end user unlike many proprietary software. • As a researcher using of open-source software in general, brings significant integrity and accountability to the entire work. • Python is an interpreted language and hence can be used with all major operating system platforms and CPU architectures. • Different microcontrollers can be easily used with python due to its simple programming interface.
4 Proposed Model for Estimating Current and Power for Single-Diode Solar Photovoltaic Model 4.1 Maximum Power Determination for Single-Diode Solar Photovoltaic Cell Figure 5 shows a schematic diagram of a Solar PV system connected to a load. The MPPT system determines the maximum power point, but it requires input from sensors to measure voltage and current. The proposed model eliminates the need of
528
C. Thottan and H. Kulkarni
Solar PV
MPPT
Regulator
Current
Battery
Load
Inverter
Estimation
Measured Voltage
Fig. 5 Schematic diagram of the system
Table 1 Technical parameters for TSM-PC05 solar PV module
Parameter Value
Parameter
Value
Ip
8.314 A
Q
Io
1.14 × 10–9 A K
1.381 × 10–23 J/K
m
1.06
T
250C = 298.15 K
Rs
0.290
V T = KT/q 0.0257 V
Rsh
682
V mp
1.602 × 10–19 C
29.3 V
current sensor and only requires voltage input. The current is calculated by solving the single diode current equation (Eq. 2) using the bisection method of the numerical method. The parameters of TSM-PC05 Solar PV Module are used for finding the solution of the single diode current equation. The analysis is done for TSM-PC05 Solar PV Module by substituting the below values mentioned in Table 1.
4.2 Bisection Method This numerical method applies the intermediate value theorem repeatedly. Figure 6 shows a continuous function f (x) between a and b. For definiteness, let f (a) be negative and f (b) be positive. Thus as per the bisection rule the first approximation of the root is given as
Estimation of Maximum Power Operating Point for Single-Diode Solar …
529
Fig. 6 Bisection method
x1 =
1 (a + b) 2
(3)
If f (x 1 ) = 0, then x 1 is a root of f (x). Otherwise, the root lies between x 1 and a or x 1 and b depending on if f (x 1 ) is positive or negative. The next approximation for the root is found from the new points.
4.3 Simulation Results The solution of the PV equation defined (2) after substituting the Table 1 values using the bisection method is given in Table 2. Thus the approximate value of root after 11 iteration is 8.2712. Hence Maximum power current, I mp is 8.2712 A and Maximum power Pm is Pm = Vmp · Imp = 242.346 W A comparison of calculated values using the bisection method with datasheet values is given in Table 3. It can be seen that the calculated value and datasheet value are approximately the same with a very small error of 0.03% after 11 iterations. The convergence rate can be improved by using other numerical methods.
530
C. Thottan and H. Kulkarni
Table 2 Values at different iterations i
a
b
xi
1
8
8.5
8.25
f (x i )
2
8.25
8.5
8.3750
−0.2286
3
8.25
8.3750
8.3125
−0.0411
4
8.25
8.3125
8.2812
−0.0099
5
8.25
8.2812
8.2656
0.0057
6
8.2656
8.2812
8.2734
−0.0021
7
8.2656
8.2734
8.2695
0.0018
8
8.2695
8.2734
8.2715
−0.0001
9
8.2695
8.2715
8.2705
0.0008
10
8.2705
8.2715
8.2710
0.0004
11
8.2710
8.2715
8.2712
0.0001
0.0214
Table 3 Comparison of calculated values with datasheet values of TSM-PC05 Solar PV module at Irradiance 1000 W/m2 and Cell Temperature 25 °C Calculated value I mp (A) Pm (W)
8.2712 242.346
Datasheet value 8.03 235
5 Conclusion This paper proposes of using Python programming for estimating solar cell current for determining the maximum power operating point for a solar module. The method reduces the use of sensors to measure real-time current by calculating it from measured output cell voltage. Thus the proposed method will contribute in reducing the cost and maximum power estimation time of a solar PV module. The calculated values of Imp and Pm were compared with the observed datasheet values and found to be having a very small error of 0.03%. Hence the result can be accepted. Following are the key concluding points from the study, • Python programming offers a simple, effective, and economical method for determining the Maximum operating point of the Solar PV model. • This study is carried out by using the bisection method. The convergence rate can be further improved by using other numerical methods. • Multiple root numerical methods can be used for estimating more PV model parameters. • The approach can be used for other solar PV models.
Estimation of Maximum Power Operating Point for Single-Diode Solar …
531
References 1. Li G, Li M, Taylor R, Hao Y, Besagni G, Markides CN (2022) Solar energy utilisation: current status and roll-out potential. Appl Therm Eng 209:118285. ISSN: 1359-4311 2. Farivar G, Asaei B (2011) A new approach for solar module temperature estimation using the simple diode model. IEEE Trans Energy Conv 26:1118–1126 3. Barth N, Jovanovic R, Ahzi S, Khaleel MA (2016) PV panel single and double diode models: optimization of the parameters and temperature dependence. Solar Energy Mater Solar Cells 148:87–98. ISSN: 0927-0248 4. Jordehi AR (2016) Parameter estimation of solar photovoltaic (PV) cells: a review. Renew Sustain Energy Rev 61:354–371 5. Reis LRD, Camacho JR, Novacki DF (2017) The Newton Raphson method in the extraction of parameters of PV modules. In: International conference on renewable energies and power quality (ICREPQ’17), RE&PQJ, vol 1, no 15 6. Gnetchejo PJ, Essiane SN, Ele P, Wamkeue R, Wapet DM, Ngoffe SP (2019) Important notes on parameter estimation of solar photovoltaic cell. Energy Convers Manag 197:111870 7. Shannan NM, Yahaya NZ, Singh B (2013) Single-diode model and two-diode model of PV modules: a comparison. In: IEEE international conference on control system, computing and engineering, Penang, Malaysia 8. Esram T, Chapman PL (2007) Comparison of photovoltaic array maximum power point tracking techniques. IEEE Trans Energy Convers 22(2) 9. Chandramouli A, Sivachidambaranathan V (2019) Extract maximum power from PV system employing MPPT with FLC controller. J Electr Eng Autom (EEA) 1(4) 10. Raghappriya M, Devadharshini KM, Karrishma S (2022) Fuzzy logic based maximum power point tracking of photovoltaic system. J Innov Image Process 4(1):49–60 11. Siddique HA, Xu P, De Doncker RW (2013) Parameter extraction algorithm for one-diode model of PV panels based on datasheet values. In: 2013 International conference on clean electrical power (ICCEP). IEEE 12. Ran Z, Hui-Jun X, Zhi-Ying Z, Shun-Hua Z (2009) A simplified double-exponential model of photovoltaic module in matlab. In: International conference on energy and environment technology, vol 3, pp 157–160 13. Premkumar M, Sowmya R, Umashankar S, Pradeep J (2020) An effective solar photovoltaic module parameter estimation technique for single-diode model. IOP Conf Ser: Mater Sci Eng 14. Reyes-Ramírez I, Fonseca-Campos J, Mata-Machuca JL (2018) Measurement of the currentvoltage curve of photovoltaic cells based on a DAQ and python. Renew Energy Power Quality J (RE&PQJ) 15. Khari SA, Ismail AL, Lokman HA, Shivan SA (2021) Power loss calculation of photovoltaics using python. Comput Inform 1(2)
Hardware Trojan Modelling on a FPGA Based Deep Neural Network Accelerator Gundlur Bhanu Vamsi Krishna, Karthi Balasubramanian, and B. Yamuna
Abstract Neural networks have started proliferating in various different applications including ones where security can’t be compromised. Training of high performance neural network models involves high hardware requirement and is also very time consuming. This forces users to rely on third party companies for training the neural networks, exposing the trained model to unscrupulous hands and reducing the trustworthiness of the model. It has been reported in literature about mixing of samples of malicious Trojans with training data, the trained network being embedded with hidden functionalities, which can be triggered by specific patterns of the Trojan. Hence it is essential to understand the possibilities of Trojan attacks on local systems. This work is aimed towards proposing a Trojan model for a deep neural network (DNN) targeting FPGA platforms. Insertion of a simple Trojan in the activation module of a neuron resulted in a decrease of 26% in the efficiency of the DNN. This work brings out the need for more efficient defense mechanisms against such Trojans. Keywords Neural network · Hardware trojan · VLSI · Hardware implementation · Pattern recognition
1 Introduction With connected technologies including the Internet of Things (IoT) evolving very rapidly, it becomes imperative to protect communication and end-user systems from threats posed by both software externals and hardware internals. Further, globalized business models of IC manufacturing has all the more exposed ICs to being implanted with malicious circuits during the foundry process. Hardware Trojans are such malicious circuits providing a back door entry to adversaries causing undesired G. B. Vamsi Krishna (B) · K. Balasubramanian · B. Yamuna Department of Electronics and Communication Engineering Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_39
533
534
G. B. Vamsi Krishna et al.
behaviour in the infected devices [1–4]. The adversary may use the Trojans to leak sensitive information or cause major malfunctioning of the device. Several testing techniques have been designed for detecting Trojans, but finding a single solution for all types of hardware Trojans is difficult. Side channel and logic testing are the two standard approaches for Trojan detection where a golden circuit and its output is used to compare the results obtained from the circuit under test [5]. Trojans are typically designed to activate rarely and also have small physical features to ensure that they don’t get detected during logic and channel based techniques. Deep neural networks DNNs are artificial neural networks with multiple layers sandwiched between the input and output layers. The basic constituents of these networks are neurons, weights, biases, synapses and functions [6–8]. Neural network inputs such as as pixel values or sound samples are generally positive numbers. The weights and their bias values may be either positive or negative with a possible fractional component also. For dealing with signed numbers, either signed magnitude or 2s complement representation are used. To handle fractions, either single precision or double precision IEEE floating point representations or fixed point representations may be used. Use of floating point numbers helps in representing very large numbers and may provide better precision, but their implementation and processing is difficult as it requires more hardware resources [9]. Fixed point on the other hand is easy to implement but the range of numbers is restricted. In this representation the total number of available bits need to be shared between the integer and the fractional part, resulting in a tradeoff between accuracy and resource utilization. The use of fixed point representation is recommended for DNNs as input values are generally normalized (between 0 to 1 and −1 to 1), so there doesn’t arise a need for representing large numbers. But a drawback of this representation is slight degradation of accuracy which can be minimized by maintaining the overflow errors. Traditionally, realization of training and inference engines for neural networks have been achieved using general purpose processors. But in recent times, researchers have started exploring ASIC and FPGA implementations of neural networks, due to the high power consumption and resource utilization of the models. The emergence of the Internet of things has made it imperative to build machine learning tools into low-power mobile devices. Towards achieving this goal, computation of training algorithms such as backpropagation [10] can be performed in cloud servers and the weights sent to low power mobile devices with ASIC or FPGA chips for performing the inference operations. Most of the neural networks use non-linear functions such as sigmoid or hyperbolic tangent as activation functions. Generating circuits for these functions is very challenging and consume extra resources, so generally we can store the pre-calculated values in a ROM (LUTs(Look Up Tables)) as we know the range of x which is the input to activation functions. These LUT ROMs are built either by using block RAMs or distributed RAMs(Flip Flops and LUTs). One example of a DNN on a configurable platform is described by Vipin et al. in [11] which forms the basis for this current work.
Hardware Trojan Modelling on a FPGA …
535
Trojans in deep neural networks With the improvement in computational power, neural networks are becoming deeper and larger, thus dramatically increasing the time, data and hardware required for training systems. This has forced massive outsourcing of the training activity to third party vendors, leading to the popular machine learning as a service (MLaaS) business model [12]. As a consequence, underground business activities with malicious intents of introducing Trojans in the trained models have mushroomed all over. Thus, transparency of MLaaS is a major issue that puts a big question on the trustworthiness of the trained models. Previous research works highlighting this severe issue can be found in [13–18]. The different methodologies for inserting neural Trojans include training data poisoning and algorithm modification. Training data poisoning is caused by mixing of malicious training data with regular training data. This makes the network sensitive to triggered Trojans, but maintaining normal behaviour at all other times. Algorithm modification based Trojan injection, on the other hand modifies the behaviour of the algorithm for a subset of neurons. It flips or rewrites chosen bits in the network’s binary code depending on the triggering pattern and the Trojan’s functionality [11]. This work is focused on designing an algorithm modifying Trojan model for a deep neural network designed in [11] and analysing the effectiveness of the designed Trojan. The paper is organized as follows. Section 2 discusses the DNN model implementation while Sect. 3 introduces the proposed Trojan model. Section 4 deals with the results and the paper concludes in Sect. 5.
2 Deep Neural Network Implementation A fully connected DNN with an input layer, multiple hidden layers, and an output layer is implemented in the design. Each neuron is connected to every other neuron in the adjacent layer. The input data is sent to the neurons in the input layer and it is multiplied by the corresponding weight value and the sum of those products is added with the bias value. The resultant product is given as an input to the activation function. The output from the activation function is considered as the output of a single neuron and is given as an input to all the neurons present in the next layer. Different types of activation functions such as sigmoid, Rectified Linear Unit (ReLU), and hardmax may be used. At the output layer, we can observe the data obtained and use it according to the application for which the DNN is designed [19]. Figure 1 shows a DNN with 1 input layer, 3 hidden layers and an output layer. As used in [11], MNIST data set is used for the analysis.
536
G. B. Vamsi Krishna et al.
Fig. 1 A fully connected deep neural network with 3 hidden layers
2.1 Neuron Architecture Figure 2 shows the architecture of a neuron which resembles a typical MAC (multiply accumulate) unit. The input is passed through a delay and is multiplied with the corresponding weight and added with the previous product. This MAC output is added with the bias value stored in bias memory and then given as an input to the activation function, which is implemented as a ROM with stored pre-calculated values for different functions. The output of a neuron is fetched from the memory, based on the MAC output value. The output of the activation function is the required neuron output as shown in Fig. 2. As the neuron module is parametrized, we can use the same module for any number of neuron inputs, which provides an efficient implementation for various DNN applications. Every layer of the network is made up of number of neurons and the requirement for a deep layer is interconnectivity of every neuron from the previous layer. This is not possible directly with the standard neuron model since there is only one data
Hardware Trojan Modelling on a FPGA …
537
Fig. 2 Neuron architecture
interface for each neuron. To overcome this issue, a shift register is used to store data from each layer and shifted to the required layer at every clock edge. Figure 3 shows the layer architecture including the shift registers.
3 Proposed Trojan Model for DNN Trojans are generally designed such that they occupy very low area and dissipate minimum power in order to be surreptitious in nature. In this work, an algorithmmodification Trojan is proposed for the DNN model presented in [11]. A Trojan is inserted in the output of activation function of a neuron in layers 1 and 3. The Trojan is triggered when the output of either the ReLU or the sigmoid function crosses a predefined threshold value. Once triggered, the output defaults to a wrong value, thus corrupting the final decision. Figure 4 shows the hardware implementation of the Trojan, from which it can be seen that the Trojan logic is very simple, thus potentially having the ability to successfully evade detection techniques like side channel testing.
538
G. B. Vamsi Krishna et al.
Fig. 3 Layer architecture
Fig. 4 Schematic of trojan
4 Results and Discussion As given in [11], the DNN was designed in Verilog and simulated using ModelSim and synthesized on Xilinx Vivado. From the MNIST dataset, thirty thousand images of handwritten digits were used for training and ten thousand digits were chosen for the testing purpose. The characteristics of the network is as follows:
Hardware Trojan Modelling on a FPGA …
539
Table 1 Accuracy of implemented design for two different activation functions and its corresponding Trojan analysis Accuracy (%) ReLU 88.7 62.5
Without Trojan With Trojan
Table 2 Resource utilization summary Resources Utilization
LUT FF BRAM DSP IO BUFG
Without Trojan 6287 6653 15 160 105 1
Sigmoid 94.2 68.1
Available
Utilization%
53200 106400 140 220 200 32
Without Trojan 11.82 6.25 10.71 72.73 52.5 3.13
With Trojan 6335 6653 15 160 105 1
With Trojan 11.91 6.25 10.71 72.73 52.5 3.13
• An input layer containing 784 neurons. • One hidden layer with ten neurons and two hidden layers with thirty neurons. • An output layer with ten neurons. The design was simulated with and without the Trojans and the results analyzed. Table 1 shows the effect of Trojans on the accuracy of the model, using ReLU and sigmoid activation functions. It can be seen that the Trojan reduces the testing accuracy by a significant amount (30% for ReLU function and 28% for sigmoid function). To study the feasibility of the Trojan model to evade detection, the area occupied was analyzed using Vivado and the results tabulated in Table 2. It is observed that the addition of Trojan causes a change of only 0.7% in the number of LUTs used, while there are no changes in the number of flipflops, RAM and other FPGA blocks. This shows the effectiveness of the proposed Trojan and is expected to avoid detection.
5 Conclusion This work dealt with the design and analysis of a deep neural network and a Trojan model has been proposed for the same. With an appreciable dip in the accuracy values, the effectiveness of the Trojan in disrupting the regular functionality was shown. To study the surreptitious nature of the Trojan, area analysis was done and it was shown that apart from a 0.7% change in the number of LUTs used, there is
540
G. B. Vamsi Krishna et al.
no other change in the resource utilization. It is envisaged that this preliminary work will pave way for identifying other possible Trojan models for DNNs and design effective security measures.
References 1. Colins D (2007) Trust in integrated circuits (tic). DARPA Solicitation BAA07-24 2. Chakraborty RS, Narasimhan S, Bhunia S (2009) Hardware trojan: threats and emerging solutions. In: 2009 IEEE international high level design validation and test workshop, 2009. HLDVT. IEEE, pp 166–171 3. Kakkara V, Balasubramanian K, Yamuna B, Mishra D, Lingasubramanian K, Murugan S (2020) A Viterbi decoder and its hardware trojan models: an FPGA-based implementation study. PeerJ Comput Sci 6:e250 4. Maruthi V, Balamurugan K, Mohankumar N (2020) Hardware trojan detection using power signal foot prints in frequency domain. In: 2020 international conference on communication and signal processing (ICCSP). IEEE, pp 1212–1216 5. Narasimhan S, Du D, Chakraborty RS, Paul S, Wolff FG, Papachristou CA, Roy K, Bhunia S (2013) Hardware trojan detection by multiple-parameter side-channel analysis. IEEE Trans Comput 62(11):2183–2195 6. Bhardwaj A, Di W, Wei J (2018) Deep learning essentials: your hands-on guide to the fundamentals of deep learning and neural network modeling. Packt Publishing Ltd 7. Yadav P, Menon N, Ravi V, Vishvanathan S, Pham TD (2022) EfficientNet convolutional neural networks-based android malware detection. Comput Secur 115:102622 8. Pothina H, Nagaraja K (2023) Artificial neural network and math behind it. In: Smart trends in computing and communications. Springer, pp 205–221 9. Nikhila S, Yamuna B, Balasubramanian K, Mishra D (2019) FPGA based implementation of a floating point multiplier and its hardware Trojan models. In: 2019 IEEE 16th India council international conference (INDICON). IEEE, pp 1–4 10. Amrutha J, Ajai AR (2018) Performance analysis of backpropagation algorithm of artificial neural networks in Verilog. In: 2018 3rd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT). IEEE, pp 1547–1550 11. Vipin K (2019) Zynet: automating deep neural network implementation on low-cost reconfigurable edge computing platforms. In: 2019 international conference on field-programmable technology (ICFPT). IEEE, pp 323–326 12. Ribeiro M, Grolinger K, Capretz MA (2015) Mlaas: machine learning as a service. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, pp 896–902 13. Barni M, Kallas K, Tondi B (2019) A new backdoor attack in CNNS by training set corruption without label poisoning. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 101–105 14. Clements J, Lao Y (2019) Hardware trojan design on neural networks. In: 2019 IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 1–5 15. Clements J, Lao Y (2018) Backdoor attacks on neural network operations. In: 2018 IEEE global conference on signal and information processing (GlobalSIP). IEEE, pp 1154–1158 16. Geigel A (2013) Neural network Trojan. J Comput Secur 21(2):191–232 17. Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244
Hardware Trojan Modelling on a FPGA …
541
18. Li S, Zhao BZH, Yu J, Xue M, Kaafar D, Zhu H (2019) Invisible backdoor attacks against deep neural networks. arXiv preprint arXiv:1909.02742 19. Haoxiang W, Smys S et al (2021) Overview of configuring adaptive activation functions for deep neural networks-a comparative study. J Ubiquitous Comput Commun Technol (UCCT) 3(01):10–22
Sustainable Farming and Customized Livestock Management Using Internet of Things S. A. Sivakumar, B. Maruthi Shankar, M. Mahaboob, N. Adhish, R. Dineshkumar, and N. Rahul
Abstract A wide variety of livestock are managed on agricultural farms by farmers. As the cattle are not fixed to one location, manual monitoring and inspection of livestock are performed. However, it is a tedious task. Tracking and monitoring the position of any object can be performed on a real-time basis by attaching it to a satellite navigation device. The physical intervention of the farmers is essential to stop the cattle from crossing beyond specific regions. Fencing and visual livestock tracking are challenging and time-consuming tasks. In this paper the movement of animals is monitored by location-aware devices in smart farming and alerts are raised when the animals cross the pasture, farm or geofence boundary using an IoT-based Model. GPRS and IoT technologies are used for creating a geographical safe zone for the cattle by assigning them with dedicated IoT sensors. Without the need for physical intervention, livestock management can be performed by farmers using S. A. Sivakumar (B) Department of Electronics and Communication Engineering, Dr. N.G.P. Institute of Technology, Coimbatore, India e-mail: [email protected] B. Maruthi Shankar Associate Professor, Department of Electronics and Communication Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, India e-mail: [email protected] M. Mahaboob Assistant Professor, Department of Electronics and Communication Engineering, Sri Eshwar College of Engineering, Coimbatore, India e-mail: [email protected] N. Adhish · R. Dineshkumar · N. Rahul UG Scholar, Department of Electronics and Communication Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, India e-mail: [email protected] R. Dineshkumar e-mail: [email protected] N. Rahul e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_40
543
544
S. A. Sivakumar et al.
the proposed system with easy remote monitoring and control of cattle movement. Livestock health, well-being and location-based data are collected by the system in an automated and continuous manner. This system also proves to be a reliable and low-cost solution to this challenge. Keywords Internet of things · GPRS · Livestock management · Sustainable farming · Geofencing
1 Introduction Industrialization and rapid population growth have led to a reduction in farmland on a global level. The significance of agriculture cannot be overstated as food is an essential requirement for all individuals. Climate change and reduction in farmers and agriculturists are some of the other challenges in the agricultural domain. In order to meet the growing food demand of the ever-increasing global population, it is essential to maintain progressiveness and stability in the agricultural industry. Until the eighteenth century, agriculture was the major driving factor of the global economy [1]. With the invention of the steam engine, the industrial revolution began during the 1760s. Due to the socio-economic benefits, during this period, several farmers abandoned their farms and moved to urban regions due to this revolution and large-scale mechanization. Several farmlands were abandoned with the increase in the process of migration that was accelerated in the following two centuries with the second and third industrial revolutions. During the 1950s, the abandonment of farmlands in several regions across the globe was analyzed by the authors in [2]. According to the study, this trend is expected to continue in the future. Also, this phenomenon is more significant in the advanced regions across the world. Sophisticated technologies like quantum computing, robotics, Unmanned Aerial Vehicles (UAVs), the Internet of Things (IoT), Artificial Intelligence (AI), machine learning and so on are at the peak of the fourth industrial revolution today [3]. These technologies have impacted our everyday lives in a significant manner. Along with agriculture, almost every other domain has been influenced by these technologies. Management of water, livestock, crops, and soil resources is performed in an efficient manner with machine learning and artificial intelligence [4]. Crop yield analysis, farmland management, insect detection, plant disease detection and several such agricultural issues are addressed through computer vision techniques. The impact of IoT-based technologies on agriculture are reviewed by several researchers [5]. UAVs and cheap sensors are developed contributing to a significant boost in precision farming. The satellite navigation system is used to determine the position using GPS receiver or satellite navigation devices. BeiDou by China, GLObal NAvigation Satellite System (GLONASS) by Russia, Galileo by Europe and Global Positioning
Sustainable Farming and Customized Livestock Management Using …
545
System (GPS) by the United States are the four satellite navigation systems that are currently active providing global coverage [6]. Tracking and monitoring the position of any object can be performed on a real-time basis by attaching it to a satellite navigation device. A geographic area on the planet is referred to as a closed polygon for establishing a geofence [7]. When an object passes through the geofence-defined area, alerts are triggered by the geofence using a location-sensitive device. The future of mankind is shaped by several exciting technologies like the IoT. Mechanical devices, computers, sensors and other such interconnected devices that are uniquely identifiable operate together in IoT for data collection. This data is stored in the cloud and processed by smart algorithms for specific applications. In almost all domains, several applications make use of IoT in recent days. Some such applications are reviewed in detail in [8]. Another significant aspect of farming is livestock monitoring. Physical fences were constructed and confined farms were used for manual monitoring of cattle traditionally. Automatic tracking and monitoring of cattle is made possible with advanced technologies. The position of cattle can be tracked extensively using GPS and navigation satellites [9]. Hassle-free and cost-efficient real-time cattle monitoring can be performed using UAVs. Low Power Wide Area Network (LPWAN), wireless sensor network (WSN) and radio-frequency identification (RFID) are some of the potential technologies that can be used for monitoring farm animals in a confined region by establishing virtual fences.
2 Literature Review The movement of animals is monitored by location-aware devices in smart farming and alerts are raised when the animals cross the pasture, farm or geofence boundary. The farm-animals well-being and health are also monitored using several IoT sensors additionally. The position of the attached object can be tracked using a satellite navigation system-based device according to [10]. Cellular, wireless or radio frequency network-based wireless transmission medium can be used for transmitting the position of the object. The geographic area can be divided into the required number of grids in the geofencing scheme described in [11]. The location monitoring resources are optimized using a grid structure as the desired goal is achieved through extensive calculations for the complex polygon. Several researchers have published articles regarding remote livestock tracking over the last decade. Various cutting-edge technologies are used for remote livestock monitoring and tracking. Enhanced solutions are developed to address this issue with the evolution of technology. Despite various attempts toward automation of livestock management, there are several issues that still exist related to the optimization of geofencing. A tracking device is fixed on the animal and light electric shocks and auditory feedback are provided in the geofencing application presented in [12]. The animal is effectively guided from one place to another using these feedback mechanisms. The animal can
546
S. A. Sivakumar et al.
be kept within the geofence using this system. Along with location tracking, the wellbeing and health of the animal can be monitored through an implantable tracking device presented in [13]. Owner contact information, past medication, diseases, surgeries, vaccination and other information about the animal can also be stored in this device. Real-time location monitoring also helps in safeguarding the animals against theft [14]. The chances of animal theft are largely minimized by the systems proposed by researchers so far. Cellular network-based communication technology, RFID tags and a centralized livestock database are proposed in a collective system by researchers in [15]. Animal theft is identified by employing several heuristics in the proposed system. If the animal is away from the geofence for gazing, the geographic location of the animal is tracked and it is marked as stolen by the heuristics of the system [16]. Also, if the location of the animal is extremely different from that of the other animals in the herd, the location is tracked and marked as the animal is lost or stolen by the heuristics of the system. Animal identification and tracking is performed using unmanned aerial vehicles or wireless sensor networks in a similar system proposed in [17].
3 Proposed Model Various playing, sleeping and grazing patterns exist for different cattle varieties in a paddock. When compared to buffaloes and cows, the food intake and digestion systems of sheep and goats are more active. Communication and navigation is performed using GPRS and satellites through the GPS and Iota sensors employed in most of the livestock management systems that currently exist. The proposed work is carried out on semi-intensive livestock farming. The communication bandwidth and device energy is consumed by the GPRS and GPS sensors. Other than the animals with genetic diversity in the same herd, all livestock categories are installed with the same set of sensors. The livestock movement is tracked by conventional tracking systems without specific geographical boundaries. However, when animals stray far away from the regions with major access points, tracking them becomes challenging. An enhanced livestock management system is proposed in this paper to address this issue. The livestock is provided with a geographical safe zone defined by the farmers based on their convenience. When the cattle stray beyond the boundary defined by the farmers, they are notified with a message about the location of the cattle. Based on the genetic diversity of various animals, communication and navigation are controlled automatically. The overview of the proposed system is presented as a conceptual framework in Fig. 1. The geographical safe zone of the livestock is represented by the elliptical boundary where ultrasonic sensors are installed to identify the cattle movements. The presence of livestock is discovered and the distance is estimated by the propagation of ultrasonic sound waves. The communication navigation system is turned on when
Sustainable Farming and Customized Livestock Management Using …
547
Fig. 1 Block diagram of the proposed work
the predefined safe distance threshold is crossed by the cattle. Satellite navigation is used for locating the animals through navigation sensors that are attached to all the animals in the herd. The distance of each animal has been estimated relative to the geographical safe zone boundary and the farmer receives an alert when the distance of the animal nears the predefined threshold value. When the animals are away from the safe zone for a specific time period, the exact location of the animal is provided to the farmer on a continuous basis. When the animal is in a static state, communication and navigation is suspended by the motion sensor for energy optimization. This also enables significant utilization of the communication bandwidth. Auditory feedback is not included in the system as it may alert the intruders in case of animal theft. The ultrasonic sensors are installed on the data transmission poles. A camera can be installed to record activity when movement across the safe zone boundary is detected in future. When the cattle cross the common access point, tracing is performed using physical exertion in the conventional systems for tracking livestock. Unlike these approaches, the safe zone is defined by the farmer, enabling prompt, secure and convenient livestock management in the proposed system. Significant efficiency in the utilization of communication channels and power consumption is achieved by turning the system into inactive mode based on the information from the motion sensor, especially in the case of the livestock that does not frequently move. The location threshold may not be exceeded and the livestock will remain within the safe zone to the maximum probability when a substantial safe zone is maintained by the farmers. The power supply is equipped with a solar panel for the sensing devices to enhance the sustainability of the proposed system.
548
S. A. Sivakumar et al. Algorithm: Hybrid LSTM Algorithm for Battery Life Prediction 1. Start 2. Initialization: Initialize ‘X’ animals Initialize ‘H’ herd Initialize ‘S’ safezone 3. Define: H= {H1,H2, H3, H4, H5, …, Hi}, i i x ⎪ ⎩ 0, ix = ix where Ti is the threshold value of the Nth window.
(6)
604
P. Yugander et al.
The average of the weights (ηx) of the image can be determined by using the mean of the window and the final assigned weights of the image. It is represented by Eq. (7). 1 1 + max(εx ) ik ηx = ix + 2 + max(εx ) PM − 1 k∈P
(7)
M
where ix is the xth pixel of the Nth window and ik is the LVC of the ith pixel. To enhance the segmentation speed and accuracy, the Euclidean distance in the object function is replaced with a regularization parameter. It is given in Eq. (8). ε(i x ) − ε(h y )2 = A(i x , i x ) + A(h y , h y ) − 2 A(i x , h y )
(8)
where A is the aforementioned kernel and it is represented by Eq. (9). Kernel width δ can be determined using the area of the Nth window area (c) and the average area of the N windows. It is represented by Eq. (10) i x − h y 2 A(i x , h y ) = exp − 2δ 2 δ=
P x=1
(ci − c)2 P −1
(9)
1/2 (10)
Using Eqs. (9) and (10), the ARKFCM objective function can be minimized and it is represented by Eq. (11).
J A R K FC M
⎡ ⎤ P P d d = 2⎣ gxi y (1 − A(i x − h y )) + εx gxi y (1 − A(i x − h y ))⎦ x=1 y=1
x=1 y=1
(11) For the above objective function, membership function (gxy ) and clustering center (hy ) are mathematically represented using Eqs. (12) and (13). ((1 − A(i x , h y )) + εx (1 − A(i x , h y )))−1/(m−1) g x y = d −1/(m−1) z=1 (1 − A(i x , h y ) + εx (1 − A(i x , h y ))) P i x=1 gx y (A(i x , h y )) + εx A(i x , h y )i x h y = P i x=1 gx y (A(i x , h y )+εx A(i x , h y )) where m is the fuzziness degree.
(12)
(13)
Noisy Brain MR Image Segmentation Using Modified Adaptively …
605
The required components from the MR image can be extracted using Eq. (11). If J A R K FC M > εx , then the pixel can be treated as a WM pixel and the remaining pixels can be treated as GM pixels. The main steps involved in ARKFM are as follows: Step 1: Initialize threshold parameter Ti . Step 2: Determine average weights ïx . Step 3: Calculate the core width of the pixel δ. Step 4: Determine membership function gxy and cluster center hy . Step 5: If λx > εx or ïx < δ, then stop otherwise update JARKFCM.
3 Adaptive Weighted Mean Filter The MR images are usually affected by SPN due to various reasons, like faulty memory locations, malfunctioning camera sensors, and resistance of the coil [14]. Doctors are not able to identify diseases from noisy images. Image denoising is the primary and essential step in MR image processing. Numerous algorithms are proposed to denoise the images, but they fail to process images that are affected by high-level SPN. This paper proposes an AWMF algorithm to denoise the high-level noise-affected images [15]. The basic principle of the AWMF algorithm is to reduce the detection errors and replace the noisy pixels with better values than the median. For this purpose, we introduced a weighted window W, which is represented by Eq. (14). ⎧ bs,t ∗ f s,t ⎪ ⎪ ⎪ (w) ⎪ ⎨ (s,t)∈Q x,y , bs,t Q mean x,y (w) = ⎪ (s,t)∈Q x,y (w) ⎪ ⎪ ⎪ ⎩ −1
bs,t = 0
(s,t)∈Q x,y (w)
(14)
other wise
where (s,t) are the pixel coordinates, Qx,y (w) is the adaptive window, bs,t is the weighted window, and fs,t is the sub image. Weight b(s,t) is given by Eq. (15) b(s,t) =
1, 0.
max Q min x,y (w) < f s,t < Q x,y (w)
other wise
(15)
max where Q min x,y (w) is the minimum pixel value in the window and Q x,y (w) is the maximum pixel value in the window. In this method, the adaptive window is used for perfume filtering. This adaptive window is initialized with size 3 × 3. After performing the filter operation, it finds the minimum and maximum values and then the window size is increased. This process continues until minimum and maximum values are equal in two successive windows.
606
P. Yugander et al.
If the above condition is satisfied, then the center pixel is considered a noise-free pixel. If it fails to satisfy the above condition, then the center pixel is considered as noise pixel and it is replaced with the weighted mean of the window. The basic principle of AWMF is the adaptive window size. In the normal spatial domain filters, the window size is fixed, but in the proposed AWMF method, the window size is variable. The size of the window is varied according to the minimum and maximum pixels in the image. The high-level SPN is completely eliminated by employing the adaptive window size concept. The proposed algorithm is summarized as follows: Step 1: Initialize w = 1, h = 1, wmax = 40. max mean min Step 2: Calculate Q min x,y (w), Q x,y (w), Q x,y (w), Q x,y (w + h). max max max mean Step 3: If Q min x,y (w), = Q x,y (w + h), Q x,y (w) = Q x,y (w + h) and Q x,y (w) = 1. max Step 4: If Q min x,y (w) < f x,y < Q x,y (w), f (x,y) is noise-free otherwise.
Step 5: If w ≤ wmax go to step 2.
4 Results and Discussion Extraction of GM, WM, and CSF from noisy MR images is a challenging task in the biomedical image processing. In this paper, we have proposed a modified ARKFCM algorithm for segmenting noisy MR images. The proposed modified ARKFCM algorithm was tested using an Intel core i9 processor, 8 GB RAM, and Windows 10 operating system. The programs can be run on MATLAB 2022a. The Brain Web database is used to test the proposed algorithm. The modified ARKFCM method has been applied based on the statistical feature (SF)-based values and gray values (GV). In the SF method, the similarity and diversity of sample sets were used to extract the required components. Initially, the MR image is divided into 64 × 64 sub images. Using Eq. 2, these are converted into similarity and diversity sets. The type of component will be decided by the εx value. In the SF-based extraction, the following metrics were considered: Jaccard similarity (JS) and the Dice coefficient (DC). In GV-based extraction segmentation accuracy, over segmentation, under segmentation, and incorrect segmentation were considered as metrics. Among the above-mentioned parameters, JS and DC produced good results. The average accuracy using JS and DC is shown in Table 1. The proposed algorithm was tested using the following metrics which are depicted in Eqs. (16)–(23). Mathematically, JS can be represented as follows: J S(J1 , J2 ) =
|J1 ∩ J2 | |J1 ∪ J2 |
(16)
Noisy Brain MR Image Segmentation Using Modified Adaptively …
607
Table 1 The average quantitative accuracy (JS and DC) of the noisy MR images segmentations using the proposed method Percentage of Noise (%)
Percentage of accuracy (%) White matter
Gray matter
Cerebrospinal fluid
JS
DC
JS
DC
JS
DC
10
93.46
89.12
87.56
79.92
87.64
71.23
20
91.23
89.86
89.99
79.83
85.45
74.53
30
81.21
80.14
86.75
72.34
81.21
70.01
40
91.23
73.25
89.74
76.77
80.01
68.98
50
89.23
79.89
76.57
78.32
76.54
69.98
60
83.45
79.45
65.48
71.23
73.45
70
89.89
69.87
70.12
67.23
78.99
65.43
80
71.23
70.12
73.45
69.98
69.87
67.89
90
78.93
65.43
78.93
67.89
69.87
61.23
100
83.47
67.56
72.31
62.34
65.51
60.03
6.567
Mathematically, DC can be represented as DC(J1 , J2 ) =
2|J1 + J2 | |J1 | + |J2 |
(17)
Where J1 and J2 are two sets. Segmentation accuracy =
correctly identfied pixels × 100 Total number of pixels
(18)
Under segmentation =
Mfp Mn
(19)
Over segmentation =
Mfn Mp
(20)
Incorrect segmentation =
Mfp + Mfn M
(21)
The average segmentation accuracy of the noisy MR image segmentation is shown in Table 2. Where Mfp is the number of pixels segmented but not contained to the same cluster, Mfn is the number of pixels not segmented properly but contained to the same cluster, Mn is the number of pixels from the same cluster, Mp is the number of pixels which are not contained to a cluster, and M is the total pixels in the image. Mean square error (MSE):
608
P. Yugander et al.
Table 2 The average segmentation accuracy of the noisy MR images segmentation using the proposed method
Percentage of noise (%)
Percentage of accuracy (%) Under segmentation
Over segmentation
Incorrect segmentation
10
65.34
96.93
42.45
20
66.23
96.21
39.34
30
61.56
94.32
39.21
40
58.43
92.21
38.12
50
54.32
90.01
32.21
60
53.22
87.27
30.01
70
43.23
83.22
29.81
80
34.23
82.43
19.82
90
31.11
81.23
12.21
100
26.32
80.21
10.82
MSE is calculated by considering the average of the squared gray value of the input image and the output image pixels. MSE =
y z ∼ 1 {[T (r, s) − T (r, s)]2 M X N r =1 s=1
(22)
∼
where T (r, s) is the output image and T (r, s) is the input image. Peak signal-to-noise ratio (PSNR): It is used to calculate the quality of the MR image after the reconstruction. P S N R = 10 log10
N2 MSE
(23)
where N is the number of gray levels. The average PSNR of the noisy MR images segmentation is shown in Table 3. Table 3 The average PSNR of the noisy MR images segmentation using the proposed method Percentage of noise (%)
10
20
30
40
50
PSNR (dB)
36.21
34.51
33.01
32.21
30.02
Percentage of Noise (%)
60
70
80
90
100
PSNR (dB)
29.32
28.34
26.03
21.22
20.01
Noisy Brain MR Image Segmentation Using Modified Adaptively …
609
A total of 50 structural MR images from the brain web database were used to test the proposed algorithm. Among fifty images, one sample segmented image is shown in Fig. 1. Input MR image with 60% SPN is shown in Fig. 1a. Initially, the ARKFCM algorithm was applied to noisy MR images. The extracted WM, GM, and CSF are shown in Fig. 1b, c, and d, respectively. But it is clearly observed that the ARKFCM algorithm fails to extract WM, GM, and CSF from the noisy MR images. Our proposed AWMF algorithm applied to input noisy image and denoised image is shown in Fig. 1e. The denoised image is applied to ARKFCM to extract the required components. By using the proposed algorithm, WM, GM, and CSF were perfectly extracted. The extracted components using the proposed method were shown in Fig. 1f–h.
5 Conclusion This paper proposes a novel algorithm for noisy MR image segmentation. Numerous algorithms were proposed for noisy MR image segmentation. But they work in the presence of low-level SPN. In this paper, we have proposed a modified ARKFCM which extracts WM, GM, and CSF under high-level noise conditions. Our proposed algorithm was tested using the Brain Web database. The proposed algorithm is successfully applied to structural MR images and provided good performance in terms of JS and DC metrics. The obtained average accuracy based on Jaccard similarity was 85.33%, 79.09%, and 76.85% for GM, WM, and CSF, respectively. Also, the obtained average accuracy based on Dice similarity was 76.47%, 72.59%, and 67.5% for GM, WM, and CSF, respectively. This algorithm can be extended to functional MR images, 3D MR images, PET images, and CT scan images. It can be
Fig. 1 a Noisy MR image, b CSF region, c GM region, d WM Region, e Denoised MR image, f CSF region after AWMF, g GM region after AWMF, and h WM Region after AWMF
610
P. Yugander et al.
Fig. 1 (continued)
applied to identify abnormalities of the MR images to identify different types of neurodevelopmental disorders.
References 1. Malathi M, Sinthia P (2018) MRI brain tumour segmentation using hybrid clustering and classification by back propagation algorithm. Asian Pac J Cancer Prev 19(11):3257–3263 2. Sharma N, Aggarwal LM (2010) Automated medical image segmentation techniques. J Med Phys 35(1):3–14 3. Zhuang Y, Liu H, Song E, Ma G, Xu X, Hung CC (2022) APRNet: a 3D anisotropic pyramidal reversible network with multi-modal cross-dimension attention for brain tissue segmentation in MR images. IEEE J Biomed Health Inform 26(2):749–761 4. Hosseini M, Nazem-Zadeh MR, Mahmoudi F, Ying H, Soltanian-Zadeh H (2014) Support vector machine with nonlinear-kernel optimization for lateralization of epileptogenic
Noisy Brain MR Image Segmentation Using Modified Adaptively …
5.
6. 7. 8. 9.
10.
11.
12.
13.
14.
15.
611
hippocampus in MR images. In: 36th annual international conference of the IEEE engineering in medicine and biology society, pp 1047–1050, IEEE Press, Chicago, USA Clcek C, Akan A (2018) Gray and white matter segmentation method in MRI images for ADHD detection. In: 2018 electric electronics, computer science, biomedical engineerings’ meeting, pp 1–4. IEEE Press, pp 1–4, Istanbul, Turkey (2018) Yugander P, Jagannath M (2021) Structural neuroimaging findings in autism spectrum disorder: a systematic review. Res J Pharm Technol 14(4):2341–2347 Karuppusamy P (2020) Hybrid manta ray foraging optimization for novel brain tumor detection. J Soft Comput Paradig 2(3):175–185 Vivekanandam B (2021) Automated multimodal fusion technique for the classification of human brain on Alzheimer’s disorder. J Electr Eng Autom 3(3):214–229 Feng Y, Chen W (2004) Brain MR image segmentation using fuzzy clustering with spatial constraints based on Markov random field theory. In: Yang GZ, Jiang TZ (eds) Medical imaging and augmented reality. MIAR 2004. In: Computer science, vol 3150, pp 188–195. Springer, Berlin (2004) Cherfa I, Mokraoui A, Mekhmoukh A, Mokrani K (2020) Adaptively regularized kernelbased fuzzy C-means clustering algorithm using particle swarm optimization for medical image segmentation. In: 2020 signal processing: algorithms, architectures, arrangements, and applications, pp 24–29, IEEE Press, Poznan, Poland (2020) Wan C, Ye M, Yao C, Wu C (2017) Brain MR image segmentation based on Gaussian filtering and improved FCM clustering algorithm. In: 10th international congress on image and signal processing, biomedical engineering and informatics. IEEE Press, Shanghai, China, pp 1–5 Abdel-Dayem AR, Hamou AK, El-Sakka MR (2004) Novel adaptive filtering for salt-andpepper noise removal from binary document images. In: Campilho A, Kamel M (eds) Image analysis and recognition. ICIAR 2004. LCNS, vol 3212. Springer, Heidelberg, pp 191–199 Elazab A, Wang C, Jia F, Wu J, Li G, Hu Q (2015) Segmentation of brain tissues from magnetic resonance images using adaptively regularized kernel-based fuzzy -means clustering. Comput Math Methods Med 19(2):1–12 Jeon SW, Kwack KS, Yun JS, Gho SM, Park S (2020) Salt-and-pepper noise sign on fatfraction maps by chemical-shift-encoded MRI: a useful sign to differentiate bone islands from osteoblastic metastases-a preliminary study. Am Roentgen Ray Soc 214(5):1139–1145 Mohan J, Guo Y, Krishnaveni V, Jeganathan K (2012) MRI denoising based on neutrosphic wiener filtering. In|: IEEE international conference on imaging systems and techniques proceedings, pp 327–331. IEEE Press, Manchester, UK (2012)
Deep Learning with Metadata Augmentation for Classification of Diabetic Retinopathy Level Maksym Shulha , Yuri Gordienko , and Sergii Stirenko
Abstract Diabetic retinopathy (DR) is one of the most important and embarrassing problems in the medical, psychological, and social aspects of the working-age population in the world. The DR severity classification problem for single modality (with image input) model and multi modality (with image and text inputs) model is considered on the basis of RetinaMNIST dataset. The influence of additional data like subjective “patient” opinion or “expert” opinions about patient health state (that provide “data leakage” on some classes) can be helpful in some practical situations. These opinions were simulated by additional (augmented) metadata from simulated questionnaires. As a result the following variants of input values and the correspondent models were prepared: single modality model (SM) with input images only, and multi modality models with input images and patient opinion text like Multi modality model with Patient opinion (MP), Multi modality model with Expert opinion (ME), and Multi modality model with Patient and Expert opinions (MPE). All these multi modality models (MP, ME, MPE) allowed us to reach the various statistically significant improvements of classification performance by AUC value for all classes in the range from 4% to 27% that are rather beyond the limits of the standard deviation of 2–3% measured by cross-validation and can be estimated as significant ones. In general, this approach based on metadata augmentation, namely, usage of the additional modalities with “data leakage” on the extreme classes, for example, with the lowest (Class 0) and highest (Class 4) DR severity, and their combinations could be useful strategy for the better classification of some hardly classified DR severities like Classes 1–3 here and in the more general context. Keywords Multi-class classification · Neural networks · Deep learning · Metadata augmentation · Multimodal model · Retina · Diabetic retinopathy Supported by “Knowledge At the Tip of Your fingers: Clinical Knowledge for Humanity” (KATY) project funded from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 101017453. M. Shulha (B) · Y. Gordienko · S. Stirenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_46
613
614
M. Shulha et al.
1 Introduction Diabetic retinopathy (DR) is the frequent consequence in diabetes and cause of visual impairment that affects adult population at the world scale [7, 18, 25]. Periodic eye screening is necessary for early diagnosis and timely treatment of DR to prevent blindness. This can be accomplished by development and implementation of effective computer-aided screening programs. The medical community is actively using the potential of artificial intelligence (AI) methods in medicine [4, 6, 10]. During recent years deep neural networks (DNNs) demonstrated their performance and applicability for Computer-Aided Detection (CADe) and Computer-Aided Diagnosis (CADx). AI-related methods can automate and significantly accelerate screening programs by automatically processing medical data without the involvement of medical personnel on the screening stage and making it available on a regular basis worldwide. In this work, some approaches using the methods of computer vision and deep neural networks (DNNs) are proposed to provide modern advanced AI-based medical services targeted for DR classification. In this paper we are interested in determining the severity level of DR using dual view retinal fundus images by deep learning (DL) approaches to help to identify possible negative variants of the disease. Section 2 contains description of the state of the art, Sect. 3 describes dataset, models, experiments, and the whole workflow, Sect. 4 gives the results obtained during the experiments, and Sect. 5 contains discussions of results and resumes them.
2 Background and Related Works According to the World Health Organization [28], more than 11 million suffer from vision impairment caused by glaucoma, trachoma, and DR where DR causes 2.6% cases of blindness at the world scale [4, 28]. Several DL methods based on DNNs recently were applied to medical imaging as a whole [1, 6] and for diabetic retinopathy in part [11, 15, 23]. Several recent reviews contain the thorough studies of such attempts for diabetic retinopathy classification [2–4]. The following promising approaches should be emphasized. For example, the data-driven DL algorithm was developed as a novel diagnostic tool for automated DR detection that processed color fundus images and classified them as healthy (no DR) or having DR, identifying relevant cases for medical referral [11]. Several computer vision and DL models were used and compared for quantifying the features as blood vessels, fluid drip, exudates, hemorrhages, and micro aneurysms into different classes [9]. The multistage approach to transfer learning was proposed that can be used as a screening method for early detection of DR with sensitivity and specificity of 0.99 [30]. Two versions of a DL system were created to predict the development of DR in patients with diabetes where the input for the two versions was either a set of three-field or one-field colour fundus photographs [5].
Deep Learning with Metadata Augmentation for Classification …
615
Recently, a large retinal image dataset, DeepDR (Deep Diabetic Retinopathy), was proposed for the scientific community in the framework of IEEE DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD) competition [17] with 5 classes: (a) no apparent proliferative—Class 0, (b) mild proliferative—Class 1, (c) moderate proliferative—Class 2, (d) severe proliferative—Class 3, and (e) proliferative diabetic retinopathy—Class 4 (Fig. 1). It was targeted to further promote the early diagnosis precision and robustness in practice on the basis of the dual-view fundus images from the same eyes, e.g. the optic disc as the center and the fovea as the center, to classify and grade DR lesions. The expected results should outperform the state-of-the-art models built with the single-view fundus images. Also, various image quality of fundus images were included in DeepDR dataset to reflect the real scenario in practice. This initiative was supported by many other similar competition-like activities to foster the creativity and popularity of DL-related approaches in the healthcare and for DR classification in part. Recently, many other professional and educational datasets with medical data appeared that were actively used to develop and test the new DL methods. For example, MedMNIST v2, a large-scale MNIST-like dataset collection of standardized biomedical images, including 12 datasets for two-dimensional (2D) images and 6 datasets for three-dimensional (3D) images, was recently proposed for research and educational purposes (Fig. 2) [31, 32]. The images have the corresponding classification labels so that no background knowledge is required for users. MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various dataset scales (from 102 to 105 ) and different tasks (binary/multi-class, ordinal regression, and multi-label). The resulting dataset, consisting of 7 ∗ 105 2D images and 104 3D images in total, could support numerous research and educational activities in biomedical image analysis, computer vision, and machine learning (ML). Several baseline methods were benchmarked on MedMNIST v2, including 2D and 3D DNNs, and also open-source and commercial AutoML tools.
3 Methodology 3.1 Dataset Actually, for the purpose of this work RetinaMNIST dataset (as a part of MedMNIST) is based on the DeepDRiD challenge, which provides a dataset of 1600 retina fundus images [17]. The images are labeled by 5-level grading of DR severity where the medical details of their diagnostics are given elsewhere [17]. The source training set was splitted with a ratio of 9 : 1 into training and validation set, and the source validation set was used as a test set. This split ratio was used to compare the current results with the previous results [31, 32].
616
M. Shulha et al.
a)
b)
c)
d)
e) Fig. 1 Examples of images for the multi-class classification problem IEEE DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD) competition [17] with 5 classes: a no apparent proliferative - Class 0, b mild proliferative - Class 1, c moderate proliferative - Class 2, d severe proliferative— Class 3, and e proliferative diabetic retinopathy—Class 4. The image sizes are 1736 × 1824 pixels
Deep Learning with Metadata Augmentation for Classification …
617
Fig. 2 Examples of images for the multi-class (5 classes) classification problem in MedMNIST dataset with image sizes of 28 × 28
The dataset consists of 1600 pictures of retina fundus, which is splitted into 5 types with regard to the observed level of DR [17]: no apparent DR—Class 0 (Fig. 1a), mild non-proliferative DR—Class 1 (Fig. 1b), moderate non-proliferative DR—Class 2 (Fig. 1c), severe non-proliferative DR—Class 3 (Fig. 1d), and proliferative DR Class 4 (Fig. 1e). The source images with the large size of 3 × 1736 × 1824 were center-cropped and resized into 3 × 28 × 28 (Fig. 2). All these images were pre-processed from a small size of 3 × 28 × 28 to a little bit larger size of 3 × 32 × 32, because of requirements of the standard NN architectures used (see below their description). Finally, RetinaMNIST dataset (as a part of MedMNIST) contains train (1080 images), validation (120 images), and test (400 images) parts. The exploratory data analysis (EDA) was performed to understand the data representation by classes. Below distribution of images by classes in counts (Fig. 3a) and proportions (Fig. 3b) is shown, where the pronounced bias is evident. In the future to avoid such unbalanced configuration the class weights should be applied that are adequate to the actual representation of images by the classes. It should be noted that the same distribution of images by classes was preserved for train, validation, and test subsets of the dataset (Fig. 3b).
618
M. Shulha et al.
a)
b) Fig. 3 Distribution of images by classes in: a counts, b proportions
To visualize similarity and dissimilarity of retina images for different classes the well-known t-distributed stochastic neighbor embedding (t-SNE) technique was applied [22, 26]. It allowed us to embed high-dimensional retina image data into a 3D space, which can then be visualized in a scatter plot (Fig. 4). Actually, this t-SNE map of retina images in 3D space allows us visually understand the very close similarity of objects from different classes that hardly can be distinguished by simple approaches. It should be emphasized that despite the quite low visual quality of simplified RetinaMNIST dataset (32 × 32) (Fig. 2) in comparison to the original DeepDRiD dataset (1736 × 1824) (Fig. 1) some classes like Class 0 (Fig. 4, red color) and Class 4 (Fig. 4, dark blue color) have visually distinguished location regions. At the same time, other classes (like Class 1, Class 2, Class 3) are hardly distinguishable
Deep Learning with Metadata Augmentation for Classification …
619
a)
b)
c)
d)
Fig. 4 Visualisation of t-SNE model for the retina images by a three-dimensional (3D) point, where similar retina images are modeled by nearby points and dissimilar retina images are mapped to distant points: a XY-projection, b XZ-projection, c YZ-projection, d 3D representation. The colors correspond to 5 classes of retina types (see description in the text)
and the more sophisticated DL-based approaches should be used to resolve the DR severity classification task. The influence of additional data like subjective patient opinion or expert opinions about patient health state can be helpful in some practical situations. These opinions were simulated by additional (augmented) metadata from simulated questionnaires to get: • “patient” opinion as an answer to the question “do you feel healthy?” with 2 possible answers: “healthy” (for Class 0) or “ill” (for classes from 1 to 4), where any patient discomfort is roughly classified as an “ill” state, • “expert” opinion as an answer to the question “do you see a severe illness state?” with 2 possible answers: “ill” (for classes from 0 to 3) or “severe” (for Class 4), where the any non-severe proliferative state is roughly classified as an “ill” state.
620
M. Shulha et al.
Table 1 Input values for different models in Fig. 5 Model Inputs Class 0 Class 1 Single MP (patient) ME (expert) MPE (patient + expert)
input_image input_image + input_text input_image + input_text input_image + input_text
Fig. 1a Fig. 1a + healthy Fig. 1a + ill Fig. 1a + healthy + ill
Fig. 1b Fig. 1b + ill Fig. 1b + ill Fig. 1b + ill + ill
Class 2
Class 3
Class 4
Fig. 1c Fig. 1c + ill Fig. 1c + ill Fig. 1c + ill + ill
Fig. 1d Fig. 1d + ill Fig. 1d + ill Fig. 1d + ill + ill
Fig. 1e Fig. 1e + ill Fig. 1e + severe Fig. 1e + ill + severe
In general, the patient and expert opinions can be much more complicated and contain the similar words even with different semantics. In this work they were extremely simplified to understand the influence of additional modality on metrics of DL procedure. In the future the more complicated text opinions can be used to study such influence from the more realistic points of view. As a result the following variants of input values (column “Model”) were prepared and shown in Table 1: • Single modality model (SM) - input consists of images only (input_image), • Multi model with patient opinion (MP) - input consists of images and patient opinion text, namely, input_image + input_text like “healthy” or “ill”, line “MP”, • Multi modality model with Expert opinion (ME) - input consists of images and expert opinion text, namely, input_image + input_text like “ill” or “severe”, line “ME”, • Multi modality model with Patient and Expert opinions (MPE) - input consists of images, and patient and expert opinion text, namely, input_image + input_text like (“healthy” or “ill”) and (“ill” or “severe”), line “MPE”.
3.2 Models The creators of MedMNIST benchmarked several standard deep learning methods and AutoML tools, including DNNs with ResNet architecture [16] with earlystopping strategies on validation set, open-source, and commercial AutoML tools. Here the several DNN architectures on the basis of ResNet50 (to be compared with the previous results [31, 32]) were used to work with input data of various modalities (Table 1) on the basis of the standard convolutional neural network (CNN) to process image inputs and recurrent neural network (RNN) to process text inputs: • single modality: CNN with ResNet50 architecture (Fig. 5a), • multi modalities: CNN with ResNet50 architecture and RNN with LSTM architecture (Fig. 5b).
Deep Learning with Metadata Augmentation for Classification …
input_image
input:
InputLayer
output:
input_image
input:
InputLayer
output:
resnet50
input:
Functional
output:
flatten
input:
Flatten
output:
dense
input:
Dense
output:
resnet50
input: output:
flatten_1
input:
Flatten
output:
(None, 2048)
(None, 2048)
(None, 512)
input: output:
(None, 512)
(None, 2048)
(None, 2048)
concatenate
input:
Concatenate
output:
(None, 5)
a)
[(None, 32, 32, 3)]
(None, 2048)
(None, 2048)
(None, 2048)
Dense
(None, 32, 32, 3)
[(None, 32, 32, 3)]
(None, 32, 32, 3)
dense_1
[(None, 32, 32, 3)]
Functional
[(None, 32, 32, 3)]
621
input_text
input:
InputLayer
output:
embedding
input:
Embedding
output:
lstm
input:
LSTM
output:
[(None, 2048), (None, 8)]
dense_2
input:
Dense
output:
dense_3
input:
Dense
output:
(None, 2056)
(None, 512)
[(None, 1)]
(None, 1)
(None, 1, 1)
[(None, 1)]
(None, 1, 1)
(None, 8)
(None, 2056)
(None, 512)
(None, 5)
b)
Fig. 5 Visualisation of the models used: a single modality by CNN with ResNet50 architecture, b multi modality by CNN with ResNet50 architecture and RNN with LSTM architecture
To compare the multimodal approaches proposed here with the benchmarked results on the multi-class classification problem [31, 32] the popular standard DNN model with ResNet architecture was used in the work, namely, ResNet50. The standard architecture of DNNs used was partially changed to accommodate the number of classes used, namely: the last classification layer was taken away, the dense layer with 512 nodes was added, the classification dense layer with 3 nodes was added, which determines the final image category. The DNN models were trained with initial random weights for 100 epochs, because as it is turned out, that was enough for sat-
622
M. Shulha et al.
uration of validation metric growth in many cases. The optimization was performed by Adam algorithm with the learning rate equal to 0.001, the exponential decay rate for the 1st moment—0.9, and the exponential decay rate for the 2nd moment—0.99, and the epsilon constant—10−7 [19]. The batch size for the initial runs was selected to 128 to be compared with the previous results [31, 32]. Accuracy, AUC and Loss were chosen as the main metrics.
3.3 Workflow The whole workflow was implemented as the cross-validation [24] regime and the further out-of-fold regime, where • cross-validation regime (CV) was performed on the basis of the training part (1080 images) of the original RetinaMNIST dataset which was divided in six folds, where the best model after training on CV-train part (900 images) was determined by AUC value after validation on CV-validation part (180 images) for each fold, then metrics were measured for each fold on the test part the original RetinaMNIST dataset (400 images), and their mean and standard deviations were calculated to understand the statistical reliability of the data obtained, • out-of-cross-validation regime (OoCS) was implemented by training on the whole train part (1080 images) of the original RetinaMNIST dataset, by selecting the best model by AUC value after validating on the validation part (120 images) of the original RetinaMNIST dataset, and the final testing on the test part (400 images) of the original RetinaMNIST dataset.
4 Results 4.1 Single Modality Model Cross-Validation Study. SM model, actually CNN with ResNet50 architecture (Fig. 5a), was trained for 100 epochs in sixfold CV regime. The histories of some training metrics are shown in Fig. 6, where the horizontal axis contain the number of epochs, and the vertical axis - accuracy (Fig. 6a) and loss (Fig. 6b) values. The histories demonstrate the very quick learning with the lower standard deviations of loss on the initial learning stage (up to 18 epochs). Then the standard deviations of loss become much higher on the later learning stage with loss growth that can be considered as over-fitting (after 50 epochs). The confusion matrix was calculated for mean and standard deviation of absolute (Fig. 7a) and normalized (Fig. 7b) numbers of retina images obtained after sixfold CV regime.
Deep Learning with Metadata Augmentation for Classification …
623
Fig. 6 History of accuracy (a) and loss (b) after sixfold cross-validation for single modality model
Also the receiver operator curves (ROC) were constructed and area under curve (AUC) values were calculated with mean and standard deviation of AUC as shown in Fig. 8a. Out-of-Cross-Validation Study. SM model (CNN with ResNet50 architecture shown in Fig. 5a) was trained for 100 epochs in OoCV. Also the receiver operator curves (ROC) were constructed and area under curve (AUC) values were calculated with mean and standard deviation of AUC shown in Fig. 8b.
624
M. Shulha et al.
a)
b) Fig. 7 Confusion matrix for absolute (a) and normalized (b) values after sixfold CV for SM model
4.2 Multi Modality with Patient Opinion The Multi modality with Patient opinion (MP) model, actually CNN with ResNet50 and LSTM architectures shown in Fig. 5b, was trained for 100 epochs in sixfold CV and OoCV procedures. The ROC curves were constructed, and AUC values were calculated with mean and standard deviation of AUC as shown in Fig. 9.
Deep Learning with Metadata Augmentation for Classification …
a)
625
b)
Fig. 8 ROC curves with mean and standard deviation values of AUC for SM model: a after sixfold CV, and b after OoCV
a)
b)
Fig. 9 ROC curves with mean and standard deviation values of AUC for MP model: a after sixfold CV, and b after OoCV
4.3 Multi Modality with Expert Opinion The Multi modality with Expert opinion (ME) model, actually CNN with ResNet50 and LSTM architectures shown in Fig. 5b, was trained for 100 epochs in sixfold CV and OoCV procedures. The ROC curves were constructed, and AUC values were calculated with mean and standard deviation of AUC as shown in Fig. 10.
4.4 Multi Modality with Patient and Expert Opinions The Multi modality with Patient and Expert opinions (MPE) model, actually CNN with ResNet50 and LSTM architectures shown in Fig. 5b, was trained for 100 epochs
626
M. Shulha et al.
a)
b)
Fig. 10 ROC curves with mean and standard deviation values of AUC for ME model: a after sixfold CV, and b after OoCV
a)
b)
Fig. 11 ROC curves with mean and standard deviation values of AUC for MPE model: a after sixfold CV, and b after OoCV
in sixfold CV and OoCV procedures. The ROC curves were constructed, and AUC values were calculated with mean and standard deviation of AUC as shown in Fig. 11.
5 Discussion and Conclusions As a result of work, it was found that SM models demonstrate the lowest values (emphasized by the italic font in Tables 2 and 3) of AUC metric both for CV (Table 2) and OoCV (Table 3) regimes. All multi-modal models allowed us to get the higher AUC values for each class. The highest AUC values were emphasized by the bold font in Tables 2 and 3. Empty cells in Tables 2 and 3 mean that additional text input
Deep Learning with Metadata Augmentation for Classification … Table 2 AUC mean and standard deviation values after sixfold CV Model Class 0 Class 1 Class 2 Class 3 Class 4 Single MP ME MPE
0.806 ±0.02 – 0.818 ±0.01 –
0.597 ±0.03 0.802 ±0.03 0.599 ±0.04 0.797 ±0.04
0.667 ±0.02 0.788 ±0.02 0.700 ±0.02 0.803 ±0.03
Table 3 AUC values after OoCV testing Model Class 0 Class 1 Class 2 Single MP ME MPE
0.737 – 0.779 –
0.661 0.727 0.538 0.791
0.605 0.767 0.652 0.844
627
Macro
Micro
0.685 ±0.075 0.819 ±0.089 0.763 ±0.114 0.870 ±0.083
0.784 ±0.014 0.914 ±0.009 0.831 0.016 0.928 0.015
0.729 ±0.03 0.797 ±0.01 0.759 ±0.02 0.811 ±0.03
0.627 ±0.03 0.723 ±0.03 –
Class 3
Class 4
Macro
Micro
0.719 0.815 0.712 0.856
0.487 0.760 – –
0.642 0.810 0.724 0.896
0.753 0.874 0.776 0.938
–
(by patient and/or expert) allow us to determine the correspondent classes, and they are useless for our discussion.
5.1 Cross-Validation Study As to the performance observed on the separate classes obtained after CV, MP model allowed us to reach the highest AUC value among all models for Class 1 (with an increase of AUC by 20.5% from 0.597 up to 0.802) and Class 4 (with an increase of AUC by 9.6% from 0.627 up to 0.723). These improvements are rather beyond the limits of the standard deviation of 3% and can be estimated as significant ones. In the similar way, ME model allowed us to reach the highest AUC value among all models for Class 0 (with an increase of AUC by 1.2% from 0.806 up to 0.818), but this increase is near to the limits of the standard deviation of 1–2% and cannot be estimated as a significant one. MPE model allowed us to reach the highest AUC value among all models for Class 2 (with an increase of AUC by 13.6% from 0.667 up to 0.803) and Class 3 (with an increase of AUC by 8.2% from 0.729 up to 0.811). These improvements are rather beyond the limits of the standard deviation of 2–3% and can be estimated as significant ones.
628
M. Shulha et al.
5.2 Out-of-Cross-Validation Study As to the performance observed on the separate classes obtained after OoCV, MP model allowed us to reach the highest AUC value among all models for Class 4 (with an increase of AUC by 27.3% from 0.487 up to 0.760). This improvement is rather beyond the limits of the standard deviation of 3% (measured by CV) and can be estimated as a significant one. ME model allowed us to reach the highest AUC value among all models for Class 0 (with an increase of AUC by 4.2% from 0.737 up to 0.779), but this increase is beyond of the limits of the standard deviation of 1–2% and can be estimated as a significant one. MPE model allowed us to reach the highest AUC value among all models for Class 1 (with an increase of AUC by 13% from 0.661 up to 0.791), Class 2 (with an increase of AUC by 23.9% from 0.605 up to 0.844), and Class 3 (with an increase of AUC by 13.7% from 0.719 up to 0.856). These improvements are rather beyond the limits of the standard deviation of 2–3% and can be estimated as significant ones. The similar improvements of macro and micro AUC values cannot be estimated and discussed in the same manner because they are heavily dependent on the classes explicitly “meta-labeled” by the additional text input from patient and/or expert (which are denoted by empty cells in Tables 2 and 3) for the correspondent models. Such explicit “meta-labeling” provides the data leakage that allows us to predict such classes for sure and significantly improve the macro and micro AUC values. Comparison of the absolutely best AUC values (emphasized by the red font in Tables 2 and 3) for CV (Table 2) and OoCV (Table 3) could create the distorted impression about the higher increase of performance after OoCV. The problem is that selection of the best OoCV models was performed after single validation attempt (on 120 images) in comparison to selection of the best CV models performed after 6 validation attempts (on 180 images) and their averaging. It means that results from Table 2 are statistically reliable in the limits of standard deviations, but results from Table 3 are examples of some possible fluctuations of metrics (AUC here) only.
5.3 Conclusion In conclusion, this approach based on metadata augmentation, namely, usage of the additional modalities (various text descriptions of images here) with “data leakage” on the extreme classes, for example, with the lowest (Class 0) and highest (Class 4) DR severity, and their combinations could be useful strategy for the better classification of some hardly classified DR severities like Classes 1–3 here. Of course, the results reported here were obtained for the playground dataset (RetinaMNIST) without many additional generalization techniques (like data augmentation, dropout, reducing the complexity of models, etc.). It was performed to exclude the side effects of these influences and to be concentrated on the impact of “metadata augmentation” with data leakage on some classes in favor of the bet-
Deep Learning with Metadata Augmentation for Classification …
629
ter classification of other classes. In our further research the results obtained here will be verified on the full version of RetinaMNIST dataset [17] with additional generalization techniques that were successfully applied by us for real-life medical data like data augmentation [27] and image size optimization with initial weights pretrained on ImageNet dataset [29]. The special attention should be paid to reducing the complexity of models in the view of availability of the small and powerful enough architectures that can be ported to Edge Computing devices with the limited computational resources [13, 14]. As it was previously shown, the further optimization of DNNs can be reached by tuning the model complexity, size [8, 12], types of components used (like various activation functions [21]), batch size [20], and other methods. This study demonstrate that additional improvement can be obtained by including additional modality which can be contextually relevant to the targeted objects.
References 1. Alienin O, Rokovyi O, Gordienko Y, Kochura Y, Taran V, Stirenko S (2022) Artificial intelligence platform for distant computer-aided detection (CADe) and computer-aided diagnosis (CADx) of human diseases. In: The international conference on artificial intelligence and logistics engineering. Springer, pp 91–100 2. Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep learning techniques: a review. Inf Med Unlocked 20:100377 3. Asiri N, Hussain M, Al Adel F, Alzaidi N (2019) Deep learning based computer-aided diagnosis systems for diabetic retinopathy: a survey. Artif Intell Med 99:101701 4. Atwany MZ, Sahyoun AH, Yaqub M (2022) Deep learning techniques for diabetic retinopathy classification: a survey. IEEE Access 5. Bora A, Balasubramanian S, Babenko B, Virmani S, Venugopalan S, Mitani A, de Oliveira Marinho G, Cuadros J, Ruamviboonsuk P, Corrado GS et al (2021) Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Dig Health 3(1):e10–e19 6. Chen YW, Jain LC (2020) Deep learning in healthcare. Springer 7. Cunha-Vaz JG (2011) Diabetic retinopathy. World Scientific 8. Doms V, Gordienko Y, Kochura Y, Rokovyi O, Alienin O, Stirenko S (2021) Deep learning for melanoma detection with testing time data augmentation. In: The international conference on artificial intelligence and logistics engineering. Springer, pp 131–140 9. Dutta S, Manideep B, Basha SM, Caytiles RD, Iyengar N (2018) Classification of diabetic retinopathy images by using deep learning models. Int J Grid Distrib Comput 11(1):89–106 10. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29 11. Gargeya R, Leng T (2017) Automated identification of diabetic retinopathy using deep learning. Ophthalmology 124(7):962–969 12. Gordienko Y, Kochura Y, Taran V, Gordienko N, Bugaiov A, Stirenko S (2019) Adaptive iterative pruning for accelerating deep neural networks. In: 2019 XIth international scientific and practical conference on electronics and information technologies (ELIT). IEEE, pp 173– 178 13. Gordienko Y, Kochura Y, Taran V, Gordienko N, Rokovyi A, Alienin O, Stirenko S (2020) Scaling analysis of specialized tensor processing architectures for deep learning models. In: Deep learning: concepts and architectures. Springer, pp 65–99
630
M. Shulha et al.
14. Gordienko Y, Kochura Y, Taran V, Gordienko N, Rokovyi O, Alienin O, Stirenko S (2021) “last mile” optimization of edge computing ecosystem with deep learning models and specialized tensor processing architectures. In: Advances in computers, vol 122. Elsevier, pp 303–341 15. Grauslund J (2022) Diabetic retinopathy screening in the emerging era of artificial intelligence. Diabetologia, pp 1–9 16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 17. IEEE: The 2nd diabetic retinopathy - grading and image quality estimation, challenge (2020), https://isbi.deepdr.org/data.html. Accessed 30 July 2022 18. Kertes PJ, Johnson TM (2007) Evidence-based eye care. Lippincott Williams & Wilkins 19. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980 20. Kochura Y, Gordienko Y, Taran V, Gordienko N, Rokovyi A, Alienin O, Stirenko S (2019) Batch size influence on performance of graphic and tensor processing units during training and inference phases. In: International conference on computer science, engineering and education applications, pp 658–668. Springer (2019) 21. Kochura Y, Stirenko S, Gordienko Y (2017) Comparative performance analysis of neural networks architectures on H2O platform for various activation functions. In: 2017 IEEE international young scientists forum on applied physics and engineering (YSF), pp 70–73. IEEE 22. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11) (2008) 23. Parthasharathi G, Premnivas R, Jasmine K et al (2022) Diabetic retinopathy detection using machine learning. J Innov Image Process 4(1):26–33 24. Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. Encycl Database Syst 5:532–538 25. Scanlon PH, Sallam A, Van Wijngaarden P (2017) A practical manual of diabetic retinopathy management. Wiley 26. Schmidt P (2017) Cervix EDA and model selection. Notebook. https://www.kaggle.com/ philschmidt. Accessed 30 June 2022 27. Statkevych R, Gordienko Y, Stirenko S (2022) Improving u-net kidney glomerulus segmentation with fine-tuning, dataset randomization and augmentations. In: International conference on computer science, engineering and education applications. Springer, pp 488–498 28. Team W (2019) World report on vision. World Health Organization 29. Tomko M, Pavliuchenko M, Pavliuchenko I, Gordienko Y, Stirenko S (2022) Multi-label classification of cervix types with image size optimization for cervical cancer prescreening by deep learning. In: 4th international conference on inventive computation and information technologies. Springer 30. Tymchenko B, Marchenko P, Spodarets D (2020) Deep learning approach to diabetic retinopathy detection. arXiv:2003.02261 (2020) 31. Yang J, Shi R, Ni B (2021) Medmnist classification decathlon: a lightweight automl benchmark for medical image analysis. In: IEEE 18th international symposium on biomedical imaging (ISBI), pp 191–195 32. Yang J, Shi R, Wei D, Liu Z, Zhao L, Ke B, Pfister H, Ni B (2021) Medmnist v2: a large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv:2110.14795
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite K. Mohamed Haris, N. Sabiyath Fatima, and Syed Abdallah Albeez
Abstract The real-time heads-up display (HUD) will enable the human operator to have a new sense and perspective to driving. The proposed Heads-up display (HUD) detects vehicles, civilians, and obstacles by allowing the user to stay focused on driving. It has a navigator map to keep the user aware of their current location and helps to track their movement. It also tracks speed with an overlayed speedometer and offers fail-safe emergency protocol to alert the emergency contacts when a crash is detected. All these features are accessible by almost everyone who have access to a mobile device and is very cost-effective. The Heads-up display serves as an effective medium between the car and human operator. The operator can set any of the three models: SSD, YOLO v2 and v3 based on their device performance level. If it is a lower-end device, the SSD model performs faster and efficient. If it is a high-end device, the performance can be improved by using YOLO higher version models. Keywords Heads-up display · HUD · Software application · Flutter · TensorFlow lite · Object detection · Vehicle detection
1 Introduction A simple miscalculation while driving will lead to hazardous accidents. In recent years, self-driving cars are a booming trend. It is a belief among people that selfdriving cars are safer. On contrary to the belief, self-driving cars are more prone to accidents than manually driven cars but the accidents are less severe than the other. It is found that on average, there are 9.1 self-driving car accidents per sixteen lakh kilometres driven, while the manually driven car accidents are comparatively less which is 4.1 crashes per sixteen lakh kilometres driven. In order to prevent such huge number of accidents, it is better to have computer combined with human with an interactive medium. In this instance, a user customizable application compatible as a K. M. Haris · N. S. Fatima (B) · S. A. Albeez Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_47
631
632
K. M. Haris et al.
wearable using virtual reality gear. Hence that application is developed and discussed in the following thesis. This application solves most of the common problems with the self-driving accidental variables such as real-life complex driving situation, lack of self-driving regulations, false sense of security, etc. Until now, the concept of heads-up display (HUD) is only a sci-fi concept. This Heads-up display application with the option where the user can completely customize their views would make this HUD of limitless application for various purposes. This project opens doors to limitless applications from microscopic robotic surgeries to next-generation military head gear/helm.
2 Related Works In 2021, Ma et al. [1] did a preliminary study on whether or not a heads-up display helps to improve driving performance and to promote driving safety awareness. The paper mainly did the study on the use of Heads-Up Display by overlaying visuals on the windshield. The viewport angle for the driver to see this information based on current driving scenarios are all done based on speed and other certain metrics. Driver’s eye movements are tracked using the infrared lights but long-term exposure may affect the eye’s health. The user interface proposed in the paper is also not so intuitive for a driver. The limitation of the system could be the cost incurred to set up such a visual overlay on the windshield through hardware which is not widely affordable. Moreover, the interface may not be suitable for all weather and lighting conditions. In 2021, Taheri Tajar et al. [2] proposed a vehicle detection model using YOLO which is a pre-trained model. The model is subjected to pruning to make the algorithm to be faster and efficient for vehicle detection. The customized YOLO detects six types of vehicles. The performance was improved from the default pre-trained YOLO. The limitation is that the model could be quantized to make it even faster with custom training. In that way, the detection could be even faster and could detect more than 16 frames per second. In 2020, Wu et al. [3] proposed a study that uses a laser-based heads-up display to display the information/content to the driver/users. The system consists a laser projector which takes the original image as input and then projects it on two rightangle placed prisms which is placed right in front of the projector. The image is then split into three using diffuser and that split image is then placed horizontal to the driver’s viewport by using a virtual projection lens. The virtual image is then displayed on the windshield in a such a way that it is 160 cm behind the windshield through reflection. The virtual image has information on speed, charge, and indicators which does not aid the driver in the context of driving much. The limitation of this system is that it does not display much information such as object detection, speed control, navigator, etc. In other words, this system is short on features. Also installing such projectors and lenses to make up the virtual image would simply make the product not affordable to most of the users.
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
633
In 2019, Park and Park [4] conducted a brief study on how to improve design development of HUD which aims to review interface design of HUDs of leading manufacturers of automobiles to ensure improved safety of the driver. After thorough areas of design principles, information characteristics, etc., was suggested for future research so maximum number of HUDs at maximum number of levels that can be used in HUD Safety System without compromising on the safety of the driver can be determined. In 2015, Pauzie [5] proposed a system that tries to help the driver to make the best decision by presenting useful information such as traffic, weather information on the windshield which shall be considered by the driver to make the informed decision quickly which would in turn ensure they take the right action circumventing negative outcomes in a method that avoids distracting or overwhelming the driver further improves the reaction time of the driver to a situation. The Heads-up display is yet again used in the windshield of the car; it has features such as lane maintaining, navigation sign detection, etc. The AR technology is used to anticipate next turns by prediction. The limitations in these systems are that all these features are only ever tested in a simulator but with real-driving several factors and variables are involved which also requires the human cognitive actions to make good decisions while driving. So, it is advisable and better to not to use path detection as there can be some animals or civilians can leave out in the detection and could cause accidents. No prediction algorithm is more accurate than the actual human cognitive prediction. In 2018, Cano et al. [6] demonstrated that a study claims that attention of the driver cannot be divided between HUD and the road. It is stated that while the HUD provides the driver with important information without having to change their gaze but it also increases the chance that it may unintentionally distract the driver by overwhelming information. HUD is less stressful than HDD (Heads Down Display) making drivers to respond and make decisions quicker with better speed maintenance as they are provided with constant speed level information in their view. There were also several applications developed for this HUD purpose but all these applications are a reversed view of information kept in front a windshield to be reflected using high brightness. The limitation is that not all vehicles are expected to have windshields such as bikes. In the case of bikes, the mobile could not be placed in front of a mirror. Also, the information displayed are really limited. In 2015, Mahajan et al. [7] did a study to explore the techniques used in vehicle HUDs specifically due to its novelty. A HUD usually consists of an image source, lenses, and a combiner surface such as the windshield with some variations because of lack of discourse related to Human Factor issues. HUDs helps in conditions where there is low visibility due to factors such as cold weather or night time which could make it hard to make out vehicles or the next turn but also make navigation more intuitive. The future of HUDs looks promising with several industries trying to implement this heads-up display feature to the driver. The limitation is that the cost of implementation makes it difficult to implement HUDs in cars at present despite their ability to improve safety and ensure the drivers focus on the road. In 2020, Lee et al. [8] proposed an AR HUD system with an eye tracking camera. The system contains a camera placed right in front of the user to detect the user’s eye
634
K. M. Haris et al.
movement. The AR HUD then projects the light on a mirror which is placed at specific angle so that it could be displayed on a windshield. The light rendered elements on the windshield are reconfigured based on user’s head or eye movement which is tracked by the eye tracking camera. The displayed information on the windshield has some specific depth which is used for rendering the light rays in different lighting environments to make it seem 3D embedded using AR. The limitation is that it is complex and expensive to implement such system on a car or any such vehicles. In 2021, Currano et al. [9] conducted a study to explore and better understand how awareness or judgment of a situation of a driver is influenced by visualizations of a HUD by designing and performing tests of an AR (Augmented Reality) HUD in the windshield after testing it briefly. The tests consisted of driving scenarios and three types of HUDs for the purpose of the each of the subjects were exposed to one of the three HUDs for two driving scenarios. The results of the test were, if the windshield is filled with increased context complexity, then it has an impact on situational awareness of the driver. The limitation is that the heads-up display must not be disturbance to the users. The transparency could be less as much that the user can see and interpret. In 2007, Choi et al. [10] proposed a system for vehicle detection that uses CCTV as a source of input. The system tracks and keeps count on number of vehicle present and tracks vehicles in lane and changing lane by implementing a scale invariant feature transform tracking coupled with quadtree segmentation and using the view of the traffic in two dimensions by using CCTV. The limitation is that the vehicle detection is done at a fixed point such as CCTV, the detection is done only at the place where it is placed at whereas the system could be mobile and detects vehicles in real-time. In 2006, Sun et al. [11] proposed a system that can detect vehicles using a camera that is mounted on a moving car rather than at a fixed point. The system uses optical, active, and passive sensors to detect and alert users on collision imminent. The limitation is that the interpreted detected detail could provide more context to the user on information rather than just collision alert. In 2021, Haris et al. [12] proposed a flutter application that can send out alert SOS signals to the rescuers nearby the victim area to avoid suspicious activity. The limitation is that the application sends out alerts only to the rescuers who have access to that application. Instead, the application could have also alerted people through carrier data and SMS charges to ensure that the message have reached the rescuers at times of emergency. In 2014, Jakkhupan [13] briefly discussed about the advantages of using haversine formula to calculate speed. The GPS module collects the last two latitude and longitude and hence the speed is computed based on how much distance is covered between those latitudes and longitudes with respective to time yields the speed. In 2019, Boukhary and Colmenares [14] et al. proposed a clean flutter mobile application development architecture which separates the business logic from the user interface for efficient data handling and state management. In 2011, Zandbergen and Barbeau [15] conducted a series of test to determine which GPS techniques are faster and efficient. With those tests, it is found that
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
635
the high sensitivity GPS-enabled mobile phone is more efficient in pinpointing the location and has less errors, good performance in par with recreational grade GPS receiver. In 2020, Sabiyath Fatima et al. [16] proposed an android system with better performing recycler view. The recycler view helps in creating list or grid view for large set of values. The recycler view helps adapt automatically by dynamically setting its height and width as per content is available. In 2016, Steinberger et al. [17] and in 2021, Szabo et al. [18] proposed a speedometer application that is placed under the windshield which helps to detect speed and shows user the exact speed that the user is travelling at. The limitation is that the placement of the phone makes the user constantly look down while driving which might be distracting or illegible in certain lighting conditions and for people with vision problems. In 2011, Bai [19] proposed a portable HUD display which displays information on the windshield from the GPS, accelerometer, and range finder as a HUD using Vacuum fluorescent display while brightness adjustment of the HUD is achieved using a CDS photocell. The limitations of such a system are that due to all the specific components used the HUD ends up becoming complex and very expensive. The display used consumes a lot of power. The infrared range finder also has a very limited range of 150 cm. In 2003, Imamura et al. [20] proposed a system, which utilises the local sensors of the vehicle and differential GPS and fuses them for positioning of a vehicle in real time with good accuracy. But the limitations of such a system are the lack of portability because of the dependence of the system on local sensors and because of the involvement of the local sensors of the vehicle and cannot be used with vehicles lacking any of the required sensors. In 2021, Burta et al. [21] proposed a system which can provide a heads-up display to the car windshield by using the display of an iPhone to improve safety of the driver by avoiding change of line of site by looking down to information such as speed to manage the speed of the vehicle and time. The limitation of the system is that it is limited in features, and the heads-up display is very small in size because of the use of iPhone as a windshield projection medium negatively impacting its immersiveness and visibility. In 2016, Chouksey and Sirsikar [22] proposed a system that is used to input the current location and destination of the user according to which the navigation symbols are displayed using the display under the windshield. The limitation of such a system is that it is very limited in features displaying only navigation information using symbols and is less immersive during navigation as the user is will have to check the map constantly which in turn affects the driving. In 2014, Yoon et al. [23] proposed a system uses input from a camera and sensors installed on a vehicle to recommend lanes and provide navigation instructions to the driver on a transparent screen installed on the windshield. The limitations of this system are that all the components are very expensive and in turn makes the end product not affordable. The instructional layout is too congested that might be overwhelming and block the view of the driver.
636
K. M. Haris et al.
In 1997, Nishizawa et al. [24] proposed a system for monitoring the traffic as well as provide collision warning to the driver. The system uses laser radar sensors for estimating the traffic on the road ahead and detect and track vehicles up to 100 m. The limitations of such a system are that performance is reduced when some dirt gets on the housing of the laser radar sensor, bringing the need to implement sensor wipers for cleaning and a limited directional resolution when using a single sensor. The system will turn out to be limited in features, immersion to the driver and not easily affordable. In 2018, Walenchok et al. [25] proposed a system for displaying navigation information while the driver is on the road which uses a system unit which consists of single board computer connected to display the system unit is fed power through power supply with a portable battery the user can interact with the system by using a GPS mobile app which acts as a medium for the user to input location and destination using Bluetooth technology as a response to which navigation instructions are displayed on the windshield of the using the display of system unit for projection as a projector. The limitation of this system is that the system unit has to be installed and adjusted before use which affects portability and due to it being a projection system the system unit which makes the Heads-Up display possible is not cost effective. This system does not offer any additional functionality to the driver.
3 Working Figure 1 shows the system architecture of the Heads-up display application. The system is executed in the following steps. Step 1: The user enters the mobile number and triggers the OTP from firebase based on their choice to signup/login as shown in Figs. 2 and 3. Step 2: The firebase sends the OTP to the specified mobile number. Step 3: The user enters the received OTP. If the OTP is valid, the user is redirect to the next page based on whether or not he is an existing user or a new user. If the OTP is invalid, the app does not redirect the user to the next screen as shown in Fig. 4. Step 4: If the user is already an existing user, their details are fetched from the Firestore database and redirects the user to the home page with listed models—SSD, YOLO v2 and v3. Step 5: If the user is a new user, the user will be redirected to sign up page where the basic details and emergency contacts are collected and sent to the firebase database. The register screen will appear as shown in Fig. 5. All these data will authenticate the user and store the user data in firebase. Step 6: After account creation, the driver/user may start using the application through a VR headgear. The VR headgear acts as a viewport for the driver/user through which
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
Fig. 1 System architecture of the heads-up display Fig. 2 Onboard screen
637
638 Fig. 3 Signup screen
Fig. 4 OTP screen
K. M. Haris et al.
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
639
Fig. 5 Register screen
the driver/user will be able to concentrate more on the driving with the real-time data constantly fed into their heads-up display. Step 7: Each module works together to serve as an input between each other. Whenever the object detection algorithm detects a car or any such vehicles with more than 80% confidence score, the speedometer will suggest to go slower since there is a vehicle in the upfront range / closer proximity. Step 8: The object detection algorithm is fine tuned in such a way that it not only detects the vehicles but also civilians/pedestrians and common objects to give a detailed detection of objects in the surrounding of the user. The user interface (UI) is designed in such a way that it is clearly visible in both day time and night time. The user can pick any of the listed object detection algorithm to activate the HUD as shown in Fig. 6. Fig. 6 Home screen
640
K. M. Haris et al.
Fig. 7 SOS alerts
Step 9: The HUD application also uses in-built gyroscopic sensors to detect the sudden crash-like movements. If the gyroscopic sensor detects such sudden motion change, then it will send out an SMS alert to the emergency contacts that which the user stored when creating the account with the application as shown in Fig. 7. Step 10: These contact details are stored in firebase database and to reduce the API calls, data handling has been implemented using local sqlite database. In this way, the application works efficiently by limiting the API calls to be called only when there is no data stored in the local database. The bounding boxes are calculated with the help of the feature map which marks the centre point by having the base of at least one ground truth. Step 11: The user interface of the HUD—Heads-up-Display is designed in such a way that it is easier to use and informative. The widgets are set to a transparency level through which the user can see through which indeed does not cause any blockage of view for the user as shown in Fig. 8. In terms of security, this application stores information over an encrypted database that which cannot be accessed by anyone other than the current user. The authentication token is used to get any responses back to the user.
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
641
Fig. 8 Heads-up display screen
Pseudocode for Object Detection START SWITCH(MODEL): CASE SSD: INITIALIZE SSD CASE V3: INITIALIZE YOLOV3 CASE V4: INITIALIZE YOLOV4 DEFAULT: INITIALIZE SSD INPUT CAMERA FEED TO THE SELECTED MODEL MODELS SCANS AND GRIDS THE IMAGE DETECT IMAGES IN A SINGLE PASS THROUGH THE CONVOLUTION NETWORK FIND PROBABILITY / CONFIDENCE SCORE CALCULATE HEIGHT AND WIDTH OF THE BOUNDING BOX FIND HORIZONTAL AND VERTICAL COORDINATE CENTER POINT WITH RESPECT TO THE BOUNDING BOX CONTINUE
642
K. M. Haris et al.
Pseudocode For Location and Crash Updates. START INITIALIZE GYROSCOPE AND GPS SENSOR FIND CURRENT LOCATION AND TRACK SET LISTENER FOR LOCATION CHANGES SET LISTENER FOR CRASH-LIKE GYROSCOPIC MOVEMENTS IF (LOCATION CHANGED): UPDATE LOCATION IF (CRASH DETECTED): ALERT EMERGENCY CONTACTS THROUGH SMS TERMINATE
Pseudocode for Calculating Speed. START INITIALIZE VARIABLE FOR LAT 1 AND LONG 1 OF CURRENT LOCATION INITIALIZE TIME 1 VARIABLE TO CURRENT TIME ON LOCATION CHANGE (): SET LAT 2 AND LONG 2 OF CURRENT LOCATION SET TIME 2 VARIABLE TO CURRENT TIME USING HAVERSINE FORMULA, CALCULATE DISTANCE BETWEEN TWO LATITUDES AND LONGITUDES CALCULATE TIME DIFFERENCE BETWEEN 2 TIME VARIABLES CALCULATE SPEED USING MENTIONED FORMULAS (1), (2) and (3). REPLACE LAT 1, LONG 1 AND TIME 1 WITH LAT 2, LONG 2 AND TIME 2 RESPECTIVELY RESET LAT 2, LONG 2 AND TIME 2 AS NONE
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
643
Speed Calculation Formula Speed = Distance/Tine
(1)
Distance = r ∗ c
(2)
• r is the radius of the earth. (r is taken as 6371 to calculate kilometres) c = 2 ∗ atan2
√ √ a, 1 − a
(3)
where a = (sin^2(lat2-lat1/2))^2 + (sin(long2-long1/2))^2*cos(lat1)*cos(lat2) • lat1 and lat2 in cos function are assumed to be radians. • lat 1 and long1 represents the initial current location. • lat2 and long2 represents the reached current location.
4 Result and Discussion The Heads-up display application is a real-time working project rather than a simulator when compared to other similar projects. This Heads-up display application is accessible to everyone who has a smartphone. However, in terms of performance, this project can be improved by using certain models that is used within the application. The Heads-up Display application uses three different models. They are SSD, YOLO v2 and v3. The following analysis shows how each model performs with the application in terms of speed and accuracy. The three models were put to test and the values were measured in three repetitions to get the mean accurate value. Table 1 shows the finalized values for each category. The measured categories are FPS—frames per second, mAP—Mean Average Precision, Accuracy, Min TPF—Minimum Time Per Frame. Table 1 Comparative Analysis of SSD, YOLO v2 and v3 models Models
Metrics FPS
mAP
Accuracy
Min TPF
SSD
59
0.25
72.1
0.17
YOLO v2
45
0.35
80.3
0.84
YOLO v3
30
0.11
63.1
0.42
644
K. M. Haris et al.
Fig. 9 Comparative analysis of frames per second
4.1 Frames Per Second (FPS) Figure 9 shows the comparison of frames per second between SSD, yolo v3 and v2. It is found that the SSD has better output in frames per second than both versions of YOLO. SSD can detect the most images at 59 frames per second followed by YOLO v2 at 45 frames per second and YOLO v3 at 30 Frames. When comparing SSD to both versions of yolo v2 and yolo v3 SSD provides 31.1% and 96.66% Percentage increase in Frames Per Second respectively. This data helps in deciding which model should be selected for devices that are with lesser processing power. If the device is of higher processing power, then SSD can be yolo highest version can be used which will be faster for that specific device but as for the low end and mid-level performing devices, SSD will be suitable.
4.2 Mean Average Precision By comparing the Mean Average Precision of SSD, YOLO v3 and v2 as shown in Fig. 10, it is found that yolo v3 has better output in Mean Average Precision compared to YOLO v2 and SSD. When comparing both the yolo versions’ Mean Average Precision YOLO v2 has a 68% lower mean average precision than YOLO v3. It has 40% higher Mean Average Precision compared to SSD. Fig. 10 Comparative analysis of mean average precision
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
645
Fig. 11 Comparative analysis of accuracy percentage
4.3 Accuracy Percentage By comparing the accuracy percentage as shown in Fig. 11, it is found that yolo v3 has better Accuracy Percentage compared to YOLO v2 and SSD. When comparing the Accuracy Percentage of YOLO v3 and v2 there is a Percentage Increase of 27.25%. Similarly, when comparing the Accuracy Percentage of YOLO v3 and SSD there is a percentage increase of 11.37%.
4.4 Minimum Time Per Frame By comparing the accuracy percentage as shown in Fig. 12, it is found that SSD takes lesser minimum time per frame in seconds to process a frame compared to both the YOLO Models. When comparing the minimum time per frame in seconds of SSD to that of yolo v2 and v3 there is a percentage decrease in Minimum Time Per Frame in seconds of 59.52% and 79.76% respectively.
Fig. 12 Comparative analysis of minimum time per frame
646
K. M. Haris et al.
Therefore, it is clear that the application performs faster detections when SSD model is used. Besides, faster detection, the model also predicts the image at a decent level when compared with YOLO v3. It is the model with highest accuracy and precision. However, it takes more time to detect the object. As for the yolo v2, it has poor accuracy and slower detections when compared with SSD and YOLO v3. So, SSD is the viable model when used with the Heads-up Display application.
5 Conclusion and Future Work The developed Heads-Up display application serves as a viable object detection solution with a mobile application. The Heads-up-display not only detects vehicles, civilians but also any objects that are trained to be detected. Unlike several other projects on this Heads-Up-Display, this application is not a mere simulator but a real-time application that can detect, locate, and show current speed in real-time. The future scope of this project is limitless. The TensorFlow model can be custom trained to fit any required needs. This application with respect to the vehicle detection can further be developed to have a path detection/guidance feature, which would help the driver/user to know when to make turns, quick turns and also helps to detect and map out the entire path to the destination, where the user needs to reach.
References 1. Ma X, Jia M, Hong Z, Kwok APK, Yan M (2021) Does Augmented-reality head-up display help? A preliminary study on driving performance through a VR-simulated eye movement analysis. IEEE Access 9:129951–129964 2. Taheri Tajar A, Ramazani A, Mansoorizadeh M (2012) A lightweight Tiny-YOLOv3 vehicle detection approach. J Real-Time Image Process 1–13. 3. Wu PJ, Chuang CH, Chen CY, Wu JH, Lin BS (2020) An augmented reality head-up display system with a wide-view eyebox. Int J Optics 4. Park J, Park W (2019) A review on the interface design of automotive head-up displays for communicating safety-related information. In: Proceedings of the human factors and ergonomics society annual meeting, vol 63. No. 1. Sage CA: Los Angeles, CA: SAGE Publications 5. Pauzie A (2015) Head up display in automotive: a new reality for the driver. In: International conference of design, user experience, and usability. Springer, Cham 6. Cano E, González P, Maroto M, Villegas D (2018) Head-up displays (HUD) in driving. HumComput Interact 1–7 7. Mahajan SM, Khedkar SB, Kasav SM (2015) Head up display techniques in cars. Int J Eng Sci Innov Technol 4(2):119–124 8. Lee J-H, Yanusik I, Choi Y, Kang B, Hwang C, Park J, Nam D, Hong S (2020) Automotive augmented reality 3D head-up display based on light-field rendering with eye-tracking. Opt Exp 28(20):29788–29804 9. Currano R, Park SY, Moore DJ, Lyons K, Sirkin D (2021) Little road driving hud: heads-up display complexity influences drivers’ perceptions of automated vehicles. In: Proceedings of the 2021 CHI conference on human factors in computing systems
Advanced Vehicle Detection Heads-Up Display with TensorFlow Lite
647
10. Choi J, Sung K, Yang Y (2007) Multiple vehicles detection and tracking based on scale-invariant feature transform. IEEE Intell Transp Syst Conf 2007:528–533. https://doi.org/10.1109/ITSC. 2007.4357684 11. Sun Z, Bebis G, Miller R (2006) On-road vehicle detection: a review. IEEE Trans Pattern Anal Mach Intell 28(5):694–711. https://doi.org/10.1109/TPAMI.2006.104 12. Haris KM, Fatima NS (2021) Sentinel–a neighbourhood based live location streaming safety APP for women and children. Revista Geintec-Gestao Inovacao E Tecnologias 11(4):2273– 2293 13. Jakkhupan W (2014) A prototype of mobile speed limits alert application using enhanced HTML5 geolocation. In: International conference on computational collective intelligence. Springer, Cham 14. Boukhary S, Colmenares E (2019) A clean approach to flutter development through the flutter clean architecture package. In: 2019 international conference on computational science and computational intelligence (CSCI), pp 1115–1120. https://doi.org/10.1109/CSCI49370.2019. 00211 15. Zandbergen P, Barbeau S (2011) Positional accuracy of assisted GPS data from high-sensitivity GPS-enabled mobile phones. J Navig 64(3):381–399. https://doi.org/10.1017/S03734633110 00051 16. Sabiyath Fatima N, Steffy D, Stella D, Nandhini Devi S (2020) Enhanced performance of android application using recycler view. In: Advanced computing and intelligent engineering. Springer, Singapore, pp 189–199 17. Steinberger F, Proppe P, Schroeter R, Alt F (2016). CoastMaster: an ambient speedometer to gamify safe driving. In Proceedings of the 8th international conference on automotive user interfaces and interactive vehicular applications, pp 83–90 18. Szabo R, Gontean A, Burta A (2021) The development of a head-up display (HUD) app on the android mobile operating system. In: 2021 IEEE 27th international symposium for design and technology in electronic packaging (SIITME), 2021, pp 5–8. https://doi.org/10.1109/SIITME 53254.2021.9663719.aa 19. Bai B (2011) Portable heads up display. ECET, vol 496. Spring 20. Imamura M, Kobayashi K, Watanabe K (2003) Real time positioning by fusing differentialGPS and local vehicle sensors. In: SICE 2003 annual conference (IEEE Cat. No.03TH8734), vol.1, pp 778–781 21. Burta A, Szabo R, Gontean A (2021) The creation method of a head-up display for cars using an iPhone. In: 2021 fifth world conference on smart trends in systems security and sustainability (WorldS4), 2021, pp 303–306. https://doi.org/10.1109/WorldS451998.2021.9513996 22. Chouksey S, Sirsikar S (2016) A prototype of low cost heads up display for automobiles navigation system. In: 2016 international conference on computing, analytics and security trends (CAST), pp 205–210. https://doi.org/10.1109/CAST.2016.7914967 23. Yoon C, Kim K, Baek S, Park S (2014) Development of augmented in-vehicle navigation system for Head-Up Display. In: 2014 international conference on information and communication technology convergence (ICTC), pp 601–602. https://doi.org/10.1109/ICTC.2014.6983221 24. Dr Nishizawa S, Cheok K, Smid E, Berge M, Lescoe M (1997) Heads-up-display collision warning and traffic monitoring system 25. Walenchok A, Seifert N, Reed J, Humphrey J (2018) Navigational heads-up display. In: Williams honors college, honors research projects, p 690 26. Raj JS (2021) Blockchain framework for communication between vehicle through iot devices and sensors 3(2):93–106. Accessed March 2021 27. Chen JIZ, Zong JI (2021) Automatic vehicle license plate detection using K-means clustering algorithm and CNN. J Electr Eng Autom 3(1):15–23
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models Manish, Upasana Dohare, and Sushil Kumar
Abstract In the field of smart urban traffic networks, Vehicle Trajectory Prediction or VTP is a crucial research field in the area of driving assistance and autonomous vehicles. A vehicle can be autonomous or manual; trajectory prediction involves future location and turns (Left or Right). Unlike the pedestrian trajectory prediction problem, Vehicle Trajectory Prediction is more complex. The trajectory prediction problem involves the human decision-making process and many factors affecting the decision for predicting the trajectory of the vehicle. Therefore, the nature of the problem is non-linear and comes under classification or regression problems. Recent developments in the field of artificial intelligence models of machine learning and deep learning, have made researchers provide promising solutions to the problem in different traffic situations. In this paper, firstly, we are providing the taxonomy of popular existing machine learning and deep learning models that have been used to solve the problem of vehicle trajectory prediction so far. Further, a discussion of some existing deep learning models follows. Secondly, we are listing some public datasets for the study of vehicle trajectory prediction and different performance metrics used to measure the performance of the models by researchers. Finally, a discussion on the limitations of the deep learning models is presented. After reading this paper, one can start their initial research in the area of vehicle trajectory prediction. Keywords Artificial ıntelligence · Vehicle trajectory prediction · Machine learning · Deep learning · Spatial · Temporal · Spatiotemporal models
Manish (B) · S. Kumar School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi, India e-mail: [email protected] S. Kumar e-mail: [email protected] U. Dohare School of Computing Science & Engineering, Galgotias University, Uttar Pradesh, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_48
649
650
Manish et al.
1 Introduction Sustainable Development Goals explicitly pointed toward the sustainable transport practice under SDG-11. SDG-11 says “Make cities and human settlements inclusive, safe, resilient, and sustainable.” It is crucial for creating greater social and economic integration while protecting the environment, enhancing social fairness, health, urban resilience, urban–rural connections, and rural production. For over one hundred years’ vehicles have been an essential means of transportation worldwide. These vehicles have increased in importance and numbers on the roads. These numbers are so huge that traffic situations have become more complex nowadays, and small lane changes or turns can cause accidents and traffic congestion.
1.1 Challenges in Vehicular Motion Traditional driving comprises the driver’s judgment on the road according to the environment on the road. The driver here controls and modulates the vehicle’s driving controls for that. This process has great uncertainty in many situations [1]. Here, the driver’s quick judgment and response time are not adequate most of the time. Here, two challenges are caused due to uncertainity. Which are: • Less information to the driver of the situation that affects the judgment, and • Considerable data to process by the driver so that decision takes more time.
1.2 Solutions for Challenges of Vehicular Motion One solution of the situation can be an autonomous vehicle. They perceive all the road factors and then follow the SLAM’s predetermined rules for the collision-free ride and reach the destination. Another solution can be a system that can predict vehicles’ trajectories and prescribe the speed and route for the ego-vehicle. Trajectory prediction comes under a complex problem class. The problem requires taking care of the resources of computation, i.e. pace and time, and dependency on a specific type of environment for robustness of the model, and the influences of other vehicles’ interaction on the trajectory of the target and ego-vehicle. Hence, an efficient and accurate trajectory prediction in real-time traffic scenarios is a big challenge. The efficiency of such models is measured via their accuracy, adaptability to a new traffic environment, and prediction horizon [2]. It should also consider the impact of other vehicles for the prediction course.
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
651
1.3 Applications of Trajectory Prediction These predictions maximize resource utilization and help to reduce traffic congestion and accidents. For drivers, these systems provide actual journey times and also alternate routes. It is evident and reasonable for related departments, actors and applications with traffic applications to make decisions according to the predicted future traffic conditions rather considering only present traffic information [3].
2 Background 2.1 Problem Statement for VTP The problem of trajectory prediction involves calculating future time step values based on the past time steps along with an optimized (here it is being minimum) loss function for the real-time outputs. A trajectory is a path traced by a body (here, the body is a vehicle) in a time duration (Fig. 1). We can see Trajectory as vector of coordinates: V = {Xi, Yi: i ∈ [1,2,3, …, n]}. Here, the time function T(x) maps the points of trajectory with a timestamp; T(i) − T(j) = C ∗ (i − j)
Peripheral Vehicle Leading Vehicle Target Vehicle
Y-Axis
Predicted Trajectory Actual Trajectory
y tor jec Tra t s Pa
’
,Y n ,Y n Xn ’
Xn
Xm
X-A
,Y m
xis
Fig. 1 Trajectory prediction in traffic
652
Manish et al.
Trajectory prediction problem step by step is as follows: • Given observed trajectory points: {Xi, Yi: i ∈ [1, 2, … m]}, • Trajectory Points to predict: {Xi, Yi: i ∈ [m + 1, m + 2, … n]}, where, m < n points (Fig. 1) and • We are minimizing the error metrics with the actual trajectory v ∈ V. The generation of trajectory points is equal to the problem of forecasting or prediction. The only difference is to find something “vi ” ∈ V at the place of optimizing some metrics of error. The issue of trajectory prediction has different types based on the time duration of prediction.
2.2 Challenges of Vehicle Trajectory Prediction Challenges of vehicle trajectory prediction are more or less according to the interactions of the agents in the environment and pre-assumptions for the prediction. • V2I interaction: The dependence of the Vehicle Trajectory on the environment like physical space or temporal window. • V2V interactions: The mutual influence of VTs, (a) physical restriction/other vehicles come in between the trajectory, and (b) due to traffic rules/Lane and speed. • Multimodality: Multiple plausible outputs of prediction problem and aleatory of a random system. We cannot ignore these as they are directly connected to the system’s randomness and epistemic. The solution is to add more data/information to the system. • Generalizability: The evaluation of the method depends on its ability to predict all possible real-time vehicle scenarios on the road. • Risk assessment: The trade-off between prediction model completeness and realtime constraints. • Choice of Prediction Model: The model selection depends on the risk assessment method chosen. The previous works mainly focused on solving the first two challenges; the rest are targeted later with advancements in the field and research.
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
653
3 Related Work 3.1 Different Approaches for Trajectory Prediction Environment Assessment The mathematical modelling of the situation is required to assess the environment associated with the problem, which helps to predict the change or evolution of the situation. There are three categories to approach the vehicle trajectory/motion prediction [4]. Physical-based model. The model only considers the laws of physics, i.e., the vehicle’s motion is constrained and assessed only according to the physical laws. And the future situation of the automobile is predicted using the dynamic and kinematic models. These all factors are used to evaluate the vehicle’s future state, i.e., the position, speed, and direction. Maneuver-Based Model. The model takes each vehicle as an independent maneuvering unit. Therefore, the motion of the vehicles is assumed here as the sequence of the independent maneuvers from other vehicles in the road network. These models consider future movements in advance. They influence the driver’s intention or the maneuver, like going left, right, or straight only. That model property provides us a more reliable and essential trajectory on the first hand. Interaction-Aware Motion Model. These models have working law as interaction-aware motion. They consider the vehicle maneuvering mutually interacting entities. This mutual interaction leads to improved representation of associated risk in real-life situations. Interaction-aware models are trajectory-prototypes models in nature. Dynamic Bayesian networks (DBN) is the base model. Such kind of interaction-aware models presents better assessment of the environment. While doing the matching, mutual influences can be considered as they are assumed to help avoid collisions. Trajectories which lead to crashes can be panelized in the process, and safe ones can be filtered. Thus models based on DBN, an asymmetric coupled hidden Markov Models (CHMMs) come to model the pairwise dependencies among the vehicles’ maneuvers [5] with less complex computations. Interaction-aware motion models are more reliable and allow long-term predictions than the previously stated models.
3.2 Limitations of the Trajectory Prediction Environment Assessment Models Motion prediction models mentioned in the previous part have advantages in assessing risk related to situations. Yet they all have their limitations. The limitation of physical-based models is the nature of their assumption while calculating vehicle motion. The model is highly inefficient and away from interactive situations
654
Manish et al.
in real-time scenarios as laws of physics seek ideal conditions, which are not in the case of real-time road scenes. The maneuver-based model has its limitations from the models’ starting pre-assumptions, as no vehicle in real road scenarios is independent of one another’s motion. The limitations of the model are the exhaustiveness in computing all the trajectories, which are expensive, and real-time risk assessment is not that compatible. An interaction-aware model has one fundamental problem, there is no direct technique present to comprehend the influence of any two vehicles on their separate trajectories. Contribution to Work This paper provides a survey of different models of machine learning and deep learning in Vehicle Trajectory Prediction. The contributions of this paper are distributed in following four parts: • In the first section, we are providing a taxonomy for categorization of the machine learning and deep learning models in the VTP field. • In the second section, we providing names of some available public online datasets and their summarized details. • In the third section, we are providing some of the performance matrices for the accuracy calculation used in different paper for the different prediction models. • In the fourth section, we later discuss some limitations of all artificial neural network based learning models. In the last section of the paper a conclusion for the work and future directions for the research work is presented.
4 Vehicle Trajectory Prediction Models Algorithms of machine learning and deep learning have made significant advancements in the vehicle trajectory prediction problem. Based on situation assessment and intelligent model choice, VTP models are classified as Physics-based or Classical and Deep Learning Models.
4.1 Classical or Physics-Based Models Here, most techniques involve two phases to fix the issue of VTP. These are model’s state and initial stage trajectory creation. In the first phase, using a single model, modelling of trajectory history takes place. We can represent all the features in a neural network to make predictions. In practical terms, it is challenging as no single model can accommodate all the diversity in various circumstances. The two components of a technique in most machine learning algorithms are classification and regression. The classification in the context of vehicle prediction entails figuring out the driver’s intentions, while regression entails forecasting the vehicle’s course in the
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
655
future. In the past, kinematic models were primarily used to describe how the vehicle moved. Most straightforward models used assumptions on constant acceleration, turn rate, and other parameters. Despite being straightforward, these models are not precise enough for real-time driving situations. Customary machine learning algorithms like Bayesian network, Markov model [6], support vector machine [6], direct vehicle models [7], maneuver specific models [7], Gaussian process models, and Gaussian mixer models [8], polynomial fitting [9] are used for vehicle trajectory prediction to get the prediction for a state or a point in the space or on the road. Here, models solve the prediction involving the classification decisions, like lane change or not. The problem of confidence in the prediction is severe in these models. These problems are very hard to avoid for the accuracy of a model. Cross-validation sampling methods like Monte Carlo and K-Fold [10] help reduce such issues somehow. The traditional ways are not reliable for long-term prediction. These models do not entirely solve the issue of non-linearity in predicting vehicle trajectories. Therefore, the researcher has demonstrated that a variety of data-driven methodologies can address the issue of non-linearity in trajectory prediction. The average prediction method used by data-driven techniques lowers the regression error.
4.2 Deep Learning Models Here, the category of Deep Learning models comes. They are subcategorized based on the dependencies models they are addressing. These are spatial, temporal, and hybrid or spatiotemporal as given Fig. 2. Spatial Dependency Models. The tendency for close areas to impact and share characteristics is known as spatial dependency (Goodchild, 1992, p.33). Convolutional Neural Network. Multiple researchers have used a convolutional neural network or CNN to detect spatial-relationship between the vehicles on roads using two-dimensional traffic data [11]. Based on places in the 2-D design, various road circumstances are transformed into a map. To anticipate a trajectory, CNN extracts the spatial correlation between multiple scenarios. Graph Convolutional Network or (GCN). Traditional CNN can only model ndimensional linear space data. Which is also known as Euclidean data. Thus, to model the non-Euclidean data in spatial structure GCN [11] is capable, which is better under the design of the traffic road network. For the most part, there are two premises of GCN, which include spectral technique method and spatial technique. Attention. The attention system is firstly introduced for natural language handling or NLP. Then it becomes popular in different fields. Different streets impact the traffic state of a road with various effects. The key thought is to progressively allot various weights to the various regions on different stamps of time using spatial attention component. The attention mechanism is utilized by gridlock speed expectation [12] in rush hours, to progressively catch and adapt the spatial inter-relations between regions targeted and first-hop neighbors in areas of the road network.
656
Manish et al.
Fig. 2 Classification diagram of vehicle trajectory prediction models
GAN-Vehicle Trajectory Prediction. City traffic with high density of traffic on the roads need special models to forecast trajectories and traffic conditions. Zhao et al. [13] suggests a Generative Adversarial Network-based trajectory prediction approach, called GAN-VEEP. The model uses the coordinate transformation model of a vehicle to address the complex spatial dependency in the topology of metropolitan roads. The neural network prediction model then picks up knowledge from how drivers behave. Thus, models improve the driving trajectory based on the driver’s psychology. Social Pooling (LSTM). This method is unique as it comprises local and non-local operations [14]. The attention mechanism for non-local operations with multi-head captures the relative effects, for the significance of each vehicle. On the other hand, local blocks indicate close-by interactions between vehicles. The model uses an LSTM encoder-decoder. This method models the interactions between all of the nearby cars through a social pooling mechanism. Stacked Auto Encoder (SAEs). Data fusion and compression using stacked autoencoders is a typical unsupervised learning technique. For feedback, it employs a backpropagation algorithm. Jiang et al. [15] present SAEs with filter mechanism gives better prediction. Transformer. In [16], a transformer-based model considers nearby vehicles interactions. The present model considers any potential spatial connections between future trajectories in decoding, distinct from techniques that merely consider the spatial interactions among observed trajectories in both the encoding and decoding stages.
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
657
Temporal Dependency Models. In this model mapping takes place between time stamps and points of the trajectory. These models are efficient for data series processing and ensures temporal coherence in prediction. There are some popular deep learning models using temporal dependencies for VTP. Temporal CNN. A typical effort in traffic research, [17] also employed CNN’s alternative version, which applied solely convolutional layers to concurrently capture and takeout spatio-temporal characteristics from the time series data of graph structures. To capture the complex temporal features, model employ CNN’s dilated causal form as the temporal convolution layer [12]. That makes temporal feature capturing possible using a temporal convolutional neural network. RNN, GRU, LSTM. Recurrent Neural Network models help process sequential data. And so are its special variants like LSTM and Gated Recurrent Units, which are standard neural networks for processing sequencing data. With the help of inputs from the previous states, RNN predicts the time series data [18]. LSTM and GRU models take inputs in the form of prior trajectory or positional data of the surrounding vehicles and then give output in the format of the location and speed of the egovehicle. Encoder-Decoder design is a functional working unit in RNN-based models. These models are helpful in sequence learning. Therefore, the model can make traffic predictions. The loss of information occurs in encoder-decoder based models due to fixed-length semantic vectors. Attention Recurrent Neural Network. An attention mechanism comes as an extension at the time axis in ARNN. This mechanism in ARNN gives an output sequence after making some selection in the hidden states of the encoder [19]. This mechanism supports the long-term sequence data also. The attention mechanism is used to overcome the problem of external input and to formulate the state of network traffic to feed heterogeneous input sources into the VTP. Transformers. More superficial, recurrent structures based on transformer networks allow each road agent’s trajectory without using complicated interaction terms. They do it by including enhanced data (including position and direction of its movement) and their application to the challenge of vehicle trajectory prediction in situations of urban traffic with prediction timeframes up to 5 s. The model in [20] uses these basic structures of the transformer model to forecast the trajectory. Hybrid or Spatiotemporal Models. The assumptions behind the spatial and temporal techniques do not take into account how interdependent they are. As a result, two-way joint relationships are disregarded. After overcoming the drawbacks, hybrid strategies that made use of both temporal and spatial information produced results that were superior to those of the earlier methods. STA-LSTM. RNNs are unaware of explaining how the model gives prediction using the past long-term data information of the trajectory, including neighbor vehicle trajectories data. Lin et al. [21] put forward an enhanced hybrid model, in which an LSTM model with spatial–temporal attention processes works to explain the ability of vehicle trajectory prediction to respond to these queries. The complete form is spatiotemporal attention long short-term memory. STA-LSTM achieves equivalent prediction accuracy to other cutting-edge algorithms and utilizes attention weights to explain how past trajectories and nearby vehicles affect target vehicles.
658
Manish et al.
DLM. For unsupervised learning of the interaction between vehicles and associated traffic hazards, [2] proposes a dual learning model (DLM) to predict the trajectory of the detected vehicle. This data-driven strategy uses an encoder-decoder structure of LSTM with two input channels exploiting Risk Maps and Occupancy Maps. The efficiency of the LSTM network in simulating nonlinear time relationships in sequence prediction is the driving force behind this model. Thus model produces an accurate trajectory prediction. GSTCN. Predicting the trajectory of nearby automobiles is an essential step in decision-making and planning for self-driving automobiles. Sheng et al. [22] propose “a graph-based spatial convolution network” (GSTCN) for predicting future distributions of trajectories of all neighboring automobiles using historical data. GSTCN solves space location-based interactions using a convolutional graph network (GCN), and CNN gets temporal features. The space–time features are encoded and decoded by GRUs to generate trajectory distributions for the future. A new adjacency weightmatrix also describes the magnitude of the interactions between the means, and [22] shows the effectiveness of GSTCN model on public data sets. SpecTGNN. Cao et al. [23] suggest a neural network using spectral temporal graph, which can simultaneously capture time dependency and inter-agent correlations in the domain. Agent graphs with information of dynamic state and graphs of surroundings with characteristics assembled from perspective pictures in double streams. Here both are integrated by SpecTGNN. The model uses temporal or time-synchronized convolution, spectral-graph convolution, and Fourier graph transformation to reduce the impact of the propagation of the error over a long period of time, the model also combines spatial–temporal multi-head attention. RRNN. A relational recurrent neural network (RRNN) handles the challenge of predicting how a vehicle will move in the future. Messaoud et al. [24] suggest an architecture of encoder-decoder based on RRNNs. The encoder examines the patterns in historical trajectory sequences, and the decoder creates a series of future points of trajectory. The uniqueness of this network is in blending the benefits of the LSTM model for expressing trajectories’ temporal developments with the power of attention mechanism to simulate movements of the vehicles relative to each other. GAN-3. In multi-modal circumstances, the GAN-3 model [25] can be utilized to provide multi-predictions. The GAN-3 model depicts numerous different behaviors, which enables the predictions to be more accurate and unaffected by the multi-modality issue. In instances where the impacts of multimodality are more pronounced, generative models outperform LSTM-based models [25].
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
659
5 Public Datasets 5.1 FCD The term floating cellular data (FCD) refers to a method for calculating the speed of traffic on the road network. This dataset is called Floating Car Data [25]. This dataset takes data from mobile phones, which means every mobile in the vehicle works a sensor on the road network. Compared to model-based techniques, FCD is more responsive to traffic-related events, such as traffic jams. This dataset includes data from four metropolitan cities.
5.2 Argoverse This dataset is a collection of widely used public datasets having detailed maps to experiment, test, and train self-driving cars in the road environment. The dataset includes over 300 K pictures of a 2-dimensional birds-eye-view in the HD format of Pittsburgh and Miami [26]. Here, objective of the prediction is to give motion of the sole object for next 3 s using prior 2 s’ motion points.
5.3 T-Drive Here, ‘T’ stands for trajectory here. This dataset is a public dataset of trajectories under the project of Microsoft Research Asia. This project collects the trajectories using GPS tracks for three months from 30,000 taxis [27]. The project is based on the traffic in Beijing, China.
5.4 NuScenes This dataset is for autonomous cars. NuScenes is a huge public dataset. In dataset NuScenes [28], the trajectories are shown in the Euclidean Cartesian CoordinateSystem at a rate of 2 Hz. In Singapore and Boston, where left-handed traffic and right-handed regulations are in effect. The actual driving scenarios of referred areas are compiled in the dataset.
660
Manish et al.
5.5 NGSIM The most popular dataset among researchers and programmers is Next Generation Simulation [3]. This dataset is a collection of many other public datasets, taken using digital cameras and an application software named NGVIDEO.
6 Evaluation Metrics Correct solution cannot be found in highly uncertain scenarios yet many plausible solutions can be generated that do minimize the error in prediction. Therefore, standard metrics are used to evaluate the prediction error. Here in this section we are listing some of these metrics.
6.1 ADE Average Distance Error measures the RMSE of the predicted and actual trajectory data points.
6.2 FDE Final Displacement Error gives the difference of the real final trajectory and the predicted trajectory by the model.
6.3 ADE (K) The typical average L2 gap between the best-predicted trajectory and actual. Here, the best-predicted or optimum trajectory is the one with the smallest endpoint error in ‘K’ predictions of trajectories.
6.4 FDE (K) The distance gap of the best-predicted trajectory’s terminus from the actual location is measured in L2. Here, the best trajectory is the one with the smallest endpoint error in ‘K’ predicted trajectories.
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
661
6.5 N-ADE/N-FDE ADE/FDE for longer trajectories tends to be larger. Therefore, these metrics are biased in the case of longer trajectories. To overcome these limitations, new normalized metrics N-ADE/N-FDE are used [25]. The normalized metrics are primarily independent of trajectory characteristics; nevertheless, they can only be utilized in aggregate settings, like in GAN models, because they are useless when applied to a single trajectory.
6.6 Miss Rate Miss Rate [29] assesses the percentage of undesirable results in all predicted solutions. A miss happens when the endpoint error exceeds a threshold value. Usually, this threshold is 2.0 m in the case of VTP.
6.7 DAC The term Drivable Area Compliance measures the feasibility of proposed trajectories for VTP. The DAC for a given model will be (X–Y)/Y, if a given model generates ‘X’ alternative future trajectories, and ‘Y’ of them escape the navigable region at a certain period of time. Therefore, DAC is equal to the ratio of a number of potential future trajectories divided by the total number of possible future trajectories [26].
7 Limitations of Deep Learning Models 7.1 Pre-processing of Data The data pre-processing and filtering steps significantly impact the models’ accuracy. Authors of ANN-based techniques never address such kinds of pre-processing. Instead, they have focused primarily on designing the model’s neural network structure.
7.2 High Demand for Dataset It is a significant disadvantage that deep learning algorithms need massive datasets for training. Deep learning models also need more data to generate reliable data with
662
Manish et al.
a substantial design size. Data augmentation is helpful to some extent, but more data is always a preferable solution to reduce error rates in the prediction. Furthermore, training a deep learning model is expensive because of complex data.
7.3 Generalization of Model Even highly trained deep learning models can fail when implemented over datasets other than their benchmark dataset. This instance suggests a lack of common sense in models or identifying the context in the dataset they have trained. Deep learning models are very good at classifying images and predicting sequences. They even can produce data that can reproduce another pattern. The property is evident in the GANs. They do not, however, generalize to all supervised learning issues.
7.4 Un-Defined Nature of the Models Deep learning algorithms are known for their opaqueness or “black box” issue. They are unable to comprehend their results, which makes it challenging to debug or understand how they arrive at judgments.
8 Conclusion In the paper, we extensively surveyed several machine learning and deep learning models for vehicle trajectory prediction. To be more precise, we first provided a taxonomy and summary of the current trajectory forecast techniques. The basis of this categorization is the consideration of dependencies for modelling the environment parameters for prediction. The same dependencies are the basis for the risk and uncertainty assessment. Then, we listed the publicly available popular datasets. After that, we listed some performance metrics to measure the performances of different prediction models in VTP. These metrics are based on the dependencies and the models they are using for the prediction. Finally, some significant challenges of the VTP are discussed to provide potential study gaps and possibilities for future work. After reading this paper, readers can promptly understand the concepts of VTP and can start their research work in the same. Researchers in this discipline can use this paper as a valuable resource and source of references, which can help them in pertinent studies.
A Survey of Vehicle Trajectory Prediction Based on Deep Learning Models
663
References 1. Qu R, Huang S, Zhou J, Fan C, Yan Z (2022) The vehicle trajectory prediction based on ResNet and EfficientNet model. arXiv:2201.09973 2. Khakzar M, Rakotonirainy A, Bond A, Dehkordi SG (2020) A dual learning model for vehicle trajectory prediction. IEEE Access, 8:21897–21908. https://doi.org/10.1109/ACCESS.2020. 2968618 3. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang L, Wang G, Cai J, Chen T (2015) Recent advances in convolutional neural networks. http://arxiv.org/abs/1512. 07108 4. Lefèvre S, Vasquez D, Laugier C (2014) Access a survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J 1. http://www.robomechjournal.com/con tent/ 5. Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models forcomplex action recognition In: Proceedings of IEEE conference on computer visionand pattern recognition, pp 994–999 6. Kumar P, Perrollaz M, Lefevre S, Laugier C (2013) Learning-based approach for online lane change intention prediction. In: 2013 IEEE ıntelligent vehicles symposium (IV). IEEE, pp 797–802 7. Houenou A, Bonnifait P, Cherfaoui V, Yao W (2013) Vehicle trajectory prediction based on motion model and maneuver recognition. In: 2013 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 4363–4369 8. Schreier M, Willert V, Adamy J (2014) Bayesian, maneuver-based, long-term trajectory prediction and criticality assessment for driver assistance systems. In: 17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 334–341 9. Deo N, Rangesh A, Trivedi MM (2018) How would surround vehicles move? A unified framework for maneuver classification and motion prediction. IEEE Trans Intell Veh 3(2):129–140 10. Fushiki T (2011) Estimation of prediction error by using K-fold cross-validation. Stat Comput 21(2):137–146 11. Li Y, Shahabi C (2018) A brief overview of machine learning methods for short-term traffic forecasting and future directions. Sigspatial Spec 10(1):3–9 12. Fang S, Zhang Q, Meng G, Xiang S, Pan C (2019). GSTNet: global spatial-temporal network for traffic flow prediction. In: IJCAI, pp 2286–2293 13. Zhao L, Liu Y, Al-Dubai AY, Zomaya AY, Min G, Hawbani A (2020) A novel generationadversarial-network-based vehicle trajectory prediction method for intelligent vehicular networks. IEEE Internet Things J 8(3):2066–2077 14. Messaoud K, Yahiaoui I, Verroust-Blondet A, Nashashibi F (2019) Non-local social pooling for vehicle trajectory prediction. In: 2019 IEEE ıntelligent vehicles symposium (IV). IEEE, pp 975–980 15. Jiang H, Chang L, Li Q, Chen D (2019) Trajectory prediction of vehicles based on deep learning. In: 2019 4th ınternational conference on ıntelligent transportation engineering (ICITE), pp 190–195. IEEE 16. Li X, Xia J, Chen X, Tan Y, Chen J (2022) SIT: a spatial interaction-aware transformer-based model for freeway trajectory prediction. ISPRS Int J Geo Inf 11(2):79 17. Yu B, Yin H, Zhu Z (2017) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv:1709.04875 18. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471 19. Zheng C, Fan X, Wang C, Qi J (2020) Gman: a graph multi-attention network for traffic prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, No. 01, pp 1234–1241 20. Li Y, Moura JM (2019) Forecaster: a graph transformer for forecasting spatial and timedependent data. arXiv:1909.04019
664
Manish et al.
21. Lin L, Li W, Bi H, Qin L (2021) Vehicle trajectory prediction using LSTMs with spatialtemporal attention mechanisms. IEEE Intell Transp Syst Mag 14(2):197–208 22. Sheng Z, Xu Y, Xue S, Li D (2022) Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving. IEEE Trans Intell Transp Syst 23. Cao D, Li J, Ma H, Tomizuka M (2021) Spectral temporal graph neural network for trajectory prediction. In: 2021 IEEE ınternational conference on robotics and automation (ICRA). IEEE, pp 1839–1845 24. Messaoud K, Yahiaoui I, Verroust-Blondet A, Nashashibi F (2019) Relational recurrent neural networks for vehicle trajectory prediction. In: 2019 IEEE ıntelligent transportation systems conference (ITSC). IEEE, pp 1813–1818 25. Rossi L, Ajmar A, Paolanti M, Pierdicca R (2021) Vehicle trajectory prediction and generation using LSTM models and GANs. PLoS ONE 16(7):e0253868. https://doi.org/10.1371/journal. pone.0253868 26. Liu J, Mao X, Fang Y, Zhu D, Meng MQH (2021) A survey on deep-learning approaches for vehicle trajectory prediction in autonomous driving. In: 2021 IEEE ınternational conference on robotics and biomimetics (ROBIO). IEEE, pp 978–985 27. Pecher P, Hunter M, Fujimoto R (2016) Data-driven vehicle trajectory prediction. In: Proceedings of the 2016 ACM SIGSIM conference on principles of advanced discrete simulation, pp 13–22 28. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Beijbom O (2020) Nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631 29. EvalAI: Evaluating state of the art in AI (n.d.) EvalAI; eval.ai. https://eval.ai/web/challenges/ challenge-page/454/evaluation. Accessed 12 Aug 2022
A Lightweight Intrusion Detection Model for In-vehicular CAN Networks D. S. Divya Raj, G. Renjith, and S. Aji
Abstract The Intelligent Transport System(ITS) is an important development in this technological era. Effective communication in ITS is possible by means of different network systems and technologies, and as a result, driverless vehicles are becoming the target of cyber-attacks. The CAN bus is mainly used in ITS for internal communication and is vulnerable to cyber-attacks. CAN protocol gives minimal information for the design of intelligent algorithms for detecting intruders. The existing intrusion detection mechanisms are computationally expensive and may not be suitable for low-end ECUs in vehicles. In this work, we have given importance to the data part of the frame while preparing the dataset for the experiments. We have explored the performance of several classical machine learning algorithms like Random Forest algorithm, XGBoost algorithm, LightGBM algorithm, Naive Baye’s algorithm, and Decision Trees with the refined dataset and used the publically available dataset ’Car Hacking: Attack and Defence Challenge’ for the experimental evaluation. In the results, the Random Forest algorithm got the most significant accuracy with 95% and an F1-score of 95%. Compared to the computationally complex algorithms, we could achieve comparable and significant results with the classical machine learning algorithms which can be easily portable to ECUs in vehicles. Keywords Intrusion detection · Controller area network · Random forest algorithm. · Lightweight intrusion detection
D. S. Divya Raj · G. Renjith (B) · S. Aji Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India e-mail: [email protected] D. S. Divya Raj e-mail: [email protected] S. Aji e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_49
665
666
D. S. Divya Raj et al.
1 Introduction ITS is redefining the ways that we move on our roadways by the use of information and communication technology in the transport sector. However, the vehicular network is a challenging category that uses the network to strengthen communication between the inside devices. It is also used to enhance communication between the vehicles and the infrastructure. There are many Engine Control Unit(ECUs) in ITS vehicles, and there is consistent communication between the ECUs. The intra-vehicular communication is mainly in two broad types, wired or wireless network based. Controller Area Network(CAN) is a serial network technology mainly designed for the automobile industry that was initially used in European-made cars, and now it is used in all ITS-related fields worldwide. Traditional intrusion detection algorithms uses mining association rules to detect the intrusion behaviors in the network [18]. CAN is an intra-vehicular network bus that connects and communicates with every component like ECU, audio system, airbag, or any other components that can be attached to an automotive. Each ECU will carry information and share it with another ECU as and when needed, and there can be up to 70 ECUs placed in a car. Figure 1 shows the basic structure of a CAN message frame and Table 1 gives the description of the elements in the CAN frame. Nowadays, cars are implemented with many technologies like Bluetooth, Wi-Fi networks, and smartphone plug-ins. CAN data packets have been broadcast to all the nodes in the CAN bus, and it will be one of the critical reasons for the attempts to attack the CAN bus. This work comprehensively studies the different kinds of attacks
Fig. 1 CAN data frame
Table 1 Description of elements in the CAN frame Field Description Start of Frame(SOF) Identifier(ID) Remote Transmission Request(RTR) Identifier Extension(IDE) Reserved Bit(R) Data Length Code(DLC) Data Cyclic Redundancy Check(CRC) Acknowledgement(ACK) End of Frame(EOF) Interframe Space(IFS)
Represent beginning of the frame Unique identity and indicates the intention of the message Recessive in remote frames For extended frame format For possible use by future amendment The length of the data. The actual data For error detection Receiver asserts dominant End of the frame Must be recessive
A Lightweight Intrusion Detection Model for In-vehicular …
667
that can be attempted in CAN. Our studies observed that there are insufficient features in the communication frame that can characterize the genuineness of the message. The publically available dataset ‘Car Hacking: Attack and Defence Challenge’ [16] has taken for experimental evaluation. We have concentrated the data part of the frame to prepare data for the analysis. We have designed an intrusion detection system and experimented with various machine learning methods—Random forest algorithm, XGBoost algorithm, LightGBM algorithm, Naïve Baye’s algorithm, and Decision tree. Each algorithm is evaluated with the accuracy, F1-score, precision, and recall. Mostly the tree-based ensemble classifiers give the best accuracy. The rest of the paper is organized as follows: Sect. 2 presents the literature review and the existing methods. The following section gives an overall idea of the proposed methodology. A detailed explanation of the dataset and experiments are given in the proceeding section. The conclusion of the work is presented in the last section.
2 Literature Review Along with the technological advancements, the attempts to attack the CAN network are also increasing. This section discusses the related works regarding network intrusion detection in CAN. Compared to the other areas, the research ITS is in its primitive stages, and there are only some significant works are existing in CAN intrusion detection. Regarding the earlier Intrusion Detection Systems, Hossain et al. [1] introduced a model based on Long Short-Term Memory(LSTM). Their model can detect network attacks and effectively categorize the normal and attack data with high accuracy of 99.99%. In this model, they have implemented binary and multiclass classification models. Moreover, hyper-parameter value tuning is also used to investigate the detection performance of attacks in CAN buses. Hiroyuki et al. [2] proposed a robust IDS using Convolutional Neural Network(CNN) deep learning approach. The classifier is very efficient in detecting CAN bus attacks, it got the highest accuracy of 99%, and detection rate is 99%; this experiment mainly tested on four types of attack, i.e., DoS attack, fuzzing attack, RPM, and spoofing attack. Hanselmann et al. [3] explored neural network architecture on CAN for detecting intrusions. The experiment was based on a novel unsupervised learning approach, which could detect known and unknown intrusion scenarios. The accuracy of the model is 99%, and the detection rate is larger than 0.70. Ning et al. [4] introduced a LOF(Local Outlier Factor) based intrusion detection method that could diminish the false detection rate. They mainly focused on three types of attacks on two different vehicles and carried out intrusion detection investigations by employing the voltage feature of the CAN bus. While evaluating the performance of the model with SVM, an average detection rate is 87.9% it is reported for two independent vehicles, and the average false detection rate was about 3.77%, whereas the highest false detection rate was 17.5%. Javed et al. [5] explored a novel approach named CANintelliIDS, which enforced vehicle intrusion attack detection in the CAN bus. CANintelliIDS combines a CNN
668
D. S. Divya Raj et al.
and an attention-based Gated Recurrent Unit(GRU) system. The experiments were performed under three main types of attacks and evaluated with accuracy, precision, recall, and F1-score. DOS, fuzzy, impersonation attack and attack-free state are detected with the accuracy of 94.06%, 95.09%, 94.01%, and 93.79% respectively, and the average F1-score was 94%. Chiscop et al. [6] implemented a machine learning-based intrusion detection method for CAN networks. This model gives a temporal convolutional networkbased solution that can be learned from the normal behavior of CAN signals and able to differentiate them from malicious ones. Gmiden et al. [7] proposed a simple intrusion detection method for CAN bus IDS. This procedure focused on the scanning of time intervals in CAN messages. The principle idea is to implement a IDS which identifies the CAN ID and then calculates the time intervals from the most recent one. The time interval of the arrival message is differentiated to the last and final message. The alert will be lifted if the interval message is smaller than the normal one. Young et al. [8] implemented a technique that is focused on constant message frequencies across vehicle driving modes, and that do not need changes on the prevailing CAN Network. Suwwan et al. [9] proposed an intrusion detection for CAN deep learning techniques. This paper implemented a simple 1D CNN, Long Short Term Memory(LSTM), and GRU. They could score the maximum F1-score while limiting the data in the driving state. Yinan et al. [10] introduced a Signature-Based IDS for in-Vehicle CAN Bus Network. Yijie Xun et al. [11] designed an external IDS based on vehicle voltage signals, named VehicleEIDS. Yuchu et al. [12] proposed a hybrid similar neighborhood robust factorization machine model for CAN bus intrusion detection in the in-Vehicle Network. The model has AUC values of 0.9216 and 0.901 and AUPR values of 0.9194 and 0.9018 on two real datasets, respectively. Alyu et al. [13] implemented a Blockchain based federated forest for SDNEnabled in-Vehicle network IDS. Yang et al. [14] proposed Tree-based intelligent IDS in the internet of vehicles. Hyun et al. [15] implemented an in-vehicle network intrusion detection using a deep CNN. The research utilized the datasets and implement them in a real vehicle to evaluate our detection system. Sivaganesan et al. [17] implemented an efficient routing protocol with collision avoidance in the vehicular networks and it is basically for handling the traffic in the network. Table 2 abstracts the studies we have carried out as part of the work in this paper. We have observed that the dataset ‘Car Hacking: Attack and Defence Challenge’ [16] has not considered in the studies which are listed in the Table 2. Except for Suwwan et al. [9]. In [9], they have used reduced a dataset from for training and testing from the original dataset [16]. In our proposed work, we have taken the entire dataset [16] for the analysis and implementation of the lightweight intrusion detection model.
A Lightweight Intrusion Detection Model for In-vehicular … Table 2 Comparison of works Work Method Javed et al. [5] Chiscop et al. [6] Suwwan et al. [9] Yinan et al. [10] Xun et al. [11] Aliyu et al. [13] Hossain et al. [1] Hiroyuki et al. [2] Hanselmann et al. [3] Yuchu et al. [12]
Hyun et al. [15] Ning et al. [4] Young et al. [8] Yang et al. [14] Gmiden et al. [7]
CNN and attention-based GRU CANet LSTM Signature-based system Based on vehicle voltage signals Blockchain-based LSTM algorithm CNN deep learning algorithm The novel unsupervised learning approach Hybrid similar neighborhood robust factorization machine mode Deep convolutional neural network SVM Based on frequencies Tree-based intelligent intrusion detection system Simple intrusion detection method
669
Result F1-score: 94.38% Accuracy: 96% F1-score: 100% Detection rate: 98.2% Accuracy: 97% Detection rate: 98.1% Accuracy: 99.99% Accuracy: 99.99% Accuracy: 99.99% AUC value of 0.92
F1-score: 99.85% Accuracy: 87.9% Accuracy: 97% Accuracy: 99.86% Accuracy: 95%
3 Proposed Methodology The proposed architecture consists of mainly two phases data preparation and intrusion detection using machine learning methods. Figure 2 draws the detailed architecture and workflow of the proposed strategy. In the first phase, the data pre-processing step includes data splitting and conversion. Basically the dataset contains a timestamp, arbitration ID, DLC, data, class, and sub-class fields, but we excluded some of them and considered the DLC, data, and class only for analysis. The Data field has hexadecimal values, each byte separated by a single space, and split into eight entries. The hexadecimal data is further converted into the decimal format. The data imbalance in the dataset may lead to inconsistent behavior in the machine learning algorithms. The random over-sampling task in our proposed architecture is intended to solve this problem. The training module will give a trained model as output after the training process of the concerned machine learning strategy. The test data set is used to evaluate our proposed IDS, where the converted data will be given to the trained model to detect the intrusion.
670
D. S. Divya Raj et al.
Fig. 2 Architecture diagram
4 Experiments and Result 4.1 Dataset We have used the publically available dataset ‘Car Hacking: Attack and Defence Challenge 2020’ [16] for the experiments. The dataset contains a separate collection for both training and testing data. The data is collected from driving and stationary vehicles, and the same is noted in the dataset. The driving data includes the common motions in driving, like accelerating and turning the steering wheel. In the case of stationery data, it is collected when the car engine is on but not moving in park gear. There are six separate files in the dataset, and we have merged them all into one depository for ease of processing. There are six fields in the data - timestamp, arbitration ID, DLC, data, Class, and subclass. The CAN message initiating time is referred to as a time stamp. Arbitration ID is called CAN ID, the unique id of each CAN message. The CAN ID is used to identify each message uniquely, and it is always a hexadecimal value. The data field represents the data or the message that will be in hexadecimal format. The length of data field is called DLC(Data Length Code). Class and subclass both show the CAN message or CAN data belonging to attack or normal data. The subclass will mention the type of attack. A field called data in the dataset has been taken for our analysis purpose. The data field contains hexadecimal data, and it is converted into a decimal format for the machine learning algorithms. Padding of 0’s in the missing data part is also carried out before splitting it into eight junks. Each byte of data is assigned to a column from
A Lightweight Intrusion Detection Model for In-vehicular … Table 3 Dataset format DLC D1 D2 8 8
06 85
25 A6
D3
D4
D5
D6
D7
D8
Class
05 B4
30 52
FF 6D
CF CB
71 0E
55 74
0 1
Table 4 Data description before class balancing Data Training Total data Normal data Attack data
671
3672151 3372743 299408
Testing 2097150 1874773 222377
D0 to D7. The normal data is marked as 0 and attacked as 1 in the class field. There are ten fields in the dataset that we have used for machine learning activities, DLC, Data [D0-D7], and Class, which are represented in Table 3. The class field indicates the anomalies injected in the CAN bus, which can be of four kinds: Flooding, Spoofing, Replay, and Fuzzing. A flooding attack mainly focused on denying the services of the CAN bus. By continuously sending many messages so that all the nodes in the bus will become unusable, and it will slow down the service of the CAN bus. In a spoofing attack, an attacker injects a message to control the desired function after reversing the vehicle traffic. Detecting the spoofing attack is very difficult. In a replay attack, the messages can be stored for a long time and broadcast after the validity of the message expires. In a fuzzing attack, the attacker only has insufficient information about the CAN message, and they will inject the messages randomly to complicate the system’s working. The training dataset has a total collection of 3672151 samples in which 3372743 normal data and 299408 belong to the attack. After merging the two files in the testing dataset, we obtained a total of 2097150 testing samples with 1874773 and 222377 samples in normal and attacked categories. There is a considerable difference between the distribution of samples for the classes in both training and testing datasets which are mentioned in Table 4. This kind of unbalanced dataset may lead to avoiding the minority class completely in the learning phase. It is a severe issue in machine learning because the minority classes should have a relevant contribution in the knowledge refinement phase and the reliability of the predictions in later stages. The Random Over Sampling technique converts the unbalanced dataset into a balanced one. Random oversampling involves randomly choosing example data from the minority class and adding them to the training dataset. The increase in the number of data for the minority class can also result in a hike in the computational cost when the model has to consider the same samples in lots of situations in a single iteration. We have applied Random Over Sampling to the training data, and we got a total of 6745486 training data. There
672
D. S. Divya Raj et al.
Table 5 Data description after balancing Data Training Total data Normal data Attack data
6745486 3372743 3372743
Testing 3749546 1874773 1874773
were 3372743 normal data and 3372743 data in the attack class. In the 3749546 samples of testing data, 1874773 data were found as normal, and 1874773 in the attack group and it is mentioned in Table 5.
4.2 Experiment with ML Algorithms and Results Analysis We conducted a set of detailed experiments to validate the efficiency of the proposed strategy. Three different sets of experiments with different combinations of training and testing datasets were conducted throughout the work. Five classical machine learning algorithms are selected for intrusion detection. We found that there are a couple of new generation algorithms in the machine learning paradigm, but most of them are found computationally expensive. We are trying to explore the theoretical stability of the selected ML algorithms with the dataset we have refined from the original dataset. The Random Forest(RF) is one of the algorithms in our bucket of ML methods. The RF makes good predictions and is skilled in handling large dataset very effectively and consistently. This algorithm can give high accuracy over large dataset and does not require data scaling. LightGBM is the second method in the list, and it is a histogram-based algorithm that is so quick in the training procedure. It only uses the minimum memory because it keeps on replacing continuous values with discrete bins. The LightGBM algorithm is compatible with large dataset. XGBoost is a tree-based and highly flexible ensemble ML algorithm. It has an in-built capability to handle the missing values in the dataset, and it permits the user to execute cross-validation at every cycle of the boosting process. After XGBoost, the Decision Trees algorithm comes to the list. Decision Tree is very simple and efficient in ML methods. The Naïve Bayes can easily handle both continuous and discrete data. This algorithm is computationally fast and can easily come to our list of algorithms. We used the balanced training and testing datasets in the first set of experiments. The overall result of the algorithms is mentioned in Fig. 3. It is noted from experiments that the random forest classifier could perform well compared with the other models. All the performance measures are achieved best with an accuracy of 94%, F1-score of 95%, and precision with 97% in RF. It is interesting to note that the precision of most of the methods, except Naïve Bayes,
A Lightweight Intrusion Detection Model for In-vehicular …
673
Fig. 3 Performance of different ML algorithms with balanced training and testing datasets
Fig. 4 AUC and ROC curve of random forest algorithm
is recorded above 90% and a best of 97% with RF. We can see that the decision tree algorithm also performed well in the experiments and got better performance compared with the other three methods other than in the recall. In the experiment, we have also evaluated the performance of classification models by using AUC-ROC curve. It clearly helps us to determine the capability of our model. As compared to another models random forest classifier achieved the best AUC value and the AUC-ROC curve is shown in Fig. 4. The LightGBM and XGBoost algorithms are also performed well and recorded above 90% in AUC value and the AUC-ROC curve is shown in Fig. 5. Then we can see that the decision tree and Naive Baye’s algorithms are achieved below 90% and the AUC-ROC curve is shown in the Fig. 6. The detection time of each algorithm is very important in evaluating the performance and complexity. The random forest algorithm has detection time of 0.2 s and it is the minimum time as compared to other ML algorithms. The highest detection time is observed in XGBoost algorithm which is around 1.2 s. This clearly depicts that, our model is computationally simple and lightweight keeping better performance. The detection time of all the five algorithms are shown in Fig. 7.
674
Fig. 5 AUC and ROC curve of LightGBM and XGBoost algorithm
Fig. 6 AUC & ROC curve of decision tree and Naive Baye’s algorithm Fig. 7 Detection time(in sec) for different algorithms
D. S. Divya Raj et al.
A Lightweight Intrusion Detection Model for In-vehicular … Table 6 Tenfold cross-validation result (Training dataset) Folds Accuracy (%) F1-score (%) Recall (%) 1st Fold 2nd Fold 3rd Fold 4th Fold 5th Fold 6th Fold 7th Fold 8th fold 9th Fold 10th Fold
93.17 94.75 94.80 94.84 94.82 94.18 94.62 94.79 94.81 94.89
93.13 94.65 94.70 94.70 94.71 94.09 94.52 94.68 94.70 94.78
92.73 92.76 92.83 92.75 92.72 92.80 92.80 92.78 92.79 92.84
675
Precision (%) 93.46 96.62 96.64 96.75 96.77 96.41 96.31 96.65 96.69 96.82
Fig. 8 Graphical representation of tenfold cross-validation on a testing dataset
In order to measure the consistency of the proposed method, we have conducted another series of experiments with training and testing datasets separately. In the first exercise, we conducted the tenfold cross-validation with the training dataset. The results obtained in the experiments are given in Table 6. Similarly, the test dataset alone is also used in the subsequent tenfold cross-validation experiments. The results of the experiments with test data are shown in Fig. 8. The results of the experiments with 6745486 samples of training datasets in tenfold cross-validation shows the consistency of our proposed method. We can see that the best accuracy noted in the ten iterations is 94.89, and the worst is 93.17, which is about 1.72% apart only. Almost a similar pattern is seen in the F1-score as well. The recall was the best consistent measure in the experiments, and the precision only gave a higher margin of 3.36%. In the experiments with test data, a total sample of
676
D. S. Divya Raj et al.
Fig. 9 Graphical representation of tenfold cross-validation on combined dataset
3749546 is used for the tenfold cross-validation. We can see an almost similar kind of pattern in Fig. 8, where the highest margin in the results were noted in precision with 1.14%, and the lowest margin is recorded in the F1-score with 0.84%. These figures explain the constancy of the proposed method throughout the different combinations of datasets for both training and testing separately. We have prepared the next set of experiment setups to enlarge the volume of the dataset by combining the training and testing datasets. Now we have more than one crore data in the collection and executed the cross-validation to test the consistency of our strategy by combining the characteristics of test data along with the training dataset. It is observed in Fig. 9. That the performance of recall is so consistent and has only a 0.1% of difference between the best and most minor performances. The precision has a bit of inconsistency with 3.42% deviation and below 2% for the other two measures. The results with this big collection of data itself show the consistency of the proposed method. To maximize the model performance, we did hyper-parameter tuning in the five ML algorithms and got better results. All algorithms have shown improvement in the accuracy measure. Better accuracy is obtained in the random forest algorithm with 0.14% improvement. The highest improvement of 3.12% is observed in the case of the LightGBM algorithm tuning. Details of the tuning parameters and the corresponding results of different algorithms are presented in Table 7.
5 Conclusion Nowadays, cars are getting more connected with ECUs and their networks with new applications that cause an open way for attacking the CAN bus of automobiles. This attack may form a serious crash on traffic safety security. To effectively identify those attacks, we have presented five lightweight algorithms to detect intrusions in a CAN network and to verify which algorithm can effectively identify the anomaly from the redefined dataset. We implemented Random Forest, XGBoost, LightGBM, Naïve Bayes, and Decision tree in a total of 10495032 data. From the result, we can
A Lightweight Intrusion Detection Model for In-vehicular …
677
Table 7 The accuracy before and after tuning Algorithm
Default values
Default accuracy (%) Tuned values
Random Forest
max_depth = None 95 n_estimators = 100 min_samples_leaf =1
max_depth = 55 n_estimators = 61 min_samples_leaf =2
XGBoost
max_depth = 3 85.80 gamma = 0 reg_alpha = 0 min_child_weight =1
max_depth = 12 86.68 gamma = 8 reg_alpha = 161 min_child_weight =2
LightGBM
num_leaves = 31 86 feature_fraction = 1.0 bagging_freq = 0 min_child_samples = 20
num_leaves = 250 89.12 feature_fraction = 0.595 bagging_freq = 7 min_child_samples = 100
Decision tree
max_depth = None splitter = best
max_depth = 100 splitter = random
Naïve Bayes
var_smoothing = e−9 60.20
87.08
Accuracy after tuning (%) 95.14
89.27
var_smoothing = 0.01 62.94
conclude that the Random Forest algorithm is the best because it is able to detect intrusions and performs well in all sets of measurements like accuracy and F1-score with 95%. Moreover, the consistency of the proposed model has also been verified with the cross-validation test. The performance of the recall is so consistent and has only 0.1% of the difference between the best and most minor performances.
References 1. Hossain MD, Inoue H, Ochiai H, Fall D, Kadobayashi Y (2020) Long short-term memorybased intrusion detection system for in-vehicle controller area network bus. In: IEEE 44th annual computers, software, and applications conference (2020) 2. Hossain MD, Inoue H, Ochiai H, Fall D, Kadobayashi Y (2020) An effective in-vehicle CAN bus intrusion detection system using CNN deep learning approach. In: IEEE global communications conference 3. Hanselmann M, Strauss T, Dormann K, Holger U (2020) CANet: an unsupervised intrusion detection system for high dimensional CAN bus data. In: IEEE digital object identifier 4. Jing N, Wang J, Liu J, Kato N (2019) CANet: attacker identification and intrusion detection for in-vehicle networks. IEEE Commun Lett 23(11) 5. Javed AR, Rehman SU, Ullahkhan MU, Alazab M, Reddy T (2021) CANintelliIDS: detecting in-vehicle intrusion attacks on a controller area network using CNN and attention-based GRU. In: IEEE Trans Netw Sci Eng 8(2) 6. Chiscop I, Gazdag A, Bosman J, Biczok G (2021) CANet: detecting message modification attacks on the CAN bus with temporal convolutional networks. In: 7th international conference on vehicle technology and intelligent transport systems 7. Gmiden M, Gmiden MH, Trabelsi H (2016) An intrusion detection method for securing invehicle CAN bus. In: 17th international conference on sciences and techniques of automatic control & computer engineering
678
D. S. Divya Raj et al.
8. Young C, Olufowobi H, Bloom G, Zambreno J (2019) Automotive intrusion detection based on constant CAN message frequencies across vehicle driving modes. In: ACM workshop on automotive cybersecurity. https://doi.org/10.1145/3309171.3309179 9. Suwwan R, Alkafri S, Elsadek L, Afifi K, Zualkerman I, Aloul F (2021) Intrusion detection for CAN using deep learning techniques. In: Proceedings of the international conference on applied cyber security 10. Shiiyi J, Yinan X, Chung JG (2021) Signature-based intrusion detection system (IDS) for in-vehicle CAN bus network. In: IEEE international symposium on circuits and systems 11. Xun Y, Zhao Y, Liu J (2021) VehicleEIDS: a novel external intrusion detection system based on vehicle voltage signals. IEEE Internet Things J 12. He Y, Jia Z, Hu M, Cui C, Cheng Y, Yang Y (2020) The hybrid similar neighborhood robust factorization machine model for can bus intrusion detection in the in-vehicle network. IEEE Trans Intell Transp Syst 13. Aliyu I, Feliciano MC, Engelenburg SV, Kim DO, Lim CG (2021) A blockchain-based federated forest for SDN-enabled in-vehicle network intrusion detection system. In: Digital object identifier 10.1109/ACCESS.2021.3094365 14. Yang L, Moubayed A, Hamieh I, Shami A (2019) Tree-based intelligent intrusion detection system in internet of vehicles. In: IEEE global communications conference 15. Song HM, Woo J, Kim HK (2020) In-vehicle network intrusion detection using deep convolutional neural network In: Elsevier vehicular communications, vol 21 16. Kim HK (2021) Car hacking: attack & defense challenge 2020 dataset. https://ieeedataport.org/open-access/car-hacking-attack-defense-challenge-2020 17. Sivaganesan D (2019) Efficient routing protocol with collision avoidance in vehicular networks. J Ubiquitous Comput Commun Technol 18. Xiao Y, Xing C, Zhang T, Zhao Z (2019) An intrusion detection model based on feature reduction and convolutional neural networks In: Artificial intelligence for physical-layer wireless communications
Vein Pattern-Based Species Classification from Monocotyledonous Leaf Images with Deep Transfer Learning Abdul Hasib Uddin , Sharder Shams Mahamud, Abdullah Al Noman , Prince Mahmud, and Abu Shamim Mohammad Arif
Abstract Species classification based on vein pattern from Dicotyledonous leaf images is a popular research topic in recent times. However, the same work on Monocotyledonous plants is not available. In this paper, we have developed our own dataset containing center-focused leaf images from three Monocotyledonous plant species, namely Cocos nucifera, Eichhornia crassipes, and Musa musa. For bench-marking, we have applied twenty-two renowned deep learning models with ImageNet-based transfer learning. In the case of four models, DenseNet121, ResNet50, InceptionResNetV2, and Xception, the validation accuracy on our dataset was more than 90%. Moreover, we have provided visualizations of the extracted features from the last ReLU activation layer using Grad-CAM, Grad-CAM ++ , Score-CAM, Faster-ScoreCAM, and Guided Back-propagation. By analyzing the results and the visualizations, it can be concluded that Monocotyledonous plant species can be efficiently classified solely based on their leaf vein pattern. This work will enable the researchers to explore monocotyledonous plant species classification further. Keywords Vein pattern · Monocotyledonous plants · Deep learning · Transfer learning · Feature visualization
1 Introduction In the world, the total number of species of plants is around 400,000 [1]. The identification of such a massive amount of plants is difficult. So the classification of plants by the botanist or other specialist in the related field is not a practical task to do. However, some plants share many common properties, making their classification process much harder. Hence, the classification and identification of plants using a computing device have taken much attention from the research community. Currently, in the field of machine learning, deep learning has taken a significant place, because it can extract important features automatically and does not need A. H. Uddin (B) · S. S. Mahamud · A. Al Noman · P. Mahmud · A. S. M. Arif Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_50
679
680
A. H. Uddin et al.
manual feature extraction. Also, the accuracy can be much higher in deep learning algorithms. In the sector of classification of leaves, generally, leaf shapes are being used as a dominant feature. Nonetheless, in the case of leaf, it also provides additional features such as types of venation of leaf. We have built our dataset by keeping in mind that the order of venation can be a dominant feature for classifying the species of the leaf. Leaf veins act as if they are the fingerprint of a plant species. No two species of plants have the same leaf vein pattern. Therefore, we hypothesize that the species can be determined from leaf veins, even from monocotyledonous plants. This is how parallel foliage of monocotyledonous plants acts as a dominant feature. In this paper, we have presented a new leaf vein image dataset of monocotyledonous plants that are available in Bangladesh. In the rest of the paper, we presented related works in Sect. 2. Section 3 describes dataset preparation in detail. The methodology is described in Sect. 4 and the corresponding results along with the comparison of model performances are presented in Sect. 5, and related discussions along with class activation map visualizations are provided in Sect. 6. Finally, Sect. 7 concludes the research and mentions some possible future contributions.
2 Related Works Recently, the classification of plants from leaf images has become an efficient means in the field of agricultural research. Different classification methods and numerous types of features have been used by researchers in recent times, which got promising outcomes in the related fields. Numerous characteristics such as color variation, aspect ratio, and vein pattern of leaves were used to classify specific plants. Different machine learning, as well as deep learning, algorithms were being used for plant species recognition. Machine learning is being applied by many researchers to classify different types of leaves. Ali et al. proposed an efficient scheme to identify leaf types using RBFNN, SVM, and SAA-based SVM [2]. Many researchers are using deep learning techniques to classify plant leaves. Jing et al. proposed a deep learning-based technique where leaf images were pre-processed and the features were extracted by using pre-trained AlexNet, fine-tuned AlexNet, and D-Leaf models [3]. These features were then classified by five different machine learning methods. A new CNN architecture to classify leaves has been developed by Habibollah et al. [4]. The architecture consists of 5 layers. ReLU/ELU was used as the activation function after each convolution layer. Saleem et al. showed a novel segmentation scheme that segments disease from mango leaf considering the vein pattern of the leaves [5]. Recently, the classification of plants from leaf vein parameters is studied by Guruprasad et al. [6]. They studied venation patterns in several leaf images for three different trees, namely Jackfruit, Mango, and Peepal. A machine learning-based method of plant species identification using leaf venation was proposed by Ambarwari et al. [7]. Grinblat et al. have shown in their work that the classification of legume species (white bean, red bean, and soybean)
Vein Pattern-Based Species Classification from Monocotyledonous …
681
can be done using leaf vein image features [8]. Sue et al. proposed a deep learning approach to quantify discriminate leaf [9]. They showed that shape is not a dominant feature for dicotyledon group leaves rather the different orders of venation are the most important feature. However, the existing works do not include monocotyledonous plants. We have shown in this study that in the case of the monocotyledonous leaves, different orders of venations may also be the key feature. Huang et al. proposed a densely connected network architecture for image classification [10]. Yu et al. deployed DenseNet201 for breast cancer diagnosis [11]. A recursive network was proposed by Lin et al. [12]. He et al. applied ResNets for image classification [13]. Manoharan established a herbal plant classifier utilizing a double-layer authentication process [14]. Dhaya applied Machine Learning models to identify Fusarium oxysporum from tomato plant leaves [15].
3 Dataset Preparation The steps we have followed in this work are shown in Fig. 1.
3.1 Data Collection We collected leaves from three different types of plants (Cocos nucifera, Eichhornia crassipes, and Musa musa) from the Monocotyledonous group. We considered three different types of growth stages of leaves: young, mid-aged, and old. Each set of leaf images has two subsets based on the leaf side—the front and reverse sides. We have shown sample images from each type of plant in Fig. 2. A Hardware augmentation was performed to collect these image data. We used a high-frequency camera to capture video. We placed each leaf on white paper and Fig. 1 Block schematic of the workflow
682
A. H. Uddin et al.
Fig. 2 Sample images of a Cocos nucifera, b Eichhornia crassipes, and c Musa musa
rotated the paper while capturing the video. Then, we extracted images at a oneframe-per-second rate using Python scripting and OpenCV. After that, we cropped the center patch from each extracted image using NumPy slicing so that no image holds the entire leaf shape. This was done to ensure that neural networks can not focus on any geometric features. The statistics of our Monocotyledonous leaf vein dataset is provided in Table 1. The entire dataset contains a total of 24,000 images, of which 9000 images are from Cocos nucifera, 9000 from Eichhornia crassipes, and 6000 images are from the Musa musa group. For Cocos nucifera, there are 6 different types of images, namely front-young, front-middle-aged, front-old, reverse-young, reverse-middle-aged, and reverse-old. Every type contains 1500 images. Similarly, for Eichhornia crassipes, there are 6 different types of images, every type containing 1500 images each. For Musa musa, there are 4 different types of images. They are front-young, front-old, reverse-young, and reverse-old. All four types contain 1500 images each. The dataset can be found here: https://doi.org/10.34740/kaggle/dsv/2842086
3.2 Filename-Format Every group of images is saved in a different folder. The naming of each file is like the following: < cotyledon-type > _ < species-first-name > _ < leaf-age > _ < leaf-side > . < channel > . < image-height > × < image-width > p. < image_no > . < file-format > Here is an example of an image file name, ‘Monocot_Cocos_midAged_front.rgb.384 × 384p.1.png’, which belongs to the Monocot group, Cocos nucifera, which is a medium age leaf and the channel of the image is 3, the width and height of the image are 384 and 384, respectively.
Vein Pattern-Based Species Classification from Monocotyledonous …
683
Table 1 Statistics for Monocotyledonous leaf vein dataset. Each image in this dataset is of size 384 × 384 with 3 channels (RGB) Cotyledon-type
Species
Leaf age
Leaf side
# images
Monocotyledon
Cocos nucifera
Young
Front
1500
Mid-aged
1500
Old
1500
Young
Reverse
1500
Old
1500
Total from Cocos nucifera Eichhornia crassipes
9000 Young
Front
Mid-aged Young
1500 Reverse
Mid-aged Old
1500 9000 Front
Mid-aged
Total from Musa musa Total from Monocotyledon
1500 0
Old Young
1500 1500
Total from Eichhornia crassipes Young
1500 1500
Old
Musa musa
1500
Mid-aged
1500 Reverse
1500
Mid-aged
0
Old
1500 6000 24,000
4 Methodology We have applied 22 ImageNet pre-trained models to our dataset. In this section, we have described dataset preparations and a few model structures.
4.1 Dataset Preparation We split our developed dataset as described in Sect. 3 into 50% training (12,000 images) and 50% validation (12,000 images). Thus, from ‘Cocos nucifera’, we have split the dataset into 4500 images for training and 4500 images for validation; similarly, from ‘Eichhornia crassipes’, 4500 for training and 4500 images for validation; finally, from ‘Musa musa’, 3000 images for training and 3000 images for validation.
684
A. H. Uddin et al.
Each image’s height and width are 384 and 384, respectively. The images contained three channels and all channels are used during model training. In residual blocks, the features from previous layers are added, while in dense blocks the features from the previous layers are concatenated to reserve the most important information. Thus, both ResNets and DenseNets have effective performance which indicates usefulness on many visual recognition tasks.
5 Experimental Results We have trained several Deep Learning algorithms with a batch size of 32 on our dataset and achieved promising performance. In Fig. 3, we presented the training and validation loss curve of one of our best-performed models (DenseNet201). In Fig. 4, we presented the training and validation accuracy curve of one of our best-performed models (DenseNet201). The summary of the performances of our applied models is provided in Table 2. Each model was pre-trained on the ImageNet dataset, and we trained every model on Fig. 3 Training and validation loss curve of DenseNet201
Fig. 4 Training versus validation accuracy curve of DenseNet201 model
Vein Pattern-Based Species Classification from Monocotyledonous …
685
Table 2 Results for plant species identification using 22 deep learning models Model
VGG16
# Test Test loss Layers accuracy (%)
# Correctly classified images from validation set Cocos Eichhornia Musa nucifera (total crassipes (total musa (total 4500) 4500) 3000)
21
0.8404
2.2638
4448
4164
1473
DenseNet121
429
0.9182
0.2226
4443
3948
2627
DenseNet169
597
0.8231
0.7769
4497
3963
1417
DenseNet201
709
0.7894
1.409
4487
4486
500
ResNet50
177
0.9252
0.2184
4423
4403
2276
ResNet50V2
192
0.8827
0.4753
4493
3680
2420
ResNet101
347
0.8693
0.4928
4494
4465
1473
ResNet152
517
0.7957
0.5006
4192
3814
1542
ResNet152V2
566
0.8693
0.3697
4442
3757
2233
InceptionResNetV2 782
0.9019
0.3304
4485
4481
1857
Inception V3
313
0.8363
0.5933
4385
4499
1152
Xception
134
0.9113
0.5241
4497
4499
1940
MobileNet
88
0.8656
1.0646
4489
4490
1408
MobileNetV2
156
0.8687
1.1449
4287
4198
1940
EfficientNetB0
239
0.375
21,755.4746
0
4500
0
EfficientNetB1
341
0.375
568,934.9375 4500
0
0
EfficientNetB2
341
0.375
589,224.0625 4500
0
0
EfficientNetB4
386
0.375
1773.0724
4500
0
0
EfficientNetB3
476
0.3859
73,714.0859
4500
0
131
EfficientNetB5
578
0.2608
5646.2075
247
1161
1722
EfficientNetB6
668
0.375
80,638.2734
4500
0
0
EfficientNetB7
815
0.3861
1225.3704
4489
0
144
our dataset until encountering no improvements upon validation loss in 5 consecutive epochs. Table 2 summarizes the final result of monocotyledon plant identification using 22 Deep Learning models. On the other hand, Table 3 summarizes the precision, recall, and F1-score values for the deployed models on our dataset.
6 Discussion To examine the performance, we have applied different existing models to our own dataset and got above 90% performance in 4 models of DenseNet121 (91.82%), ResNet50 (92.52%), InceptionResNetV2 (90.19%), and Xception
686
A. H. Uddin et al.
Table 3 Performance metrics for the applied models on our dataset Model
Eichhornia crassipes
Musa musa
Precision
Cocos nucifera Recall
F1-score
Precision
Recall
F1-score
Precision
Recall
F1-score
VGG16
0.95
0.99
0.97
0.73
0.93
0.82
0.92
0.49
0.64
DenseNet121
1.00
0.99
0.99
0.90
0.88
0.89
0.82
0.88
0.85
DenseNet169
0.85
1.00
0.92
0.79
0.88
0.83
0.82
0.47
0.60
DenseNet201
1.00
1.00
1.00
0.64
1.00
0.78
1.00
0.17
0.29
ResNet50
1.00
0.98
0.99
0.86
0.98
0.91
0.93
0.76
0.84
ResNet50V2
1.00
1.00
1.00
0.86
0.82
0.84
0.75
0.81
0.78
ResNet101
1.00
1.00
1.00
0.75
0.99
0.85
0.98
0.49
0.65
ResNet152
0.92
0.93
0.92
0.68
0.85
0.76
0.84
0.51
0.64
ResNet152V2
0.96
0.99
0.97
0.82
0.83
0.83
0.80
0.74
0.77
InceptionResNetV2
1.00
1.00
1.00
0.80
1.00
0.89
0.99
0.62
0.76
Inception V3
1.00
0.97
0.99
0.70
1.00
0.82
1.00
0.38
0.55
Xception
1.00
1.00
1.00
0.81
1.00
0.90
1.00
0.65
0.79
MobileNet
1.00
1.00
1.00
0.74
1.00
0.85
1.00
0.47
0.64
MobileNetV2
1.00
0.95
0.98
0.78
0.93
0.85
0.83
0.65
0.73
EfficientNetB0
0.00
0.00
0.00
0.38
1.00
0.55
0.00
0.00
0.00
EfficientNetB1
0.38
1.00
0.55
0.00
0.00
0.00
0.00
0.00
0.00
EfficientNetB2
0.38
1.00
0.55
0.00
0.00
0.00
0.00
0.00
0.00
EfficientNetB4
0.38
1.00
0.55
0.00
0.00
0.00
0.00
0.00
0.00
EfficientNetB3
0.38
1.00
0.55
0.00
0.00
0.00
1.00
0.04
0.08
EfficientNetB5
0.25
0.05
0.09
0.25
0.26
0.25
0.27
0.57
0.37
EfficientNetB6
0.38
1.00
0.55
0.00
0.00
0.00
0.00
0.00
0.00
EfficientNetB7
0.39
1.00
0.56
0.00
0.00
0.00
0.30
0.05
0.08
(91.13%). We have also got above 80% performance in VGG16 (84.04%), VGG19 (88.59%), DenseNet169 (82.31%), ResNet50V2 (88.27%), ResNet101 (86.93%), ResNet101V2 (85.37%), Inception V3 (83.63%), MobileNet (86.56%), and MobileNetV2 (86.87%). The worst results were found in our dataset using the EfficientNetB0 to EfficientNetB7 model and got performance below 40%. From the results, we can observe the fact that the test accuracy of the ResNet50 model result is 92.52%, which is much better than the other models applied to our dataset. According to the performances presented in the previous section, we can see that the ResNet50 performs relatively better than the other models. According to the performance, in the case of dicotyledonous plants, species can be determined from leaf veins, so in the case of monocotyledonous plants, it is possible species can be determined from leaf veins. That is, like the reticulated foliage of dicotyledonous plants, the parallel foliage of monocotyledonous plants also acts as a dominant feature. Additionally, for the visual explanation, we have visualized the final activation outputs in Figs. 5 and 6 using class activation maps, such as Grad-CAM, Grad-CAM + + , Score-CAM, Faster-Score-CAM, and Guided Back-propagation. From Figs. 5
Vein Pattern-Based Species Classification from Monocotyledonous …
687
Fig. 5 Visualization of vein pattern as extracted feature for front side of young aged Eichhornia crassipes leaf (DenseNet201)
and 6, we can take the decision that while image patches of monocot plant leaves do not hold entire leaf shapes, neural networks automatically consider different degrees of vein patterns as the dominant feature, even though monocot plant leaves do not have complex vein patterns of multiple degrees.
7 Conclusion and Future Works In this manuscript, we introduced a novel image dataset for monocotyledon-type plant species identification and classification using only leaf vein pattern. Moreover, we trained several well-known deep learning models to classify plant species. A total of 24,000 images were used for model training and testing. The results show that the classification accuracy is up to 92.52% using the ResNet50 model. This shows that the deep learning models could reliably be used in plant species identification and classification using leaf veins. Moreover, the visualizations signify that different structures of venation can be the dominant feature of the Monocotyledonous group.
688
A. H. Uddin et al.
Fig. 6 Visualization of vein pattern as extracted feature for reverse side of young aged Eichhornia crassipes leaf (DenseNet201)
Hence, for the Monocotyledonous plants, we can use the venation of leaves for the classification of plant species. Further study may be performed using different deep learning algorithms and with a large number of image data, which may lead to better results. Also, The total number of parameters in the Convolutional networks is huge, which necessitates more resources and time. However, the number of parameters can be reduced by applying dropouts. Additionally, the development of new algorithms, and improving image quality may guide to better performance. Over the above, it will be possible to classify Monocotyledonous plant species based on the venation of leaves. Acknowledgements This work is funded by the division of Information and Communication Technology (ICT), Ministry of Posts, Telecommunications and Information Technology, Government of the People’s Republic of Bangladesh.
Vein Pattern-Based Species Classification from Monocotyledonous …
689
References 1. Govaerts R (2001) How many species of seed plants are there? Taxon 50(4):1085–1090 2. Ahmed A, Hussein SE (2020) Leaf identification using radial basis function neural networks and SSA based support vector machine. PLoS ONE 15(8):e0237645. https://doi.org/10.1371/ journal.pone.0237645 3. Tan Jw, Chang S-W, Abdul-Kareem S, Yap HJ, Yong K-T (2020) Deep learning for plant species classification using leaf vein morphometric. IEEE/ACM Trans Comput Biol Bioinf 17(1): 82–90. https://doi.org/10.1109/TCBB.2018.2848653 4. Habibollah Agh. A convolutional neural network with a new architecture applied on leaf classification. IIOAB J 7(5). ISSN 0326-0331 5. Saleem R, Shah JH, Sharif M, Yasmin M, Yong H-S, Cha J (2021) Mango leaf disease recognition and classification using novel segmentation and vein pattern technique. Appl Sci 11(24):11901. https://doi.org/10.3390/app112411901 6. Samanta G, Chakrabarti A, Bhattacharya BB (2020) Extraction of leaf-vein parameters and classification of plants using machine learning. In: Proceedings of international conference on frontiers in computing and systems, pp 579–586. https://doi.org/10.1007/978-981-15-78342_54 7. Ambarwari A, Adrian QJ, Herdiyeni Y, Hermadi I (2020) Plant species identification based on leaf venation features using SVM. TELKOMNIKA (Telecommunication Computing Electronics and Control) 18(2):726. https://doi.org/10.12928/TELKOMNIKA.V18I2.14062 8. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification using vein morphological patterns. Comput Electron Agric 127:418–424 9. Lee SH, Chan CS, Mayo SJ, Remagnino P (2017) How deep learning extracts and learns leaf features for plant classification. Pattern Recogn. https://doi.org/10.1016/j.patcog.2017.05.015 10. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 11. Yu X, Zeng N, Liu S, Zhang Y-D (2019) Utilization of DenseNet201 for diagnosis of breast abnormality. Mach Vis Appl 30:1135–1144 12. Lin M, Chen Q, Yan S (2014) Network in network. arXiv:1312.4400v3 [cs.NE] 13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 14. Manoharan JS (2021) Flawless detection of herbal plant leaf by machine learning classifier through two stage authentication procedure. J Artif Intell Capsule Netw 3(2):125–139 15. Dhaya R (2020) Flawless identification of fusarium oxysporum in tomato plant leaves by machine learning algorithm. J Innov Image Process (JIIP) 2(04):194–201
A Scalable Distributed Query Framework for Unstructured Big Clinical Data: A Case Study on Diabetic Records Ahmet Sayar
Abstract Nearly 80% of the information in the healthcare sector is unstructured data, and this percentage is steadily rising. Traditional relational database solutions are not effective for the analysis and querying of that type of data. For the purpose of searching and analyzing unstructured clinical data, we provide a distributed and scalable big data system in this research. The framework was developed using the open source Hadoop and Hive libraries and is based on the MapReduce paradigm. The effectiveness of the suggested architecture is demonstrated by running various queries on actual records from diabetic clinics. In terms of response times, the framework is also contrasted with relational databases. Keywords Big data · Hadoop · Hive · Querying diabetes records
1 Introduction The size of the clinical data is growing exponentially with each passing day. Big data has a high volume of data (data scale), a high rate of change (different data types), a high variety, and a high veracity (data uncertainty) [17]. The enormous growth in data is largely the result of new technologies. Massive volumes of organized, semistructured, or unstructured data are produced and gathered by them. These sources include electronic medical records (EMR), social media, blogs, websites, and other scientific sensors like medical imaging, biometrics data, and DNA research. These datasets result in some storage, analytical, and visualization issues [16]. With relational database systems, maintaining, querying, and analyzing such large datasets is becoming more difficult and occasionally impossible. Parallel solutions are found for these issues in distributed systems [1, 9]. These systems must be scalable to support both the addition of new conventional processors (or “nodes”) and the execution of numerous jobs at once [6, 19]. A Distributed File System (DFS), a novel type of file A. Sayar (B) Computer Engineering Department, Kocaeli University, 41380 ˙Izmit, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_51
691
692
A. Sayar
system that controls storage across a network of computers, serves as the foundation of these systems [8]. Data replication is a benefit of DFS. So, distributing data over regular computing nodes ensures protection against frequent media failures. Hadoop Distributed File System (HDFS), used by Hadoop, is an open source file system. HDFS offers the capacity to run on clusters and store very large files [20, 24]. Numerous more large data solutions are also offered by the Hadoop ecosystem, such as Hive, a warehousing solution built on top of Hadoop. Ad hoc queries, summarization, and data analysis can all be done by SQL users. In this paper, we present a Hadoop and Hive-based architecture and implementation for handling sizable clinical datasets. Additionally, a case study using the suggested architecture is completed using records from 70,000 inpatient diabetes encounters. The same queries are also conducted on the MySQL relational database to assess how quickly the proposed architecture responds to queries. There are few publications in the literature that use the MapReduce programming framework to accelerate certain data analytics on various sorts of clinical big datasets. Clinical datasets, biometric datasets, bioinformatics datasets, biomedical signal datasets, and biomedical image datasets are a few examples of those that are publicly accessible. We concentrate on the analysis and querying of clinical large datasets in this research. In their analysis of existing Hadoop-based clinical big data applications and implementations, Mohammed et al. [14] also discuss adjacent medical health informatics domains. Using a Pig Latin alphabet [15], Horiguchi et al. [11] develop usercustomized functions for processing massive volumes of administrative data. On a sizable claims database, the built functions’ response times and scalability were evaluated. We are concerned with querying large-scale data, whereas they are focused on translating large-scale data into a wide table format. To manage enormous amounts of clinical data related to heart disease, Wang et al. [23] offer a novel system called the Clinical Data Managing and Analyzing System that combines an XML database with the HBase/Hadoop infrastructure. Doctors can examine this data using distributed data mining methods based on the Apache Mahout Library [3]. In order to detect unproven cancer treatments on the health Web, Aphinyanaphongsa et al. [4] propose machine learning models. They do this by presenting techniques and relevant empirical data proving strong accuracy and generalizability. The MapReduce distributed architecture is implemented to apply the proposed method to billions of documents on the health Web. The outcomes of the current effort seem promising, according to experiments. The following two publications demonstrate the applicability of machine learning and data mining techniques to huge clinical datasets. Multivariable logistic regression was employed by Strack et al. [21] to fit the correlation between early readmission and HbA1c level. The primary distinction between Strack’s research and ours is the use of a Hadoop and Hive-based architecture for some of our queries as well as the analysis of records from over 70,000 inpatient diabetes visits. The associated research listed above makes use of Mahout, Pig Latin, Hadoop MapReduce, or a mix of these. Kul et al. [12], in their work, implemented a real-time system that can identify Twitter opinions about the COVID-19 Vaccine using Hadoop. All tweets are divided into three categories (Positive, Neutral, and
A Scalable Distributed Query Framework for Unstructured Big Clinical Data …
693
Negative). Sentiment analysis was conducted by Logistic Regression, Random Forest, Deep Neural Network, and Convolutional Neural Network. Compared to our study, they are not using a query framework. To our knowledge, no prior effort has used Hive to evaluate clinical data. In this paper, we provide a method for searching diabetes records along with a case study of it. Regarding the querying of the diabetes or other datasets, there are some solution approaches based on the usage of other big data technologies [18]. Some research deals with scalability and performance issues in distributed querying of datasets and propose algorithm-based solutions [7]. The focus of this paper is enhancing querying response times by using open source big data technologies. The remainder of the papers is structured as follows. The preliminary work on Apache Hadoop, MapReduce, and Hive is highlighted in the Sect. 2. In Sect. 3, the suggested architecture is described. It discusses Hive and the Hadoop APIs for efficiently analyzing huge clinical datasets. The proposed implementation is demonstrated in Sect. 4 along with the MySQL-based solution. The final portion includes a conclusion and some recommendations for future improvements.
2 Preliminaries on Hadoop and Hive The HDFS for storage and MapReduce for computing make up Hadoop’s masterslave architecture. Hadoop utilizes the MapReduce programming model to execute tasks in parallel [2, 5]. The Hadoop MapReduce framework’s primary benefits include scalability, cost-effectiveness, flexibility, speed, and failure resilience [25]. On the Hadoop platform, Hive [22] is an open source data warehousing solution. Data analysts who are conversant with SQL and need to manage and analyze Hadoop-scale data continue to be its primary target customers. Databases, tables, rows, columns, schema, and views are all supported by Hive. It is accessible interactively through a variety of means, including Web GUI, command-line interface (CLI), and programmatically through Java utilizing Java Database Connectivity (JDBC). It runs the SQL-like language known as HiveQL. For illustration, a query to retrieve all patients with diabetes whose age is 30 is as follows: SELECT patients. ∗ FROM diabets WHERE user.age = 30; In the same amount of time, it would take us to develop your main method in Java, Hive enables us to write and begin executing MapReduce jobs. A metastore, which normally lives in a relational database for storing information, is a brand-new and crucial part of Hive (metadata). Figure 1 depicts the high-level architecture diagram of Hive.
694
A. Sayar
Fig. 1 High-level architecture diagram of Hive
Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) are used for connecting the outside databases. ODBC is an open standard application programming interface (API) for accessing databases. They are used for accessing data in Database Management Systems. Let’s assume we have patient personal information (data.txt), which is stored locally and is comma-separated, before describing HiveQL. To store this data in Hive, a table must be defined. hive > CREATE TABLE patients (name String, surname String, age INT) Semicolons are used to conclude HiveQL statements at the end. “Patients” is a three-column table that is described here. Name is the first column, which is of type String. Surname is the second column, which is also of type String. Age is the third column, which is of type INT. Additionally, Hive supports a variety of types, including strings, booleans, signed integers, floating point numbers, maps, arrays, and structs [13]. In HiveQL, managing and defining tables is comparable to using traditional Relational Database Management System (RDMS). The user of Hive doesn’t even need to be familiar with MapReduce, which is one of its greatest features. The user only makes database queries using a language that is related to SQL. The query result should typically be stored on disk and printed right from the screen. The suggested architecture for processing clinical big data is provided in the following part, and Section V uses identical queries from this section to illustrate the proposed methodology.
3 Proposed Architecture Figure 2 shows the major components of the proposed architecture based on Hadoop and Hive. As shown, the main components of architecture are:
A Scalable Distributed Query Framework for Unstructured Big Clinical Data …
695
Fig. 2 Proposed architecture of processing of clinical big data
• Hive Client: (i) Node.js application (ii) Hive Thrift Client to run Hive commands from a wide range of programming languages. • Hive Services gives you the Hive Client for analysis with strong SQL skills to run queries on huge volume of data. • Metastore: The component that stores all the structure information of the tables and partitions. • Driver: The component which receives the queries. • HDFS and MapReduce.
4 A Case Study: Large-Scale Querying of Diabetes Records In this investigation, we compared the response times of our proposed method and the MySQL-based implementation using the Health Facts database [10], a nationwide data warehouse that gathers thorough clinical records from hospitals across the United States. On a system with a 2.3 GHz Intel i7 2820QM processor, 8 MB of cache, 6 GB of RAM, and the CentOS 5.4 64-bit operating system, we ran experimental testing. For the test purposes, five different queries, which are given in Appendix, are performed. Each one of the queries is performed ten times, and the results are displayed in Table 1 as the average processing time. The first column shows the query number, which is given in Appendix, and the second column shows the average response time in case of using the proposed system. The last column gives the response time when we use the query on MySQL.The first query’s description, the HiveQL command, and the outcome are listed below. The Appendix contains information on the last four queries. • Find each patient whose readmission day is 30 or less, and find the results of the A1c test. A1c test results show the range of the result or whether the test was
696
A. Sayar
Table 1 Query response times of the proposed architecture on the diabetes dataset Query Number Average query response times (s) On Hive On MySQL 1 2 3 4 5
10,554 12,714 21,818 15,637 14,416
15,347 18,638 27,479 21,637 17,632
administered or not. There are four possible values for the result: “normal” if it was less than 7%, “>7” if the result was greater than 7% but less than 8%, and “none” if wasn’t measured.
5 Conclusion and Future Work The effective use of available time is an essential component of data management. The primary objective of this investigation is to supply researchers with open source technologies and a Hadoop and Hive-based architecture that is both time-efficient and effective in the management of enormous amounts of clinical data. The suggested method demonstrates that large-scale diabetic datasets housed in distributed file systems supplied by open source technologies may be queried scalably using traditional SQL commands. Here, we employ Hive and HDFS. To analyze and contrast the various response times of queries, an application built on top of a relational database has
A Scalable Distributed Query Framework for Unstructured Big Clinical Data …
697
been developed. The time efficiency of the suggested architecture is demonstrated to be superior to that of the relational design by the results of the experiments. The proposed architecture is quite straightforward and uncomplicated for researchers to utilize; also, the results of the tests indicate that it is a potential method for effectively querying substantial amounts of clinical data by utilizing MapReduce technology and Hive. In addition, additional work needs to be done to assess the proposed system using datasets and cluster sizes that are significantly bigger in order to locate potential performance bottlenecks.
6 Other Queries Query 2: Find demographics (such as race) distributions of all patients whose readmission day is equal to or less than 30. Race values can be Caucasian, Asian, African American, Hispanic, and other. Other demographics such as sex and age can be queried. Gender values can be male, female, and unknown/invalid. Age is grouped in 10-year intervals: [0, 10), [10, 20), . . ., [90, 100). select race,count(race) as quantity from diabetic.dia _ data where readmitted like’% < 30%’ group by (race); Query 3: List the generic names of mostly given 5 diabetic medications. Query 4: Find the number of the primary diagnosis coded as first three digits of ICD9 (848 distinct values). Group names and their ICD9 codes of primary diagnosis are as following: Circulatory (390-459, 785), Respiratory (460-519, 786), Digestive (520-579, 787), Diabetes (250.xx), Injury (800-999), Musculoskeletal (710-739), Genitourinary (580-629, 788), Neoplasms (140-239), and others. SELECT count(CASE WHEN (diag_1 >390 AND diag_1460 AND diag_1520 AND diag_1=250 AND diag_1800 AND diag_1710 AND diag_1580 AND diag_1140 AND diag_1 1), do 3: Estimate S deviant and S stars using Eqs. (13) and (14) 4:Compute Sdev ← Sdeviant − Sstar s //deviations from standards 5: Estimate S dev(mean) using Eqs. (11) 6: Estimate S pred using Eqs. (12) and (13) 7: Update S dev(mean) using Algorithm 2 8: end for
Auditory Machine Intelligence for Incipient Fault Localization …
883
Algorithm 2. AMI Learning Algorithm: 1: Initialize prediction output variable S pred , time series input variable states as standards, S stars , define deviant mean variable, S dev(mean) , S diff(1) as difference between S pred , S deviant +1 and S diff(2) as numerical difference between S dev(mean) , and |S diff(1) |, lp as corrective (additive) bias. 2: for all (s ∈ s.Sstars ) do 3: if S diff(2) > 0
4: Sdev(mean) ← Sdev(mean) − Sdi f f (1) //Weaken deviant mean by a factor, |S diff(1) | 5: elseif S diff(2) < 0
6: Sdev(mean) ← Sdev(mean) + Sdi f f (1) //Reinforce deviant mean by a factor, |S diff(1) | 7: else 8: Sdev(mean) ← Sdev(mean) + l p . 9: end if 10: end for As can be seen in listings 1 and 2, the Neuro-AMI processing is fine-tuned with respect to learning a set of deviant mean estimates of the predicted pattern. Hebbian learning is incorporated by adjusting (reducing or increasing) deviant mean value using error corrective estimates in the predicted values, i.e. the difference between prediction and actual values which is seen as a deviant incremented at time step t + 1. The deviant mean is computed using Eq. (11). This conditioning or regulatory approach results in a more precise and accurate prediction estimate over time.
3 Simulation Results 3.1 System Details Computer simulations were done using MATLAB/SIMULINK based on existing line parameters of a Transmission Line (TL) of a section of the Nigerian 330 kV network (Onitsha-Alaoji single circuit). A real-time emulator for realizing the predictive fault localization functions is shown in Fig. 3. Table 1 shows the TL parameter specifications used to generate the results.
884
B. A. Wokoma et al.
Fig. 3 Schematic of power transmission line for resonance simulation fault localization study
Table 1 Key TL specifications
Parameter
Value/Specification
Unit
Line length
138
km
Circuit type
Single
NA
Conductor cross-section
350
mm2
Resistance
0.0390
/km
Inductance Capacitance
1.11 912.06
mH/km uF/km
3.2 System Level Simulations—Under No-Fault and Faulted Conditions In this section, the task emphasizes the continuous prediction of the estimated Power Spectral Density (PSD) values at the next time step using data from the previous time step. This presents a continual learning time series problem to a machine learning technique that must be solved in real time. For a no-fault situation, the PSD prediction results using the Neuro-AMI continual learning predictor as shown in Fig. 4. As can be seen, there are noticeable deviations at the maxima and minima points (data points 250, 1250, 1750, 2800, and 3750). The results (response graphs) in Fig. 4 are indicative of the close correlation between the actual PSD estimate and the predicted one. In the case of a fault in the transmission line, we consider a fault after line 2—see the schematic of Fig. 3. This corresponds to a fault at a location of 40 km from the stepup transformer end and at a resistance of 0.1 . The resulting simulation is presented in the graph of Fig. 5. Just as in the no-fault case, the Neuro-AMI prediction results
Auditory Machine Intelligence for Incipient Fault Localization … 4
5
885
PSD REsponse of the Prediction vs Actual - No Fault
x 10
Predicted PSD Actual PSD
4.5 4
Voltage PSD
3.5 3 2.5 2 1.5 1 0.5
0
500
1000
1500
2500 2000 data points
3000
3500
4000
4500
Fig. 4 PSD prediction response compared to actual values during no-fault simulation
follow the actual values closely. However, at peaks of about 25 V/k-Hz, 27 V/kHz, and 35 V/k-Hz, there are noticeable discrepancies in the actual versus predicted estimates during the faulted case (see Fig. 5). In particular, we can notice that there are only deviations at the maxima points during the faulted case.
3.3 Comparative System-Level Data-Driven Classification Using the VSB Dataset In this case, a small sample from the VSB power line datasets which was provided by Ostrava Technical University as a Kaggle competition dataset was employed in comparison with the standard well-known and popular back-propagation trained feed-forward Artificial Neural Network (BP-FFANN). The datasets consist of Partial Discharge (PD) signal measurements on the three power line phases. The comparative results were reported in terms of the Root Mean Squared Error (RMSE) as shown in Table 2 for the power lines and for the first 100 samples of the VSB dataset. The standard ANN (BP-FFANN) followed the usual convention of training–testing data split with 60% for training and 40% for testing from the considered 100 samples, and the simulations were performed for 5 trials and the mean computed. Also, a bivariate data splitting method using a scheme earlier proposed in [17] was employed for the continual learning predictions in the proposed Neuro-AMI
886
B. A. Wokoma et al.
Fig. 5 PSD prediction response compared to actual values during faulted conditions
Table 2 Comparative RMSE results using VSB dataset
Line
AMIRMSE
BP-FFANNRMSE
1
0.3742
0.5428
2
0.3742
0.4628
3
0.3464
0.6628
technique. In Table 3 is shown the individual trial errors for the different lines as per the BP-FFANN predictions. Table 3 BP-FFANN RMSE for different trial runs and lines using VSB dataset Trials
BP-FFANNRMSE(1)
BP-FFANNRMSE(2)
BP-FFANNRMSE(3)
1
0.6363
0.3747
0.5027
2
0.4800
0.7221
0.6983
3
0.4730
0.5404
0.8559
4
0.3675
0.3745
0.4053
5
0.3571
0.7025
0.8518
Auditory Machine Intelligence for Incipient Fault Localization …
887
4 Discussion and Conclusion This research paper has proposed a continual learning neuronal auditory machine intelligence (Neuro-AMI) approach and simulation model to TL fault diagnosis in a power system transmission network. It has also presented some initial results on the developed solution model, and the results showed a good predictive response of the considered approach. As can be seen from initial Power Spectral Density (PSD) simulations, the Neuro-AMI predictor closely matches the expected PSD values for a given continual learning task. Furthermore, the comparative results considering a real-world case study data showed the superiority of using the proposed technique over the conventional BPANN. It is important to emphasize that the Neuro-AMI automatically encodes a class structure from the input data continually rather than the convention of assigning class labels to input values (class–input pairing) as used in the BP-FFANN. In particular, it shows the advantage of the AMI technique in the context of small data analysis and continual learning against the instabilities inherent in the conventional neural scheme. Currently, this research work is ongoing at the Department of Electrical Engineering, Rivers State University, Nigeria. Future work will incorporate real-time embedded microprocessor relaying logic to further enhance the proposed model features. Also, the proposed approach should be applied to different line configurations considering the varieties of existing line lengths and considering the gradual variation of the fault resistances and partial discharge signals in real time. Acknowledgements Sincere thanks and acknowledgment go to Engr. E. N. Osegi of the SurePay Foundations Group who assisted and provided code modifications and neural design for the simulation experiments.
References 1. Negrao MML, da Silva PRN, Gomes CR, Gomes HS, Junior PV, Sanz-Bobi MA (2013) MCHO–A new indicator for insulation conditions in transmission lines. Int J Electr Power Energy Syst 53:733–741 2. Stefenon SF, Ribeiro MHDM, Nied A, Mariani VC, dos Santos Coelho L, da Rocha DFM, Grebogi RB, de Barros Ruano AE (2020) Wavelet group method of data handling for fault prediction in electrical power insulators. Int J Electr Power Energy Syst 123:106269 3. Tayeb EBM, Rhim OAAA (2011) Transmission line faults detection, classification and location using artificial neural network. In: 2011 ınternational conference & utility exhibition on power and energy systems: ıssues and prospects for Asia (ICUE), pp 1–5 4. Roostaee S, Thomas MS, Mehfuz S (2017) Experimental studies on impedance-based fault location for long transmission lines. Protect Contr Modern Power Syst 2(1):1–9 5. Jembari NN, Yi SS, Utomo WM, Zin NM, Zambri NA, Mustafa F, ... Buswig YM (2019) IoT based three phase fault analysis for temporary and permanent fault detection. J Electr Power Electron Syst (2)
888
B. A. Wokoma et al.
6. Mustari MR, Hashim MN, Osman MK, Ahmad AR, Ahmad F, Ibrahim MN (2019) Fault location estimation on transmission lines using neuro-fuzzy system. Procedia Comput Sci 163:591–602 7. Li M, Yu Y, Ji T, Wu Q (2019) On-line transmission line fault classification using long short-term memory. In 2019 IEEE 12th ınternational symposium on diagnostics for electrical machines, power electronics and drives (SDEMPED), pp 513–518 8. Contreras-Valdes A, Amezquita-Sanchez JP, Granados-Lieberman D, Valtierra-Rodriguez M (2020) Predictive data mining techniques for fault diagnosis of electric equipment: a review. Appl Sci 10(3):950 9. Shetty N (2021) A comprehensive review on power efficient fault tolerance models in high performance computation systems. J Soft Comput Paradigm 3(3):135–148 10. Amanuel T, Ghirmay A, Ghebremeskel H, Ghebrehiwet R, Bahlibi W (2021) Design of vibration frequency method with fine-tuned factor for fault detection of three phase induction motor. J Innov Image Process (JIIP) 3(1):52–65 11. Andresen CA, Torsaeter BN, Haugdal H, Uhlen K (2018) Fault detection and prediction in smart grids. In 2018 IEEE 9th international workshop on applied measurements for power systems (AMPS), pp 1–6 12. Govindarajan S, Kim Y, Holbert KE (2015) A novel methodology for power cable monitoring using frequency domain analysis. In: 2015 North American power symposium (NAPS), pp 1–6 13. Glover JD, Sarma M, Overbye TJ (2012) Transmission lines: steady-state operation. Power Syst Anal Des 254–262 14. Lin K, Holbert KE (2009) Applying the equivalent pi circuit to the modeling of hydraulic pressurized lines. Math Comput Simul 79(7):2064–2075 15. Osegi EN, Anireh VI (2020) AMI: an auditory machine intelligence algorithm for predicting sensory-like data. Comput Sci 5(2):71–89 16. Osegi EN, Taylor OE, Wokoma BA, Idachaba AO (2020) A smart grid technique for dynamic load prediction in Nigerian power distribution network. In: Pandit M, Srivastava L, Venkata Rao R, Bansal J (eds) Intelligent computing applications for sustainable real-world systems. ICSISCET 2019. Proceedings in adaptation, learning and optimization, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-030-44758-8_38 17. Osegi EN (2021) Using the hierarchical temporal memory spatial pooler for short-term forecasting of electrical load time series. Appl Comput Inform 17(2): 264–278. https://doi.org/10. 1016/j.aci.2018.09.002 18. Osegi EN, Jumbo EF (2021) Comparative analysis of credit card fraud detection in simulated annealing trained artificial neural network and hierarchical temporal memory. Mach Learn Appl 6:100080
The RRRS Methodology Using Self-Regulated Strategies with ICT to Homogenize the English Language Verónica Elizabeth Chicaiza Redin , Sarah Iza-Pazmiño , Edgar Guadia Encalada Trujillo , and Cristina del Rocío Jordán Buenaño Abstract The Research, Revision, Review, and Self-assessment (RRRS) methodology is used in establishing regulated self-learning among the students of the National and Foreign Languages Career. This research work aims to analyze the application of this methodology by using the resources based on Information and Communication Technologies (ICT); the research comes under an analytical experimental type by utilizing a quantitative information analysis approach, and several strategies were applied by using ICT resources such as videos, use of mobile devices, web 2.0 tools; the technique used here was the application of experiments by using digital and technological resources. This study was applied to twenty-nine students in a sixteen-week period during the intervention process with two hours daily; the methodology applied for the development of technological resources was ADDIE, the same one that focuses on the pedagogical development mediated by technology in students. As a result, it was possible to demonstrate the improvement that exists while applying ICT-based strategies and self-regulated learning, and finally, it is concluded that the use of technological resources improves the learning of English language and the skills that are evaluated in this language. Keywords Education · ICT · Web 2.0 tools · Learning English · Self-regulated strategies
V. E. Chicaiza Redin (B) · S. Iza-Pazmiño · E. G. Encalada Trujillo · C. del Rocío Jordán Buenaño Facultad de Ciencias Humanas y de la Educación, Universidad Técnica de Ambato, Ambato, Ecuador e-mail: [email protected] S. Iza-Pazmiño e-mail: [email protected] E. G. Encalada Trujillo e-mail: [email protected] C. del Rocío Jordán Buenaño e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_65
889
890
V. E. Chicaiza Redin et al.
1 Introduction It is undeniable that the importance of learning English language has a lot of benefits for university students. When they learn a second language, they can expand their understanding of the world and other cultures. This, in turn, helps them to value their culture and national identity, and also the principles that constitute the core of good citizenship. Nowadays, speaking English language has become a necessity. When students know English, they can enroll in international graduate programs or access better jobs. These are the main reasons that students consider when learning English as a language. Furthermore, professionals who know English are more competitive in their job places, which allows them to have economic mobility. However, according to [1], despite the importance of English, Ecuador has a very low level of proficiency in this language. According to an EF report in 2017, Ecuador ranked 55th out of 80 countries. To rank the countries, EF administers a test. This year 85,000 people were evaluated and the average score that Ecuadorians had was 49.32 points out of 100. Regarding the age group, adults from 31 to 40 years old scored 47.26 on average and people from 18 to 20 years scored 53.57. This data shows that young people have better English compared to adults [2]. In addition, Heredia affirms that “The EF test was also taken in 600 educational establishments in the country. 132,493 boys participated in this. The result was 49 out of 100 points, just like the adults. That is to say, a low level, according to the EF study” (2017). Continued efforts to improve the teaching of English in the country, train teachers, grant study abroad scholarships, and bring native teachers to teach, have been minimal. The search for new alternatives to change this reality becomes vital due to the necessity of improving the quality of education in teaching the English language. The need to learn English is increasing in Ecuador. This has become an important tool for work and professional growth. Faced with this demand, the supply of private programs for teaching this language has intensified rapidly. The Ecuadorian public university, for its part, has also generated similar programs providing greater accessibility to students from all social strata in the country. English teachers are always exploring new methodological ways to facilitate the acquisition of this foreign language and improve their teaching practices. The researchers of the Pedagogy of National and Foreign Languages major at the Technical University of Ambato propose to design and implement the RRRS methodology with a variety of strategies with a focus on self-regulated learning with technological tools to homogenize the command of English in university students from zone 3 of the country. The English area aims to provide students with communicative, cognitive, and cultural tools combined with technology, for the appropriate use of the language in the academic context and in social interaction. The rapid advancement of information and communication technologies (ICTs) has made possible contributions to English language education over the last few decades. In fact, the use of technology provides students with unprecedented opportunities to practice English and engage in authentic language use environments [3].
The RRRS Methodology Using Self-Regulated Strategies with ICT …
891
For example, they can use Skype Chat to interact [4] or social networking sites like Facebook or Twitter to practice speaking with native speakers or classmates. In addition, the integration of ICTs increases the motivation of students due to the multimedia capabilities that include visual, audio, and video aids [3]. However, technology lies in educational spaces and students show their skills in handling various technological tools. This does not mean that the student has control over their learning process. Therefore, the self-regulation of learning plays a very crucial role because it allows the development of executive functions which are necessary for planning, organization, the ability to concentrate and memory. Likewise, it will serve to assimilate technological tools as didactic-pedagogical tools, transforming the learning process into a planned, organized act with specific goals. In addition, the evaluation of the self-regulation of learning will allow the student to assume a critical and participatory position in their learning environment. In this regard, [1] in the article entitled “Teaching English through technology: some perspectives”, authors have explored the potential to identify relevant factors for language teaching by using video games and applications of Android. The discriminatory approach to language learning has gained paramount importance. Innovations are used to achieve the desired goals. Teaching English as a second language is a challenging task, and the shortcomings of the process can be solved through the adoption of adequate methodologies. Teaching English through video games and Android applications stimulates students’ great interest and makes them improve their language learning skills in an enjoyable way. Meanwhile, the Ministry of Education indicates that the basic principles of the curricular proposal for the integration of the foreign language in the national curriculum include, among others: • The Communicative Language Approach: Language is best learned as a means of interacting and communicating rather than as a body of knowledge to be memorized. • International Standards: The curriculum is based on internationally recognized teaching levels and processes for language learning. On the other hand, the current career application system and the limited number of quotas for admission to universities have caused several students to opt for the Pedagogy of National and Foreign Languages major as a bridge to access the career of their preference. In addition to this, the nomenclature of said career apparently gives a different idea of academic offer than that expected by the applicants. That’s why there are many students who quit their studies at the beginning since they don’t have an acceptable knowledge of the English Language which is needed to understand the different subjects taught in this program. In view of the heterogeneous knowledge shown by students in the first levels, in terms of their English language skills, through the study of self-regulated learning with technological tools to university students in zone 3 of the country, the aim is to homogenize the domain of this language. In this way, achieve a greater impact benefiting a considerable number of the university student population of Ecuador [5]. It is evident, then, the country needs to have English teachers with solid knowledge and updated methodologies, which allow them to train students who can develop communication skills in the English language.
892
V. E. Chicaiza Redin et al.
This solid knowledge is acquired from the first levels of the careers that train English teachers, and if students do not have an acceptable level of mastery of communicative skills in the English language, the necessary foundations will not be laid for their correct training. The Pedagogy of National and Foreign Languages course aims to train English language teaching professionals for the different levels of the Ecuadorian educational system, and the purpose of this project is to contribute to said training through the implementation of a methodology that allows students of the first levels to selflevel their knowledge of the English language. This will bring as a direct benefit an improvement in the academic performance of the students of the Pedagogy of National and Foreign Languages major since by leveling their knowledge of English, they will be able to acquire the theoretical and methodological knowledge raised in the curriculum of the career of a more seamless and efficient way. Similarly, the Ecuadorian educational system will benefit from English teachers better trained to transfer to their students a level of the independent communicative domain upon finishing high school. The entire educational community will be the indirect beneficiaries of this proposal.
2 State of the Art Self-regulated learning strategies are those applied by students to improve their academic performance. Self-regulation has been defined as “the control that the subject performs over his thoughts, actions, emotions and motivation through personal strategies to achieve the objectives that he has established” [6]. Higher education institutions have implemented an autonomous learning component in their programs, thus recognizing the importance of self-regulated learning [7]. Students use this time in a variety of ways, and teachers are expected to assign homework to their students with the goal of solidly building their knowledge. Students are given the freedom to decide what self-study strategies they will use during these periods of autonomous work, but these processes are not closely monitored by teachers, and grades largely reflect students’ differing levels of success at the time of applying these strategies [8]. Learning a second language is a complex process in which several factors intervene. One such factor is the direct relationship between using self-regulation strategies for learning and motivation, which students from different educational levels can also use. On the other hand, there is a wide variety of technological resources available for learning the English language on the Internet. Often called Open Educational Resources, these resources allow students to locate the necessary materials to improve their learning of a second language, but the intervention of the teacher is necessary to guide the student in the selection of the necessary tools for their learning, since many open educational resources do not meet the needs of users, despite having a direct impact on the development of the current knowledge society [9].
The RRRS Methodology Using Self-Regulated Strategies with ICT …
893
Although students must develop their own strategies for self-regulation of learning, it is important that teachers are also involved in the process of applying these strategies, making use of the technological resources that exist at their disposal. In fact, it is known that the more teachers master the use of information technologies, the more they will use them with their students [10]. However, in addition to the use of technological strategies that improve teaching practice, it is necessary for students to use self-regulation strategies for their learning that allow them to incorporate technological tools and thus develop their proficiency in English language, despite the fact that different self-efficiency profiles can be found in the same course [11]. Therefore, it is possible to identify the importance of a methodology that allows providing tools that facilitate the self-regulated learning of the students of the first levels of the National and Foreign Languages Pedagogy Major, since it has been identified that the strategies of self-regulated learning allow achieving true learning in students at the university level.
2.1 ICT and Foreign Language Since the middle of the last century, according to [9], accelerated growth in the demand for tertiary consumption services began to be observed as a consequence of industrialization. This growth, in turn, contributed to the gestation of a technological revolution, and with it, to the generation of changes that have been reflected in the transformation of various structures and social processes that encompass the economic, social, and educational spheres [13]; these changes have triggered some problems due to the growing gap between educational systems and technological development, since up to now individuals have not been adequately equipped with digital skills necessary to search, select, prepare and disseminate information, particularly those according to the demands of the current world of work. To address the above problem, among other actions, several attempts have been made in different contexts to incorporate Information and Communication Technologies (ICT) into education, especially in the field of foreign language teaching; an area of education that has traditionally been open and flexible to innovations and the incorporation of new tools to support teaching and learning processes. An example of this is the case of Mexico, a country where for several decades, the government has made numerous attempts to incorporate programs for the integration of technologies and the teaching of English in public primary schools [4], which are described below. As far as ICTs are concerned, the first program to incorporate technology in primary schools, named Introduction of Electronic Computing in Basic Education (COEEBA-SEP) was implemented in 1990 and ended in 1993. This program consisted of four modalities: the installation of computer equipment in the classrooms, the establishment of computer laboratories, the creation of computer training workshops, and the construction of training centers [11]. Later, in 1995, the first audiovisual media implementation program was launched, called Satellite Education (EDUSAT).
894
V. E. Chicaiza Redin et al.
In the case of ICT, Andrade-Pulido on the use of computers by teachers stands out; that of Domínguez, Cisneros, Suaste, and Vázquez related to the factors that prevent the effective integration of ICTs in public schools of basic education; that of [11] who conducted a diagnosis of the digital skills and pedagogical practices of teachers in primary education within the framework of the Mi Compu.MX program; and the work of [3] who analyzed the digital inclusion and literacy program (PIAD). The research of [13] also stands out about teachers’ beliefs about the use of technologies in education and that of [5] on the use of ICT by students. fifth and sixth grade of primary education.
3 Methodology For the application of this methodology, four strategies mediated by the use of ICT are applied; the same ones that are focused on improving the level of the English language in the students of the Nation0al and Foreign Languages Pedagogy Career of the Technical University of Ambato. The research is of an analytical experimental type; through which information and communication technologies were applied in each of the applied strategies such as 1. Picture description using technology, 2. Digital Storytelling for Oral Proficiency Development and Assessment, 3. Self-videos on cell phones used as mirrors, and 4. My experience visiting virtual museums as in Figs. 1, 2, 3, and 4. My experience visiting virtual museums, in each of the strategies, the skills that students must apply to improve the four skills of the English language are described. In this virtue, for the implementation of the self-regulated learning methodology, the first step was to diagnose the real level of the English language in the students of the Pedagogy of National and Foreign Languages career, applied to a total of 29 students among 9 students. male and 20 females with ages between 17 and 21 years. In addition, the scores that each participant obtained in the different skills through an
Fig. 1 Strategy 1: picture description using technology
The RRRS Methodology Using Self-Regulated Strategies with ICT …
895
Fig. 2 Strategy 2: digital storytelling for oral proficiency development and assessment Source: https://images.app.goo.gl/Eqe2Fwb6NuovZgVX8
Fig. 3 Strategy 3: self videos on cellphones using as mirrors
Fig. 4 Strategy 4: my experience visiting virtual museums
896
V. E. Chicaiza Redin et al.
independent test are detailed. Once the experiments were applied, the acceptability of these strategies with the students was measured, as well as the level of acceptability of the applied technology for each strategy, which was analyzed in the results obtained. For these technological resources to be applied with a pedagogical purpose, they were developed through the ADDIE methodology (Analysis, Design, Development, Implementation, and Evaluation); where each of the phases was evaluated to know the fulfillment of the learning results of each of the skills through self-regulated learning. ADDIE is also a methodology in which the teacher introduces pedagogy with the correct use of technology and that these two sciences meet the established objectives.
4 Results Once the experimentation has been applied for a period of 16 weeks of classes in the students of the Pedagogy of National and Foreign Languages career, the data is analyzed, for which the results or previous scores of the students are taken into account to know the true level of knowledge they have in the English language in the four basic skills, where the following results are obtained. After analyzing the data, it was possible to conclude the following levels in the English language according to the Common European Framework of Reference for Languages (CEFR): 9 participants with a FAIL level, that is, 31 percent low, followed by six participants with an A1 FAIL level representing 20.7%, followed by eight participants with an A2 grade C level, and 3 participants with an A2 grade B level and three participants with a B1 grade A as in Table 1. After the intervention process with the self-regulated learning strategies, the posttest was applied whose results by skills indicate that in the Speaking skill there is an average of 91.72, in Reading an average of 72.83; an average of 83.59 in Writing, and 71.59 in Listening as in Table 2. A significant increase in the Speaking skill is determined with an average of 91.72 compared to the average of 59.72 in the pretest. In the results by levels according to the Common European Framework of Reference for Languages, 7 students representing 24.1% are below level A1 and failed, Table 1 Pretest levels Frequency Valid
Percentage
Fail
9
A1 fail
6
20.7
A2 Grade C
8
27.6
A2 Grade B
3
10.3
B1 Grade A
3
10.3
29
100.0
Total
31.0
The RRRS Methodology Using Self-Regulated Strategies with ICT …
897
Table 2 Mean the real level N
Minimum
Maximum
Mean
Calculators speaking score post
29
66
100
91.72
8,413
Calculators’ reading score post
29
27
100
72.83
20,468
Calculators writing score post
29
52
100
83.59
13,973
Calculators listening score post
29
28
100
71.59
23,332
Overall score post
29
52
150
80.72
18,773
N
29
Table 3 Post-test level
Frequency Valid
A1 fail
Deviation
Percentage
7
24.1
A2 Grade C
11
37.9
A2 Grade B
3
10.3
B1 Grade A
8
27.6
29
100.0
Total
11 students are at level A2 C with a 37, 9%; and 3 students in A2 B corresponding to 10.3%. Finally, 8 students representing 27.6% are at the B1 A level as in Table 3. So, in the post test, most of the students reached an A2 C level according to the CEFR, together with a significant number of students who surpassed that level and were located at the B1 A level. What is evident is that technology-based self-regulated learning strategies were very useful for improving the productive and receptive skills of the English language in the students of the National and Foreign Languages Pedagogy Career in Zone 3.
4.1 Statement of the Hypothesis If p-value > 0.05 = Ho. If p-value < 0.05 = H1. The level of significance is 0.000 based on 0.05. In this virtue, as 0.000 is less than 0.05, the Alternative Hypothesis “Self-regulated learning strategies through ICT information and communication technologies improve the learning of English language skills” is accepted. Test statistics are shown in Table 4. Table 5 shows the comparison of the pretest and posttest ranges, respectively. The pretest shows a range of 2.60 and the posttest a range of 4.50 with a difference of 1.9 in the Speaking skill. Regarding the Reading skill in the pretest a range of 2.55 and the posttest 2.14, the difference is −0.41. With 2.62 in the pretest in the Writing skill and the posttest of 3.47, there is a difference of 0.85. Finally, in the Listening
898
V. E. Chicaiza Redin et al.
Table 4 Test statistics
Z
Calculators speaking score post−calculators speaking score pre
Calculators’ reading score post−calculators’ reading score pre
Calculators writing score post−calculators writing score pre
Calculators Overall score listening score post−overall post−calculators score pre listening score pre
−4.709b
−3.031b
−1.386b
−4.421b
−4.165b
0.000
0.002
0.166
0.000
0.000
Sig. asintót a Wilcoxon b It
signed rank test is based on negative ranges
Table 5 Comparison of pre and posttest results Pretest
Post test Range
Range
Diferences
Calculator speaking score pre
2.60
Calculator speaking score post
4.50
1.9
Calculators’ reading score pre
2.55
Calculator reading score post
2.14
−0.41
Calculator writing score pre
2,62
Calculator writing score post
3.47
0.85
Calculator listening score pre
2,22
Calculator listening score post
2.07
−0.15
Overall score pre
2.49
Overall score post
2.83
0.34
skill in the pretest with a range of 2.22 and the posttest of 2.07, a difference of − 0.15 is evident. The general score indicates that there is a difference between the average of the pretest ranges of 2.49 and the posttest of 2.83, the difference is 0.34. Therefore, the Writing skill has been the one that has improved the most, followed by Speaking with a higher range. On the other hand, the Reading and Listening skills have not been significantly improved due to a negative difference of −0.41 and 0.15 in the ranges, respectively.
5 Conclusion According to the data obtained, the applied RRRS (Research of information, review and memorization, revision or review of notes and/or books, and self-assessment activities) methodology has gained an acceptance rate of 90–100%. Strategy 1: “Picture description using technology”, 2: “Digital Storytelling for Oral Proficiency Development and Assessment”, 3. “Self-videos in cellphones using as mirrors”, and 4. “My experience visiting virtual museums”, developed with the RRRS methodology, has strengthened self-regulated learning.
The RRRS Methodology Using Self-Regulated Strategies with ICT …
899
The combination of strategies 1: “Picture description using technology”, 2: “Digital Storytelling for Oral Proficiency Development and Assessment”, 3. the “Selfvideos in cellphones using as mirrors”, 4. “My experience visiting virtual museums”, developed with the RRRS methodology and technological resources has facilitated the learning process. Strategy 1: “Picture description using technology”, 2: “Digital Storytelling for Oral Proficiency Development and Assessment”, 3. “Self-videos in cellphones using as mirrors”, 4. “My experience visiting virtual museums”, developed with the methodology RRRS, has contributed to the learning of English. Strategy 1: “Picture description using technology”, 2: “Digital Storytelling for Oral Proficiency Development and Assessment”, 3. “Self-videos in cellphones using as mirrors”, 4. “My experience visiting virtual museums”, has allowed the development of all language skills (reading, writing, speaking, writing). The rapid advancement of information and communication technologies (ICTs) has made possible contributions to English language education in recent decades. In fact, the use of technology provides students with unprecedented opportunities to practice English and participate in authentic language use environments [5]. For example, they can use Skype Chat to interact or social networking sites like Facebook or Twitter to practice speaking with native speakers or classmates [1]. Furthermore, the integration of ICT increases the motivation of students due to the multimedia capabilities that include visual, audio, and video aids [4]. However, technology lies in educational spaces and students show their skills in handling various technological tools. This does not mean that the student has control over their learning process. To do this, the self-regulation of learning will allow the development of executive functions, necessary for planning, organization, and the ability to concentrate and memory. Likewise, it will serve to assimilate technological tools as didactic-pedagogical tools, transforming the learning process into a planned, organized act with specific objectives. Acknowledgements Thanks to the Technical University of Ambato and the Directorate of Research and Development (DIDE acronym in Spanish) for supporting our research project Estrategias de aprendizaje auto regulado con herramientas tecnológicas para homogeneizar el dominio del inglés en estudiantes universitarios de la Zona 3 SFFCHE06 and being part of the research group: Research in Language and Education.
References 1. Ferede B, Elen J, Van-Petegem W, Bekele-Hund A, Goeman K (2022) A structural equation model for determinants of instructors’ educational ICT use in higher education in developing countries: evidence from Ethiopia. Comput Educ 104566 2. Hammoumi S, Zerhane R, Janati-Idrissi R (2022) The impact of using interactive animation in biology education at Moroccan Universities and students’ attitudes towards animation and ICT in general. Soc Sci Human Open 100293
900
V. E. Chicaiza Redin et al.
3. Kozlova D, Pikhart M (2021) The use of ICT in higher education from the perspective of the university students. Procedia Comput Sci 2309–2317 4. Infante-Paredes R, Velastegui-Viteri S, Páez-Quinde C, Suarez-Mosquera W (2022) Easle educational platform and reading skills. In: IEEE global engineering education conference, EDUCON, pp 1609–1614 5. Páez-Quinde C, Armas-Arias S, Morocho-Lara D, Barrera GM (2022) Virtual environments web 2.0 as a tool for the development of the reading comprehension in the basic education area. In: IEEE global engineering education conference, EDUCON, pp 785–789 6. Martinek D, Zumbach J, Carmignola M (2020) The impact of perceived autonomy support and autonomy orientation on orientations towards teaching and self-regulation at university. Int J Educ Res 101574 7. Meuwissen A, Carlson SM (2019) An experimental study of the effects of autonomy support on preschoolers’ self-regulation. J Appl Dev Psychol 11–23 8. Pozo DSB, Chicaiza RPM (2021) Gamificación: Reflexiones teóricas desde el enfoque empresarial. Religación: Revista de Ciencias Sociales y Humanidades 197–210 9. Páez-Quinde C, Infante-Paredes R, Chimbo-Cáceres M, Barragán-Mejía E (2022) Educaplay: una herramienta de gamificación para el rendimiento académico en la educación virtual durante la pandemia covid-19. Catedra 5(1): 32–46 10. Ramírez-Rueda M, Cózar-Gutiérrez R, Roblizo Colmenero M, González-Calero J (2021) Towards a coordinated vision of ICT in education: a comparative analysis of Preschool and Primary Education teachers’ and parents’ perceptions. Teach Teach Educ 103300 11. Sadeghi K, Ballıda˘g A, Mede E (2021) The washback effect of TOEFL iBT and a local English proficiency exam on students’ motivation, autonomy and language learning strategies. Heliyon e08135 13. Zhou S, Rose H (2021) Self-regulated listening of students at transition from high school to an English medium instruction (EMI) transnational university in China. System 102644
Application of Artificial Intelligence in Human Resource Activities for Advanced Governance and Welfare K. Sivasubramanian , K. P. Jaheer Mukthar , Wendy Allauca-Castillo , Maximiliano Asis-Lopez , Cilenny Cayotopa-Ylatoma , and Sandra Mory-Guarnizo
Abstract Artificial Intelligence (AI) technology is becoming one of the master instruments in the new world for many companies worldwide. The future of learning and the process of development have been bundled with innovations in the coming years. The application of AI helps to enhance the quality of employees through training and development at par with industrial needs. AI involves in the functional areas of the human resource segment such as recruitment, training, performance assessment, and employee retention. AI creates a next-gen technological workplace that succeeds in the seamless association between organisational systems and employees. Hence, human capitals are not obsolescent, but their efficiencies are bolstered by upcoming technology. In reality, AI offers organisations with a good amount of freeing up resources for greater tasks. The AI-enabled advanced software will read the employee based on their documents entered. Moreover, the AI application helps to identify various issues of the employees through staff data related to their stress level, late coming, unnecessary leaves, and so on. It has great market potential and is applicable not only in HR operations but also across the organisation. Therefore, AI helps the HR professional to make the decision-making process easier and smarter. Keywords AI · Human resource · On the job training · Machine learning · Deep learning
K. Sivasubramanian · K. P. J. Mukthar (B) Kristu Jayanti College Autonomous, Bengaluru, India e-mail: [email protected] W. Allauca-Castillo · M. Asis-Lopez Universidad Nacional Santiago Antunez de Mayolo, Huaraz, Peru C. Cayotopa-Ylatoma · S. Mory-Guarnizo Universidad Señor de Sipán, Chiclayo, Peru © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_66
901
902
K. Sivasubramanian et al.
1 Introduction Managerial people work in various work environments and work pressures. AI helps in simplifying the Human Resource (HR)-related activities. It plays a significant role in different HR activities such as talent acquisition and performance assessment. Many companies, industries, and corporates across the world have clearly understood the significance of AI in human resource management activities. The Oracle and Future Workplace survey revealed that AI is helpful in mastering new and advanced skills for human resource professionals for the strategic management of people. The artificial intelligence is capable of taking more strategic and wise decisions than any other technology. For instance, automated employee benefits, new employee registration, and also AI-enabled query solution for team members. On these parameters, the HR professional need not worry about the manual entries. AI is defined as the capacity to create a computer system to do things that humans can do. Moreover, AI is termed as a form of machine learning technology that imitates human competencies and behaviour. It emphasises very significant features such as representation, decoding, inference, prediction, recovery, generalisation, curiosity, and creativity [1, 2]. Automated Regular Tasks: Almost all the human resource activities are repetitive in nature. The application of automated AI solutions will complete the repetitive works on a regular basis. It will help to minimise the use of time and resources. Data Aggregation: In this competitive and fast world, the HR professional could not take decision for employee appointment just by viewing the resume and conducting the interview. The AI application will analyse the whole process of recruitment from sorting of resumes, short-listing of resumes or applications, analysing the data provided by the candidate, and also validating the originality of the data given. So, obviously, HR could list out the appropriate candidate for recruitment. Hence, AI plays an important role in data aggregation. Employee Referrals: AI components help to enhance employee referrals and also analyse the significance of employee referral programme in getting quality employees. In addition to this, AI instruments analyse data of existing and previous referrals and recognise new applicants who are alike to the existing successive employees. Learning and Development Process: The future of learning and the process of development have been bundled with innovations in the coming years. The application of AI instruments helps to enhance the quality of employees through training and development at par with industrial needs. Learning will become specific and personalised to bridge the skill gap of employees.
Application of Artificial Intelligence in Human Resource Activities …
903
1.1 Objectives • To identify the role and significance of AI in HR processes. • To analyse the impact of AI on HR processes and employee performance. • To suggest policy implications based on the results of this study.
2 Literature Review Almost all corporates are applying AI to enhance the efficiency of employees in the information and communication industries. These initiatives have started with the automated procedure in recruitment and will sustain till the appraisal or performance analysis of the workers [3]. In this digitalised world, companies operate their HR management with appropriate software solutions through AI [4]. In this highly competitive and corporate world, mapping people’s skillset and responsibilities has many long-term, mid-term, and short-term organisational effects. It is a common scenario for all corporate’s organisational hierarchy. Overlaying this genuineness with that of enhanced automation in every touch point of HR functioning is needed. Moreover, it is quite significant to understand how the talents have been mapped with the work to get a progressive outcome. Nowadays, these things would happen based on the superpower called as artificial intelligence. The AI structure is capable of forecasting and estimating the potential employee’s durability and operative success [5]. AI involves in the functional areas of the human resource segment such as recruitment, training, performance assessment, and employee retention [6]. The application of AI is much useful in policy-making of the HR manager. This is the biggest advantage seen in current HR activities. The AI involves in gathering data sources and supply them to the respective users. In the case of policy terms, this may mean emphasising the requirement for change and remodelling its impacts. The AI provides appropriate evidence to take a decision rather than the biased decision of an HR professional [7]. AI application is used as an important tool in human resource allocation, administration, evaluation, and training [8]. AI is having an impactful influence on HR management. It is found that machine learning and AI have various operational advantages for HR managers in the dayto-day process of organisations [9]. Human resources are termed as an important profile of the company. Appropriate and effective ways of using HR would help the organisation to reach its goals. In this scenario, the HR department must apply the apt and innovative tools to acquire the proper workforce [10]. AI components create differences in using existing software technology. It computes faster than the software. It is based on the advanced-level algorithms and delivers quality data. So, it ensures quality and accurate process delivery [11]. It is observed that in-depth information mapping through machine learning helps to realise human activities better than any other models [12]. The user’s personality and behaviour on any social media network can be predicted through AI, based on
904
K. Sivasubramanian et al.
their posts for enhanced delivery and accuracy [13]. The organisations are facing robust challenges including the employee-related issues. Subsequently, companies are in a crucial position to retain quality employees. In order to solve the employeerelated challenges, the HR manager is simplified by applying AI components in HR processes [14].
3 Methodology The major scope of this study is to analyse the impact of AI on HR activities and employee performance. To achieve this goal, various literatures related to AI application and human resource development have been reviewed. The surveyed literatures are sequentially presented to establish the similarities, dissimilarities, and scope of this study. This piece of research work is based on the observational research technique. Information for this study has been collected from published resources and also from primary surveys. The field data are collected from leading users of AI in HR processes for their organisations such as Infosys, HCL, and Oracle from their respective reports and also through their HR professionals.
4 Analysis and Discussion AI in Recruitment: Artificial intelligence technology influences automation or streamlines the high quantum of process in the recruitment phase. It helps to simplify the big challenge of employee recruitment during the selection process, for the HR team members. AI applied in the process of recruitment will reject almost 75% of the candidate application during the screening through software. The high-intelligent software screening process will identify the originality of qualification, background of the candidate, skill measurement, speaking quality, and body language. Besides, the AI will also give notifications on current openings and quality requirements. It will be much helpful for the candidate to improvise their experience and expertise to meet the need of the recruiters. It will be very much useful on the demand side to fill the appropriate employee for their organisations. AI in “On the Job Training” (OJT): The AI-enabled advanced software will read the employee based on their documents entered. It procures and saves all the relevant data about every employee to analyse and suggest, providing appropriate training. AI in Retention: It is very hard to retain the hard working and talented employees. AI application helps to identify various issues of the employees through staff data related to their stress level, late coming, unnecessary leaves, and so on.
Application of Artificial Intelligence in Human Resource Activities …
905
4.1 Infosys Case AI has elevated from the process of analytics to predictive mechanisms in the recent times. The application of AI, Machine Learning (ML), and Deep Learning (DL) in certain updated and innovative AI technology could adopt data of the user and deliver customised services. It creates a huge impact on human capital management through innovative technology. Many leading corporates and ˙Information and Technology (IT) companies including Infosys have incorporated AI components for various human resource management issues. The application of AI mechanisms is broadly classified into three dimensions, namely analytical, predictive, and adaptive. Analytical Process The AI-enabled analytic systems aim to achieve greater data management and superior insight into their internal data through the machine learning and deep learning module. Normally, the HR data could not be rendered as numerical values and is very difficult to evaluate by using traditional technologies. The AI innovative technology has proven that significant changes in this difficult process very easily address various HR-related issues. The Analytical Process in AI is shown in Fig. 1. Employee attrition is the biggest negative costs for the company from the HR perspective. A fair portion of investment has been spent on hiring, training, and making them fit into the company’s structure. These costs become losses when they leave the organisation. Now, the major companies like Infosys are using the AI-enabled technology to assess the employee-related variables such as employee
Fig. 1 Analytical process in AI
906
K. Sivasubramanian et al.
Fig. 2 Predictive modelling
patterns of work, performance, compensation, delays in progression, and values. This mechanism helps the organisation and HR team to take the right intervention to support and motivate the talented workers to retain. With respect to talent mapping, Infosys frequently rely on internal hires to fill any available positions within the organisation as there are many advantages of promoting from within. The AI-enabled innovative technology in HR practices helps to make out the talent matches by proving equal opportunity by using predictive estimation and forecasting tools. The mapping of the Key Responsibility Area (KRA) based on the expertise and experience of the related field will help to align with talent mapping. The Predictive Modelling is shown in Fig. 2.
4.2 HCL Case Positive Impact of AI on HR Practices AI is enhancing the boundaries of the various advanced ML structures. This hyperedge innovative technology facilitates the machines to function with a high degree of autonomy.
Application of Artificial Intelligence in Human Resource Activities …
907
Vital advantages of AI in HR: a. b. c. d. e.
It cuts down the time to perform a HR task. It enables hyper-technology tasks with significant cost savings. It functions across day and night without any breaks and interruptions. It supplements the capabilities of differently abled professionals. It has great market potential and is applicable not only in HR operations but also across the organisation. f. It helps in decision-making by the HR professional to make the process easier and smarter.
4.3 ORACLE Case AI application helps in the determination of appropriate hiring time. In today’s complex labour market, understanding the frequency of recruitment and hiring of new people is highly valuable. The AI helps to project the slabs of recruitment time for the HR professionals. AI application helps in the determination of the best candidates. So, it helps to make out the talent matches and acquisition. The application of powerful tools called AI and ML helps to scrutinise and identify the most appropriate candidates based on the company’s requirement. It also helps the job searchers to check the recommended job according to their profile and offers to choose similar kinds of jobs based on their search.
4.4 Differentiation of Cases on the Classes Considered for Hiring Candidates a. Infosys has created a huge impact on human capital management through innovative technology. b. Infosys’s AI application helps the HR team to take the right intervention to support and motivate the talented workers to retain. c. HCL AI application has targeted towards significant cost savings. d. HCL AI supplements the capabilities of differently abled professionals. e. In the case of ORACLE, AI helps to scrutinise the best candidates. f. ORACLE’s AI application is used to recommend jobs for the candidates.
908
K. Sivasubramanian et al.
5 Conclusion This paper reviews various literatures and analyses the impact of AI on HR activities and employee performance. As the organisations are facing robust challenges including employee-related issues, AI plays a very important role in assisting HR professionals. Companies are in the crucial position to hire and retain quality employees. The application of AI helps to enhance the quality of employees through training and development with industrial needs. The field data collected from leading users of AI in HR processes such as Infosys, HCL, and Oracle from their respective reports and their HR professionals have been summarised.
References 1. Beer M, Boselie P, Brewster C (2015) Back to the future. Implications for the field of HRM of the multi stakeholder perspective proposed 30 year ago. Hum Resour Manag 54:427–438 2. Sandilya G, Shahnawaz G (2018) Index of psychological well-being at work—validation of tools in the Indian organisational context. Vis-J Bus Perspect 22(2):174–184 3. Verma R, Bandi S (2019) Artificial intelligence & human resource management in Indian IT Sector. In: Proceedings of the 10th international conference on digital strategies for organisational success 4. Bharadwaj A, Sawy OAEl, Pavlou PA, Venkatraman (2013) Digital business strategy: toward a next generation of insights. MIS Q 37(2):471–482 5. Rajesh S, Kandaswamy U, Rakesh A (2018) The impact of artificial intelligence in talent acquisition lifecycle of organisations. Int J Eng Dev Res 6(2):709–717 6. Rubee Merlin P, Jayam R (2018) Artificial intelligence in human resource management. Int J Pure Appl Math 119(17):1891–1895 7. Reilly P (2018) The impact of artificial intelligence on the HR function. IES perspectives on HR 2018. IES Institute of Employment Studies, Member Paper 142 8. Arora S (2020) Revamping human resources with artificial intelligence. Int J Res Anal Rev 7(1):595–600 9. Chandra D (2022) Modernisation of human resources with artificial intelligence. Int J Creat Res Thoughts 10(5) 10. Baakeel OA (2020) The association between the effectiveness of human resource management functions and the use of artificial intelligence. Int J Adv Trends Comput Sci Eng 9(1.1):606–612 11. Bhardwaj G, Singh V, Kumar V (2020) An empirical study of artificial intelligence and its impact on human resource functions. In: International conference on competition, automation and knowledge management, Amity University. IEEE 12. Ranganathan G (2020) Real life human movement realization in multimodal group communication using depth map information and machine learning. J Innov Image Process (JIIP) 2(02):93–101 13. Valanarasu R (2021) Comparative analysis for personality prediction by digital footprints in social media. J Inf Technol 3(02):77–91 14. Anitha K, Shanthi V, Sam A (2021) Impact of artificial intelligence technique on employee well-being for employee retention. Int J Eng Res Technol
Model-Based Design for Mispa CountX Using FPGA M. Manasy Suresh, Vishnu Rajan, K. Vidyamol, and Gnana King
Abstract Blood analysis is a primary test for disease diagnostics. For blood tests, hematology analyzers are used. Blood samples are run on hematology analyzers. In clinical laboratories, automated hematology analyzers are the major important instrument. These analyzers are able to perform hundreds of complete blood count (CBC) per day in an automated way. Mispa CountX is India’s first indigenously built Automated 3-Part Differential Hematology Analyzer which is developed by Agappe Diagnostics Pvt. Ltd. in partnership with L&T Technology Services. It counts the blood cells based on the electrical impedance principle. It tests 60 samples per hour and gives 22 parameters (including the count of Red Blood Cells (RBC), White Blood Cells (WBC), platelets, and hemoglobin) and 3 histograms. Mispa CountX consists of a microcontroller TMS320F28377D. The project is to build an FPGA design to control the motors of Mispa CountX in the place of microcontroller TMS320F28377D. The hardware architecture of Digital Signal Processor (DSP) is not much flexible and they are programmable with software. The limitation of DSP is its fixed architecture. Various DSP applications can be implemented in a single FPGA and it offers hardware customization. For design, simulate, and validate algorithms, MATLAB is used. It automatically generates and optimizes Verilog HDL code based on the system model. Keywords Complete blood count · Digital signal processor · Hematology analyzer · Red blood cells · White blood cells
M. M. Suresh (B) · V. Rajan · K. Vidyamol · G. King Sahrdaya College of Engineering and Technology, Kodakara, Thrissur, Kerala, India e-mail: [email protected] V. Rajan e-mail: [email protected] K. Vidyamol e-mail: [email protected] G. King e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_67
909
910
M. M. Suresh et al.
1 Introduction The main tests in laboratories are complete blood cell tests and white blood counts. Previously these tests were performed manually. Laboratory instrumentation designs make changes in the clinical industry. Automated blood analysis replaced manual analysis. These automated analyzers provide more accurate outputs than manual analysis within a small turnaround time and low cost. Manual methods can’t achieve some parameters, but it is possible in hematology analyzers. Charles Wallace Coulter invented impedance technology, and it was the new era for hematology analyzers. Sysmex corporation is one of the hematology analyzers that uses conductance technology, and then it shifted to impedance technology [1]. Hematology automation is accepted in the whole world because of its accuracy and cell counting speed. The major advantages are manual workload reduction and blood cell counting of other body fluids other than blood. Traditional microscopic methods can cause errors in blood cell counting, and these errors can be reduced by using advanced methods. A blood test for blood cell count is commonly prescribed. Blood cell count directly indicates the overall physiological health status of a person. For disease diagnostics, differential blood count and CBC are widely used. Mispa CountX is India’s first indigenously built Automated 3-Part Differential Hematology Analyzer. It is developed by Agappe Diagnostics Pvt. Ltd. [2]. It counts the blood cells based on the electrical impedance principle. It tests 60 samples per hour and gives 22 parameters (including the count of RBC, WBC, platelets, and hemoglobin) and 3 histograms. Mispa CountX consists of a microcontroller TMS320F28377D. The project is to build an FPGA design for the motors and sensors of Mispa CountX in the place of microcontroller TMS320F28377D. Mispa CountX consists of 3 stepper motors and 2 DC motors. The blood counting sequence of CountX starts with blood aspiration and ends with waste discharge and cleaning. Each stage of blood counting is mainly controlled by motors and sensors. In the blood aspiration stage, blood is aspirated from the blood sample by an aspiration syringe and it is taken to RBC and WBC counting tub with the help of DC motors. The motor position can be found by a home position sensor. PWM signal is generated to change the speed and direction of the motor. Model-based design is used to design the counting sequence. With the help of an HDL coder, we can directly convert the Simulink model into Verilog code. The hardware architecture of DSP is not very flexible. DSP blocks can be implemented in FPGA [3]. FPGA is made of a large number of gates and multiplexers. These logic blocks help with complex computations. Multiplexers convert the parallel data to serial data [4]. As compared to traditional DSPs and microcontrollers (MCUs), FPGAs and ASICs offer faster processing speed and more functionality to support more advanced features. The choice between an ASIC and an FPGA implementation depends on the application. FPGA implementation can be a faster time-to-market and lower-cost solution than an ASIC design. FPGAs also offer the added benefit of reconfigurability when the design specification changes. On the other hand, an ASIC
Model-Based Design for Mispa CountX Using FPGA
911
may be the right solution for a large volume, very high-speed, or power-sensitive application. Model-Based Design is used for designing and simulating a system to understand its behavior. It improves the quality of the design. With model-based design, we can simulate the model under any conditions, like with delay or without delay, and at any time, we will get an instant view of the system. Significant advantages of ModelBased Design are that it allows rapid design changes and moves the verification process all the way to the beginning of the design cycle [5]. This helps to detect the system specification-related errors, design errors, and implementation errors early.
2 Related Works The Sysmex XE-2100 is an automated hematology system. According to the evaluation done by Ruzicka et al. [6], Sysmex XE-2100 can provide results of 32 parameters and it also can test 150 samples per hour. It can provide NRBC and reticulocyte count. The system detects forward and side scattered information of the light with the help of a semiconductor laser. With a reagent flow, cytometric red lysis is performed for the 5 measurements of WBC. Based on the side scattered light information and fluorescence intensity characteristics, the cells are categorized. The UniCel DxH 800 is another automated hematology analyzer. It is developed for use in clinical laboratories for in vitro diagnostics. Tanaka [7] proposed the detection of plasma cell leukemia using Unicel DxH 800. DxH800 uses leukocytes differential with the Flow Cytometric Digital Morphology technology. Raspadori et al. [8] presented the XE-5000 automated hematology analyzer as an upgrade of XE-2100. It is using same semiconductor laser technology which was used in XE 2100. XE-5000 can analyze and can give output for 67 parameters. The XE-5000 uses the RF/DC detection method to measure hematopoietic progenitor cells (HPC) and also obtain information on the immature white blood cells. Ragav et al. [9] did a comparative study about hematology analyzers. Sysmex XN2000 is an automated hematology analyzer. The XN-2000 platform combines two investigative components that can be customized to suit a specific clinical application. The two modules measure rack tests at the same time, and it can oblige any example without easing back down. The modules cooperate consummately. Shu et al. [10] proposed an artificial intelligence-based reagent-free hematology analyzer. It is designed by training a two-step residual neural network using label-free images of an isolated leukocyte. It is acquired from a custom-built quantitative phase microscope. The analyzer is clinically translatable and this can also be deployed in resource-limited settings. Khan et al. [11] proposed a system for blood cell counting. The system is implemented in FPGA with VHDL. Image processing is used for blood cell counting and the system is able to store the details of the patient. For image processing, it only needs a small number of images. Image processing and data storage run in FPGA only.
912
M. M. Suresh et al.
Hamouda et al. [12] did a project to count red blood cells in a sample. It is an automatic RBC counter, and to count the cells, it uses image processing. The images are processed by clustering, segmentation, and equalization. For clustering, the K-mean clustering algorithm is proposed. Zhao et al. [13] presented a system to detect the real-time RBC counter. A photoacoustic-based microfluidic system is proposed for RBC count and osmolarity analysis. Here photoacoustic (PA) detection is a rapid and noninvasive analysis. It can count and characterize nearly 60 RBC counts per second. Hanchinal and Kulkarni [14] proposed a system to control DC motor using FPGA. A fuzzy logic control system is used to control the speed of the motor. To drive the motor, PWM signals are used and by this motor can be run at a constant speed even if the load varies. Panda et al. [15] developed a DSP-based DC motor speed controlling system. The motor speed is controlled by a fuzzy logic controller. The system can automatically vary the duty cycle to maintain the constant speed. TMS320F28377s is used as a microcontroller to control the system. Gargees et al. [16] proposed an adjustable closed-loop DC motor speed controlling system. This paper deals with the TMS320LF2812 digital signal processor. The speed of the motor is controlled by changing the duty cycle of the PWM signal.
3 Mispa CountX Figure 1 shows a side view of the Mispa CountX. It consists of a microcontroller TMS320F28377D. When the door open button is pressed, the blood placed inside a culture tube is aspired using a motor and syringe mechanism. Then by using the Coulter principle, the RBC, WBC, and platelets are counted with the help of signal conditioning circuits. Whole blood is passed between two electrodes through an aperture so narrow that only one cell can pass through at a time. The impedance changes as a cell passes through it. The change in impedance is proportional to cell volume and these results are used for counting cells and measuring the volume. With the help of the Coulter principle, we can accurately determine the counts of RBCs, Platelets, and WBCs as well as three differential WBC counts with accurate volume distribution. The electrical Impedance Method/Coulter Principle is when whole blood is passed between two electrodes through a narrow aperture where only one cell can pass through at a time. Then there will be a change in impedance when a cell passes through. This change in impedance is directly proportional to cell volume. By using cell volume, we can calculate cell count. The change in electrical resistance of the cell passing through the small aperture in a conductive liquid depends directly on the cell volume. This analyzer uses a volumetric metering unit to control the count cycle. It will help to ensure the precise volume of the sample which is to be analyzed. The volumetric counting unit ensures the accuracy of the measurements. A volumetric board circuit is used to check the count completion time of the RBC and WBC during the sample test. The output of the phototransistor fluid inline sensor indicates
Model-Based Design for Mispa CountX Using FPGA
913
Fig. 1 Side view of Mispa CountX
that if there is any liquid flowing through the tubes. The count time is calculated by measuring the time taken by the RBC/WBC fluid line to move from the sensor position located at the top to the sensor position located at the bottom (WBC)/middle (RBC). The phototransistor output voltage decreases by indicating the presence of the liquid. The lysed WBC solution is exposed to 520 nm LED light within the WBC bath and is detected by a photodiode. The sample absorbs light when exposed to a beam of incident light. The amount of absorbance of light determines the total hemoglobin count.
4 Proposed System 4.1 Block Diagram The proposed system consists of FPGA controller. Figure 2 shows the block diagram of the proposed system. There are 2 DC motors and 3 stepper motors for blood aspiration and mixing. Here NEMA17 stepper motor is used and its shaft length is 5 mm. SSCMRRT015PDAA3 is used for pressure sensing and its operating voltage is 3 V. Home position sensors detect the position. Home position sensors detect the position. Home position sensors are mainly for detecting the current position of the
914
M. M. Suresh et al.
stepper motor. It is attached to the shaft of the stepper motor. The 5 motors are called x-axis, y-axis, waste, diluent, and lyse motor. X-axis motor is for movement in x-axis, i.e. to and fro motion. Y-axis motor is for up and down movement. Diluent motor is for diluent transfer and lyse motor is for lyse mixing. Waste motor is for the transfer of waste. Diluent and lyse transfers with the help of a syringe mechanism. The inline temperature sensor monitors the temperature of a fluid that passes through it. Inline temperature sensor is attached to fluid pipes. Diluent heater and temperature sensor heat the diluent and measure the corresponding temperature. The RBC, WBC, and platelets are counted with signal conditioning circuits according to the Coulter principle. Whole blood is passed between two electrodes through a small aperture so that only one cell can pass through at a time. There will be an impedance change as a cell passes through. The change in impedance is proportional to cell volume, resulting in a cell count and measure of volume. The Coulter principle helps in accurately determining the counts of RBCs, Platelets, and WBCs as well as three differential WBC counts with accurate volume distribution. This is because the change in electrical resistance of the cell passing through the small aperture in a conductive liquid depends directly on the cell volume. The lysed WBC solution is exposed to 520 nm LED light within the WBC bath and is detected
Fig. 2 Proposed system
Model-Based Design for Mispa CountX Using FPGA
915
by a photodiode. The sample absorbs light when exposed to a beam of incident light. The amount of absorbance of the light determines the hemoglobin count.
4.2 Software Implementation The Software implementation of this work uses FPGA and Verilog programming language. MATLAB state flow is used to design the system. There is a HDL coder package in Simulink. HDL coder directly converts Simulink model into Verilog code. Flowchart. Figure 3 shows the flowchart of the system. The initial step is blood aspiration. When the DOOR OPEN SWITCH is pressed, the blood placed inside a culture tube is aspired using a syringe and motor mechanism. There is 2 bath, WBC bath and RBC bath. After blood aspiration, the aspired blood is dispensed into WBC bath. Diluent is mixed with blood by a bubbling mechanism. The process is continued in RBC bath also. The next process is counting. The Coulter principle is used to determine the counts of RBCs, Platelets, and WBCs. The whole blood solution is passed between two electrodes through an aperture so narrow that only one cell can pass through at a time. There will be a change in impedance as a cell passes through it. The change in impedance is proportional to cell volume. By this result, cell count and volume can be measured. After counting the WBC and RBC, the bath draining process will occur. The drained solution will pass through the waste pump. Then both baths are cleaned by using a diluent. This solution also back flushes with a waste pump. The process continues for the next test.
Fig. 3 Flowchart
916
M. M. Suresh et al.
Fig. 4 Software design
Model-based Design. For software design, MATLAB is used. With the help of Simulink state flow, the sequence diagram can be designed. The state flow graph is directly converted into Verilog code by the HDL coder. Figure 4 shows overall software design. The HDL coder is a tool from MathWorks. It is embedded in the MATLAB/Simulink environment and allows the designer to generate synthesizable Verilog code for FPGA and ASIC implementations. First, we have to build a model of the design using a combination of Simulink and State flowcharts. Then, • • • • • •
We have to simulate the whole model and check its overall behavior. Then make the entire model HDL compliant. If needed convert it into a fixed point. Next step is generating the corresponding Verilog code and synthesizing it. Optimize and iterate the whole design to meet area-speed objectives. Generate the design using the integrated HDL Workflow Advisor for MATLAB and Simulink targeting different tools.
Verify the generated Verilog code using HDL verifier after code generation and synthesize the code into the target with vivado software. The vivado is an Integrated Design Environment (IDE). It provides an intuitive graphical user interface (GUI) with many powerful features. Through vivado Zynq 7000 board can be connected.
5 Results and Discussion The software design is done using MATLAB software. Simulink state flow design is developed for each sequence. To load the program into the FPGA board, the Simulink model is converted into Verilog code by the HDL coder. HDL code is
Model-Based Design for Mispa CountX Using FPGA
917
simulated and synthesized in vivado software. The main blood counting sequences are startup sequence, priming all, blood counting, back flush, and drain bath.
5.1 PWM Generation Pulse width modulation can be used to control the speed of the motor. By varying the duty cycle, we can vary the signal value. For DC motor speed control pulse width modulation is the best option. Duty cycle can be calculated by the ratio of total pulse width to the period of the signal. If we take 5 V supply voltage, then 0% duty cycle means 0 voltages. 25% duty cycle represents 1.25 V, and for 50 and 75% duty cycles, the corresponding voltages are 2.5 and 3.75 V. So by varying the duty cycle dc voltage can be changed so speed also varies. Figure 5 shows PWM signals with different duty cycles. Figure 5a represents the clock input pulse, and it is the input pulse for generating PWM. PWM signal is generated with the help of Simulink blocks. Figure 5b represents the PWM with 50% duty cycle and Fig. 5c shows PWM with 70% duty cycle. Here 12 V DC motor is taken. And with PWM the speed of the motor is controlled. DRV8825 is taken as the motor driver for the DC motor. So we can set the direction clockwise and anti-clockwise. PWM is given as the input to the motor driver, and by setting the direction, we can move the motor forward and backward.
5.2 Startup Sequence Startup is the initial service of the machine. Simulink model of startup sequence is shown in Fig. 6. Mainly there are 5 motors. X-axis motor, Y-axis motor, Lyse syringe, waste syringe, and diluent syringe. And there are 17 valves. In startup sequence valve and motor initialization takes place. At this point, the position of the motor is primarily being checked.
5.3 Priming All The next step is priming. Figure 7 shows the Simulink chart of priming. In this process, the cleaning of the whole machine takes place. Valve and motor initialization is the first step. That time, the position of the motor is verified with the help of a home sensor. Then next is lyse prime step, where the lyse syringe moves down. There are 2 baths, RBC and WBC baths, and these are for RBC, WBC, and hemoglobin counting. So the next process is WBC prime. The diluent syringe moves up at that time. In the RBC drain process, the diluent syringe moves down and the Y-axis moves down to
918
M. M. Suresh et al.
Fig. 5 a Clock pulse, b PWM with 50% duty cycle, c PWM with 70% duty cycle
the WBC bath. The final process is needle prime. The diluent syringe moves up and the waste moves down.
Model-Based Design for Mispa CountX Using FPGA
Fig. 6 Startup sequence
Fig. 7 Priming all
919
920
M. M. Suresh et al.
Fig. 8 Counting sequence
5.4 Counting For the counting sequence, the first step is the initialization of motors and valves. Counting sequence is shown in Fig. 8. Then the next process is WBC draining. The WBC solution is drained to a small aperture for counting the WBC and hemoglobin. That time waste syringe moves down and the diluent syringe also moves down. The same process is continued in the RBC bath. The diluent syringe moves up and the waste syringe moves down. Then waste disposal takes place. So the waste syringe moves down. At the same time, RBC bath filling happens. RBC bath is filled with the diluent for cleaning of RBC bath. Basically, the diluent is used for cleaning. For that process, the diluent syringe moves up.
6 Conclusion Implemented a model-based design for CountX with FPGA. By replacing the existing DSP controller, we can achieve more efficiency and speed. The main advantage of this system is FPGA is reprogrammable and it has reconfigurable architecture. Low power consumption is another advantage of the proposed system. MATLAB is used for designing, simulating, and validating the system. HDL coder directly converted the MATLAB design into VHDL code. This paper is mainly based on converting the motor control system of the Mispa CountX from DSP to FPGA. In the future, the
Model-Based Design for Mispa CountX Using FPGA
921
whole system can be redesigned with FPGA. But FPGA is costlier than DSP, and it is also faster than DSP.
References 1. Wilcox SE (1999) Automating the hematology laboratory–the Coulter Legacy. Lab Med 30(5):358–360. https://doi.org/10.1093/labmed/30.5.360 2. Vijayan DK, Jiby Krishna KG, Danniel D, Shaji A, Mathew J, John A, Sukumaran A, Varkey PK, Thomas T, Thomas R (2021) Mispa count X; the first indigenous Indian hematology 3-part analyzer. Int J Clin Biochem Res 8(4):265–273 3. Butorac M, Vucic M (2012) FPGA implementation of simple digital signal processor. In: 2012 19th IEEE international conference on electronics, circuits, and systems (ICECS 2012), pp 137–140. https://doi.org/10.1109/ICECS.2012.6463781 4. Bansal M, Singh H, Sharma G (2021) A taxonomical review of multiplexer designs for electronic circuits & devices. J Electron 3(02):77–88 5. Sharma S, Chen W (2009) Using model-based design to accelerate FPGA development for automotive applications. SAE Int J Passeng Cars - Electron Electr Syst 2(1):150–158 6. Ruzicka K, Veitl M, Thalhammer-Scherrer R, Schwarzinger I (2001) The new hematology analyzer Sysmex XE-2100. Arch Pathol Lab Med 125(3):391–396 7. Tanaka C (2008) Automated hematology analyzer XE-5000: overview and basic performance 8. Raspadori D, Sirianni S, Gozzetti A, Lauria F, Fogli C, Di Gaetano N (2011) Usefulness of Unicel DxH 800 cell population data in the detection of plasmacell leukemia: 2 cases report. Blood 118(21) 9. Ragav NVH, Sinduja P, Priyadharshini R (2021) Automated blood analysers and their testing principles: a comparative study. J Pharm Res Int 10. Shu X, Sansare S, Jin D, Zeng X, Tong KY, Pandey R, Zhou R (2021) Artificial-intelligenceenabled reagent-free imaging hematology analyzer. Adv Intell Syst 11. Khan Z, Pathan S, Pathan R, Ali Z, Khadase R, Alvi R (2017) Blood cell monitoring based on FPGA, implemented using VHDL. J Res Eng Appl Sci 02(04):151–154 12. Hamouda, Khedr A, Ramadan R (2012) Automated red blood cell counting. 1 13. Zhao W, Yu H, Wen Y, Luo H, Jia B, Wang X, Liu L, Li WJ (2021) Real-time red blood cell counting and osmolarity analysis using a photoacoustic-based microfluidic system. Lab Chip 21(13) 14. Hanchinal V, Kulkarni UM (2015) Automatic speed control of DC motor using FPGA 15. Panda A, Kahare S, Gawre SK (2020) DSP TMS320F28377S based speed control of DC motor. In: 2020 IEEE international students’ conference on electrical, electronics and computer science (SCEECS), pp 1–4. https://doi.org/10.1109/SCEECS48394.2020.133 16. Gargees R, Mansoor AK, Khalil R (2011) DSP based adjustable closed-loop DC motor speed control system. AL-Rafdain Eng J (AREJ) 19:66–76. https://doi.org/10.33899/rengj.2011. 26749
Analysis of Native Multi-model Database Using ArangoDB Rajat Belgundi, Yash Kulkarni, and Balaso Jagdale
Abstract ArangoDB—the native multi-model database—is one of the most promising, next-generation database solutions for handling Big Data. ArangoDB is a scalable, fully managed graph database, document store, and search engine all in one database launched in 2011. ArangoDB is the perfect amalgam of three major NoSQL models—graph, document, and key-value pair. ArangoDB is native in the sense that it supports all the three mentioned NoSQL models under one roof and one query language. ArangoDB uses Arango Query Language (AQL) as its data query language. Keywords ArangoDB · Arango Query Language (AQL) · Graph database · Native multi-model database · Not Only Structured Query Language (NoSQL) · Relational Database Management System (RDBMS) · Raft Consensus Protocol · SQL (Structured Query Language)
1 Introduction The rise of data-centric applications, decisions, and organizations has been tremendous. The 5 Vs of Big Data give us a fair idea of the kind of challenges we will face in this data-driven industry. The sheer volume and variety of Big Data make it difficult to handle and make proper use of the data. In the current data-driven industry, the efficacy of handling Big Data plays a crucial role for organizations to be at par with the competition. Thus, the need for transitioning from traditional RDBMS to
R. Belgundi (B) · Y. Kulkarni · B. Jagdale MIT WPU School of Computer Engineering and Technology, Pune, India e-mail: [email protected] Y. Kulkarni e-mail: [email protected] B. Jagdale e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_68
923
924
R. Belgundi et al.
multi-model databases is necessary. Data is stored in various formats and can be of three types: (1) Structured, (2) Semi-structured, and (3) Unstructured data. To handle this, we need multi-model databases which can help us to bridge the gap between the traditional schema-based databases and the modern Big Data. The introduction of NoSQL databases to the world was explosive and it engendered the change from schema-based to no-schema databases. Following this, various NoSQL model databases were introduced, and slowly the need to use more than one model became a requirement. The solution to this was using multiple single model databases one on top of another. The implementation of multi-single model databases became really difficult and time-consuming with the growth of data. The specific issues in the above approach of using multiple single model databases are namely complicated deployment, frequent upgrades for data consistency, and duplication issues. This problem which many organizations had been facing has been solved to quite some extent by ArangoDB—a native multi-model database. ArangoDB supports three major NoSQL models—document, graph, and key-value pair all under one roof. The unique feature of ArangoDB is that it is a native multi-model database which means it has one core and one query language but multiple data models. In this paper, we aim to offer a review of ArangoDB by shedding light on architecture, graph capabilities, AQL, how Arango is easy to adapt with few similarities to RDBMS, a clustered environment of ArangoDB for scalability, ArangoDB with Kubernetes, ArangoDB on the cloud, and finally drivers and integrations for ArangoDB. Section 2 consists of a review of relevant papers which helped in knowledge building and understanding of various graph databases. Section 3 aims to showcase ArangoDB and give a basic and clear understanding of the Graph Database ArangoDB. Section 4 illustrates the application of ArangoDB in the Engineering domain with a use case. Section 5 has a conclusion that gives an overview and lists applications of ArangoDB.
2 Literature Review ArangoDB offered a one-of-a-kind solution to the multi-model database problem by integrating the major NoSQL models in one place. One needs to understand RDBMS and NoSQL to be able to jump to multi-model database in practice. The paper [1] has studied RDBMS in comparison with NoSQL. This study analyzes the properties of Relational Databases with NoSQL databases in order to determine which database is better at fulfilling modern application demands. The report also highlights the difficulties of NoSQL. Relational databases are simple to set up, robust, consistent, and secure, but they are too rigid. NoSQL performs well in handling huge volumes of data and supports unstructured data but is less consistent and less secure. Database model selection needs to be done keeping the application to be developed in mind.
Analysis of Native Multi-model Database Using ArangoDB
925
Big Data has risen in popularity which meant that RDBMS-based applications are incapable of providing fast data access to huge amounts of data. Reference [2] shows that as enterprises are switching to NoSQL databases for fast data access, it becomes important to select the right kind of NoSQL database. Reference [2] has introduced fifteen categories of NoSQL databases and has proposed principles and examples for choosing the correct NoSQL database. The authors of [3] have compared and analyzed five open-source NoSQL databases. The technologies in each of the five databases are also analyzed, and the pros and cons of each are noted. Reference [3] also assists in picking the suitable database as per the industry requirement. The authors in [4] have presented issues in multi-model databases. The team has made an attempt to showcase the wide implications of these databases. Reference [4] also states that multi-model database is an old idea and has been reconditioned for implementation of Big Data use cases. Big Data has many aspects out of which ‘Variety’ of data is one of the important ones. Handling a variety of data and ensuring efficient storage and retrieval is the need of modern applications. The paper [5] has shown the true meaning of multimodel databases along with hands-on illustrations for better understanding. It sets the background for more research on multi-model databases. Looking back at the history of databases, [5] observes a trend toward databases storing and processing an increasing variety of data. As a result, the need for a multi-model database system that possesses the ability to manage multiple types of data simultaneously arises. Multimodel databases have recently found a wide range of applications and have become quite popular among the existing NoSQL databases. Many databases claim to be multi-model databases now. The support for multi-model greatly varies in the sense that abilities to query the data in different models varies also optimization of query plans of various models will need changes along with changes in the indexing of the model’s structure. Owing to the unique V-characteristics of Big Data, several difficult tasks must be handled in order to provide efficient and effective data administration. This study [5] focuses on the ‘variety’ challenge of Big Data. One of the data models in ArangoDB is the graph data model. Data storage, retrieval, and processing are done using graph concepts in this data model. Reference [6] assists in better understanding the inner workings of the graph data model in general. Thus, [6] is an elaborate review that covers issues and concepts of graph databases. The paper compares the graph databases and highlights the benefits and issues of each of them. The review also compares the graph databases based on the performance and ease of use of each graph database. Prior to the introduction of a native multi-model database like ArangoDB, applications generally used more than one NoSQL database to implement multi-model database functionality. Managing multiple databases is not an efficient way to develop modern applications. Thus, a need for a single database offering multiple data models is the best solution going forward. Reference [7] illustrates a native multi-model database and explains the concepts revolving around it. The paper refers to ArangoDB the native multi-model database and illustrates the concepts and features with the help of real-life use cases. The use case showcases the capabilities of ArangoDB by
926
R. Belgundi et al.
answering real-life queries which indicate that ArangoDB is here to stay for a long time and can replace the existing old-school databases. ArangoDB or multi-model databases find their best applications in use cases involving hierarchical data. The team of authors of [8] have researched the rise of multi-model databases and have shown the importance of a single multi-model data management platform as the necessity for Big Data-based applications. Based on the conclusion of [8], ArangoDB seems to be the right choice as it is a native multi-model database. The reliability of RDBMS is high but it has few constraints when Big Data steps in. The analyzed native multi-model database has shown the same reliability as RDBMS along with tackling the Big Data problems. Reference [9] compares RDBMS with native multi-model databases conceptually. Specifically, MySQL and ArangoDB are compared as MySQL is the most used RDBMS. It gives a high-level overview of the features and differences of both the database systems. It is extremely crucial to understand the similarities and differences to easily adopt the features and establish clarity regarding ArangoDB as a native multi-model database. All the key features and facilities offered by ArangoDB are explained in a clear, concise manner. The paper concludes by explaining how a native multi-model database fulfills the modern software development requirements. The author of [10] has showcased one of the applications of ArangoDB in the real world by solving a fraud detection use case using graph concepts and AQL queries. This serves as a practical guideline to use ArangoDB in the real-world applications. References [11–14] provide information on technologies and features of ArangoDB namely architecture, data models, concepts of indexing, and query performance. This helps us understand the basics and concepts involved when using ArangoDB in applications. The team of authors of [15] have studied and showcased applications of networking in the engineering and manufacturing industry. They have shown that graph concepts are a good approach to implement engineering use cases and associated informative visualization and analysis.
3 Native Multi-model Database ArangoDB is an open-source, native multi-model database for documents, key-value pairs, and graphs. This database allows users to create high-performance applications by utilizing a simple SQL-like query language or extensions (JavaScript). If necessary, ACID transactions can also be used. ArangoDB can be easily scaled both horizontally and vertically thus, making it apt for distributed systems.
Analysis of Native Multi-model Database Using ArangoDB
927
3.1 Data Storage Entities In ArangoDB, all data records are referred to as documents. Documents can have a single attribute or multiple attributes. ArangoDB organizes documents into entities known as collections. Collections are analogous to tables in relational databases. Collections are made up of documents (records) and can contain one or more documents. The advantage here is that in traditional RDBMS we would have to define the columns before storing the records in the table. In ArangoDB, we can directly create collections as it is schema-free. Collections in ArangoDB are of two types: (a) Document collection (b) Edge collection In the context of graphs, document collections are referred to as vertex collections. Every document inside a collection has _key and _id and _rev attributes. The _key resembles the primary key in RDBMS. The _id attribute is an attribute with a structure like ‘collection_name/_key’. The _id is useful for joining node and edge collections via _from and _to attributes. The edge collections are quite similar to document collections except that they have two special attributes _from, _to—as the names suggest _from points to the parent of the selected node and _to points to the child of the selected node. The _rev attribute in every document maintains the document revision. Collections exist inside the databases. There can be more than one database, and each of them is isolated from the others. The default database is _system which is a special database that cannot be deleted. It is used to manage the creation and deletion of databases and database user management. Databases also consist of Views. Views in ArangoDB are conceptually similar to views in RDBMS but have additional responsibilities and applications. The major use of Views is in the NLP-based feature of ArangoDB called ArangoSearch.
3.2 ArangoDB Graph A graph is a data structure in the programming world whereas in mathematical terms it is a structure of nodes and edges. A node/vertex in the ArangoDB graph is a document in a document collection or a document in an edge collection. ArangoDB Graph edges can be used as vertices. The edge definitions are used to define the edge collections of the graph. The edge collections are similar to relation tables in SQL while the vertex collections are like data tables in SQL. ArangoDB Graphs on (a) Direction of edges, (b) Edge/graph traversal directions, (c) A typical edge document are shown in Fig. 1. ArangoDB graph supports graph algorithms like
928
R. Belgundi et al.
Fig. 1 a Direction of edges, b edge/graph traversal directions, c a typical edge document. Source ArangoDB Graph Course
(a) (b) (c) (d)
Traversal (Inbound, Outbound, DFS, BFS, weighted traversals, etc.) Shortest Path k-Shortest Paths Distributed Iterative Graph Processing (Pregel)
3.3 Arango Query Language (AQL) AQL is used to access and modify data in ArangoDB. The Arango Query Language is a declarative programming language. AQL uses keywords from the English language which makes it easy for any learner to grasp and build queries. Another goal of AQL’s design was client independence, which means that the language and syntax are the same for all clients, regardless of programming language. AQL is a very flexible query language and is case-insensitive as well. AQL query statements are shown in Figs. 2, 3, and 4. (I)
For instance, a simple query to access records of students above 18 years of age.
Fig. 2 A typical AQL query to retrieve data from ArangoDB
Fig. 3 A typical AQL query to update existing records by adding an attribute
Analysis of Native Multi-model Database Using ArangoDB
929
Fig. 4 A typical AQL query to update existing attribute value
Data modification in ArangoDB can be done using the ‘UPDATE’ keyword to modify an existing record, add an attribute, and update existing attribute values. (II) For instance, an AQL query to add an attribute to the existing collection. (III) For instance, an AQL query to update existing attribute values of students.
3.4 ArangoDB Storage Engine At the core of ArangoDB is a storage engine that is responsible for keeping copies of documents in memory, persisting them on the disc, and providing indexes and caches for high-speed queries. The sole storage engine available in ArangoDB versions 3.7 and higher is based on Facebook’s RocksDB. RocksDB is a persistent key-value store that is embeddable. This database has a log structure and is designed for quick storage. As a result of the RocksDB engine’s optimization for huge data sets, steady insert performance may be achieved even when the data set is considerably larger than the main memory. A few advantages of RocksDB are as follows: (a) Document level locks (b) Large data set support (c) Persistent Indexes
3.5 Indexing for Query Performance and Optimization Indexes provide fast access to documents, given the fact that the indexed attribute(s) are used in a query. ArangoDB indexes some system attributes by default. Users are given the flexibility to define additional indexes on the non-system attributes of documents in collections. The names of the index characteristics can be used to generate most user-defined indexes. Some index types allow you to index just one characteristic (such as the TTL index), while others allow you to index many attributes at once. The system attributes (_key, _id, _from, _to) are indexed by default in ArangoDB. Types of indexes in ArangoDB: (1) Primary Index (default for node and edge collections) (2) Edge Index (default for edge collections)
930
(3) (4) (5) (6) (7) (8)
R. Belgundi et al.
Persistent Index TTL (time-to-live) Index Full text Index Multi-dimensional Indexes Geospatial Indexes Vertex Centric Indices
With indexes, we can reduce the number of unnecessary iterations in the query which reduces the query execution time significantly. The queries in AQL are optimized using an AQL query optimizer. The optimization will take place as long as the query result is not affected or modified in any way. The ‘explain’ method in ArangoDB shows the optimized query execution plan and also all the plans generated for query execution by the query optimizer.
3.6 ArangoDB Runtime Errors In case an invalid query is issued to the server, the server will return a parse error if the query is syntactically invalid. During query inspection, ArangoDB can detect such errors and terminate further processing. The error number and message are returned so that errors can be resolved. A query will open every collection it references if the parsing phase is successful. The execution of the query will once more be terminated and an appropriate error message will be returned if any of the referred collections are missing. In some cases, running a query might also result in runtime errors or warnings that aren’t obvious from just reading the query content. This is because queries can use data from potentially heterogeneous collections. A few examples of runtime query errors are as follows: (1) (2) (3) (4) (5)
Division by zero error Invalid numeric value for arithmetic operations Invalid operands for arithmetic operations Invalid operands for logical operations Invalid variable and collection name
3.7 ArangoSearch ArangoSearch is a C++-based full-text search engine. It allows users to utilize a combination of two information retrieval techniques: (I) Boolean and (II) Generalized ranking. Search results can be ranked by relevance using the Vector Space Model in conjunction with BM25 or TFIDF weighting schemes. The view concept is key to ArangoSearch. A single ArangoSearch view can contain documents from many collections, allowing for complicated federated searches even over the entire graph.
Analysis of Native Multi-model Database Using ArangoDB
931
3.8 Arango Foxx Services Arango Foxx Service is an API creation feature of ArangoDB. It is a framework developed in JavaScript. This framework is used to develop and build data-centric HTTP microservices which run inside ArangoDB. Users can use ArangoDB’s Foxx Services to write data access and domain logic as microservices that run directly inside the database and have native data access. The Foxx microservice framework makes it simple to add new HTTP endpoints to ArangoDB’s own REST API using modern JavaScript and the same V8 engine from Node.js and the Google Chrome web browser. Benefits of Arango Foxx Services: (1) Unified Data Storage Logic (2) Reduced Network Overhead (3) Sensitive Data Access We need to create a folder with two files: (1) manifest (JSON file) (2) index (JavaScript file) The manifest JSON file contains nested key-value pair which specifies the engine and ArangoDB version in use. The index JavaScript file is the file in which we are going to set up a Router that connects to ArangoDB Foxx router. The router provides ‘get’ and ‘post’ methods along with request and response objects to handle the incoming request and generate the output response. In this index.js file, we will be creating our APIs written in JavaScript. This folder structure needs to be converted into a zip archive and then needs to be uploaded onto the web interface via the Services tab. Steps to start using the service: (1) Press ‘Add Service’ tab, then select Zip from the dialog’s options. A mount route, which is the URL prefix at which the service will be mounted (for example, /getting-started), must be provided. (2) Your service should be installed as soon as the upload starts once you select the zip archive using the file picker. Otherwise, click the Install button and wait until the box closes and the service appears in the list of services. (3) To view the service’s details, click anywhere on the card with your mount path on the label. (4) Click on the ‘Try it out’ to check if your APIs are working or not.
3.9 Arango Deployment Modes ArangoDB offers the following deployment modes: (1) Single Instance The simple and most basic way of running ArangoDB is the single instance. Basically, it just runs the ArangoDB server binary alone.
932
R. Belgundi et al.
(2) Active Failover In this mode, ArangoDB instances have a leader–follower architecture. One ArangoDB single server instance is the Leader along with one or more instances as followers. There is one supervisor in the form of the ArangoDB Agency which elects the Leader in case of a failure. (3) Cluster a. The cluster architecture of ArangoDB is designed to achieve Consistency and Partition, i.e., CP in the CAP Theorem. It has master/master design and has no single point of failure. With reference to the CAP theorem, in event of a network partition, the database gives a higher priority to internal consistency than availability. Here, ‘master/master’ means that users/clients can send queries to any node and get the same database view regardless of the location of the client machines. ‘No single point of failure’ indicates that even if one machine fails totally, the cluster can continue to service requests. b. Structure of ArangoDB Cluster i. An ArangoDB Cluster is a cluster or collection of ArangoDB instances which communicate via the network. The Agency is a high-availability and resilient key/value store which implements the Raft Consensus Protocol. The configuration of the cluster at all times is stored in the Agency. ii. ArangoDB instances in a Cluster take up the following three roles: 1. Agents (the brain of the cluster) 2. Coordinators (Serves the client) 3. DB-Servers (Data host. Sharding is done here) iii. ArangoDB Cluster can be configured by changing the number of instances (Agencies, Coordinators, DB-Servers) as per the use case. (4) Multiple Datacenters
3.10 ArangoDB Kubernetes Operator ArangoDB has its own Kubernetes Operator called ‘kube-arangodb’ that can be deployed in a Kubernetes Cluster to (I) Manage ArangoDB deployments, (II) For optimal storage performance, enable ‘Persistent Volumes’ on local storage, (III) Set up ArangoDB for inter-datacenter replication.
4 Use Case Analysis In this section, we will illustrate a use case from the engineering industry. In the engineering industry, data of parts (manufactured, sold), manufacturing plants, suppliers, etc. is stored. The data in this space of the industry is currently being collected and stored and no action is taken on using it for adding value to the organization. The data
Analysis of Native Multi-model Database Using ArangoDB
933
in the use case mentioned is related data—parts can belong to a specific location, parts can be supplied by various suppliers, etc. Related data immediately points us to graphs which are used to store relations between nodes via edges. This is what makes graph databases an amazing solution to store, process, analyze, and visualize data from the engineering industry to extract value from it. Another notable example in this industry is the Bill of Materials (BoM) which includes all the parts required in the assembly whose BoM is being analyzed. In this use case, ArangoSearch module’s fuzzy search has been used to search for suppliers based on their names which involves text-based search. Finally, we created APIs for the queries using ArangoDB’s Foxx microservices. Let us look at a typical use case with a few basic queries in the ArangoDB Query Editor as examples as in Figs. 5, 6, 7, and 8: 1. Retrieve parts when inputs are location and part number. 2. Retrieve parts along with their supplier based on part number (uses joins and edge collections). 3. Retrieve Supplier details using ArangoSearch
Fig. 5 AQL query to retrieve part records filtered on location and part number attributes
Fig. 6 AQL query to retrieve part records with their respective suppliers (join functionality)
Fig. 7 AQL query to retrieve supplier records using text-based search
934
R. Belgundi et al.
Fig. 8 Query results for above-mentioned queries
4.1 Query Execution Results 5 Future Scope ArangoDB currently has applications in fraud detection, privacy, building machine learning and AI pipelines, case management for law enforcement authorities, key and certificate management, botanical observatories, etc. This wide range of applications clearly shows that graph databases are here to stay in the long future and ArangoDB is clearly a stand-out among them. Along with these broad applications, there are many features that assist the users in improving performance and efficiency with Big Data. The fine-tuning and optimization parameters for query performance and memory footprint help in building robust and efficient applications. ArangoDB deployment with the Kubernetes operator on cloud platforms is a big advantage that needs to be explored further. Further, many more cloud-native applications will be built and ArangoDB has the potential to be an integral part of them. ArangoDB also has the feature of Smart Joins which is a smart way of organizing data and an example of well-planned data storage architecture. This feature needs to be explored further to attain better query and system performance. In this paper, we have attempted to illustrate a typical engineering and manufacturing industry use case and explored a new application for ArangoDB, the native multi-model graph database. As per [15], for the engineering and manufacturing industry, Bill of Materials (BoM) is a crucial part of their daily work. Networking and Graph concepts are used to display BoM in the form of nodes and edges which has a great potential. This use and others like these can be extended to Graph Machine Learning which can do wonders in the engineering and manufacturing industry. Relevant work in this direction can be done using ArangoDB as it offers a database on cloud and machine learning based on graph concepts.
6 Conclusion In this paper, we have seen the rise of data-centric applications and their relevance in the upcoming future. Big Data has a bunch of aspects like the 5 V’s which is why there is a need to handle this. ArangoDB puts a foot in the right direction by offering a native multi-model database. ArangoDB brings 3 database models under one roof
Analysis of Native Multi-model Database Using ArangoDB
935
coupled with the same query language for all the DB models (AQL). ArangoDB is a complete package in itself, in the sense that it has a simple, lucid, efficient query language; a graph database with well-defined data storage entities; a reliable, easyto-use, and efficient API writing service (Foxx); deployment modes ensuring easy scaling horizontally and vertically; and well-defined indexes to optimize the query performance. ArangoDB also offers NLP-based search engine ArangoSearch which is used for text-based analysis. In addition to the above, ArangoDB also has its own cloud-based environment called ArangoDB Oasis which is used for GraphML in machine learning applications.
References 1. Kunda D, Phiri H (2017) RDBMS vs NoSQL. Zambia ICT J 2. Chen J-K, Lee W-Z (2019) An introduction of NoSQL databases based on their categories and application industries. Algorithms 12:106 3. Amani N, Rajesh Y (2021) Comparative study of open-source NOSQL document-based databases. In: Senjyu T, Mahalle PN, Perumal T, Joshi A (eds) Information and communication technology for intelligent systems. ICTIS 2020. Smart innovation, systems and technologies, vol 195. Springer, Singapore 4. Płuciennik E, Zgorzałek K (2017) The multi-model databases – a review. In: Kozielski S, Mrozek D, Kasprowski P, Małysiak-Mrozek B, Kostrzewa D (eds) Beyond databases, architectures and structures. Towards efficient solutions for data analysis and knowledge representation. BDAS 2017. Communications in computer and information science, vol 716. Springer, Cham 5. Lu J, Holubova I (2019) Multi-model databases: a way to handle a variety of data. In: ACM computing surveys, vol 52 6. Das A, Mitra A, Bhagat SN, Paul S (2020) Issues and concepts of graph database and a comparative analysis on list of graph database tools. In: IEEE conference publication/ICCCI 2020 7. What is a multi-model database and why use it - ArangoDB white paper, April 2020 8. Lu J, Holubová I (2017) Multi-model data management: what’s new and what’s next? In: Proceeding of the 20th international conference on extended databases, Venice, Italy, 21 Mar 2017, pp 602–605 9. Switching from RDBMS to ArangoDB - white paper, August 2018 10. Keen A (2020) Multi-model database identifies fraud at scale 11. ArangoDB. https://www.arangodb.com/docs/stable 12. ArangoDB data model. https://www.arangodb.com/docs/stable/data-modeling-concepts.html 13. ArangoDB indexing. https://www.arangodb.com/docs/stable/indexing.html 14. ArangoDB structure. https://www.arangodb.com/docs/stable/architecture.html 15. Cinelli M, Ferraro G, Iovanella A, Lucci G, Schiraldi MM (2017) A network perspective on the visualization and analysis of bill of materials. Int J Eng Bus Manag
A Review of Deep Learning-Based Object Detection Current and Future Perspectives Museboyina Sirisha and S. V. Sudha
Abstract Object detection is a method to detect and localize the objects present in images and videos and stands as one of the challenging fields of computer vision. Object detection plays a crucial role in multiple real-time applications like video surveillance, autonomous driving, medical image processing, etc. Object detection key challenges such as detecting small objects and addressing class imbalance are addressed with various deep learning detection models. This review article classifies the object detection models into three main categories—Two-stage, One-stage and Transformer based detectors discussing the recent advanced developments in object detection listing some of the most important works in each category. Benchmark datasets used for object detection task with different metrics used for evaluating the performance of object detectors are listed. Performance comparison among various object detectors is plotted showing how advanced detectors achieve better accuracy compared to the existing example detectors. Keywords Object detection · Convolutional neural networks · Two-stage detectors · One-stage detectors · Transformers
1 Introduction One of the foremost and most challenging techniques in computer vision is object detection. This method involves the detection of object instances in images or videos, such as vehicles, people, traffic signs, and common objects. It not only classifies the target object but also locates all instances of objects in images or videos. Among various real world applications, some important ones are autonomous driving, video surveillance, tracking the objects, person identification, medical image M. Sirisha (B) · S. V. Sudha School of Computer Science and Engineering, VIT-AP University, Amaravathi, Andhra Pradesh, India e-mail: [email protected] S. V. Sudha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Shakya et al. (eds.), Proceedings of Third International Conference on Sustainable Expert Systems, Lecture Notes in Networks and Systems 587, https://doi.org/10.1007/978-981-19-7874-6_69
937
938
M. Sirisha and S. V. Sudha
processing, etc. Training a computer to recognize the object’s categories has been challenging until the change of last decade. In this article, a comprehensive analysis is presented comparing deep learning-based detector models—Two-stage, onestage and advanced models of object detection that use attention mechanisms and transformers. Below is a list of what this review has contributed: • An overview of the challenges of object detection and the metrics used to assess its performance. • Herein is presented a comprehensive overview of the newest state-of-the-art detectors. Detectors in this category include two-stage, single-stage, and transformer based models. • Exploring and analyzing the advanced object detectors in single-stage category that are free from anchor generation and transformer based detectors using attention mechanisms. Consequently, the remainder of the paper is structured as follows: In Sect. 2, we discuss the object detection task and its challenges, followed by surveys conducted in the respective area. Section 3 provides benchmark data and evaluation metrics specific to object detection. As part of Sect. 4, we list commonly used architectures and various types of object detection models, as well as advanced models. The results presented in Sect. 5 are compared with those of different widely used object detectors. Finally, future research directions are outlined in Sect. 6.
2 Literature Survey 2.1 Object Detection and Challenges The object detection field in computer vision is one of the most promising areas as it aims to identify, classify, and localize all instances of a particular object in an image. Computer vision applications rely on object detection to deliver the most fundamental information: “What objects are located where?” Classification is the method of recognizing the object category in an image whereas object detection is an extension of classification task that can also create bounding boxes around each instance of an object in a given image by predicting its location. If an image contains a number of object classes, object detection involves identifying its location and scale, if any. Object detectors are intended to find all objects of a given class regardless of camera view, scales, pose, locations, illumination conditions or partial occlusions. An object detection network typically uses an image classification CNN (Convolutional neural network) to detect objects. In the network, an image is passed through various convolutional and pooling layers. As a result, the object’s class is received as output. For each input image, a class is generated, and the final output contains all the objects of the image detected by localizing them. Object detection tasks usually require both spatial and semantic information extracted from deep learning models which assist
A Review of Deep Learning-Based Object Detection Current and Future …
939
in classification and object positioning. Though the field of computer vision had many great achievements from the past decade, the networks that are used still face certain key challenges in many real-time applications such as object localization, viewpoint variations, occlusion, intra-class variation, multi-aspect ratios and spatial sizes, cluttered or textured background.
2.2 Existing Works Earlier object detectors were constructed using hand-crafted feature extraction and matching techniques such as Viola and Jones [1], HOG (Histogram of Oriented Gradients) [2], DPM (Deformable part model) [3], etc. The disadvantages of these models are that they were quite slow, inaccurate and achieved poor performance on unknown datasets. Advances in deep learning and application of convolutional neural networks (CNNs) [4] in the classification of images opened up new perspectives in visual perception. Analyzing color and position characteristics of a target object for object recognition [5] and an improved recognition method based on extracting scene features from remote sensing images [6] are some works using advanced computer vision software for object recognition. Object detection has seen a rapid magnification with the advance of new tools and methods in recent times. Numerous works have been published on object detector surveys in the recent past, as shown in Table 1. The works from [7, 8, 12] are based on object detection models that use convolutional neural networks. Object detection models evaluated on MSCOCO dataset are discussed in [9]. Works [10] and [11] review one-stage and two-stage object detector models. While most of the reviews have focused on two-stage and one-stage object detectors, few works have included object detectors that are anchorfree and very few of them have added attention based transformers. Surveys [13–15] added transformer based detector models in their works along with two-stage and one-stage. Our article examines all frameworks for object recognition, including anchor-free and current transformer based models. All conventional detectors have become obsolete as deep learning models have grown in popularity. And since 2014, numerous object detectors based on deep learning have demonstrated outstanding detection accuracy. Figure 1 depicts the transition of several object detectors from conventional to deep learning-based models, as well as the development of advanced transformer based detectors.
940
M. Sirisha and S. V. Sudha
Table 1 Existing surveys and their works S No Author and reference Year
Key points
1
Zhiqiang and Jun [7]
2017 Object detection is discussed in this paper with respect to regional proposal and regression techniques. A discussion of CNN-based detection methods, however, is limited to the paper
2
Agarwal et al. [8]
2018 A number of deep neural networks-based object detection algorithms were examined and evaluated in this study, including the R-CNN, Fast-R-CNN, Faster R-CNN, FPN, and Mask R-CNN. The attention was mostly on the design distinctions and variances in the datasets utilized by the various models
3
Zhao et al. [9]
2019 This work analyzes object detection models using MSCOCO dataset until 2019. Also included in the paper is a brief description of three common computer vision tasks: identification of pedestrians, detection of salient objects, and detection of faces
4
Jiao et al. [10]
2019 Describes benchmark datasets and existing typical detection models. Presents a systematic overview of one-stage and two-stage object detectors. Traditional and novel applications are also listed
5
Zou et al. [11]
2019 In this paper, historical milestones in detector development, detection datasets, metrics, fundamental system building blocks, speed-up strategies, and advanced detection methods are discussed
6
Bai [12]
2020 Using convolutional neural networks to detect objects in two stages, this paper discusses the state-of-the-art of research. The study also examines the advantages and disadvantages of each algorithm, as well as comparing them
7
Liu et al. [13]
2020 This article compares anchor-based versus anchor-free deep learning detectors comparing and analyzing detector effectiveness. Further it highlights the progress of object detection’s core technologies, including Backbone improvement, NMS optimization, and positive/negative sample imbalance solution
8
Arkin et al. [14]
2021 This paper examines conventional neural networks in relation to object detection techniques, examines convolutional neural networks used in these techniques, and introduces Transformer’s latest novel computer vision technology
9
Zaidi et al. [15]
2022 Object detectors that are based on deep learning are reviewed in this article. This document presents an overview of the benchmark datasets and assessment criteria applied to detection and recognition tasks. The paper also contains lightweight classes for edge devices (continued)
A Review of Deep Learning-Based Object Detection Current and Future …
941
Table 1 (continued) S No Author and reference Year 10
This article
Key points
2022 In this article, we describe the object detection task, common metrics that are applied, and widely used backbone architectures. Our discussion also covers the use of traditional and deep learning-based object detection methods, including anchor-free detectors. In addition, advanced detector models based on the transformer principle are examined showing how they fare better than other state-of-the-art models
Fig. 1 Milestones object detectors
3 Datasets and Metrics for Evaluation 3.1 Pascal VOC Dataset A challenge to advance visual perception was the Pascal Visual Object Classes (VOC) [16]. A standardized image dataset provided by Pascal VOC includes over 20 classes of images that are commonly used for classification, object detection and segmentation tasks. The VOC consists of approximately 5k training images and over 12k labeled items, while the VOC12 consists of approximately 11k training images and over 27k labeled items. The size of the image has a width of 793 pixels, a height of 1123 pixels, and a depth of 3 pixels. At 0.5 IoU, Pascal VOC introduced mAP as a means of measuring the model’s performance.
942
M. Sirisha and S. V. Sudha
3.2 MSCOCO Dataset This dataset enables object recognition and segmentation based on several highquality images from Microsoft, the Microsoft Common Objects in Context, also referred to as MSCOCO [17]. It includes 91 categories of objects in everyday life in their natural environments which are divided into 82 subcategories, among which 5,000 instances have been labeled. This dataset contains a total of 328,000 images of which 2,500,000 instances have been labeled.
3.3 ILSVRC Dataset In 2010, the ImageNet Large Scale Visual Recognition Contest (ILSVRC) was introduced [18]. The dataset was expanded to include over one million photos and 1,000 categories of item classification. More than 500k photos were hand-picked from 200 of these classes for the purpose of object detection.
3.4 Visdrone-DET Dataset VisDrone-DET [19] is a dataset with images taken from an aerial perspective, also known as UAVs (Unmanned Aerial Vehicles). There are training package, validation package and test-challenge package containing 6471 images, 548 images, and 1580 images, respectively. Among the ten classes of data in the Visdrone-DET dataset, there are pedestrians, cars, bicycles, vans, tricycles, trucks, awning-tricycles, motor vehicles, and buses. Figure 2 shows the example images taken from the four datasets.
Fig. 2 Benchmark datasets used for object detection
A Review of Deep Learning-Based Object Detection Current and Future …
943
3.5 Metrics for Evaluation The object should have the right resolution in order to be detected accurately. The range of detection i.e. the maximum distance of the objects being effectively tracked from the cameras should be such that the object consists of minimum recommended pixels. For different annotated datasets used, the performance and accuracy of object detectors are determined by multiple measures including Precision, Recall, Intersection over Union, and Frames per second. There are many metrics available for measuring the accuracy of a detector, but the most common one is average precision (AP). The intersection over union (IoU) method is used to measure the degree of accuracy of detection, which refers to the overlap of the ground truth value with the prediction. IoU values greater than 0.5 are generally considered to be good predictions. There are four possible outcomes that can be categorized in the following manner: • True Positive (TP): The ground truth bounding box detection is labeled as True positive (TP) if it is accurate. As a whole, all the detections that produced IoU values exceeding threshold value of 0.5 are considered as TPs (IoU > 0.5). • False Positive (FP): Relates to the incorrect detections of objects made by the object detector. An FP detection is defined as one whose IoU falls below or equals the threshold value 0.5 (IoU