409 14 20MB
English Pages 849 [850] Year 2023
Lecture Notes in Networks and Systems 613
Sandeep Kumar · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Third Congress on Intelligent Systems Proceedings of CIS 2022, Volume 2
Lecture Notes in Networks and Systems Volume 613
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Sandeep Kumar · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Third Congress on Intelligent Systems Proceedings of CIS 2022, Volume 2
Editors Sandeep Kumar Department of Computer Science and Engineering CHRIST (Deemed to be University) Bengaluru, Karnataka, India
Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, Rajasthan, India
K. Balachandran Department of Computer Science and Engineering CHRIST (Deemed to be University) Bengaluru, Karnataka, India
Joong Hoon Kim School of Civil, Environmental and Architectural Engineering Korea University Seoul, Korea (Republic of)
Jagdish Chand Bansal South Asian University New Delhi, Delhi, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-9378-7 ISBN 978-981-19-9379-4 (eBook) https://doi.org/10.1007/978-981-19-9379-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the 3rd Congress on Intelligent Systems (CIS 2022), held on September 05–06, 2022, at CHRIST (Deemed to be University), Bangalore, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the CIS 2022 through the stringent and careful peer-review process. This book presents novel contributions to Intelligent Systems and serves as reference material for advanced research. We have tried our best to enrich the quality of the CIS 2022 through a stringent and careful peer-review process. CIS 2022 received many technical contributed articles from distinguished participants from home and abroad. CIS 2022 received 729 research submissions from 45 different countries, viz., Algeria, Australia, Bangladesh, Belgium, Brazil, Bulgaria, Colombia, Cote d’Ivoire, Czechia, Egypt, Ethiopia, Fiji, Finland, Germany, Greece, India, Indonesia, Iran, Iraq, Ireland, Italy, Japan, Kenya, Latvia, Malaysia, Mexico, Morocco, Nigeria, Oman, Peru, Philippines, Poland, Romania, Russia, Saudi Arabia, Serbia, Slovakia, South Africa, Spain, Turkmenistan, Ukraine, United Kingdom, United States, Uzbekistan, and Vietnam. After a very stringent peer-reviewing process, only 120 high-quality papers were finally accepted for presentation and the final proceedings. This book presents second volume of 60 research papers data science and applications and serves as reference material for advanced research. Bengaluru, India Kota, India Bengaluru, India Seoul, Korea (Republic of) New Delhi, India
Sandeep Kumar Harish Sharma K. Balachandran Joong Hoon Kim Jagdish Chand Bansal
v
Contents
Patch Extraction and Classifier for Abnormality Classification in Mammography Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parita Oza, Paawan Sharma, and Samir Patel
1
Improving the Performance of Fuzzy Rule-Based Classification Systems Using Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . Shashi Kant, Devendra Agarwal, and Praveen Kumar Shukla
11
Tuning Extreme Learning Machine by Hybrid Planet Optimization Algorithm for Diabetes Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luka Jovanovic, Zlatko Hajdarevic, Dijana Jovanovic, Hothefa Shaker Jassim, Ivana Strumberger, Nebojsa Bacanin, Miodrag Zivkovic, and Milos Antonijevic Towards Computation Offloading Approaches in IoT-Fog-Cloud Environment: Survey on Concepts, Architectures, Tools and Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priya Thomas and Deepa V. Jose Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radhakrishnan Gopalapillai and Shreekanth M. Prabhu Event-Based Time-To-Contact Estimation with Depth Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ankit Gupta, Paras Sharma, Dibyendu Ghosh, Vinayak Honkote, and Debasish Ghose mCD and Clipped RBM-Based DBN for Optimal Classification of Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neha Ahlawat and D. Franklin Vinod
23
37
53
65
79
vii
viii
Contents
Digital Disruption in Major Ports with Special Reference to Chennai Port, Kamarajar Port, and Tuticorin Port . . . . . . . . . . . . . . . . . S. Tarun Kumar, Sanjeet Kanungo, and M. Sekar
89
SmartTour: A Blockchain-Based Smart Tourism Platform Using Improvised SHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 C. L. Pooja and B. N. Shankar Gowda Detection of Starch in Turmeric Using Machine Learning Methods . . . . . 117 Madhusudan G. Lanjewar, Rajesh K. Parate, Rupesh Wakodikar, and Jivan S. Parab A Study of Crypto-ransomware Using Detection Techniques for Defense Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Vyom Kulshreshtha, Deepak Motwani, and Pankaj Sharma Internet of Things (IOT)-Based Smart Agriculture System Implementation and Current Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Amritpal Kaur, Devershi Pallavi Bhatt, and Linesh Raja Physical Unclonable Function and Smart Contract-Based Authentication Protocol for Medical Sensor Network . . . . . . . . . . . . . . . . . 161 Aparna Singh and Geetanjali Rathee Developing Prediction Model for Hospital Appointment No-Shows Using Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Jeffin Joseph, S. Senith, A. Alfred Kirubaraj, and Jino S. R. Ramson Mixed-Language Sentiment Analysis on Malaysian Social Media Using Translated VADER and Normalisation Heuristics . . . . . . . . . . . . . . . 185 James Mountstephens and Mathieson Tan Zui Quen Impact of Feature Selection Techniques for EEG-Based Seizure Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Najmusseher and M. Umme Salma Adaptive Manta Ray Foraging Optimizer for Determining Optimal Thread Count on Many-core Architecture . . . . . . . . . . . . . . . . . . . 209 S. H. Malave and S. K. Shinde Iterated Local Search Heuristic for Integrated Single Machine Scheduling and Vehicle Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Gabriel P. Félix, José E. C. Arroyo, and Matheus de Freitas Modeling Volatility of Cryptocurrencies: GARCH Approach . . . . . . . . . . 237 B. N. S. S. Kiranmai and Viswanathan Thangaraj Digital Boolean Logic Equivalent Reversible Quantum Gates Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Bikram Paul, Nupur Choudhury, Eeshankur Saikia, and Gaurav Trivedi
Contents
ix
Adaptive Modulation Classification with Deep Learning for Various Number of Users and Performance Validation . . . . . . . . . . . . . 273 P. G. Varna Kumar Reddy and M. Meena Video Analysis to Recognize Unusual Crowd Behavior for Surveillance Systems: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 P. Shreedevi and H. S. Mohana Prediction of Drug-Drug Interactions Using Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 W. Mohammed Abdul Razak, R. Rishabh, and Merin Meleet Dynamic Load Scheduling Using Clustering for Increasing Efficiency of Warehouse Order Fulfillment Done Through Pick and Place Bots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Cysil Tom Baby and Cyril Joe Baby Deploying Fact-Checking Tools to Alleviate Misinformation Promulgation in Twitter Using Machine Learning Techniques . . . . . . . . . 329 Monikka Reshmi Sethurajan and K. Natarajan Lane Sensing and Tracing Algorithms for Advanced Driver Assistance Systems with Object Detection and Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 P. C. Gagan Machaiah and G. Pavithra Exploring Open Innovation in the Workplace Through a Serious Game: The Case of Datak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Eleni G. Makri Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using Novel QIRWS-BWO and SAES Techniques . . . . . . . . . . . . . . . . . . . . 379 Y. Jani and P. Raajan Plant Pathology Using Deep Convolutional Neural Networks . . . . . . . . . . 393 Banushruti Haveri and K. Shashi Raj Performance Evaluation of Sustainable Development Goals Employing Unsupervised Machine Learning Approach . . . . . . . . . . . . . . . 407 Indranath Chatterjee and Jayaraman Valadi Performance Analysis of Logical Structures Using Ternary Quantum Dot Cellular Automata (TQCA)-Based Nanotechnology . . . . . 421 Suparba Tapna, Kisalaya Chakrabarti, and Debarka Mukhopadhyay An MLP Neural Network for Approximation of a Functional Dependence with Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Vladimir Hlavac
x
Contents
Evaluation of Sound Propagation, Absorption, and Transmission Loss of an Acoustic Channel Model in Shallow Water . . . . . . . . . . . . . . . . . 455 Ch. Venkateswara Rao, S. Swathi, P. S. R. Charan, Ch. V. V. Santhosh Kumar, A. M. V. Pathi, and V. Praveena A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing BESO Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 O. P. Roy, Sourabh Prakash Roy, Shubham, and A. K. Singh Deep Learning-Based Three Type Classifier Model for Non-small Cell Lung Cancer from Histopathological Images . . . . . . . . . . . . . . . . . . . . . 481 Rashmi Mothkur and B. N. Veerappa Cancer Classification from High-Dimensional Multi-omics Data Using Convolutional Neural Networks, Recurrence Plots, and Wavelet-Based Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Stefanos Tsimenidis and George A. Papakostas Predicting Users’ Eat-Out Preference from Big5 Personality Traits . . . . . 511 Md. Saddam Hossain Mukta, Akib Zaman, Md. Adnanul Islam, and Bayzid Ashik Hossain Smart Accident Fatality Reduction (SAFR) System . . . . . . . . . . . . . . . . . . . 525 Daniel Bennett Joseph, K. Sivasankaran, P. R. Venkat, Srirangan Kannan, V. A. Siddeshwar, D. Vinodha, and A. Balasubramanian Android Malware Detection Against String Encryption Based Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Dip Bhakta, Mohammad Abu Yousuf, and Md. Sohel Rana Machine Learning Techniques for Resource-Constrained Devices in IoT Applications with CP-ABE Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 P. R. Ancy and Addapalli V. N. Krishna Safely Sending School Grades Using Quick Response Code . . . . . . . . . . . . 567 Roxana Flores-Quispe and Yuber Velazco-Paredes Abstractive Text Summarization of Biomedical Documents . . . . . . . . . . . . 581 Tanya Mital, Sheba Selvam, V. Tanisha, Rajdeep Chauhan, and Dewang Goplani NLP-Based Sentiment Analysis with Machine Learning Model for Election Campaign—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Shailesh S. Sangle and Raghavendra R. Sedamkar Heart Problem Detection from Electrocardiogram by One-Dimensional Convolutional Neural Network . . . . . . . . . . . . . . . . . . 613 Prince Kumar, Deepak Kumar, Poulami Singha, Rakesh Ranjan, and Dipankar Dutta
Contents
xi
Deep Monarch Butterfly Optimization-Based Attack Detection for Securing Virtualized Infrastructures of Cloud . . . . . . . . . . . . . . . . . . . . 625 Bhavana Gupta and Nishchol Mishra Artificial Intelligence Technologies Applied to Asset Management: Methods, Opportunities and Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 Saad Kabak and Ahmed Benjelloun Optimizing Reactive Power of IEEE-14 Bus System Using Artificial Electric Field Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Indu Bala and Anupam Yadav IoT-Based Automotive Collision Avoidance and Safety System for Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Dipali Ramdasi, Lokita Bhoge, Binita Jiby, Hrithika Pembarti, and Sakshi Phadatare Computer Vision-Based Electrical Equipment Condition Monitoring and Component Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 R. Vidhya, P. Vanaja Ranjan, R. Prarthna Grace Jemima, J. Reena, R. Vignesh, and J. Snegha Deep CNN Model with Enhanced Inception Layers for Lung Cancer Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699 Jaya Sharma and D. Franklin Vinod Impact of Dimensionality Reduction on Membership Privacy of CNN Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711 Ashish Kumar Lal and S. Karthikeyan Computational Modelling of Complex Systems for Democratizing Higher Education: A Tutorial on SAR Simulation . . . . . . . . . . . . . . . . . . . . 723 P. Jai Govind and Naveen Kumar Efficient Segmentation of Tumor with Convolutional Neural Network in Brain MRI Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Archana Ingle, Mani Roja, Manoj Sankhe, and Deepak Patkar Gradient-Based Physics-Informed Neural Network . . . . . . . . . . . . . . . . . . . 749 Kirti Beniwal and Vivek Kumar Automated Lesion Image Segmentation Based on Novel Histogram-Based K-Means Clustering Using COVID-19 Chest CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 S. Nivetha and H. Hannah Inbarani Real-Time Operated Medical Assistive Robot . . . . . . . . . . . . . . . . . . . . . . . . 777 Ann Mariya Lazar, Binet Rose Devassy, and Gnana King
xii
Contents
Enhancing Graph Convolutional Networks with Variational Quantum Circuits for Drug Activity Prediction . . . . . . . . . . . . . . . . . . . . . . . 789 Pranshav Gajjar, Zhenyu Zuo, Yanghepu Li, and Liang Zhao Improving Pneumonia Detection Using Segmentation and Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801 Ethiraj Thipakaran, R. Gandhiraj, and Manoj Kumar Panda Object Detection Application for a Forward Collision Early Warning System Using TensorFlow Lite on Android . . . . . . . . . . . . . . . . . . 821 Barka Satya, Hendry, and Daniel H. F. Manongga A LSTM Deep Learning Approach for Forecasting Global Air Quality Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 Ulises Manuel Ramirez-Alcocer, Edgar Tello-Leal, Jaciel David Hernandez-Resendiz, and Bárbara A. Macías-Hernández Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Editors and Contributors
About the Editors Dr. Sandeep Kumar is currently a professor at CHRIST (Deemed to be University), Bangalore. Before joining CHRIST, he worked with ACEIT Jaipur, Jagannath University, Jaipur, and Amity University, Rajasthan. He is an associate editor for Springer’s Human-centric Computing and Information Sciences (HCIS) journal. He has published more than 80 research papers in various international journals/conferences and attended several national and international conferences and workshops. He has authored/edited six books in the area of computer science. Also, he has been serving as General Chair of the International Conference on Communication and Computational Technologies (ICCCT 2021, 2022, and 2023) and the Congress on Intelligent Systems (CIS 2022). His research interests include nature-inspired algorithms, swarm intelligence, soft computing, and computational intelligence. Dr. Harish Sharma is an associate professor at Rajasthan Technical University, Kota, in the Computer Science and Engineering Department. He has worked at Vardhaman Mahaveer Open University, Kota, and Government Engineering College, Jhalawar. He received his B.Tech. and M.Tech. degrees in Computer Engineering from Government Engineering College, Kota, and Rajasthan Technical University, Kota, in 2003 and 2009, respectively. He obtained his Ph.D. from ABV—Indian Institute of Information Technology and Management Gwalior, India. He is a secretary and one of the founder members of the Soft Computing Research Society of India. He is a lifetime member of the Cryptology Research Society of India, ISI, Kolkata. He is an associate editor of The International Journal of Swarm Intelligence (IJSI) published by Inderscience. He has also edited special issues of the many reputed journals like Memetic Computing Journal of Experimental and Theoretical Artificial Intelligence Evolutionary Intelligence etc. His primary area of interest is nature-inspired optimization techniques. He has contributed to more than 105 papers published in various international journals and conferences.
xiii
xiv
Editors and Contributors
Dr. K. Balachandran is currently a professor and head of CSE at CHRIST (Deemed to be University), Bengaluru, India. He has 38 years of experience in research, academia, and industry. He served as a senior scientific officer in the Research and Development Unit of the Department of Atomic Energy for 20 years. His research interest includes data mining, artificial neural networks, soft computing, and artificial intelligence. He has published more than 50 articles in well-known SCI-/SCOPUSindexed international journals and conferences and attended several national and international conferences and workshops. He has authored/edited four books in the area of computer science. Prof. Joong Hoon Kim a faculty of Korea University in the School of Civil, Environmental and Architectural Engineering, obtained his Ph.D. from the University of Texas at Austin in 1992 with the thesis “Optimal replacement/rehabilitation model for water distribution systems.” His major areas of interest include optimal design and management of water distribution systems, application of optimization techniques to various engineering problems, and development and application of evolutionary algorithms. His publication includes A New Heuristic Optimization Algorithm: Harmony Search Simulation, February 2001, Vol. 76, pp 60–68, which has been cited over 6,700 times by other journals of diverse research areas. His keynote speeches include “Optimization Algorithms as Tools for Hydrological Science” in the Annual Meeting of Asia Oceania Geosciences Society held in Brisbane, Australia, in June of 2013, Recent Advances in Harmony Search Algorithm in the 4th Global Congress on Intelligent Systems (GCIS 2013) held in Hong Kong, China, in December of 2013, and “Improving the convergence of Harmony Search Algorithm and its variants” in the 4th International Conference on Soft Computing For Problem Solving (SOCPROS 2014) held in Silchar India, in December of 2014. He hosted the 1st, 2nd, and 6th Conference of International Conference on Harmony Search Algorithm (ICHSA) in 2013, 2014, and 2022. He also hosted the 12th International Conference on Hydroinformatics (HIC 2016). Also, he has been serving as an Honorary Chair of Congress on Intelligent Systems (CIS 2020, 2021, and 2022). Dr. Jagdish Chand Bansal is an associate professor at South Asian University, New Delhi, and visiting faculty at Maths and Computer Science, Liverpool Hope University, UK. He obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU, New Delhi, he worked as an assistant professor at ABV—Indian Institute of Information Technology and Management Gwalior and BITS Pilani. His primary area of interest is swarm intelligence and nature-inspired optimization techniques. Recently, he proposed a fission-fusion social structure-based optimization algorithm, spider monkey optimization (SMO), which is being applied to various problems from the engineering domain. He has published more than 70 research papers in various international journals/conferences. He is the editor-in-chief of the journal MethodsX published by Elsevier. He is the series editor of the book series Algorithms for Intelligent Systems (AIS) and Studies in Autonomic, Data-Driven and Industrial Computing (SADIC) published by Springer. He is the editor-in-chief of the International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also
Editors and Contributors
xv
the associate editor of Engineering Applications of Artificial Intelligence (EAAI) and ARRAY published by Elsevier. He is the general secretary of the Soft Computing Research Society (SCRS). He has also received gold medals at UG and PG levels.
Contributors Devendra Agarwal Artificial Intelligence Research Center, Department of CSE, School of Engineering, Babu Banarasi Das University, Lucknow, India Neha Ahlawat Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Modinagar, Ghaziabad, UP, India A. Alfred Kirubaraj Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India P. R. Ancy Computer Science and Engineering Department, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India Milos Antonijevic Singidunum University, Belgrade, Serbia José E. C. Arroyo Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil Cyril Joe Baby Fupro Innovation Private Limited, Mohali, India Cysil Tom Baby CHRIST (Deemed to be University), Bangalore, India Nebojsa Bacanin Singidunum University, Belgrade, Serbia Indu Bala The University of Adelaide, Adelaide, SA, Australia A. Balasubramanian Department of Automobile Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Kirti Beniwal Department of Applied Mathematics, Delhi Technological University, Delhi, India Ahmed Benjelloun National School of Business and Management, University Mohammed Ben Abdellah, Fez, Morocco Dip Bhakta Bangladesh University of Professionals (BUP), Dhaka, Bangladesh Devershi Pallavi Bhatt Manipal University Jaipur, Jaipur, Rajasthan, India Lokita Bhoge MKSSS’s Cummins College of Engineering for Women, Pune, India Kisalaya Chakrabarti Haldia Institute of Technology, Haldia, India P. S. R. Charan Department of ECE, Vishnu Institute of Technology, Bhimavaram, India
xvi
Editors and Contributors
Indranath Chatterjee Department of Computing and Data Science, FLAME University, Pune, India Rajdeep Chauhan Department of CSE, BNMIT, Bengaluru, Karnataka, India Nupur Choudhury Guwahati University, Guwahati, Assam, India Matheus de Freitas Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil Binet Rose Devassy Department of Electronics and Communication Engineering, Sahrdaya College of Engineering and Technology, Kodakara, India Dipankar Dutta University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India Roxana Flores-Quispe School of Computer Science, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru D. Franklin Vinod Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Modinagar, Ghaziabad, UP, India Gabriel P. Félix Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil P. C. Gagan Machaiah ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Pranshav Gajjar Institute of Technology, Nirma University, Gujarat, India; Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan R. Gandhiraj Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Debasish Ghose Indian Institute of Science, Bangalore, India Dibyendu Ghosh Indian Institute of Technology, Kharagpur, Kharagpur, India Radhakrishnan Gopalapillai Department of Computer Science and Engineering, CMR Institute of Technology, Bengaluru, India Dewang Goplani Department of CSE, BNMIT, Bengaluru, Karnataka, India P. Jai Govind CHRIST (Deemed to be University), Bangalore, India Ankit Gupta Intel Labs, Intel Technology, Bangalore, India Bhavana Gupta SOIT, RGPV Bhopal, Bhopal, India Zlatko Hajdarevic Singidunum University, Belgrade, Serbia H. Hannah Inbarani Department of Computer Science, Periyar University, Salem, India
Editors and Contributors
xvii
Banushruti Haveri ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Hendry Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Central Java, Indonesia Jaciel David Hernandez-Resendiz Multidisciplinary Academic Unit ReynosaRodhe, Autonomous University of Tamaulipas, Reynosa, Mexico Vladimir Hlavac Faculty of Mechanical Engineering, Czech Technical University in Prague, Prague, Czech Republic Vinayak Honkote Intel Labs, Intel Technology, Bangalore, India Bayzid Ashik Hossain Charles Sturt University, Bathurst, Australia Archana Ingle TSEC, University of Mumbai, Mumbai, India Md. Adnanul Islam Monash University, Melbourne, Australia Y. Jani Department of Computer Science, Muslim Arts College (Affiliated to Manonmaniam Sundaranar University, Abishekapatti, Tirunelveli-627012), Thiruvithancode, Tamil Nadu, India Hothefa Shaker Jassim Modern College of Business and Science, Muscat, Oman R. Prarthna Grace Jemima Loyola-ICAM College of Engineering and Technology, Chennai, India Binita Jiby MKSSS’s Cummins College of Engineering for Women, Pune, India Deepa V. Jose CHRIST (Deemed to be University), Bangalore, India Daniel Bennett Joseph Department of Automobile Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Jeffin Joseph Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India Dijana Jovanovic College of Academic Studies “Dositej”, Belgrade, Serbia Luka Jovanovic Singidunum University, Belgrade, Serbia Saad Kabak National School of Business and Management, University Mohammed Ben Abdellah, Fez, Morocco Srirangan Kannan Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Shashi Kant Artificial Intelligence Research Center, Department of CSE, School of Engineering, Babu Banarasi Das University, Lucknow, India Sanjeet Kanungo Tolani Maritime Institute, Induri, Maharashtra, India
xviii
Editors and Contributors
S. Karthikeyan Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India Amritpal Kaur Manipal University Jaipur, Jaipur, Rajasthan, India Gnana King Department of Electronics and Communication Engineering, Sahrdaya College of Engineering and Technology, Kodakara, India B. N. S. S. Kiranmai Symbiosis Institute of Business Management, A Constituent of Symbiosis International (Deemed) University, Bengaluru, India Addapalli V. N. Krishna Computer Science and Engineering Department, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India Vyom Kulshreshtha Computer Science and Engineering, Amity University Madhya Pradesh, Gwalior, India Deepak Kumar University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India Naveen Kumar CHRIST (Deemed to be University), Bangalore, India Prince Kumar University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India Vivek Kumar Department of Applied Mathematics, Delhi Technological University, Delhi, India Ashish Kumar Lal Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India Madhusudan G. Lanjewar School of Physical and Applied Sciences, Goa University, Taleigao, Goa, India Ann Mariya Lazar Department of Electronics and Communication Engineering, Sahrdaya College of Engineering and Technology, Kodakara, India Yanghepu Li Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan Bárbara A. Macías-Hernández Faculty of Engineering and Science, Autonomous University of Tamaulipas, Victoria, Mexico Eleni G. Makri Unicaf, Larnaca, Cyprus S. H. Malave Lokmanya Tilak College of Engineering, Navi Mumbai, India Daniel H. F. Manongga Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Central Java, Indonesia M. Meena Department of Electronics and Communication Engineering, Vels Institute of Science, Technology and Advanced Studies (VISTAS), Chennai, India
Editors and Contributors
xix
Merin Meleet R V College of Engineering, Bengaluru, Karnataka, India Nishchol Mishra SOIT, RGPV Bhopal, Bhopal, India Tanya Mital Department of CSE, BNMIT, Bengaluru, Karnataka, India W. Mohammed Abdul Razak R V College of Engineering, Bengaluru, Karnataka, India H. S. Mohana Navkis College of Engineering, Hassan, Karnataka, India Rashmi Mothkur Department of CSE, Dayananda Sagar University, Bangalore, India Deepak Motwani Computer Science and Engineering, Amity University Madhya Pradesh, Gwalior, India James Mountstephens Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia Debarka Mukhopadhyay Christ (Deemed to be University), Bengaluru, India Md. Saddam Hossain Mukta United International University (UIU), Dhaka, Bangladesh Najmusseher Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India K. Natarajan CHRIST (Deemed to Be University), Bangalore, India S. Nivetha Department of Computer Science, Periyar University, Salem, India Parita Oza Pandit Deendayal Energy University, Gandhinagar, India; Nirma University, Ahmedabad, India Manoj Kumar Panda Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India George A. Papakostas MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece Jivan S. Parab School of Physical and Applied Sciences, Goa University, Taleigao, Goa, India Rajesh K. Parate Departmentof Electronics, S. K. Porwal College, Kamptee, Maharashtra, India Samir Patel Pandit Deendayal Energy University, Gandhinagar, India A. M. V. Pathi Department of ECE, Vishnu Institute of Technology, Bhimavaram, India Deepak Patkar Nanavati Hospital, Mumbai, India
xx
Editors and Contributors
Bikram Paul Indian Institute of Technology Guwahati, Guwahati, Assam, India G. Pavithra ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Hrithika Pembarti MKSSS’s Cummins College of Engineering for Women, Pune, India Sakshi Phadatare MKSSS’s Cummins College of Engineering for Women, Pune, India C. L. Pooja Bangalore Institute of Technology, Bengaluru, India Shreekanth M. Prabhu Department of Computer Science and Engineering, CMR Institute of Technology, Bengaluru, India V. Praveena Department of ECE, Vishnu Institute of Technology, Bhimavaram, India Mathieson Tan Zui Quen Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia P. Raajan Department of Computer Science, Muslim Arts College (Affiliated to Manonmaniam Sundaranar University, Abishekapatti, Tirunelveli-627012), Thiruvithancode, Tamil Nadu, India Linesh Raja Manipal University Jaipur, Jaipur, Rajasthan, India Dipali Ramdasi MKSSS’s Cummins College of Engineering for Women, Pune, India Ulises Manuel Ramirez-Alcocer Multidisciplinary Academic Unit ReynosaRodhe, Autonomous University of Tamaulipas, Reynosa, Mexico Jino S. R. Ramson Saveetha School of Engineering, Thandalam, Chennai, Tamil Nadu, India Md. Sohel Rana University of Alabama at Birmingham (UAB), Birmingham, USA P. Vanaja Ranjan Embedded System Technologies, Department of Electrical and Electronics Engineering, College of Engineering - Guindy, Chennai, India Rakesh Ranjan University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India Geetanjali Rathee CSE Department, NSUT, New Delhi, India J. Reena Loyola-ICAM College of Engineering and Technology, Chennai, India R. Rishabh R V College of Engineering, Bengaluru, Karnataka, India Mani Roja TSEC, University of Mumbai, Mumbai, India O. P. Roy Department of Electrical Engineering, NERIST, Nirjuli, Arunachal Pradesh, India
Editors and Contributors
xxi
Sourabh Prakash Roy Department of Electrical Engineering, NERIST, Nirjuli, Arunachal Pradesh, India Eeshankur Saikia Guwahati University, Guwahati, Assam, India Shailesh S. Sangle Thadomal Shahani Engineering College, Mumbai, India Manoj Sankhe MPSTME, NMIMS Mumbai, Mumbai, India Ch. V. V. Santhosh Kumar Department of ECE, Vishnu Institute of Technology, Bhimavaram, India Barka Satya Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Central Java, Indonesia; Faculty of Computer Science, Universitas Amikom, Yogyakarta, Indonesia Raghavendra R. Sedamkar Computer Engineering Department, Thakur College of Engineering and Technology, Mumbai, India M. Sekar Indian Maritime University, Chennai, Tamil Nadu, India Sheba Selvam Department of CSE, BNMIT, Bengaluru, Karnataka, India S. Senith Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India Monikka Reshmi Sethurajan CHRIST (Deemed to Be University), Bangalore, India B. N. Shankar Gowda Bangalore Institute of Technology, Bengaluru, India Jaya Sharma Department of Computer Science and Engineering, Faculty of Engineering and Technology, Delhi-NCR Campus, SRM Institute of Science and Technology, NCR Campus, Modinagar, Ghaziabad, UP, India Paawan Sharma Pandit Deendayal Energy University, Gandhinagar, India Pankaj Sharma Computer Science and Engineering, Eshan College of Engineering, Mathura, India Paras Sharma Indraprastha Institute of Information Technology, Delhi, India K. Shashi Raj ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India S. K. Shinde Lokmanya Tilak College of Engineering, Navi Mumbai, India P. Shreedevi Malnad College of Engineering, Hassan, Karnataka, India Shubham Department of Electrical Engineering, NERIST, Nirjuli, Arunachal Pradesh, India Praveen Kumar Shukla Artificial Intelligence Research Center, Department of CSE, School of Engineering, Babu Banarasi Das University, Lucknow, India
xxii
Editors and Contributors
V. A. Siddeshwar Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Poulami Singha University Institute of Technology, The University of Burdwan, Burdwan, West Bengal, India A. K. Singh Department of Electrical Engineering, NERIST, Nirjuli, Arunachal Pradesh, India Aparna Singh CSE Department, NSUT, New Delhi, India K. Sivasankaran Department of Automobile Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India J. Snegha Loyola-ICAM College of Engineering and Technology, Chennai, India Ivana Strumberger Singidunum University, Belgrade, Serbia S. Swathi Department of ECE, Vishnu Institute of Technology, Bhimavaram, India; Department of ECE, SRKR Engineering College, Bhimavaram, India V. Tanisha Department of CSE, BNMIT, Bengaluru, Karnataka, India Suparba Tapna Durgapur Institute of Advanced Technology and Management, Durgapur, India S. Tarun Kumar Indian Maritime University, Chennai, Tamil Nadu, India Edgar Tello-Leal Faculty of Engineering and Science, Autonomous University of Tamaulipas, Victoria, Mexico Viswanathan Thangaraj Symbiosis Institute of Business Management, A Constituent of Symbiosis International (Deemed) University, Bengaluru, India Ethiraj Thipakaran Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Priya Thomas CHRIST (Deemed to be University), Bangalore, India Gaurav Trivedi Indian Institute of Technology Guwahati, Guwahati, Assam, India Stefanos Tsimenidis MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece M. Umme Salma Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India Jayaraman Valadi Department of Computing and Data Science, FLAME University, Pune, India P. G. Varna Kumar Reddy Department of Electronics and Communication Engineering, Vels Institute of Science, Technology and Advanced Studies (VISTAS), Chennai, India
Editors and Contributors
xxiii
B. N. Veerappa Department of Studies in CSE, University BDT College of Engineering, Davanagere, India Yuber Velazco-Paredes School of Computer Science, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru P. R. Venkat Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Ch. Venkateswara Rao Department of ECE, Vishnu Institute of Technology, Bhimavaram, India R. Vidhya Department of Electronics and Communication Engineering, LoyolaICAM College of Engineering and Technology, Chennai, India R. Vignesh Loyola-ICAM College of Engineering and Technology, Chennai, India D. Franklin Vinod Department of Computer Science and Engineering, Faculty of Engineering and Technology, Delhi-NCR Campus, SRM Institute of Science and Technology, NCR Campus, Modinagar, Ghaziabad, UP, India D. Vinodha Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India Rupesh Wakodikar Department of Electronics, Nevjabai Hitkarini College, Bramhapuri, Maharashtra, India Anupam Yadav Dr BR Ambedkar National Institute of Technology, Jalandhar, Punjab, India Mohammad Abu Yousuf Jahangirnagar University, Dhaka, Bangladesh Akib Zaman United International University (UIU), Dhaka, Bangladesh Liang Zhao Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan Miodrag Zivkovic Singidunum University, Belgrade, Serbia Zhenyu Zuo Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan
Patch Extraction and Classifier for Abnormality Classification in Mammography Imaging Parita Oza, Paawan Sharma, and Samir Patel
Abstract Breast cancer is a fatal disease that affects millions of women worldwide. The number of cases of breast cancer has risen over time. Although it is difficult to prevent this disease, the survival rate can be improved with early detection and proper treatment planning of the disease. Thanks to breakthroughs in deep learning, the progress of computer-assisted diagnosis (CAD) of breast cancer has seen a lot of improvement. Deep neural networks have advanced to the point where their diagnostic capabilities are approaching those of a human specialist. Mammograms are crucial radiological images used to detect breast cancer at its early stage. Mammogram image scaling at the input layer is required for training deep convolutional neural networks (DCNNs) directly on high-resolution images. This could result in the loss of crucial information for discovering medical abnormalities. Instead of developing an image classifier, the idea is to create a patch classifier. The technique for extracting abnormal patches from mammography images is proposed in this research. Patches are extracted from a benchmark and publicly available MIAS dataset. These patches are then used to train deep learning classifiers such as VGG-16, ResNet-50, and EfficientNet-B7. With the patches already included in the CBIS-DDSM dataset, we contrast the results of the MIAS patches. EfficientNet-B7 on CBIS-DDSM patches produced good results (92% accuracy) when compared to other classifiers such as VGG-16 and ResNet-50. We also discovered that ResNet-50 is demonstrated to be quite robust on both datasets. Keywords Mammogram · Deep learning · Patch extraction · Classification
P. Oza (B) · P. Sharma · S. Patel Pandit Deendayal Energy University, Gandhinagar, India e-mail: [email protected]; [email protected] P. Sharma e-mail: [email protected] S. Patel e-mail: [email protected] P. Oza Nirma University, Ahmedabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_1
1
2
P. Oza et al.
1 Background Breast cancer is one of the most common abnormal neoplasms in the female population. Mammography is a referral examination for breast cancer screening [1]. Furthermore, it has been established that the breast cancer survival rate is highly influenced by the stage at which the disease is detected [2]. Computer-aided diagnostic (CAD) systems are being developed for automated breast cancer diagnosis. This approach improves finding accuracy and the capacity to recognize irregularities such as breast mass, calcification, focal distortion, asymmetries. Only professional doctors make final decisions; thus, CAD systems can operate as a second reader designed to aid a radiologist. DCNNs are extensively utilized for cancer detection, classification, and segmentation in medical imaging. Unfortunately, training a network from scratch can take days or weeks and requires a significant amount of computing power [3]. In addition, to avoid overfitting the training dataset, the training procedure for supervised deep CNNs typically necessitates a high number of annotated samples. The most common method for dealing with this issue is transfer learning (TL). The goal is to fine-tune a model that has been pre-trained [4]. In most cases, TL-based convolutional neural networks (CNNs) are utilized to categorize the entire image. Instead, in patch-based classification, we categorize individual patches of an image belonging to a given class before categorizing the whole image based on the patches’ categorization. For training DCNNs directly on high-resolution images, image scaling at the input layer is necessary. This might lead to the loss of vital information to detection medical anomalies. Patch-based classification seeks to answer the following question: “What are the prominent traits of a specific patch that indicate that it belongs to one of two classes?”. Furthermore, image scarcity is a prevalent problem in medical imaging studies. As a result, making several patches from a single image might significantly expand the training set. We present a method for extracting patches from mammograms in this paper. For patch extraction, we used the MIAS [5] dataset. We created a patch-based classifier that uses DCNNs to classify breast mammography images into benign and malignant categories. We employed transfer learning with pre-trained models like VGG-16 [6], ResNet-50 [7], and EfficientNet-B7 [8] to classify patches. We utilized CBIS-DDSM [9] mammography patches as a comparison. The rest of the paper is organized as follows: Sect. 2 presents the literature of the domain. Then, the proposed methodology is showcased in Sect. 3. We discussed the results in Sect. 4. We finally end with the conclusion in Sect. 5.
2 Related Work in the Domain The fields of AI such as machine learning and deep learning have made potential advancements in biotechnology and medical research domain [10]. Despite the fact
Patch Extraction and Classifier for Abnormality Classification …
3
that several authors have proposed using traditional machine learning and computer vision techniques to classify breast abnormalities [11], due to the scarcity of publicly available datasets, the use of deep learning approaches has been limited in the field of breast imaging [12–14]. Due to privacy and regulatory concerns, having significant and correctly annotated picture data is the most frequent problem in the medical research arena [15–17]. In the majority of situations, this issue also results in an overfitting scenario. The majority of the time, convolutional neural networks (CNNs) based on transfer learning are used to classify the entire image. To fit the model input size, we must resize the picture when utilizing the transfer learning approach for the whole mammography. The loss of certain significant picture information during image scaling is undesirable for medical image analysis. This issue may be resolved, and the model performance can be enhanced, by extracting accurate ROI patches similar to model’s input size from whole mammograms and utilizing them for training. While including additional patch samples in the training set, the accuracy of transfer learning technique may also be improved. A patch-based method is proposed by authors of [2] to detect and segment microcalcifications from mammograms. This model comprises two convolutional neural network-based blocks: the detector and the segmentation. The model might be particularly beneficial in the screening situation, where the high number of exams may cause the reader’s attention to be diverted to support diagnosis or restrict the differential diagnosis. Another study [1] proposes a patch-based CNN approach for breast mass detection in full-field digital mammograms (FFDM). Authors look into using transfer learning to adapt to a specific area. Transfer learning is an effective way of transferring knowledge from one visual domain to another. The majority of current research is now based on this concept. Issues like ignorance of semantic characteristics, analysis limitations to the present patch of pictures, missing patches in low-contrast mammography images, and segmentation ambiguity are addressed in [18]. An ensemble learning strategy is presented in the work employing machine learning techniques like random forest and boosting to improve the classification performance of breast mass systems while reducing variance and generalization error. To categorize breast cancer images, [19] proposed a CNN model which can do feature extraction, feature reduction, and classification. The model can classify breast cancer images into three categories: malignant, normal, and benign. A recent survey [20] provided the study of publicly available mammographic datasets and dataset-based quantitative comparison of the most modern approaches. Authors of [21] presented an end-to-end model to identify and classify masses inside the whole mammographic image. They expanded on that work by categorizing local image patches using a pre-trained model on a labeled dataset that offers ROI data. ResNet50 and VGG-16 are used as a pre-trained models in this work. For mammogram classification, a CNN-based patch classifier is proposed in [22]. A curriculum learning technique was applied in the study to attain high levels of accuracy in classifying mammograms. Authors of [23] examined several classification strategies for mammogram classification, divided into four categories based on function, probability, rule, and similarity. Furthermore, the research concentrated on various issues relating to mammography datasets and classification algorithms. Patch-based classifiers
4
P. Oza et al.
are also used for other imaging modalities such as breast histopathology imaging. For the automatic categorization of histological breast image, authors of [24] suggested a patch-based classifier utilizing convolutional neural network (CNN). One patch in one decision and all patches in one decision are two operating modes used in the proposed classification system. Additionally, on the concealed test dataset of histopathology images, authors achieved an accuracy of 87%.
3 Methodology 3.1 Dataset MIAS: MIAS [5] is an ancient and benchmark mammography dataset with 322 images divided into three categories: standard, benign, and malignant. The images in the datasets are all 1024 × 1024 pixels in size. The (x, y) image coordinates of the anomaly’s center and the estimated radius (in pixels) of a circle encompassing the abnormality are supplied as ground truth. CBIS-DDSM: CBIS-DDSM [9] is a standardized and upgraded version of DDSM mammography [25]. The dataset contains roughly 10,000 images of various abnormalities, including normal, benign, and malignant conditions. The dataset is separated into training and testing sets to compare and contrast different methodologies. Due to a lack of memory during training, we picked 6700 images for our work.
3.2 Patch Extraction We adopted a patch-based technique to analyze the input mammograms, anticipating that local information would be adequate to categorize such tiny and limited areas. Furthermore, a patch-based method allowed us to significantly expand the training set and do a proper data augmentation simply. The dataset we utilized is supplied with information on the abnormality’s approximate radius and center. The region of interest was initially extracted using these parameters, and several patches containing benign or malignant information were then extracted. First, we extracted squared patches with N × N dimension and took their annotated labels from the corresponding mammograms. Next, we slide the patch mask over the mammogram with uniform step size and took only those region of interest (ROI) with a certain number of suspicious pixels (see Fig. 1). Then, we used a sliding window algorithm to create multiple patches from the extracted region of interest. We are able to feed models with varied inputs since every patch that is ultimately chosen has a unique visual detail. Algorithm 1 presents the entire process for extracting patches from mammograms.
Patch Extraction and Classifier for Abnormality Classification …
5
Fig. 1 Mammogram with ROI and extracted patches
Algorithm 1 Patch extraction from mammograms Require: Mammogram Images, Annotation file Ensure: Extracted ROI from abnormal mammograms Data: MIAS mammogram datasets Initialization: Patch size, width, height, (X, Y ) coordinates of abnormalities, R (Approximate center of coordinates), cropped ROI for each image detail in annotation file do Do Read images from Benign and Malignant Folders Read width and height of image Find new (x, y) coordinates pairs to generate cropped ROI x1 = X + R − Patch Size if x1 > 0 then x1 = x1 else x1 = 0 y1 = Y + R − Patch Size if y1 > 0 then y1 = y1 else y1 = 0 x2 = X − R + Patch Size if x2 ≤ width then x2 = x2 else x2 = width y2 = Y − R + Patch Size if y2 ≤ height then y2 = y2 else y2 = height ROI = im.crop(x1, y1, x2, y2) end for Use sliding window algorithm to create multiple patches from the extracted region of interests
3.3 Transfer Learning (TL) We adopt transfer learning to develop a patch classifier. We used three pre-trained DCNN models; VGG-16, ResNet-50, and EfficientNet-B7. As shown in Fig. 2, we employed a recent implementation of deep neural networks that features TL to start a new model with specified modifications utilizing parameters from a pre-trained model for a specific purpose. We began by building a foundation model and populating it with weights that had already been trained. The underlying model’s layers are then locked. The output of one (or multiple) layer from the base model is then used to build a new model on top of it. Finally, the new model is trained using MIAS and CBIS-DDSM patches, and the results are compared either.
6
P. Oza et al.
Fig. 2 Patch classifier using transfer learning
4 Result Analysis and Discussion We split the dataset into train and test random splits using the ratio of 70:30. The experiments were run for 100 epochs with a batch size of 128. We considered accuracy and loss as performance parameters for our models. Figures 3 and 4 depict the training and validation performance of all the models, respectively. The graphs show that, in comparison to CBIS patches, none of the three models could perform well with MIAS patches. We also found that the performance of EfficientNet-B7 on CBIS-DDSM
Fig. 3 Model performance on MIAS patches. a, d Accuracy and loss of VGG-16. b, e Accuracy and loss of ResNet-50. c, f Accuracy and loss of EfficientNet-B7
Patch Extraction and Classifier for Abnormality Classification …
7
Fig. 4 Model performance on CBIS-DDSM patches. a, d Accuracy and loss of VGG-16. b, e Accuracy and loss of ResNet-50. c, f Accuracy and loss of EfficientNet-B7
patches is quite good compared to other models used in this study. The models were built up to have the best accuracy and loss. To keep track of the validation loss, early stopping was used. As a result, EfficientNet-B7 has the best accuracy and loss at almost the final epoch on CBIS-DDSM patches (see Fig. 4c, f). Still, the model is suffering from a very classic problem of oscillation, which can be handled by controlling the value of the learning rate. We solely gave further performance characteristics for all the models on CBIS-DDSM and MIAS patches (see Tables 1 and 2). When compared to MIAS patches, the comparison reveals that all models performed better on CBIS-DDSM patches. Table 1 Performance measures of various models on CBIS-DDSM patches Sensitivity Specificity Precision Recall Test accuracy VGG-16 ResNet-50 EfficientNet-B7
0.86 0.69 0.90
0.89 0.93 0.92
0.88 0.90 0.91
0.86 0.69 0.90
Table 2 Performance measures of various models on MIAS patches Sensitivity Specificity Precision Recall VGG-16 ResNet-50 EfficientNet-B7
0.82 0.61 0.84
0.79 0.84 0.77
0.79 0.77 0.78
0.82 0.61 0.84
Test loss
0.88 0.82 0.91
0.44 0.44 0.38
Test accuracy
Test loss
0.84 0.76 0.83
0.52 0.69 0.50
8
P. Oza et al.
5 Conclusion This work uses a patch classifier based on transfer learning to categorize breast abnormalities. When the fine-tuning strategy is utilized, the deep neural model’s classification accuracy increases. We observed that all of the models employed in this work performed better on CBIS-DDSM patches than on MIAS patches, owing to the dataset’s restricted number of images and patches generated from them not being enough to train the massive DCNN models. On both datasets, ResNet-50 is also shown to be relatively stable. Acknowledgements The authors express gratitude to the anonymous reviewers for their invaluable efforts in improving the manuscript quality considerably. The authors also thank the Department of Computer Science and Engineering, Nirma University, Ahmedabad, for providing computing facilities for this study.
References 1. Agarwal R et al (2019) Automatic mass detection in mammograms using deep convolutional neural networks. J Med Imaging 6(3):031409 2. Valvano G et al (2019) Convolutional neural networks for the segmentation of microcalcification in mammography imaging. J Healthcare Eng 2019 3. Oza P, Sharma P, Patel S, Kumar P (2022) Deep convolutional neural networks for computeraided breast cancer diagnostic: a survey. Neural Comput Appl 34:1815–1836. https://doi.org/ 10.1007/s00521-021-06804-y 4. Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data 3(1):1– 40 5. Suckling J, Parker J, Dance D, Astley S, Hutt I, Boggis C, Ricketts I et al (2015) Mammographic image analysis society (MIAS) database v1.21 [Dataset]. https://www.repository.cam.ac.uk/ handle/1810/250394 6. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 7. He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 8. Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition 9. Lee RS et al (2017) A curated mammography data set for use in computer-aided detection and diagnosis research. Sci Data 4(1):1–9 10. Oza P, Sharma P, Patel S (2021) Machine learning applications for computer-aided medical diagnostics. In: Proceedings of second international conference on computing, communications, and cyber-security. Springer, Singapore 11. Litjens G et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88 12. Oza P, Sharma P, Patel S, Adedoyin F, Bruno A (2022) Image augmentation techniques for mammogram analysis. J Imaging 8:141. https://doi.org/10.3390/jimaging8050141 13. Oza P, Sharma P, Patel S (2022) A drive through computer-aided diagnosis of breast cancer: a comprehensive study of clinical and technical aspects. In: Recent innovations in computing. Lecture notes in electrical engineering, vol 832. Springer, Singapore. https://doi.org/10.1007/ 978-981-16-8248-3_19
Patch Extraction and Classifier for Abnormality Classification …
9
14. Oza P, Sharma P, Patel S, Bruno A (2021) A bottom-up review of image analysis methods for suspicious region detection in mammograms. J Imaging 7:190. https://doi.org/10.3390/ jimaging7090190 15. Oza et al (2022) Deep ensemble transfer learning-based framework for mammographic image classification. J Supercomput. https://doi.org/10.1007/s11227-022-04992-5 16. Oza et al (2022) Transfer learning assisted classification of artefacts removed and contrast improved digital mammograms. Scalable Comput Pract Experience 23(3):115–127. https:// doi.org/10.12694/scpe.v23i3.1992 17. Oza et al (2022) A transfer representation learning approach for breast cancer diagnosis from mammograms using efficientnet models. Scalable Comput Pract Experience 23(2):51–58. https://doi.org/10.12694/scpe.v23i2.1975 18. Malebary SJ, Hashmi A (2021) Automated breast mass classification system using deep learning and ensemble learning in digital mammogram. IEEE Access 9:55312–55328 19. Thilagaraj M, Arunkumar N, Govindan P (2022) Classification of breast cancer images by implementing improved DCNN with artificial fish school model. Comput Intell Neurosci 2022 20. Hassan NM, Hamad S, Mahar K (2022) Mammogram breast cancer CAD systems for mass detection and classification: a review. Multimed Tools Appl 1–33 21. Shen L, Margolies LR, Rothstein JH, Fluder E, McBride R, Sieh W (2019) Deep learning to improve breast cancer detection on screening mammography. Sci Rep 9(1):1–12 22. Lotter W, Sorensen G, Cox D (2017) A multi-scale CNN and curriculum learning strategy for mammogram classification. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. DLMIA ML-CDS 2017 2017. Lecture notes in computer science, vol 10553. Springer, Cham. https://doi.org/10.1007/978-3-319-67558-9_20 23. Oza P, Shah Y, Vegda M (2022) A comprehensive study of mammogram classification techniques. In: Tracking and preventing diseases with artificial intelligence. Intelligent systems reference library, vol 206. Springer, Cham. https://doi.org/10.1007/978-3-030-76732-7_10 24. Roy K et al (2019) Patch-based system for classification of breast histology images using deep learning. Comput Med Imaging Graph 71:90–103 25. Heath M et al (1998) Current status of the digital database for screening mammography. In: Digital mammography. Springer, Dordrecht, pp 457–460
Improving the Performance of Fuzzy Rule-Based Classification Systems Using Particle Swarm Optimization Shashi Kant, Devendra Agarwal, and Praveen Kumar Shukla
Abstract A fuzzy system can perform well with uncertain data with inefficiency in defining appropriate parameter values of membership function. There exist certain challenges while selecting the fuzzy membership function parameters that can eventually affect the performance of the fuzzy-based classification system. This paper proposes a particle swarm optimization-based fuzzy rule-based system for classification problems. The proposed model focuses on optimizing the membership function parameters using particle swarm optimization. Particle swarm optimization algorithm optimizes the parameters of the triangular membership function to improve the performance of fuzzy rule-based systems. Two datasets—iris and appendicitis—with various combinations of partition type and rule induction algorithms are used to evaluate the proposed model. Along with the proposed model, a simple fuzzy rule-based system without particle swarm optimization is developed, having the same combination of partition type and rule induction algorithm for comparison. In the case of IRIS dataset, using the combination of hierarchical fuzzy partition, prototyping and Wang– Mendel rule induction algorithm, the developed fuzzy rule-based system attained an accuracy of 96.86%, 96.86% and 97.77% for 10, 20 and 50 particle sizes, respectively. While fuzzy rule-based system using k-means fuzzy partition, prototyping and fuzzy decision tree rule induction algorithm achieved an accuracy of 91.67%, 91.89% and 92.140% for APPENDICITIS dataset with 10, 20 and 50 particles, respectively. The experimental result shows that a particle swarm optimization-based fuzzy rulebased system could significantly improve the accuracy compared to a simple fuzzy rule-based system. Keywords Fuzzy rule-based system · Particle swarm optimization · Tuning of membership function
S. Kant (B) · D. Agarwal · P. K. Shukla Artificial Intelligence Research Center, Department of CSE, School of Engineering, Babu Banarasi Das University, Lucknow, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_2
11
12
S. Kant et al.
1 Introduction The development of expert systems having approximate reasoning and the subjective nature of human knowledge for decision-making is a complex task. Fuzzy logic [1] plays a significant role in the design of such expert systems. The fuzzy expert systems are able to adapt the unknown situations, reason, and deduce new knowledge through experience [2]. Most fuzzy systems can be seen as fuzzy rule-based systems (FRBS) that are applied successfully in many areas like classification [3], control [4], and modeling [5]. A robust and efficient FRBS can be designed by incorporating subjective knowledge into machines. This subjective knowledge is represented as fuzzy if–then rules. A knowledge base is an important component of FRBS that stores fuzzy rules and helps FRBS to conclude fuzzified input and fuzzy-based reasoning [6]. Usually, they have predefined fuzzy membership functions (MF) and fuzzy rules to transform the numeric data into linguistic labels to perform fuzzy reasoning. The linguistic labels are fuzzy sets with suitable MFs. Recently, several fuzzy systems are developed that are capable of automatically generating the fuzzy rules from numerical data. But still, the MFs need to be defined manually or by expert users to make fuzzy systems work properly. In this process, accurately setting the parameters of MFs is a very important task and can be seen as an optimization problem [7, 8]. Several evolutionary algorithms [9, 10] including genetic algorithm [11], particle swarm optimization [12], ant colony optimization [13], and artificial bee colony algorithm [14] were studied in the past and applied successfully to optimize the MFs. In this study, PSO is combined with FRBS to optimize the parameters of fuzzy MF. Various PSO-based fuzzy systems are developed, including finding the optimal hyperparameters for fuzzy time series forecasting of COVID-19 pandemic [15], tuning of fuzzy Tsukamoto MF for the prediction of diabetes mellitus [16], optimized interval type-2 fuzzy system for predictive problems [17], fuzzy optimization of energy management for power-split hybrid electric vehicle based on PSO [18], and fuzzy modeling with PSO for improving the production of biodiesel from microalga [19], etc. In this paper, we have combined the Mamdani [20]-based FRBS and PSO to improve the performance of the fuzzy rule-based classification system. The MF of Mamdani FRBS will be optimized using PSO. The developed model uses predefined fuzzy rules. The initial parameters of MFs are created and then optimized using PSO to improve the overall performance of FRBS. K-means and hierarchical fuzzy partitions (HFP) are used as fuzzy partitioning schemes. The fuzzy rules are induced using Wang–Mendel (WM), fuzzy decision tree (FDT), prototyping rules + WM, and prototyping rules + FDT. The different combination of fuzzy partitioning scheme and fuzzy rule induction algorithm is used to develop the optimized fuzzy model. The remaining part of the paper is structured as follows. Section 2 presents some basic background of FRBS and PSO. Section 3 introduces the methodology used for the proposed model. In Sect. 4, experimental results are shown, and finally, in Sect. 6, conclusions are made.
Improving the Performance of Fuzzy Rule-Based Classification …
13
2 Background This section presents brief background support for the proposed methodology in terms of FRBS and PSO.
2.1 Fuzzy Rule-Based Systems FRBS is an enhancement of the classical rule-based system which develops the knowledge about the problem in the form of fuzzy sets and fuzzy if–then rules. The fuzzy set theory was introduced by Zadeh in 1965 [1]. It deals with the meaning of ambiguous words present in natural language using MF. Formally, the MF determines the degree of membership for each individual of a universal set within a specific range. The MF can be given by µ F : X → [0, 1]
(1)
where µ F is an MF of fuzzy set F and X is a universal set. The larger value of MF denotes a higher degree of membership. The fuzzy if–then rules are a set of antecedents and consequences that map input to output. A formal structure of the fuzzy if–then rule is given as IF input1 is A1 , input2 is A2 , and inputn is An THEN output is B
(2)
where inputi and output linguistic variables have fuzzy set Ai and B, respectively.
2.2 Particle Swarm Optimization PSO is a population-based adaptive evolutionary algorithm for finding optimal solutions [8]. PSO is an optimization algorithm from the family of swarm intelligence techniques. The idea of an algorithm was presented by Kennedy and Eberhart in 1995, which was inspired by the flocking behavior of birds. In PSO, swarm particles move through the search space of an optimization problem, where each particle denotes a candidate solution. Considering every particle as a single particle and a set of the particles as a population, every problem converges at one or few solutions in search space. The movement of particles in search space according to the position and velocity of particles helps in obtaining the optimum solution. The position (D) and velocity (V ) of K th (k = 1, 2, . . . , M) the particle is represented as given in Eqs. (3) and (4), respectively. Dk = dk1 , dk2 , . . . , dkn
(3)
14
S. Kant et al.
Vk = vk1 , vk2 , . . . , vkn
(4)
Initially, the position of all particles is set to some random values in search space. Then, each particle moves from its current position through the search space finding some optimal solution. Each particle (k) needs to maintain personal best position k . Position of best particle in the swarm is maintained as in the search space as Pbest global best position Pg_best . The personal best of kth particle and global best Pg_best swarms is updated till maximum iteration. Velocity and position of kth particle in each iteration is updated using following equation: j j j j j vk = wvk + c1 r1 pbestk − pk + c2 r2 gbesti − pk j+1
di
j
= di + dik+1
j
(5) (6)
j
where vk is the kth particle velocity in ith iteration and pk solution or position of particle k in the ith iteration. w is the inertia weight. c1 , c2 and r1 , r2 are positive constants and random variables distributed uniformly between 0 and 1, respectively.
3 Proposed Method This section presents a robust and optimal fuzzy classification system by tuning the parameters of MF using PSO. Starting with the initial implementation of the FRBS and then PSO is implemented to optimize the parameters of MF to improve the performance. The experimental dataset is read and normalized in the interval of using mathematical formula given by norm = xinput_i
xinput_i − min(input_i) max(input_i) − min(input_i)
(7)
The triangular MF with three linguistic labels (small, medium, and long) as shown in Fig. 1 is used to develop the proposed model. norm The mapping of normalized xinput_i data to a specific MF label is formulated as shown in Eqs. (8)–(10).
norm µsmall xinput
⎧ 0; ⎪ ⎨ = 1− ⎪ ⎩ 0;
norm xinput W
norm xinput < 0 norm ; 0 ≤ xinput ≤W norm xinput
>W
(8)
Improving the Performance of Fuzzy Rule-Based Classification …
15
Fig. 1 Triangular MF labels for small, medium, and long
⎧ norm 0; xinput 1.0 ⎧ norm xinput ≤W ⎨ 0; ⎪
norm W 1 norm µlong xinput = − 1−W − 1 ; W ≤ xinput ≤ 1.0 1−W ⎪ ⎩ 0; norm xinput > 1.0
(9)
(10)
PSO algorithm is used in the proceedings of finding optimal parameters of MF, wherein each iteration of PSO will output a probable parameter set for the proposed model. Figure 1 shows the MF is dependent on parameter W . In each iteration, the optimal parameter W is determined by updating the velocity and position of the particle. The solution of the particle is tested against the complete data sample to determine the fitness of each particle. The fitness is determined by how accurate its solution is in the given data sample. This value ranges between 1 and 0, where 1 means that all of them are classifying correctly and 0 means none of the samples of parameters are accurately classified. Figure 2 presents the framework for proposed model to optimize the MF parameter W as shown in Fig. 1 using defined linguistic labels given in Eqs. (8)–(10). The PSO algorithm used for optimization of MF parameters of FRBS is as follows:
16
S. Kant et al.
Fig. 2 Framework of proposed model
Step 1: Initialize following: Population, maximum iteration Tmax , w, c1 , c2 , r1 , r2 Step 2: Optimize 1. Evaluate fitness function fitti k at each particle position xi k 2. If fitti k > fitti best then fitti best fitti k and pi k =xi k 3. If fitti k > fittg best then fittg best fitti k and pg k =xi k 4. If stopping condition is satisfied, go to Step 3. 5. Calculate particle velocity and update position according to Eqs. (3) and (4). 6. Update iteration as k=k+1 7. Go to Step 2.1 Step 3: Terminate if k > Tmax
4 Experimental Results To implement the model, Jupyter Notebook and Guaje [21, 22] open-source software are used. The two standard datasets are used to evaluate the proposed model as given in Tables 1 and 2. The fuzzy rules are generated according to the schemes given in Table 3, and the PSO parameters used to optimize the MF parameters of FRBS are given in Table 4. Table 1 Description of IRIS dataset
Attributes
5
Instances
150
Output class
3
Instances of Class 1
50
Instances of Class 2
50
Instances of Class 3
50
Improving the Performance of Fuzzy Rule-Based Classification … Table 2 Description of APPENDICITIS dataset
17
Attributes
8
Instances
107
Output class
Table 3 Fuzzy partition and rule induction scheme from dataset
Table 4 PSO parameters
2
Instances of Class 1
86
Instances of Class 2
21
Partition type
Rule induction algorithm
K-means
Wang–Mendel
HFP
Prototyping + Wang–Mendel
K-means
Fuzzy decision tree
K-means
Prototyping + fuzzy decision tree
Parameters
Values
Particle size
(10, 20, 50)
C1
1
C2
2
Inertia weight
0.5
Max iteration
30
R1 , R2
Random()
Two experiments were conducted on separate datasets, i.e., iris and appendicitis to assess the developed model. Initially, the FRBS-based classification model was developed without optimizing the MF using PSO. Then, the developed model is improved by using the PSO algorithm with different particle sizes, i.e., 10, 20, and 50. The accuracy of the proposed model depicted in the experiment is shown in Figs. 3 and 4. Through the above experimental result, it is apparent that the use of the PSO algorithm to optimize the MF parameters of FRBS has significantly increased the accuracy of the proposed model.
5 Result Analysis The fuzzy partition schemes and fuzzy rule induction algorithms were set as in Table 3 to optimize the FRBS MF using PSO parameters as given in Table 4. The combination of k-means and WM scheme on iris dataset attained an accuracy of 97.33%, whereas HFP and prototyping with WM scheme is 97.77% accurate. For the appendicitis
18
S. Kant et al.
(a) Kmeans+WM
(b) HFP+Prototyping Rules+WM Fig. 3 Accuracy of proposed model on IRIS dataset
dataset, k-means and FDT collectively achieved an accuracy of 91.41%, whereas kmeans and prototyping with FDT achieved an accuracy of 92.14%. Table 5 compares the outcome of this paper with optimization of fuzzy MF parameters using genetic algorithm. From Table 5, it is clear that the use of PSO is quite promising to improve the performance of FRBS for optimizing the MF parameters.
Improving the Performance of Fuzzy Rule-Based Classification …
19
(a) Kmeans+FDT
(b) Kmeans+Prototyping Rules+FDT Fig. 4 Accuracy of proposed model on APPENDICITIS dataset
6 Conclusion In this paper, the Mamdani-based FRBS with PSO is proposed to improve the performance of the fuzzy rule-based classification system. PSO is used as an optimization technique to optimize the MF parameter of FRBS to enhance the accuracy of the model. The experimental results in this paper show the proposed PSO-based FRBS achieved better results than the FRBS model without PSO. The performance of the
20
S. Kant et al.
Table 5 Comparison of results Dataset
Partition type
Rule induction algorithm
Accuracy with GA [8]
Accuracy with PSO (50 particle)
IRIS dataset
K-means
WM
95.89
97.33
HFP
Prototyping + WM
96.01
97.77
K-means
FDT
91.0
91.41
K-means
Prototyping + FDT
91.5
92.14
APPENDICITIS dataset
proposed model is improved as the number of PSO particles increases. This paper is the basis for the authors to experiment with many types of MF for each dataset. The comparison result shows the effectiveness of the proposed model.
References 1. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 2. Karry FO, Silva CD (2009) Soft computing and intelligent systems design: theory, tools and applications. Pearson 3. Kuncheva LI (2000) Fuzzy classifier design. In: Studies in fuzziness and soft computing. Springer, Berlin 4. Palm R, Hellendoorn H, Driankov D (1997) Model based fuzzy control. Springer Berlin Heidelberg 5. Pedrycz W (1996) Fuzzy modelling: paradigms and practice. Kluwer Academic Publishers 6. Kandel A (1992) Fuzzy expert systems. CRC Press, Boca Raton 7. Kant S, Agarwal D, Shukla PK (2022) A survey on fuzzy systems optimization using evolutionary algorithms and swarm intelligence. In: Bansal JC, Engelbrecht A, Shukla PK (eds) Computer vision and robotics. Springer, Singapore, pp 421–444 8. Kant S, Agarwal D, Shukla P (2022) Improving the performance of FRBS classification systems using genetic algorithm. Webology 19(3):2724–2739 9. Shukla PK, Tripathi SP (2012) A review on the interpretability-accuracy trade-off in evolutionary multi-objective fuzzy systems (EMOFS). Information 3(3):256–277 10. Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer Berlin Heidelberg 11. Shukla PK, Tripathi SP (2014) A new approach for tuning interval type-2 fuzzy knowledge bases using genetic algorithms. J Uncertain Anal Appl 2(1):1–15 12. Kennedy J, Eberhart R (2019) Particle swarm optimization. In: Proceedings of ICNN’95— international conference on neural networks, vol 4. IEEE, pp 1942–1948 13. Dorigo M (1992) Optimization, learning and natural algorithms. Ph.D. thesis, Politecnico di Milano 14. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-Tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, vol 200, pp 1–10 15. Kumar N, Susan S (2021) Particle swarm optimization of partitions and fuzzy order for fuzzy time series forecasting of COVID-19. Appl Soft Comput 110:107611 16. Pradini RS, Previana CN, Bachtiar FA (2020) Fuzzy Tsukamoto membership function optimization using PSO to predict diabetes mellitus risk level. In: Proceedings of the 5th international conference on sustainable information engineering and technology, pp 101–106
Improving the Performance of Fuzzy Rule-Based Classification …
21
17. Mai DS, Dang TH, Ngo LT (2020) Optimization of interval type-2 fuzzy system using the PSO technique for predictive problems. J Inf Telecommun 5(2):1–17 18. Yin C, Wang S, Yu C, Li J, Zhang S (2019) Fuzzy optimization of energy management for power split hybrid electric vehicle based on particle swarm optimization algorithm. Adv Mech Eng 11(2):168781401983079 19. Nassef AM, Sayed ET, Rezk H, Abdelkareem MA, Rodriguez C, Olabi AG (2018) Fuzzymodeling with particle swarm optimization for enhancing the production of biodiesel from microalga. Energy Sources Part A Recover Util Environ Eff 41(17):2094–2103 20. Mamdani EH (1974) Application of fuzzy algorithms for control of simple dynamic plant. Proc Inst Electr Eng 121(12):1585–1588 21. Alonso JM, Magdalena L (2011) Generating understandable and accurate fuzzy rule-based systems in a java environment. Fuzzy Logic Appl 6857:212–219 22. Alonso JM, Magdalena L (2011) HILK++: an interpretability-guided fuzzy modeling methodology for learning readable and comprehensible fuzzy rule-based classifiers. Soft Comput 15(10):1959–1980
Tuning Extreme Learning Machine by Hybrid Planet Optimization Algorithm for Diabetes Classification Luka Jovanovic , Zlatko Hajdarevic , Dijana Jovanovic , Hothefa Shaker Jassim , Ivana Strumberger , Nebojsa Bacanin , Miodrag Zivkovic , and Milos Antonijevic
Abstract This paper explores hyperparameter optimization and training of extreme learning machines (ELM) applied to diabetes diagnostics. Early detection of diabetes is vital, as timely treatment significantly improves the quality of life of those affected. One of the toughest challenges facing artificial intelligence (AI) is the selection of control parameters suited to the problem being addressed. This work proposes a metaheuristics-based approach for adjusting the number of neurons in a single hidden layer of an artificial neural network in an ELM, as well as the selection of weights and biases (training) of every neuron in the hidden layer. Additionally, an exploration of the planet optimization algorithm’s (POA) potential for addressing NP difficult tasks is conducted. Through the process of hybridization with the firefly algorithm (FA), the POAs’ performance is further improved. The resulting algorithm is tasked with selecting optimal control parameter values for an ELM tackling diabetes diagnostics. A comparative analysis of the ELM tuned by the proposed PAO L. Jovanovic · Z. Hajdarevic · I. Strumberger · N. Bacanin (B) · M. Zivkovic · M. Antonijevic Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] L. Jovanovic e-mail: [email protected] Z. Hajdarevic e-mail: [email protected] I. Strumberger e-mail: [email protected] M. Zivkovic e-mail: [email protected] M. Antonijevic e-mail: [email protected] D. Jovanovic College of Academic Studies “Dositej”, 11000 Belgrade, Serbia e-mail: [email protected] H. S. Jassim Modern College of Business and Science, Muscat, Oman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_3
23
24
L. Jovanovic et al.
firefly search (POA-FS) metaheuristics with other state-of-the-art algorithms tasked with the same challenge strongly indicates that the suggested ELM-POA-FS displays superior performance, clearly outperforming contemporary algorithms tackling the same task that it was tested against. Keywords Extreme learning machine · Planet optimization algorithm · Diabetes · Optimization · Metaheuristics
1 Introduction Personal and public health is essential for leading a productive life, for both the individual and society as a whole. Much effort has been put into improving health, preventing disease, and as a result drastically prolonging life expectancy in the last century. Health is not just the absence of disease but can be defined as a state of completeness of social, mental, and physical well-being. Diabetes is a complex and chronic metabolic condition closely linked to blood sugar levels. Elevated blood sugar levels stimulate the pancreas to release insulin, which regulates blood sugar levels. Insulin deficiency or decreased sensitivity of the insulin receptor, which regulates the entry of glucose from the blood into the cells, plays a major role in the development of various forms of diabetes. Despite many medical advancements, diabetes persists and is one of the most common endocrine diseases of modern times. Major risk factors include diseases, pregnancy, genetics, obesity, drugs, and chemical agents. Diabetes affects many systems in the body and if left untreated can lead to blindness, loss of limbs, and ultimately death. Early detection can greatly improve the quality of life of patients, as with appropriate treatment many severe complications may be avoided. Machine learning (ML) is a type of artificial intelligence (AI) capable of accurately predicting outcomes without explicit programming. Classification in ML is a process of categorizing a set of input data into classes. Many ML methods excel at classification through a supervised learning approach. Algorithms are capable of learning from available data to make more accurate decisions. Recent technological advances in the field of ML have proven to be invaluable to modern medicine, addressing complex tasks. However, despite many advantages, ML methods are not without their shortcomings. Overfitting, incomplete datasets, and the complex task of selecting control parameters suitable to the task at hand are tasks that require experience and expertise on the part of the researcher tackling a specific problem. Extreme learning machines (ELM), originally introduced for optimizing singlelayer feed-forward neural network performance, are a popular choice for researchers, due to yielding excellent results when tackling complex tasks. However, ELM needs to be properly adapted to the specifics of a problem being addressed through the selection of appropriate values for control parameters. This process, traditionally tackled through trial and error, is very complex and is a task considered an NPhard problem. Nevertheless, through the use of a novel meta-heuristic algorithm, the
Tuning Extreme Learning Machine by Hybrid …
25
planet optimization algorithm (POA), this process may be resolved with reasonable resources and time constraints. Furthermore, by introducing mechanisms from a notably well-performing firefly algorithm (FA) [30], the overall performance can be further improved. The contributions of this paper may be summarized as the following: – A proposal for a novel hybrid of the POA and FA that builds on the originals further improving overall performance. – A proposal of a novel ELM-based approach for diagnosing diabetes in patients. – The first application of the POA for optimizing ELM hyperparameters and training, as well as testing of the attained performance. The remainder of this paper is structured according to the following. Section 2 presents an overview of preceding research, covering ELM in detail. In Sect. 3, the original POA is described. The proposed improvements to the original approach and the description of the novel proposed algorithm are covered in Sect. 3.1. The experiments and a brief discussion of the attained as well as a comparative analysis with other contemporary metaheuristics are shown in Sect. 4. Finally, Sect. 5 gives a conclusion to the work and presents plans for future research in the field.
2 Background and Related Work 2.1 Extreme Learning Machine The extreme learning machine (ELM) is applied as a training algorithm for single (hidden) layer feed-forward neural networks (SLFNs). It consists of hidden neurons that are randomly initialized. For analytical determination of the output, the Moore– Penrose (MP) generalized inverse is used for weights. One typical problem with gradient-descent-based algorithms is slow convergence, often not escaping local minima, and the best example is the case back-propagation (BP). A major problem of hidden neuron number is that it needs, as suggested by ELM theory, to be large enough for satisfactory generalization performance. At the core of every hidden neuron are the weights and biases, one of the greatest challenges is their optimization. As an example, when using N = {(xi , ti )|xi ∈ R d , ti ∈ R m , i = 1, . . . , N } with g(x) as the activation function of a SLFN with N hidden neurons, the output can be computed with Eq. (1) L βi (wi ∗ x j + bi ) = y j , j = 1, . . . , N (1) i=1
where wi [wi 1, . . . , wi d], bi displays the input weight and bias of each hidden neuron, respectively, βi = [βi1 , . . . , βim ] shows output weight, and finally the inner product of wi and x j represents wi ∗ x j .
26
L. Jovanovic et al.
Parameter estimates for βi , i = 1, . . . , L are done in accordance with Eq. (2) L
βi (wi ∗ x j + bi ) = t j ,
j = 1, . . . , N
(2)
i=1
As suggested in [18], Eq. (2) can be transformed as shown in Eq. (3)
in which:
Hβ = T
(3)
⎤ (w1 ∗ x1 + b1 ) · · · (wl ∗ x1 + bl ) ⎥ ⎢ .. .. .. H =⎣ ⎦ . . . (w1 ∗ x N + b1 ) · · · (wl ∗ x N + bl )
(4)
⎡
⎡
⎤ β1T ⎢ ⎥ β = ⎣ ... ⎦
(5)
⎤ T1T ⎢ ⎥ T = ⎣ ... ⎦
(6)
β LT
⎡
TNT
Matrix H is the output of the hidden layer. Output weight β can be computed the minimum norm least-square solution according to Eq. (7) β = Ht ∗ T
(7)
in which H t represents the generalized inverse of H (in Moore–Penrose).
2.2 Swarm Intelligence Swarm intelligence, a subfield of AI [23], uses nature as inspiration to develop approaches capable of solving optimization problems. Some notable nature-inspired algorithms include the whale optimization algorithm (WOA) [22], the bat algorithm (BA) [32], elephant herding optimization (EHO) [29], particle swarm optimization (PSO) [14], artificial bee colony (ABC) [20], and more [31]. Furthermore, some recent approaches that provide admirable performance include monarch butterfly optimization (MBO) [16], slime mold algorithm (SMA) [21], moth search algorithm (MSA) [28], colony predation algorithm (CPA) [27], and hunger games search (HGS) [34].
Tuning Extreme Learning Machine by Hybrid …
27
The swarm algorithms are capable of resolving even NP-hard problems (nondeterministic polynomial). Some recent notable examples include applications for general feature selection [4, 38], applications-related COVID-19 [35, 37], task scheduling on cloud-edge environment [1, 7], prediction of crypto-currency values [24], lifetime maximization and localization of wireless sensors networks [6, 26, 36], application with prediction of pollution and health care systems [5], assisted medical diagnosis [8], optimization of artificial neural networks [2, 3, 12, 17, 19], and global numerical optimization [9]. The potential of swarm intelligence for ELM optimization has not yet been fully explored. Particle swarm optimization has been processed in two papers. For flash flood prediction, hybrid PSO-ELM solution was used and it was published in [10]. The algorithm was compared to traditional ML models. Majorly the results were in favor of the proposed solution. For the derivation of hydropower reservoir operation rules, it was used PSO-ELM in research [16]. This work demonstrates the excellent generalization ability of the class-based evolutionary extreme learning machine (CEELM). Additionally, much work has gone into enhancing the accuracy of ELM [11, 15]. The results of the research verify that similar results may be achieved with less time.
3 Planet Optimization Algorithm The laws of physics have served as inspiration for many optimization algorithms. One such algorithm is the POA originally proposed in [25], which models the laws of motion in the universe. Specifically, the POA imitates the universal gravitational laws of Isaac Newton as shown in Eq. (8) − → | F | = G ∗ m 1 ∗ m 2 /R 2
(8)
where F represents gravitational force present between two planets, G is the gravitational constant, R planetary distance, and m 1 , m 2 represent planetary masses. As the use of force F in the optimization process gives less efficient results, moment (M) is used as a parameter in the search process as shown in Eq. (9) − → − → |M| = | F | ∗ R
(9)
The POA can be divided into four stages. In the initial stage, the goal is to find an effective solution. This best solution will play a role to increase convergence and accuracy. Stage two of the POA can be called the M factor. In this step, M can be defined like: mi ∗ m j − → Ri j M = | F |Ri j = G Ri j 2
(10)
28
L. Jovanovic et al.
The mass of planets can be defined like: m j , mi =
1 a obji , j/α
(11)
In this equation, a = 2 and it is constant value, α is defined as | max(obj) − objsun |. The explanation is that the mass of this planet is larger if the objective function value of a planet is smaller. The objective function of the ith or jth planets is values obji , j, max(obj), objsun . The Cartesian distance between two objects i and j with dimensions (Dim) is calculated by the next equation:
Dim
t t (X it − X tj )2 Ri j = ||X i − X j || =
(12)
K =1
Parameter G is equal to unity in this POA. The third stage is global search. The equation for simulating global search is −−→ − −−t+1 → − → → t X i = X it + b ∗ β ∗ r1 X sun − X it
(13)
On the left side of the equation is the current position of a planet i in the (t + 1) − → iteration, while the right side has elements: X it represents the position of a planet i t , and it is a coefficient that depends on M, r1 in the iteration t, β is equal to Mit /Mmax t is a random number in the range of 0–1, b represents constant parameter, and X sun represents the current position of the Sun in the iteration t. The fourth stage is local search. As an example, we use Jupiter and Mercury. Jupiter is a planet with more mass than Mercury, but Mercury is closer to the Sun (the best solution). That is why local search is very important. If we rejected the location of Mercury immediately because of its mass and did not do a local search, we would not be able to find the best solution (Sun). So the purpose of local search is to increase accuracy in a narrow area of search. In the POA, next equation is used for local search: −−t+1 → − → −−→ − → t (14) − X it X i = X it + c ∗ r1 r2 ∗ X sun In this equation, c is equal to c0 − t/T where t is iteration, the maximum number of iterations is T and c0 is constant and equal to 2, r2 represents the Gauss distribution function. Exploration creates many diverse solutions far from the current best solutions, and it can effectively explore the search space. The biggest advantage of global search is that the algorithm rarely gets stuck in a local search space. The biggest disadvantage of global search is slow convergence rates because it wastes time and effort on new solutions that can be far from the best global solution. A search of the global spaces and local spaces runs at the same time. This behavior guarantees the accuracy of the search, with the added benefit of not missing the potential locations. Rmin parameters
Tuning Extreme Learning Machine by Hybrid …
29
play a major role in this behavior. In the case that Rmin is too big, the focus of the algorithm will be on local search in the first iterations. The probability of finding, in this case, a potentially good solution far away from the present solution is hard. In case Rmin is small, the focus is on global search. The best solution for the search may not satisfy expectations. Rmin
Dim 2 = (upi − lowi ) /R0
(15)
1
Rmin is calculated first dividing the search space by 1000 (R0 = 1000). Up and low represent upper and lower bounds. With this structure that contains both local search and global search, the POA promises to be effective and save time in solving complex optimization problems.
3.1 Improved Planet Optimization Algorithm Despite admirable performance demonstrated by the base POA, as per the no free lunch theorem, no approach is equally suited to every problem. Accordingly, further improvements to existing metaheuristics are always possible, and experimental research is required to push discovery forward. Accordingly, extensive testing of the POA on unconstrained benchmark tests indicates that the POA exploitation power may be further improved. Algorithms hybridization is a popular proven technique used by researchers to further enhance performance. The idea behind hybridization is to introduce certain behaviors of another algorithm into an existing algorithm in hopes of the resulting hybrid exhibiting the best characteristics of both algorithms, with resulting performance greater than the sum of its parts. To improve the performance of the original POA algorithm, the firefly algorithm (FA) has been used following extensive experimental testing. The FA metaheuristics demonstrate admirable performance, with a very powerful exploitation mechanism, despite being a very simple algorithm with few control parameters. This work introduces a pseudo-randomly generated parameter ψ with a range of [0, 1]. When the parameter value is below 0.5, the search mechanism of the classic POA is used for the current search iteration. However, when the value exceeds 0.5 the search mechanism of the FA with the dynamic parameter α is used for the current iteration of the search shown in Eq. (16). xi = xi +
γ r2 β0 ei
1 j (x j − xi ) + α rand 2
(16)
where xi is the current position of firefly i and the distance between two fireflies is represented by r and β0 the attraction between fireflies when the distance between
30
L. Jovanovic et al.
them is r = 0. The absorption coefficient of the media is represented by γ , and rand represents a random value between [0, 1]. Finally, α is a randomization factor. A dynamic parameter α that is gradually reduced through iterations is used to slowly direct the search process from exploration to exploitation. Additional details of the mechanisms of function for the original FA algorithm can be found in the research that originally introduced it [33]. Finally, the pseudo-code of the proposed POA firefly search (POA-FS) algorithm can be seen in Algorithm 1. Algorithm 1 Pseudo-code for the proposed planet optimization firefly search (POAFS) algorithm Initialize a set of search solutions Define parameters c, b, Rmin , G, n, α , β and γ Calculate fitness of each solution while t < maximum number of iterations T do Select random value between [0, 1] for ψ if ψ < 0.5 then Update parameter c Calculate moment force of each planet (solution) Calculate distance between each planet and sun Calculate mas of planets (solutions) Search Mmax if Rsun−planeti > Rmin then Calculate r1 , β Perform global search else Calculate r1 , r2 Perform local search end if Calculate best solution Update best solution else Update each solution by using firefly search according to Eq. (16) end if end while Return the best solution obtained so far as the global optimum
4 Experiments and Discussion 4.1 Dataset Description This research makes use of the PIMA Indian Diabetes (PID) dataset, provided by the National Institute of Diabetes and Kidney Diseases center, and is available in the UCI machine learning repository [13]. The dataset contains 768 records of healthy and diabetic female individuals older than twenty-one. The available features cover the number of pregnancies, blood glucose levels, measurements of blood pressure, bicep skin fold thickness, and insulin levels. Additionally, the patient’s body mass index (BMI), age, and diabetes pedigree function describing their hereditary tendency toward developing the condition are given. Finally, whether the patient is diagnosed
Tuning Extreme Learning Machine by Hybrid …
31
Fig. 1 Correlation heatmap
Fig. 2 Outcome distribution
as diabetic is shown as a binary value given as the outcome feature. Apart from diagnosing diabetic patients, this data is used to predict the chances of a patient developing the condition within the next four years. The correlation heatmap of the available features is given in Fig. 1. Furthermore, the datasets possess an innate disbalance, being made up of two-thirds healthy individuals and one-third diabetic. This disbalance is shown in Fig. 2. Due to the datasets being slightly disbalance, minatory class oversampling is applied to compensate. The results, therefore, show both with the Synthetic Minority Over-sampling Technique (SMOTE) applied as well as without it.
32
L. Jovanovic et al.
4.2 Experimental Setup and Comparative Analysis During testing for the comparative analysis, each approach had a set solution population size of 20, and the optimization is carried through ten iterations. Furthermore, due to the innate randomness of these metaheuristics, the results shown are averaged over 50 independent runs to ensure a fair comparison. To demonstrate the improvements made, a comparative analysis is carried out between the original POA, the proposed POS-FA approach as well as other stateof-the-art metaheuristics applied to the same challenge. Each algorithm has been independently implemented and tested with the suggested control parameters’ values in its original papers, and the results are given in Tables 1 and 2 for original and synthetic datasets, respectively. Reported results include best, worst, mean, and median values of reported accuracy along with standard deviation and the number of neurons in the best-generated ELM. The best results are marked in bold text. Reported results indicate clearly that the ELM-POA-FS exhibits on average better performance than other methods. Additionally, improvements over the original method (ELM-POA) can be clearly observed.
Table 1 ELM diabetes diagnostics NO SMOTE—general metrics Metaheuristic ELMELMELMELM-BA ELMPOA-FS POA ABC SCA Best (%) Worst (%) Mean (%) Median (%) Std NN
ELMEHO
84.85 83.55 84.09 83.98
83.98 82.25 83.12 83.12
83.98 83.12 83.55 83.55
83.55 82.68 83.12 83.12
84.42 82.25 83.23 83.12
83.55 82.25 83.01 83.12
83.98 82.25 83.01 82.9
0.005447 30
0.007069 30
0.003535 30
0.004999 30
0.009599 33
0.005447 30
0.007393 31
ELMWOA
ELMEHO
Table 2 ELM diabetes diagnostics SMOTE—general metrics Metaheuristic ELMELMELMELM-BA ELMPOA-FS POA ABC SCA Best (%) Worst (%) Mean (%) Median (%) Std NN
ELMWOA
83.33 82.33 82.83 82.83
82.67 81 81.75 81.67
82 81 81.58 81.67
82.33 81 81.75 81.83
82 81.33 81.67 81.67
82 81.33 81.75 81.83
83 81 81.67 81.33
0.004303 30
0.007391 40
0.004194 54
0.005693 40
0.003849 38
0.003191 56
0.009428 30
Tuning Extreme Learning Machine by Hybrid …
33
Fig. 3 Convergence speed graph for original diabetes dataset
Fig. 4 Convergence speed graph for diabetes dataset expanded by SMOTE technique
The comparison of convergence speed rates between the proposed approach and other contemporary metaheuristics is shown in Figs. 3 and 4 for original and artificial datasets, respectively.
5 Conclusion This paper provides a proposal for a novel metaheuristics-based ELM-POA-FS model, as well as its application to the medical diagnostics of diabetes. Early detection and timely treatment of diabetes can greatly improve patients’ quality of life. Furthermore, the potential of the ELM and POA has not been fully explored for tackling such tasks. Additionally, an improved version of the original POA hybridized with the FA is proposed. The hyperparameters of the ELM have been optimized using
34
L. Jovanovic et al.
the proposed POA-FS algorithm. The proposed model has been tested on a medical diabetes diagnostic dataset, and the results have been compared with contemporary metaheuristics approaches applied to the same tasks of optimizing ELM hyperparameters. The results of the conducted experiments clearly demonstrate the superior performance of the proposed approach applied to the task of diabetes diagnostics, as the proposed approach outperformed all of the other five metaheuristics tested, as well as the original POA applied to the same task. Future work will focus on further exploring the optimization potential of the POA and the algorithms’ abilities to handle different real-world problems, as well as testing the performance when tackling NP-hard problems. Additionally, future work will explore the potential of further improving the original POA and exploring effective applications for the improved derivative algorithms.
References 1. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In: 2019 27th telecommunications forum (TELFOR). IEEE, pp 1–4 2. Bacanin N, Bezdan T, Venkatachalam K, Zivkovic M, Strumberger I, Abouhawwash M, Ahmed AB (2021) Artificial neural networks hidden unit and weight connection optimization by quasirefection-based learning artificial bee colony algorithm. IEEE Access 9:169135–169155 3. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In: Mobile computing and sustainable informatics. Springer, pp 397–409 4. Bacanin N, Petrovic A, Zivkovic M, Bezdan T, Antonijevic M (2021) Feature selection in machine learning by hybrid sine cosine metaheuristics. In: International conference on advances in computing and data sciences. Springer, pp 604–616 5. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain Comput Inf Syst 35:100711 6. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In: International conference on hybrid intelligent systems. Springer, pp 328–338 7. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput Appl 34(11):9043–9068 8. Basha J, Bacanin N, Vukobrat N, Zivkovic M, Venkatachalam K, Hubálovsk`y S, Trojovsk`y P (2021) Chaotic Harris Hawks optimization with quasi-reflection-based learning: an application to enhance CNN design. Sensors 21(19):6654 9. Bezdan T, Petrovic A, Zivkovic M, Strumberger I, Devi VK, Bacanin N (2021) Current best opposition-based learning salp swarm algorithm for global numerical optimization. In: 2021 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 5–10 10. Bui DT, Ngo PTT, Pham TD, Jaafari A, Minh NQ, Hoa PV, Samui P (2019) A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. CATENA 179:184–196 11. Chen H, Zhang Q, Luo J, Xu Y, Zhang X (2020) An enhanced bacterial foraging optimization and its application for training kernel extreme learning machine. Appl Soft Comput 86:105884
Tuning Extreme Learning Machine by Hybrid …
35
12. Cuk A, Bezdan T, Bacanin N, Zivkovic M, Venkatachalam K, Rashid TA, Devi VK (2021) Feedforward multi-layer perceptron training by hybridized method between genetic algorithm and artificial bee colony. In: Data science and data analytics: opportunities and challenges, p 279 13. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml 14. Eberhart R, Kennedy J (1942) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, Australia, vol 1948 15. Faris H, Mirjalili S, Aljarah I, Mafarja M, Heidari AA (2020) Salp swarm algorithm: theory, literature review, and application in extreme learning machines. In: Nature-inspired optimizers, pp 185–199 16. Feng ZK, Niu WJ, Zhang R, Wang S, Cheng CT (2019) Operation rule derivation of hydropower reservoir by k-means clustering method and extreme learning machine based on particle swarm optimization. J Hydrol 576:229–238 17. Gajic L, Cvetnic D, Zivkovic M, Bezdan T, Bacanin N, Milosevic S (2021) Multi-layer perceptron training using hybridized bat algorithm. In: Computational vision and bio-inspired computing. Springer, pp 689–705 18. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501 19. Jnr EON, Ziggah YY, Relvas S (2021) Hybrid ensemble intelligent model based on wavelet transform, swarm intelligence and artificial neural network for electricity demand forecasting. Sustain Cities Soc 66:102679 20. Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comput 214(1):108–132 21. Li S, Chen H, Wang M, Heidari AA, Mirjalili S (2020) Slime mould algorithm: a new method for stochastic optimization. Future Gener Comput Syst 111:300–323 22. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 23. Raslan AF, Ali AF, Darwish A (2020) Swarm intelligence algorithms and their applications in internet of things. In: Swarm intelligence for resource management in internet of things. Elsevier, pp 1–19 24. Salb M, Zivkovic M, Bacanin N, Chhabra A, Suresh M (2022) Support vector machine performance improvements for cryptocurrency value forecasting by enhanced sine cosine algorithm. In: Computer vision and robotics. Springer, pp 527–536 25. Sang-To T, Hoang-Le M, Wahab MA, Cuong-Le T (2022) An efficient planet optimization algorithm for solving engineering problems. Sci Rep 12(1):1–18 26. Strumberger I, Bezdan T, Ivanovic M, Jovanovic L (2021) Improving energy usage in wireless sensor networks by whale optimization algorithm. In: 2021 29th telecommunications forum (TELFOR). IEEE, pp 1–4 27. Tu J, Chen H, Wang M, Gandomi AH (2021) The colony predation algorithm. J Bionic Eng 18(3):674–710 28. Wang GG (2018) Moth search algorithm: a bio-inspired metaheuristic algorithm for global optimization problems. Memet Comput 10(2):151–164 29. Wang GG, Deb S, Coelho LDS (2015) Elephant herding optimization. In: 2015 3rd international symposium on computational and business intelligence (ISCBI). IEEE, pp 1–5 30. Yang XS (2009) Firefly algorithms for multimodal optimization. In: International symposium on stochastic algorithms. Springer, pp 169–178 31. Yang XS (2012) Flower pollination algorithm for global optimization. In: International conference on unconventional computing and natural computation. Springer, pp 240–249 32. Yang XS, Gandomi AH (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 33. Yang XS, Slowik A (2020) Firefly algorithm. In: Swarm intelligence algorithms. CRC Press, pp 163–174 34. Yang Y, Chen H, Heidari AA, Gandomi AH (2021) Hunger games search: visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst Appl 177:114864
36
L. Jovanovic et al.
35. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 36. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In: 2020 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 37. Zivkovic M, Jovanovic L, Ivanovic M, Krdzic A, Bacanin N, Strumberger I (2022) Feature selection using modified sine cosine algorithm with COVID-19 dataset. In: Evolutionary computing and mobile sustainable networks. Springer, pp 15–31 38. Zivkovic M, Stoean C, Chhabra A, Budimirovic N, Petrovic A, Bacanin N (2022) Novel improved salp swarm algorithm: an application for feature selection. Sensors 22(5):1711
Towards Computation Offloading Approaches in IoT-Fog-Cloud Environment: Survey on Concepts, Architectures, Tools and Methodologies Priya Thomas and Deepa V. Jose
Abstract The Internet of Things (IoT) provides communication and processing power to different entities connected to it, thereby redefining the way objects interact with one another. IoT has evolved as a promising platform within short duration of time due to its less complexity and wide applicability. IoT applications generally rely on cloud for extended storage, processing and analytics. Cloud computing increased the acceptance of IoT applications due to enhanced storage and processing. However, the integration does not offer support for latency-sensitive IoT applications. The latency-sensitive IoT applications had greatly benefited with the introduction of fog/edge layer to the existing IoT-Cloud architecture. The fog layer lies close to the edge of the network making the response time better and reducing the delay considerably. The three-tier architecture is still in its earlier phase and needs to be researched further. This paper addresses the offloading issues in IoT-Fog-Cloud architecture which helps to evenly distribute the incoming workload to available fog nodes. Offloading algorithms have to be carefully chosen to improve the performance of application. The different algorithms available in literature, the methodologies and simulation environments used for the implementation, the benefits of each approach and future research trends for offloading are surveyed in this paper. The survey shows that the offloading algorithms are an active research area where more explorations have to be done. Keywords Internet of Things (IoT) · IoT-Fog-Cloud architecture · Computational offloading · Offloading algorithms
P. Thomas (B) · D. V. Jose CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] D. V. Jose e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_4
37
38
P. Thomas and D. V. Jose
1 Introduction The Internet of Things (IoT) is a novel paradigm which has gained wide popularity within a short period of time. Wide range of applications is utilizing the IoT technology for sensing raw data, analysing information and deriving conclusions. The IoT environment is designed with group of sensors, actuators, gateways and communicating network. IoT devices are capable of sensing raw data from the connected environment using the sensing elements and identification entities connected to it. The sensed data will be transmitted to storage layer where data is stored and processed. The storage layer is built using different technologies like cloud data center, edge, micro clouds, fog, etc. The numerous data arriving at this layer will be processed and analysed using sophisticated tools and conclusions and patterns will be driven. The patterns generated will decide the solutions for problems to be investigated. The communication elements and computing elements are responsible for this process. The IoT applications suffer lot of challenges including security issues, power constrained nodes, dependency on remote storage medium, etc. The applications which rely on IoT should be capable of addressing the different challenges to achieve better performance and applicability. This paper thoroughly examines the factors to be considered by applications which utilize IoT application with respect to selection of proper storage structure and data offloading mechanisms. Selection for appropriate storage structure greatly influences the response time, latency, communication cost and performance of application. The paper highlights the need for incorporating edge/fog to the existing IoT-Cloud infrastructure to reduce the latency and communication cost. IoT-Fog-Cloud integration challenges are discussed and different offloading schemes available in literature are discussed and compared to help researchers in choosing the right environment. The open research areas and research directives for data offloading is summarized in the paper. Section 2 discusses the three-tier architecture for integration. Section 3 surveys the existing literatures and compares the existing methods in literature based on different parameters. Section 4 discusses open research areas and research directives. The survey concludes that the IoT-Fog-Cloud integration is an area where active research initiatives have to be strongly done to make the environment more robust and efficient.
2 IoT Storage Structures 2.1 IoT-Cloud Architecture The applications connected to Internet of Things are growing in an exponential rate demanding huge amount of data storage capacities and processing power. However, due to different challenges faced by the application such as lack of proper supportive resources for storage, analysis and data processing, IoT always adopts other storage mediums for storage and computation. Cloud computing is widely used to support
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
39
Fig. 1 IoT-Cloud infrastructure
IoT applications based on pay as per use policy. This feature attracts many IoT applications, and the dependency is increasing day by day. The cloud will collect the raw data arriving from sensors, filter them and apply data analytics to derive solutions to different problems. This integration has greatly helped resource constrained IoT nodes and has improved the performance of IoT applications. The IoT-Cloud integration model is depicted in Fig. 1. The lower-most IoT layer acts as the sensing layer to sense different parameters of connected environment. The data moves to middle network layer. The cloud architecture in network layer provides support for data storage, analysis and processing. The response generated by cloud layer will be transmitted to applications connected in application layer. The cloud computing offers wide range of services to connected applications. Even though cloud looks like a promising platform, it suffers from few serious drawbacks. For latency-sensitive IoT applications, the support offered by cloud will not be much beneficial due to delay in generating response as the geographical distance between the cloud data center and the end-user is very high. The large volume of raw data generated by different sensors may crowd the data center and increase congestion and traffic delaying the response further. The cloud data center also suffers from single point of failure, where the IoT applications will not get proper response if the cloud data center is down for different reasons. The data reliability and availability will be affected by this. The data privacy policies also greatly depend on cloud provider. The IoT-Cloud integration suffers from these drawbacks, and research on this field has resulted in integration of edge/fog layer between IoT and cloud. In edge/fog, computation is performed at the edge of the network in a distributed environment supported by multiple nodes including gateways, routers, mobile fog nodes and edge servers [1]. This reduces the longer response time and latency issues. Edge computing
40
P. Thomas and D. V. Jose
is also referred as cloudlets, micro data centers and fog computing. The term edge/fog computing is used interchangeably in literature.
2.2 IoT-Fog-Cloud Architecture The fog/edge computing model shifts the computing resources from remotely located cloud data center to the edge of the network. The goal behind introduction of this concept was to provide support for numerous applications having lower latency requirements without compromising efficiency and performance. Fog computing is a term coined by Cisco in 2014 to describe the decentralization of computing architecture by bringing the cloud to the ground. Fog computing helps to push the processing power from central cloud data center to the organization premises by improving the security and reliability. The fog computing helps to reduce delay jitter in transmission and increases privacy as data lies within the boundary of the organization. The security policies and privacy principles can be defined by the organization making it more reliable and trustworthy. The fog architecture is also distributed in nature reducing the problem of single point of failure. The time sensitive IoT applications can completely rely on edge/fog computing to get response quicker without much delay compared to remote cloud. The comparison between different features of cloud, edge and fog is summarized in Table 1. The comparison table shows that edge and fog computing can address the latency issues faced by cloud implementation. The table also concludes that edge and fog have minimum difference in comparison. The edge/fog can be incorporated in IoTCloud architecture to improve the performance of the application. The IoT layer senses data and transfers to fog layer where data filtering and processing happens and response will be sent to IoT layer. The data can be further transmitted to cloud layer for permanent storage. The three-tier architecture for integration is shown in Fig. 2. Theoretically, the fog computing framework can be defined as an N-tier architecture with distinct layers [2] as shown in Fig. 2. The lower-most tier includes end Table 1 Comparison between cloud, edge and fog Parameter
Cloud computing
Location of data server Remote
Fog computing
Edge computing
Edge of the network
Edge of network/device itself
Response time
High
Low
Low
Architecture
Centralized
Distributed
Distributed
Level of security
Cannot be defined
Defined
Defined
Network latency
Very high
Low
Low
Bandwidth usage and congestion
High
Very low
Low
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
41
Fig. 2 IoT-Fog-Cloud architecture
devices such as different smart devices and smart phones. The middle layer is fog layer which is distributed in nature with multiple fog nodes and the cloud layer serves as the upper layer. The three-tier architecture is gaining popularity in IoT applications as it is capable of minimizing the drawbacks of cloud alone architecture. While considering the three-tier architecture, different challenges have to be addressed. Due to its distributed nature, offloading the data efficiently among available fog nodes is a challenging task. The large volume of data generated by various IoT nodes has to be properly distributed among the available fog nodes in a systematic way. Proper algorithms have to be used to efficiently offload data. Computational offloading technique is greatly recommended to minimize latency, conserve energy and increase the battery life without compromising the performance and other user requirements [2]. Offloading can be performed in multiple ways depending on the type of platform used, decision-making strategy followed and partition used. Figure 3 shows the different offloading approaches. The offloading can be done on cloud-based architecture, edge/fog based or in hybrid architecture. The offloading can be implemented fully or partially depending on the requirement. The offloading strategies can be implemented statically or dynamically. The decision-making on how to offload can be set statically for participating nodes or dynamically depending on the available workload. Dynamic offloading requires proper understanding of the environment and workload distribution. Reinforcement learning based and meta-heuristic-based algorithms are mainly used for dynamic data offloading. Active research is going on in this area to find the optimum solution for task offloading. Sections 3 and 4 thoroughly examine the features of different offloading algorithms available in literature and compare the advantages and issues associated with existing frameworks.
42
P. Thomas and D. V. Jose
Fig. 3 Offloading approaches
3 Literature Review The IoT-Edge/Fog-Cloud integration rapidly gained popularity among IoT applications as the middle layer integrated at the edge of the network easily handles time sensitive requests and responds back with minimum delay. The active research in this area mainly focusses on the algorithms to be used for offloading the generated data to available fog nodes. The load balancing algorithms play a significant role in managing the network traffic and congestion. Proper utilization of available fog nodes is highly essential to improve performance of the connected applications. Different techniques used for data offloading in the past few years are thoroughly examined in this section. The outcomes of each approach, simulation environment used and the pros and cons in each experiment are compared to generate future research directives. Maiti et al. designed an effective approach for the fog smart gateways deployment for IoT services where the effect of number of fog nodes on latency reduction was evaluated by using different algorithms. The methods such as randomized method, greedy method, K-Median method, K-Means method and Simulated Annealing techniques were used to select a number of fog nodes randomly [3]. The results show that increased number of fog nodes decreases latency problems considerably. Simulated Annealing technique gave the minimum delay in fog node selection based on the experiments done in MATLAB R2017b. A publish/subscribe-based edge computational model was proposed by Veeramanikandan and Sankaranarayanan. The paper proposes an edge computing model for latency reduction by updating the existing
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
43
Message Queuing Telemetry Transport (MQTT) protocol. The model included fogbased remote broker and a cloud-based main broker [4]. The experimental results show that the proposed system reduced latency compared to the traditional MQTT approach. The paper needs application use-cases to be tested in real time to evaluate real-time performance. Shahzadi et al. discuss enabling IoT/5G for latencysensitive IoT applications. The model used the swap matching algorithm to give a self-organizing decentralized solution to the problem of resource allocation [5]. Here, each IoT device will be allocated a random fog node. The devices and fog nodes can update their preferences and utilities for current matching. Experimental analysis shows that the proposed task allocation mechanism gains better throughput with lower latency. But the paper appears to be more theoretical with less implementation details. Baek et al. simulate offloading in a fog environment using a reinforcement learning-based load balancing algorithm. The proposed method considers the offloading problem as a Markov decision process (MDP). To solve the MDP problem, this paper uses the Q-learning approach along with a greedy algorithm. The results show that Q-learning-based offloading decision-making helps to reduce processing time and overload probability [6]. The paper explains simulation set-up for experiment but needs to incorporate model-based learning for bias-free results. Alli and Alam proposed a model named SecOFF-FCIoT which is a machine learning driven secure offloading technique in Fog-Cloud of things. The paper uses a neuro-fuzzy model to make data secure at the smart gateway. It also uses particle swarm pptimization (PSO) technique to enable IoT devices to select a fog node which is optimal and tries to distribute its workload [7] to the selected fog node. Simulation results using the smart city application use-case show that the proposed scheme reduces latency and energy consumption. Chen et al. consider industrial IoT (IIoT) application and design a dynamic energy optimal offloading mechanism utilizing the fog computing. The energy minimization computation offloading problem is formulated and solved using gradient algorithms. The simulation output shows that the method has better performance in terms of energy consumption, completion time and convergence rate [8]. However, due to partial offloading, the paper needs machine learning-based algorithms to get an idea about network conditions in advance. Liu et al. use an edge computing model for distribution of resources in IoT networks using multi-agent reinforcement learning. The implementation first formulates a joint optimization problem considering the multi-user computation offloading and resource distribution. The stochastic game-based solution is designed for solving the problem. Simulation results show that the proposed Q-learning approach achieves reduced system cost and better computation ability [9]. Aburukba et al. use Integer Linear Programming (ILP) technique for scheduling Internet of Things requests with the aim to reduce latency in IoT-Fog-Cloud architecture. Here, modelling of minimum service time for IoT requests is done using the ILP [10]. The genetic algorithm (GA) is used to design solution for the problem. The paper guarantees improved latency compared to existing algorithms. Multiple objective functions need to be considered as future work to maximize resource utilization and reduce latency. Sun et al. formulate a resource allocation and task offloading
44
P. Thomas and D. V. Jose
framework which is energy and time efficient termed as ETCORA algorithm [11]. Here, an ETC minimization problem is formulated and solved using proposed algorithm. The experimental analysis shows that the average energy consumption and completion time of proposed method is less compared to existing ADMMD algorithm and DMP algorithm. But paper lacks more comparisons and realistic implementation. Yu et al. experimented offloading based on intelligent game-based model. Here, a hierarchical game model integrated with fictitious play is used to solve the Nash Equilibrium (NE) of the system, where NE is the concept that determines the optimal solution in a non-cooperative game [12]. The methodology has implementation background, and results show that the algorithm has a better performance gain. Paper needs to work on multi-hoping for reducing the task delay as future work. Abbasi et al. in their work use multi-objective genetic algorithm for resource allocation in IoT-Fog-Cloud architecture. The paper considers the trade-off between the fog energy consumption and transmission delay of cloud. The paper uses NSGAII algorithm for solving the offloading problem [13]. The numerical results show that algorithm improves the energy consumption and delay. The paper considers only two parameters and needs to work on few more parameters to improve performance. Aljanabi and Chalechale try to improve the IoT services using a hybrid offloading approach in the Fog-Cloud (HFCO) environment. In the HFCO framework, based on the requirement, an IoT node can offload the data to fog or cloud if it cannot handle the load. The problem is formulated as a Markov Decision Process (MDP) [14] and solved using the Q-learning-based algorithms. Numerical analysis shows that the designed approach has a better role in reducing delay and offers better load balancing. The paper considers the methods for reducing the communication overhead and reduction of time taken for offloading decisions as future works. Elgendy et al. experiment reinforcement learning-based algorithms for computation offloading and task caching. Here, a nonlinear problem is formulated with the goal to reduce the total overhead of time and energy [15] and solved using Q-learning and Deep-Q-Network-based algorithms. Experimental evaluations show that the designed framework can minimize the mobile devices overhead efficiently and in a reasonable way. Shahidinejad et al. consider context aware offloading usecase and propose a federated learning-based approach. The paper states that the offloading decisions are greatly impacted by the context information. The deep reinforcement-based learning method is used here [16]. The simulation results indicate that context aware offloading is superior to other offloading techniques in terms of energy consumption, cost of execution, network usage, delay and fairness. The paper has the novelty of including context information and also considers more than two parameters. But the federated learning approach is highly vulnerable to security attacks which need to be addressed for better applicability. Meena et al. proposed computational offloading considering the healthcare application use-case in fog computing. The paper focuses on incorporating a securitybased offloading scheme using a technique termed as TEFLON which targets the trustworthy applications [17]. The proposed system addresses the security and trust issues using the algorithms optimal service offloader and trust assessment. The experimental analysis shows that TEFLON framework provides low latency for
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
45
trustworthy applications. The paper needs experimentation with secure lightweight algorithms. Lin et al. use deep reinforcement learning method-based computation offloading strategy for connected and autonomous vehicles. This paper considers task dependencies, vehicle mobility, etc., and formulates the problem as a Markov Decision Process. The Deep Q-Network based on Simulated Annealing (SA-DQN) algorithms [18] is used to find the optimal solution. Experimental analysis shows that the proposed strategy minimizes the offloading failure rate and the total energy consumption considerably. The method needs to be more flexible for applying in real world applications. Almutairi and Aldossary use a fuzzy logic algorithm-based approach for IoT tasks offloading in Edge-Cloud environments. This paper tries to minimize the service time for delay sensitive applications. The parameters such as CPU demand, network demand and delay sensitivity [19] are considered in evaluation. The simulation results show that the proposed methodology reduces overall service time for delay sensitive applications and efficiently utilizes the available Edge-Cloud resources. The authors aim to include more computational resources in future to increase applicability. Shakarami et al. designed a computation offloading strategy using deep learning in Mobile Edge Computing. In this approach, an offloading framework is designed using a MAPE-K-based control loop. The formulated computation offloading problem is solved using an autonomic computation offloading algorithm [20]. Experimental analysis shows that the proposed hybrid model aids better offloading decision-making and provides reduced latency and lower energy consumption. The paper lacks support for implementing mobility features. Tran-Dang and Kim propose a technique named FRATO: Fog Resource-Based Adaptive Task Offloading for Delay-Minimizing IoT Service Provisioning. The paper proposes different strategies for task offloading in Fog-Cloud integration [21]. The paper suggests two distributed resource allocation mechanisms, TPRA and MaxRU, to effectively handle resource usage conflicts. The simulation results show that the proposed method reduces average delay significantly in the heterogeneous fog environment with high service requests. The solution for data fragmentation issues and fog node distribution problems can improve applicability of algorithms. Rezvani et al. experiment meta-heuristic algorithms for delay-aware optimization for task offloading in fog environments. In this paper, the task offloading problem is formulated using parameters such as delay and total power consumption. The formulated optimization problem is solved using two meta-heuristic methods such as NSGA-II and the Bees algorithm [22]. The simulation results show that the hybrid approach significantly reduces the latency and energy consumption. The work needs to consider a few more parameters to address node failure issues and needs to consider deadlines in task-based offloading for better results. Kishor and Chakarbarty experiment data offloading in fog computing using the Smart Ant Colony Optimization algorithm. The paper uses a meta-heuristic scheduler Smart Ant Colony Optimization (SACO) algorithm to offload the tasks in a fog environment [23]. The simulation results are compared with other existing algorithms and the results show that the proposed algorithm achieves much less latency than the compared approaches. The results can be made more effective in future by considering a few more parameters like network cost, power consumption, etc.
46
P. Thomas and D. V. Jose
Table 2 analyses the features of each method available in literature. The evaluation parameters used include design methodology, advantages, challenges and architecture used in implementation for delay sensitive IoT applications. Different methodologies and algorithms used to offload data in three-tier architecture in order to reduce latency and response time are analysed and evaluated as given in Table 2. The table compares relevant papers found in literature based on different parameters. It summarizes that the majority of the offloading approaches found in the literature can be basically classified into two types: RL-based methods and meta-heuristicbased approaches. RL-based methods implement Q-learning or deep learning to make offloading decisions. The complexity will be comparatively high in RL-based methods. Meta-heuristic-based methods found in literature works by formulating optimization problems based on different parameters and solving them using genetic algorithms like NSGA-II. Most of the papers use genetic algorithms by varying the parameters chosen with few exceptions which use particle swarm optimization techniques. Few papers have listed novel methods which are usually hybrid approaches which can improve the overall system performance. More research has to be done in the field of hybrid approaches to understand their role in improving the offloading performance. The future directions of each paper are purposefully incorporated into a table to guide researchers on areas which can be focused for further research. The table also shows that implementation architecture chosen for experiments includes both Edge-Cloud and Fog-Cloud architectures in almost equal proportion. The choices of simulation tools used in reviewed papers are modelled as a pie chart as shown in Fig. 4 to create an awareness in researchers about tools and techniques available for offloading. The pie chart shows that MATLAB is used as a simulator in 44% of works, and CloudSim extensions (iFogSim, EdgeCloudSim) are employed in 28% of reviewed papers. Python-based simulators are used in 17% of papers and 11% of works uses other simulators like NS3, Lingo software, etc., to simulate their works. The chart shows that MATLAB and CloudSim simulators are popular choices for implementation. The thorough analysis of different papers strictly shows that there exists a visible research gap in methods and simulators used in offloading techniques. Most of the RL-based methods discussed in literature consider only one or two parameters for optimization. The output will be better if more parameters are taken into consideration. Extensive research has to be done in incorporating multiple parameters without increasing the complexity in order to overcome this research gap. The survey also implies that most of the papers rework on existing algorithms by varying a few parameters with few exceptions. Novelty in methodology is less in terms of offloading algorithms chosen. This is another research gap identified which has to be widely researched and explored. Novel algorithms and hybrid models should be experimented to overcome the existing research gap. Optimization problems can be solved using various meta-heuristic algorithms which are also least explored. More work has to happen in experimenting a wide range of meta-heuristic algorithms in solving the offloading problem. Offloading using AI and machine learning-related techniques is also under explored. The usage of simulators for experimentation shows that most
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
47
Table 2 Comparison of offloading approaches S. Research No. authors
Methodology
Advantages
Challenges/future Architecture directions
1.
Aljanabi and Q-learning Chalechale [14]
• Better load • Reduce Edge-Cloud balancing communication overhead • Reduced delay • Reduce decisionmaking time
2.
Yu et al. [12] Hierarchical game model
• Better load balancing • Less computation cost
• Reduce the task Edge-Cloud delay
3.
Aburukba et al. [10]
Integer Linear Programming (ILP) Technique and genetic algorithm
• Low latency • Better achievement of request deadline
• Incorporate critical request scheduling • Consider pre-emption
Fog-Cloud
4.
Baek et al. [6]
Q-learning algorithms
• Reduced complexity • Better convergence speed
• Incorporate model-based learning for bias-free results
Fog-Cloud
5.
Alli and Alam [7]
Neuro-fuzzy model and • Reduced delay • Incorporate particle swarm • Less energy blockchain optimization (PSO) consumption • Build cluster-based network
Fog-Cloud
6.
Chen et al. [8]
Accelerated gradient algorithm
Fog-Cloud
7.
8.
• Less energy consumption • Better convergence speed
• Integrate machine learning approach
Abbasi et al. NSGA-II genetic [13] algorithm
• Less delay • Improved energy consumption
• Need to Fog-Cloud experiment other multi-objective optimization methods
Sun et al. [11]
• Less energy consumption • Less completion time
• Need to consider cost, security and reliability factors
ETCORA algorithm
Fog-Cloud
(continued)
48
P. Thomas and D. V. Jose
Table 2 (continued) S. Research No. authors 9.
Methodology
Liu et al. [9] RL technique
10. Elgendy et al. [15]
Advantages
Challenges/future Architecture directions
• Energy efficient • Improved computation capacity
• High complexity
Edge-Cloud
• Need to work on mobility management and security
Fog-Cloud
Q-learning and • Reduced Deep-Q-Network-based overhead • Better task algorithms caching
11. Shahidinejad Federated learning et al. [16]
• Less energy • FL is Edge-Cloud consumption vulnerable to • Reduced delay communication and network security issues cost
12. Meena et al. [17]
Algorithms such as optimal service offloader and trust assessment
• Low latency • More secure
13. Lin et al. [18]
• Need to work on secure light weight algorithms
Fog-Cloud
Deep-Q-Network based • Low failure on Simulated rate • Less energy Annealing (SA-DQN) consumption algorithm
• Need to increase flexibility
Edge-Cloud
14. Almutairi and Aldossary [19]
Fuzzy logic algorithms
• Low service time and failure rate • Better resource utilization
• More computational resources need to be added
Edge-Cloud
15. Shakarami et al. [20]
Autonomic computation offloading algorithm
• Low latency • Better decisionmaking
• Need to work for mobility support
Edge-Cloud
16. Tran-Dang and Kim [21]
FRATO algorithm
• Less delay
• Data fragmentation issues and distribution of fog nodes
Edge-Cloud
17. Rezvani et al. [22]
NSGA-II and Bees algorithm
• Better response time • Less energy consumption
• Incorporate machine learning techniques • Check node failure probability
Edge-Cloud
(continued)
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
49
Table 2 (continued) S. Research No. authors
Methodology
Advantages
Challenges/future Architecture directions
18. Kishor and Chakarbarty [23]
Smart Ant Colony Optimization
• Improved latency
• Reduce Fog-Cloud processing time
Fig. 4 Usage chart of different simulators
of the researchers are interested in following the common trend. There are numerous open-source simulators available for IoT-Fog-Cloud integration. The analysis shows that there is a visible research gap in exploring the available simulators for performing the offloading. Research initiatives have to be taken to reduce the implementation complexity and cost by selecting appropriate simulators which are more flexible, less complex and cost effective.
4 Open Research Areas and Research Directions The IoT-Fog-Cloud integration is in its early implementation stage and opens a wide range of possibilities for young researchers to explore. Few relevant open research areas are discussed below. Resource allocation and utilization decides the performance of the offloading approach. Offloading is performed on the available fog nodes using different algorithms. IoT-Fog-Cloud integration follows a distributed pattern in the middle fog layer making the decision-making crucial. The available fog nodes have to be continuously monitored for overload and underload for proper resource utilization. This data has to be used for performing proper load balancing. This is an open research area, where continuous research has to be carried out and novel offloading approaches and algorithms have to be widely experimented to find an optimal solution. The scalability is another open research area which needs to be focused. Scalability usually decides the extent up to which the network can be elaborated to incorporate
50
P. Thomas and D. V. Jose
newer functionalities. The implementation of scalability is challenging in fog network compared to cloud. The scalability has to be incorporated by considering the capacity of the network and should not affect the task offloading. This issue remains as an open area of research with wide possibilities to explore. The edge/fog layer lies close to the network reducing the chances of data manipulations. But as the number of hops increases, the chance of security violations also increases. The three-tier IoT-Fog-Cloud model transfers the data from fog to remote cloud for further analysis and long-term storage. This makes the data vulnerable to network attacks making the data inconsistent. This challenging issue has to be addressed effectively to gain the trust of users. Active research is going on in this area but, network security and data privacy are always challenging and provide endless research opportunities. The three-tier architecture of the IoT-Fog-Cloud integration may result in many interoperability issues which need to be addressed. The fog layer is distributed in nature and the cloud layer remains centralized. Also, different service providers have different underlying mechanisms for synchronization and compression of data for storage. The integration models have to be carefully designed to meet the requirements of different layers to make the layers work interoperable. The network fault tolerance and reliability play an important role in deciding the level of trust in the underlying network. The fog layer is distributed in nature making the system more reliable. Failure in one node can be compensated by other available fog nodes. But the challenging task is to properly monitor the failures and reconfigure the system appropriately with minimum delay to prevent data loss. Efficient fault detection algorithms have to be used at different nodes to improve reliability and fault tolerance. The decision on choosing a proper algorithm is challenging as the complexity of the system should be less and communication cost should be minimum to make the system more efficient. This issue still remains as an open research area with minimum inputs. The open research areas listed above can be studied in detail and novel algorithms and methodologies have to be designed to make the integration more powerful. Apart from the above-mentioned research areas, active research has to be initiated in generating trade-offs for energy consumption, designing methods for effective monitoring of network quality and performance, modelling techniques for reducing the communication cost, improving bandwidth consumption, reducing complexity, etc. The IoT-Fog-Cloud integration thus opens numerous opportunities for researchers to explore in-depth.
5 Conclusion The integration of fog computing to IoT-Cloud helps delay sensitive applications to achieve better performance. The integration should implement proper offloading algorithms for distributing the work among multiple fog nodes. The different algorithms found in literature are explored and compared in this paper. The literature
Towards Computation Offloading Approaches in IoT-Fog-Cloud …
51
review shows that novel algorithms are less explored for data offloading. Metaheuristic algorithms are a large class of algorithms which include a wide variety of nature inspired algorithms. Further research should definitely focus on implementing those algorithms as they are highly efficient, less complex and easy to implement. The offloading problem can also be solved by enabling artificial intelligence and neurofuzzy logic. More machine learning approaches and blockchain-enabled approaches have to be experimented in the design of secure offloading solutions. The open research areas discussed in this paper needs to be explored more and experimented with for improving the quality of integration. The research approach followed here mainly focuses on offloading techniques for latency reduction in IoT applications. Apart from latency, aspects like security, cost, bandwidth utilization, power consumption, etc., can be explored. The research work shows that the IoT-Fog-Cloud integration is still in its infant stage and needs strong research support for making the integration highly beneficial to a larger community.
References 1. Wang T, Zhang G, Liu A, Bhuiyan MZA, Jin Q (2019) A secure IoT service architecture with an efficient balance dynamic based on cloud and edge computing. IEEE Internet Things J 6(3):4831–4843. https://doi.org/10.1109/JIOT.2018.2870288 2. Alli AA, Alam MM (2020) The fog cloud of things: a survey on concepts, architecture, standards, tools, and applications. Internet Things 9:100177. https://doi.org/10.1016/j.iot.2020. 100177 3. Maiti P, Apat HK, Sahoo B, Turuk AK (2019) An effective approach of latency-aware fog smart gateways deployment for IoT services. Internet Things 8(2019):100091. https://doi.org/ 10.1016/j.iot.2019.100091 4. Veeramanikandan M, Sankaranarayanan S (2019) Publish/subscribe based multi-tier edge computational model in Internet of Things for latency reduction. J Parallel Distrib Comput 127:18–27. https://doi.org/10.1016/j.jpdc.2019.01.004 5. Shahzadi R et al (2019) Three tier fog networks: enabling IoT/5G for latency sensitive applications. China Commun 16(3):1–11. https://doi.org/10.12676/j.cc.2019.03.001 6. Baek JY, Kaddoum G, Garg S, Kaur K, Gravel V (2019) Managing fog networks using reinforcement learning based load balancing algorithm. In: IEEE wireless communications and networking conference, Apr 2019, vol 2019, pp 1–7. https://doi.org/10.1109/WCNC.2019.888 5745 7. Alli AA, Alam MM (2019) SecOFF-FCIoT: machine learning based secure offloading in fogcloud of things for smart city applications. Internet Things 7(2019):100070. https://doi.org/10. 1016/j.iot.2019.100070 8. Chen S, Zheng Y, Lu W, Varadarajan V, Wang K (2019) Energy-optimal dynamic computation offloading for industrial IoT in fog computing. IEEE Trans Green Commun Netw PP(c):1. https://doi.org/10.1109/TGCN.2019.2960767 9. Liu X, Yu J, Feng Z, Gao Y (2020) Multi-agent reinforcement learning for resource allocation in IoT networks with edge computing. China Commun 17(9):220–236. https://doi.org/10.23919/ JCC.2020.09.017 10. Aburukba RO, AliKarrar M, Landolsi T, El-Fakih K (2020) Scheduling Internet of Things requests to minimize latency in hybrid fog-cloud computing. Future Gener Comput Syst 111:539–551. https://doi.org/10.1016/j.future.2019.09.039
52
P. Thomas and D. V. Jose
11. Sun H, Yu H, Fan G, Chen L (2020) Energy and time efficient task offloading and resource allocation on the generic IoT-fog-cloud architecture. Peer-to-Peer Netw Appl 13(2):548–563. https://doi.org/10.1007/s12083-019-00783-7 12. Yu M, Liu A, Xiong NN, Wang T (2022) An intelligent game-based offloading scheme for maximizing benefits of IoT-edge-cloud ecosystems. IEEE Internet Things J 9(8):5600–5616. https://doi.org/10.1109/JIOT.2020.3039828 13. Abbasi M, Mohammadi Pasand E, Khosravi MR (2020) Workload allocation in IoT-fog-cloud architecture using a multi-objective genetic algorithm. J Grid Comput 18(1):43–56. https://doi. org/10.1007/s10723-020-09507-1 14. Aljanabi S, Chalechale A (2021) Improving IoT services using a hybrid fog-cloud offloading. IEEE Access 9:13775–13788. https://doi.org/10.1109/ACCESS.2021.3052458 15. Elgendy IA, Zhang WZ, He H, Gupta BB, Abd El-Latif AA (2021) Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms. Wireless Netw 27(3):2023–2038. https://doi.org/10.1007/s11276-021-02554-w 16. Shahidinejad A, Farahbakhsh F, Ghobaei-Arani M, Malik MH, Anwar T (2021) Context-aware multi-user offloading in mobile edge computing: a federated learning-based approach. J Grid Comput 19(2). https://doi.org/10.1007/s10723-021-09559-x 17. Meena V, Gorripatti M, Suriya Praba T (2021) Trust enforced computational offloading for health care applications in fog computing. Wireless Pers Commun 119(2):1369–1386. https:// doi.org/10.1007/s11277-021-08285-7 18. Lin B, Lin K, Lin C, Lu Y, Huang Z, Chen X (2021) Computation offloading strategy based on deep reinforcement learning for connected and autonomous vehicle in vehicular edge computing. J Cloud Comput 10(1). https://doi.org/10.1186/s13677-021-00246-6 19. Almutairi J, Aldossary M (2021) A novel approach for IoT tasks offloading in edge-cloud environments. J Cloud Comput 10(1). https://doi.org/10.1186/s13677-021-00243-9 20. Shakarami A, Shahidinejad A, Ghobaei-Arani M (2021) An autonomous computation offloading strategy in mobile edge computing: a deep learning-based hybrid approach. J Netw Comput Appl 178:102974. https://doi.org/10.1016/j.jnca.2021.102974 21. Tran-Dang H, Kim DS (2021) FRATO: fog resource based adaptive task offloading for delayminimizing IoT service provisioning. IEEE Trans Parallel Distrib Syst 32(10):2491–2508. https://doi.org/10.1109/TPDS.2021.3067654 22. Keshavarznejad M, Rezvani MH, Adabi S (2021) Delay-aware optimization of energy consumption for task offloading in fog environments using metaheuristic algorithms. Clust Comput 24(3):1825–1853. https://doi.org/10.1007/s10586-020-03230-y 23. Kishor A, Chakarbarty C (2021) Task offloading in fog computing for using smart ant colony optimization. Wireless Pers Commun 0123456789. https://doi.org/10.1007/s11277-021-087 14-7
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks Radhakrishnan Gopalapillai and Shreekanth M. Prabhu
Abstract Predicting the spread of COVID-19 pandemic has been a very daunting challenge. Ever since novel coronavirus has been declared a pandemic, many models have been created to predict the spread of the disease. Researchers have used traditional compartment-based epidemic models and time-series-based models and got only partial success. In this paper, we are proposing a model based on a graph neural network to capture the complex geo-spatial and temporal nature of virus transmission. A simplified computation graph model is created for a few southern India states, and the model is trained to predict the number of active COVID-19 cases in these states. The computation graph is implemented as a hybrid neural network and back propagation algorithm is used for training. Our model has obtained high R-squared values that place high confidence in the model. Keywords COVID-19 prediction · Graph neural network · Computation graph · Graph induction
1 Introduction The COVID-19 pandemic posed a major threat to humans worldwide over the last two years. The SARS-COV-2 virus spreads from human-to-human, and the disease is highly infectious. The major modes of virus transmission are believed to be through respiratory droplets and direct contact. In spite of major measures taken by various governments, multiple waves of disease transmission have devastated the world. Governments can take better preventive measures if the spread of the disease can be predicted accurately.
R. Gopalapillai (B) · S. M. Prabhu Department of Computer Science and Engineering, CMR Institute of Technology, Bengaluru 560037, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_5
53
54
R. Gopalapillai and S. M. Prabhu
Predicting the spread of the COVID-19 pandemic has been a very daunting challenge. Researchers have used traditional compartment-based epidemic models and got only partial success. The use of regression-based and time-series models also has got only limited success. There are also some geo-spatial–temporal studies that have looked at specific contexts. While there is a good degree of common sense understanding of how the disease spreads from region to region, a model that enables robust data-based analysis of the geo-spatial–temporal spread is lacking. In this paper, we explore the use of a model based on a Graph Neural Network (GNN) where each region is modeled as a node. The regional pandemic spread dynamics are modeled as vectors associated with nodes. The spread of the pandemic is modeled as it flows across the graph. The overall idea is to predict the spread of pandemics along geospatial–temporal dimensions. This analysis may need to be used in conjunction with social spread within communities and viral spread due to mutations. The rest of the paper is organized as follows. Section 2 discusses the prior work done in modeling COVID-19 spread. The proposed model that is based on graph neural networks and a simplified version using a computation graph is described in Sect. 3. Section 4 presents the implementation of this model using a hybrid feed-forward model. The experimental setup and results obtained are discussed in Sect. 5. Section 6 presents the conclusions from this study.
2 Related Work Since the outbreak of COVID-19, the scientific community has been trying to predict the spread of the disease across geographical locations to help the authorities do better planning to prevent or reduce its spread [1, 2]. This effort ranged from using existing statistical modeling tools to the latest machine learning and artificial intelligencebased approaches. Alzahrani et al. used the well-known Autoregressive Integrated Moving Average (ARIMA)-based statistical model to predict the number of daily cases [3]. Their work focused on predicting cases in the country of Saudi Arabia. They used a combination of models such as autoregressive, combined autoregressive, and moving average as well as ARIMA and concluded that ARIMA gives better results. Alabdulrazzaq et al. studied the effectiveness of the ARIMA model to predict COVID-19 cases using data collected in Kuwait [4]. Their effort focused on evaluating whether ARIMA can perform in complex and dynamic contexts. The results that they obtained confirmed that the ARIMA model can be used in the prediction of COVID-19 with reasonable accuracy. Researchers have tried the Susceptible-Infected-Recovered-Deceased (SIRD) model for prediction [5]. Shringi et al. formulated their model based on the two possibilities when a person gets infected—the person either recovers or dies [6]. The susceptible, exposed, infected, and recovered (SEIR) model is popular to study how a contagious disease spreads from a few people to a larger population [7]. Liu et al. extended the SEIR model to an adaptive model called SEIARD model [8]. Mahajan et al. came up with a new model based on the SEIR model [9]. Their SIPHERD model
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks
55
divided the population into eight categories: Susceptible (S), Exposed (E), Symptomatic (I), Purely Asymptomatic (P), Hospitalized or Quarantined (H), Recovered (R), and Deceased (D). They used their model to predict the total number of new infections and active and death cases. Another model which is based on SIR is Immunity, Susceptible, Infected, and Recovered (MSIR) [10]. Hirschprung and Hajaj proposed a new concept called the Center of Infection Mass (CoIM) [11]. Prabhu et al. proposed a Social Infection Analysis Model (SIAM) that factors social network structure [12]. Wieczorek et al. used a deep neural network model for disease prediction with good accuracy [13]. A recent study that used artificial neural networks and back propagation algorithms to predict COVID-19 outbreaks in a few countries in Asia indicated that a single model may not be suitable for different countries [14]. Most of the work that we surveyed used statistical models for predicting the spread of COVID-19. The spread of COVID-19 depends not only on the active cases in neighboring places but also on the cases in places adjacent to the neighbors. None of the earlier research has focused on this complex mobility factor. We model mobility aspects using a graph neural network.
3 Graph Model for COVID-19 Transmission As novel coronavirus spreads from human-to-human, active COVID-19 patients are sources of transmission. The transmission can happen from a member of the same house or locality. The probability of getting the infection is a function of proximity to other patients. While administrative measures have prevented the spread of transmission to a certain degree by isolating people, it is not easy to completely isolate people for an extended period when a pandemic lasts for months. When people move around, the spread of disease can happen even from a person who ordinarily resides in a distant place such as a different district, state, or even a different country. Any model for predicting the spread of COVID-19 should consider the number of active cases in geographical entities that can contribute to the spread through contacts with mobile people. The geographical entity where an active COVID-19 person ordinarily resides can be defined at different levels. The geographical entities can be identified at the level of villages, districts, states, etc. We have considered COVID-19 cases in India for our study and considered the number of cases at the state level. The objective of our study is to predict the number of active cases in a state based on the cases reported in the previous week in the state under consideration and its neighboring states. The neighborhood information of states can be represented as a graph G = (V, E), where V is the set of states in India. The edges in the set E are all pairs (u, v) where u and v, members of V, are adjacent states. A state s1 is considered adjacent to another state s2 if s1 and s2 share common borders. A subgraph of the adjacency graph of India is shown in Fig. 1. The figure has Tamil Nadu (TN) as the node of focus. The subset of nodes shown in the figures is the states within a path distance of two from TN. The nodes adjacent to TN are shown in dark gray shade, and the nodes at a path distance of two are shown in light gray shade. In order to
56
R. Gopalapillai and S. M. Prabhu
Fig. 1 Subgraph of the COVID-19 transmission graph model of India
model the disease transmission that happens due to active cases within a state, edges that are self-loops are added to each node.
3.1 Computation Graph for Graph Induction The graph induction techniques to predict the impact of active cases in all other states on a particular state are complex as spatial and temporal data have to be considered. We have used a simplified model based on computation graphs that take into account a maximum of two hops. The COVID-19 transmission is modeled using a computational graph where each node represents an Indian state. The computation graph G = (V , E ) of a state S is a directed graph. The set V , the set of vertices in the graph, is a union of two sets N 1 and N 2 . The members of set N 1 are all states that share a border with state S. The set N 2 contains all states which share a border with members in set N 1 . The computational graph for the state of Tamil Nadu (TN) is shown in Fig. 2. TN shares its borders with Andaman (AN), Andhra Pradesh (AP), Karnataka (KA), Kerala (KL), and Pondicherry (PY). Hence, the computation graph has a directed edge from each of the states to TN. Similarly, KL shares its borders
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks
57
with Tamil Nadu, Karnataka, and Lakshadweep. A similar computation graph for Andhra Pradesh is shown in Fig. 3.
Fig. 2 Computation graph for the state of Tamil Nadu
Fig. 3 Computation graph for the state of Andhra Pradesh
58
R. Gopalapillai and S. M. Prabhu
We incorporated the Markov property into the model to simplify the computations. The prediction of active cases in a particular week is modeled as a function of the values recorded in the previous week. The simplified computation model can be implemented as a hybrid feed-forward network.
4 Feed-Forward Neural Network for Computation Graph Each computation graph can be converted to a feed-forward neural network model with one hidden layer. The computation graph for the state of Tamil Nadu shown in Fig. 2 is modeled using the neural network shown in Fig. 4. Each neuron in the input layer corresponds to a state in set N 2 . Each hidden neuron corresponding to a state N 1i in N 1 has a connection from its adjacent nodes in N 2 as well as a connection to represent the self-loop. For example, hidden layer neuron AN has connections from its neighboring nodes TN and WB. The connection coming from AN in the input layer to AN in the hidden layer represents the effect of disease transmission happening due to the cases within the same state. Every node in the hidden layer has a link coming from the node with the same name in the input layer to take care of the self-loop. The network also has a skip level connection to model the self-loop in the output neuron. The dotted line in Fig. 4 corresponds to the self-loop of the output state TN. The neural network implements a two-hop induction model of the previous week’s data. The COVID-19 cases in the output state are influenced by the COVID-19 cases in the neighboring states (set N 1 ). However, the cases in the neighboring states themselves are influenced by their neighbors (set N 2 ). The weights of the connections indicate the influence of one state on another. These weights are learned in a supervised learning setup using the back propagation algorithm.
5 Experimental Setup and Results We have collected weekly COVID-19 data of 36 Indian states for 36 weeks. The data has been sourced from the website of Ministry of Health and Family Welfare of Government of India (https://www.mohfw.gov.in/). The weekly data collected include (i) the number of current active cases, (ii) the number of patients recovered from COVID-19 that week, and (iii) the number of deaths during the week. A snapshot of the data collected for two weeks is given in Table 1. The real names of the states and the specific dates are not given in the table though the data is collected from publicly available sources. The dataset has been split into a training set and a test set using an 80–20 partition. Though the dataset included data on active cases, recovered cases, and deaths per week, only the number of active cases and recovered cases is used for predicting the number of active cases in the following week. We could have selected the number
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks
59
Fig. 4 Neural network model for the state of Tamil Nadu
of new infections as the predicted output. However, the number of new infections can be computed from the number of active cases and the number of recovered cases and deaths. The prediction model used the number of active COVID-19 cases for a state in a particular week as the dependent variable. The independent variables are the number of active and recovered cases recorded by the adjacent states in the preceding week. The data relevant for the prediction of active cases for a particular state is extracted from the dataset and given as input to the neural network model. As shown in Fig. 4, there are 10 states which are at a two-hop distance from Tamil Nadu. The number of active cases and recovered cases in these states is the inputs for predicting the output, i.e., the number of active cases predicted for the following week in Tamil Nadu. Though the number of input neurons shown in Fig. 4 is only 10, the model has two input neurons per each input state, one for the active cases and the second for recovered cases. Hence, the total number of input neurons in the network model for Tamil Nadu is 20. Keras and Tensorflow have been used to create the neural network model. Keras functional model allows the creation of hybrid models which are not fully connected and have skip level connection. The model used Mean Squared Error (MSE) for the loss function. The activation function used was ReLU and the optimizer used was
60
R. Gopalapillai and S. M. Prabhu
Table 1 Snapshot of the data collected for two weeks Period
Week 1
State/UT
Active
Week 2
S1
8
7484
129
9
7593
129
S2
11,142
2,025,805
14,186
6453
2,038,960
14,295
S3
429
53,974
277
183
54,501
280
S4
4348
592,242
5876
3458
596,547
5939
S5
42
716,258
9661
46
716,314
9661
S6
36
64,377
819
30
64,447
820
S7
277
991,536
13,566
197
991,857
13,570
S8
0
10,666
4
3
10,668
4
S9
409
1,413,404
25,087
327
1,413,921
25,089
S10
838
172,360
3317
648
173,423
3339
S11
158
815,712
10,082
212
815,960
10,086
S12
278
760,738
9874
116
760,884
10,049
S13
1694
213,871
3679
1361
216,125
3717
S14
1339
323,801
4423
895
325,564
4426
S15
87
343,016
5135
130
343,150
5135
S16
12,498
2,926,284
37,807
9700
2,935,238
37,931
S17
143,081
4,526,429
25,182
95,349
4,716,728
26,734
S18
68
20,534
207
42
20,621
208
S19
0
10,310
51
1
10,313
51
S20
124
781,896
10,522
106
782,046
10,523
S21
39,952
6,374,892
139,117
33,379
6,415,316
139,734
S22
2332
116,653
1858
1444
119,208
1893
S23
1666
78,375
1404
893
80,490
1432
S24
16,361
79,781
314
13,316
97,955
380
S25
347
30,252
666
228
30,645
674
S26
4918
1,013,833
8202
4817
1,021,180
8279
S27
791
123,800
1840
611
124,836
1849
S28
279
584,865
16,518
228
585,224
16,540
S29
73
945,304
8954
40
945,389
8954
S30
618
30,477
387
189
31,159
391
S31
17,099
2,612,684
35,603
15,238
2,633,534
35,869
S32
4599
657,665
3919
4056
660,730
3936
S33
206
83,143
814
102
83,381
816
S34
162
335,993
7395
179
336,163
7397
S35
159
1,686,784
22,892
133
1,686,984
22,897
Recovered (cumulative)
Deaths (cumulative)
Active
Recovered (cumulative)
Deaths (cumulative)
(continued)
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks
61
Table 1 (continued) Period
Week 1
State/UT
Active
Recovered (cumulative)
Deaths (cumulative)
Week 2 Active
Recovered (cumulative)
Deaths (cumulative)
S36
7571
1,543,401
18,806
7513
1,552,997
18,953
Adam. Training has been done to minimize the Mean Squared Error. The trained model has been used to predict the active cases for the weeks included in the test set. Table 2 gives the number of actual active cases reported in the week and the number of active cases predicted by the model based on the number of active cases and recovered cases reported during the week preceding the week under consideration. The results given in Table 2 are for the state of Karnataka, whereas Table 3 gives the result for Tamil Nadu state. It can be seen that the predicted data is close to the actual data. The R-squared value calculated using the absolute value of predicted active cases establishes a high confidence level in the proposed graph-based model. We also calculated the R-squared value of the predicted increase or decrease of the count of active cases for periods of one-week duration. The R-squared values are summarized in Table 4. It can be seen that the R-squared value obtained for weekly changes for the two states also shows that our model captures the relation between the active cases in successive weeks really well. Table 2 Comparison of actual and predicted active COVID-19 cases for Karnataka state Active COVID-19 cases Previous week
Actual for current week
Predicted for current week
23,080
23,217
24,045
23,904
24,336
105,248
100,070
102,895
6317
23,841
6250
6209
382,710
393,395
396,725
146,747
137,072
143,913
18,226
20,314
20,836
137,072
130,894
133,002
37,928
36,891
37,505
268,926
257,275
264,118
210,673
203,790
204,337
27,550
26,911
30,038
62
R. Gopalapillai and S. M. Prabhu
Table 3 Comparison of actual and predicted active COVID-19 cases for Tamil Nadu state Active COVID-19 cases Previous week
Actual for current week
Predicted for current week
22,162
21,908
18,062
20,385
20,383
15,604
44,924
44,024
42,150
4676
4647
3023
115,128
122,228
123,042
100,523
89,009
92,420
10,487
11,634
10,136
89,009
78,780
81,046
33,224
32,629
30,034
257,463
243,703
245,451
188,664
174,802
176,220
27,281
26,550
25,724
Table 4 R-squared value for the predicted numbers
Predicted dependent variable
R-squared value
Number of the weekly active cases for Tamil 0.998 Nadu Number of the weekly active cases for Karnataka
0.999
Weekly change in the active cases for Tamil Nadu
0.852
Weekly change in the active cases for Karnataka
0.674
6 Conclusion We have developed a model based on graph neural networks to capture the spatial and temporal nature of the spread of COVID-19 across communities. A simplified computation graph is used to predict the number of weekly active cases in a few Indian states, specifically states in southern India. The computation graph is implemented as a hybrid neural network, and the network is trained using the data collected over many weeks. The R-squared values computed for the predicted values establish that the model works well. A full-fledged graph neural network model that uses multi-hop induction can be explored to improve the prediction performance.
Prediction of COVID-19 Pandemic Spread Using Graph Neural Networks
63
References 1. Prabhu SM, Subramanyam N (2021) Framework and model for surveillance of COVID-19 pandemic. In: Kumar S, Purohit SD, Hiranwal S, Prasad M (eds) Proceedings of international conference on communication and computational technologies. Algorithms for intelligent systems. Springer, Singapore 2. Shah S, Mulahuwaish A, Ghafoor KZ, Maghdid HS (2022) Prediction of global spread of COVID-19 pandemic: a review and research challenges. Artif Intell Rev 55:1607–1628 3. Alzahrani SI, Aljamaan IA, Al-Fakih EA (2020) Forecasting the spread of the COVID19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. J Infect Public Health 13:914–919 4. Alabdulrazzaq H et al (2021) On the accuracy of ARIMA based prediction of COVID-19 spread. Results Phys 27 5. Zheng Z et al (2021) Exploring the influence of human mobility factors and spread prediction on early COVID-19 in the USA. BMC Public Health 21 6. Shringi S, Sharma H, Rathie PN, Bansal JC, Nagar A (2021) Modified SIRD model for COVID19 spread prediction for northern and southern states of India. Chaos Solit Fract 148 7. Harko T, Lobo FSN, Mak MK (2014) Exact analytical solutions of the susceptible-infectedrecovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Appl Math Comput 236:184–194 8. Liu XX, Fong SJ, Dey N, Crespo RG, Herrera-Viedma E (2021) A new SEAIRD pandemic prediction model with clinical and epidemiological data analysis on COVID-19 outbreak. Appl Intell 51:4162–4198 9. Mahajan A, Sivadas NA, Solanki R (2020) An epidemic model SIPHERD and its application for prediction of the spread of COVID-19 infection in India. Chaos Solit Fract 140 10. Mohamed IA, Aissa AB, Hussein LF, Taloba AI, Kallel T (2021) A new model for epidemic prediction: COVID-19 in Kingdom Saudi Arabia case study. Mater Today Proc 11. Hirschprung RS, Hajaj C (2021) Prediction model for the spread of the COVID-19 outbreak in the global environment. Heliyon 7 12. Prabhu SM, Subramanyam N, Girdhar R (2021) Containing COVID-19 pandemic using community detection. J Phys Conf Ser 1797 13. Wieczorek M, Siłka J, Wo´zniak M (2020) Neural network powered COVID-19 spread forecasting model. Chaos Solit Fract 140 14. Niazkar HR, Niazkar M (2020) Application of artificial neural networks to predict the COVID19 outbreak. Glob Health Res Policy 5
Event-Based Time-To-Contact Estimation with Depth Image Fusion Ankit Gupta, Paras Sharma, Dibyendu Ghosh, Vinayak Honkote, and Debasish Ghose
Abstract Reliable and fast sensing are the keys to implement effective high-speed obstacle avoidance strategies in modern quadrotors. A class of bio-inspired sensors, commonly called as event cameras, are suitable for this application. It can generate events at microsecond resolution. Output data rate of events depends on the texture and relative speed between the object and camera. Event cameras are good at capturing small changes, but data output lacks the low-level scene details. In this paper, we present an algorithm that fuses the low temporal resolution data from the depth camera with an event camera to compute time-to-contact (TTC) with an obstacle. In this approach, we use low-frequency information from depth camera to identify a dynamic obstacle and use event stream to perform TTC computation. This algorithm is integrated in Airsim simulator and tested with dynamic obstacle in various collision scenarios. Keywords Event camera · Time-to-contact · Robotics · Computer vision
A. Gupta (B) · V. Honkote Intel Labs, Intel Technology, Bangalore, India e-mail: [email protected] V. Honkote e-mail: [email protected] P. Sharma Indraprastha Institute of Information Technology, Delhi, India e-mail: [email protected] D. Ghosh Indian Institute of Technology, Kharagpur, Kharagpur, India D. Ghose Indian Institute of Science, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_6
65
66
A. Gupta et al.
1 Introduction Efficient navigation of autonomous quadrotors operating in an unknown and cluttered environments is a key capability desired for executing complex missions [1] such as search and rescue, surveillance and mapping unknown area. One of the key challenges in navigation is to detect and avoid obstacles and to reach a goal location safely. Obstacle avoidance becomes more challenging when the quadrotors move fast. Fast and reliable perception of the environment is important to enable low latency obstacle avoidance. There are various sensors which can be used for obstacle avoidance such as time-of-flight range sensors (ultrasonic sensors, laser range finders, infrared sensors, etc.). Laser range scanners provide very accurate depth measurement and have been researched significantly [2, 3]. But the bulky nature of these sensors, slow data capture and complex environment interaction affects the performance of the range finders. These limitations pose several restrictions in terms of responsiveness and computational load. Vision-based solutions overcome many of these restrictions. Techniques based on optical flow [4, 5] and scale of extracted features [6] have been shown to work for obstacle detection and avoidance. Forward collision warning systems based on single camera have been solved by [7] which uses size and position of the image to estimate time to collision. Light weight and low cost have made frame camera-based solutions very popular, but these cameras continuously generate significant amount of redundant data at relatively low temporal frequency without knowledge of any scene change which requires significant compute and memory resources. These limitations present a trade-off between accuracy and efficiency which results in quadrotors to move slowly. A different class of bio-inspired vision sensor, also known as event camera, does not send intensity information of the complete scene. It sends out only the pixel information where some intensity change occurs. This per pixel change information is referred to as an event, and the sensor generates an asynchronous event data stream [8] at very high temporal resolution (µs). As the information sent by the sensor is sparse [9], it significantly reduces the amount of data that needs to be processed. Eliminating redundant data also significantly lowers the compute and memory requirement. As the data generated by an event camera is inherently different from frame-based camera, standard computer vision algorithms cannot be used. As the event camera is a relatively new sensor, there is only a limited research published for obstacle avoidance. In [10], optical flow is calculated from the event stream and then focus-of-expansion is used to calculate time-to-contact. Avoidance strategies can be designed once TTC is known. Stereo DVS-based depth information is calculated in [11], but the major problem in this work is in establishing correspondence with sparse data. A reactive dynamic obstacle detection scheme is presented in [12], which is an update over [13], but it assumes known obstacles size for the depth calculation. Recently, there are works using neural networks (NNs) to estimate optical flow [14, 15] and perform obstacle avoidance [15], but these solutions require significantly higher compute platform to run the networks in real time.
Event-Based Time-To-Contact Estimation with Depth Image Fusion
67
To this end, this paper proposes a novel algorithm which uses low-frequency stereo depth image to identify a dynamic obstacle and estimates time-to-contact using the event stream without any prior knowledge of the shape. A similarity-based score computation is done to minimize the computational requirement. The main contributions of this paper are 1. Integration of low-frequency depth image-based obstacle detection with highfrequency event camera stream, 2. Event-based model parameter estimation for the tracked obstacle and 3. Event-based time-to-contact estimation from the estimated model parameters. The remainder of the paper is organized as follows: Details of the proposed algorithm are presented in Sect. 2. Concept of TTC is explained in Sect. 2.3. Experimental setup and results are discussed in Sect. 3. Conclusions are presented in Sect. 4.
2 Methodology The proposed algorithm derives inspiration from point cloud registration techniques [16] and assumes a three-parameter motion model for the dynamic obstacle to estimate the TTC. These three parameters are x-shift (nx), y-shift (ny) and scale (nscale) which relates the events corresponding to dynamic object in two consecutive spatiotemporal windows. The proposed algorithm searches for the event clusters belonging to the dynamic object and then solves for the three-parameter model to calculate the TTC and motion direction. Overall flow diagram for the algorithm is shown in Fig. 1. To demonstrate the algorithm, only one dynamic obstacle is considered to be present at any given time in front of the quadrotor.
Esmated Obstacle Posion
nscale
Event Data
Obstacle Events Extraction
Model Parameter Estimation (Algorithm 3 )
nx, ny
Depth Image
TTC Calculation
Depth Segmentaon & Obstacle Detecon
Obstacle Posion
Predicon Step
Correcon Step
& Distance EKF for Obstacle Tracking
Fig. 1 Pipeline of the proposed dynamic obstacle avoidance algorithm
Avoidance Command Generation
To Quadrotor
68
A. Gupta et al. Window
T0
T1
Tn
Event Stream
t
Depth Image
Fig. 2 Spatiotemporal window for event and depth fusion
Data generated from event sensor is in the form of a tuple given by (xi , yi , ti , pi ). In this tuple, (xi , yi ) is the projection of a 3D world point on to the image plane where the event is generated, pi is the polarity (1 and 0 correspond to increase and decrease in perceived intensity, respectively), and ti is the timestamp of the generated event [9]. The proposed algorithm extracts events from the stream for a time interval (t, t + δt). This is called a spatiotemporal window. The window time δt is adjusted based on the quadrotor speed. Higher the speed, smaller the window size. Stereo depth camera output is generated at low rate (∼ 10 fps). Data stream for the algorithm is shown in Fig. 2.
2.1 Dynamic Obstacle Detection Due to the motion of quadrotor, events are generated by both the static environment as well as the dynamic objects. In order to identify events generated from dynamic objects, event segmentation is required. The work in [13] has used event timestamp information in an optimization framework and [12] has fused events with with IMU to achieve this. Accuracy of these methods depends on the selected window size and obstacle velocity. In order to improve the accuracy of dynamic obstacle detection, we have used a stereo depth camera. It generates depth image output at low rate (∼ 10 fps). Depth-based segmentation is done to remove parts of the scene in field of view that are far away from the quadrotor. The remaining region of interest (ROI) in the image domain corresponds to the pixel for which TTC computation is required to perform reactive avoidance. Depth value (d) for which segmentation is performed depends on the quadrotor velocity and its dynamics. This operation can generate multiple small event clusters which can correspond to a single object or some noise present in the depth output. Connected components algorithm [17] is used to create a cluster and provide a bounding box for the detected object in the scene. Center of the bounding box, which represents an obstacle (O), is tracked using a Kalman Filter (KF) as explained in Sect. 2.4.
Event-Based Time-To-Contact Estimation with Depth Image Fusion
69
2.2 Event Image for Dynamic Obstacles Data stream rate generated from the event camera is proportional to the relative velocity and texture of the environment. To identify the events belonging to a dynamic object, events with pixel location (x, y) lying within the obstacle O bounding box are selected from the spatiotemporal window. Let represent all the events in a temporal window of (t, t + δt) and C represent the events corresponding to the dynamic object O C = {{x, y, t} : {x, y} ∈ O, ∀ {x, y, t} ∈ }
(1)
Algorithm 1 Event_Image (Ci , width, height, h xi , h yi ) 1: 2: 3: 4: 5: 6: 7:
Initialize Img ← Zeros(width, height) Initilize shift shift ← (width/2 − h xi , height/2 − h yi ) for (xk , yk ) ∈ Ci do (x , y ) ← (xk , yk ) − shift Img(x , y ) ← Img(x , y ) + 1 return Img
As dynamic obstacle can be present at any location in the image plane, corresponding events are first brought into the center of 2D image plane by a shift operation. All the shifted events are now mapped to their respective pixel location in the image. The total number of events at any pixel location is the value of that pixel. This process is explained in Algorithm 1. Camera resolution is represented by the parameters width and height. The tuple (h xi , h yi ) is the centroid of the event set Ci calculated using Algorithm 2. Algorithm 2 Calculate_Centroid (Ci ) 1: 2: 3: 4: 5: 6: 7:
Initialize Cent ← [ ] Initialize h x , h y for (xk , yk ) ∈ Ci do if (xk , yk ) not in Cent then Cent.append(xk , yk ) (h x , h y ) ← mean(Cent) return (h x , h y )
70
A. Gupta et al.
2.3 Time-To-Contact It is an estimate of time (τ ) for the mobile robot to reach an obstacle, given the current relative motion between the two were to continue without any change [18]. TTC can be represented as Z τ = − dZ
(2)
dt
where Z is the distance between the camera (mounted on robot) and the obstacle, corresponds to the relative speed. and v = dZ dt TTC can be computed using only the visual information captured from a single uncalibrated camera without requiring to extract depth information and relative speed [10, 19]. The work in [20] has derived TTC using optical flow. An alternative scale change-based representation can be derived from pin hole model of the camera wt =
fW Zt
(3)
where wt is the width of the object in the image at any instant t, f is the focal length of the camera, W is the actual width of the object, and Z t is the distance of the object from the camera. Let the scale change S be defined as ratio of the image widths wt and wt−1 , of the object, in two consecutive time steps wt S= = wt−1
fW Zt fW Z t−1
=
Z t−1 Zt
(4)
and if the time step δt is small, we can write Z t = Z t−1 + vδt
(5)
substituting value of Z t−1 from (5) in (4) and using the definition of TTC from (2) will give τ=
δt S−1
(6)
Equation (6) represents instantaneous TTC as function of scale change and time interval. This equation assumes that the object is at the center of the image plane.
Event-Based Time-To-Contact Estimation with Depth Image Fusion
71
2.4 Dynamic Object Tracking Dynamic obstacle detection is done on depth image dimg as mentioned in Sect. 2.1. Depth input is received asynchronous to the events at a low input rate. Once an obstacle is identified, tracking of the center of its bounding box is done on event sets extracted in every time step spatiotemporal window. As the time step δt is very small, a simple constant velocity-based Kalman Filter (KF) is used. State space model used for the KF is given as Xˆ k = Fk−1 Xˆ k−1 + Wk−1 z n = H X k + Rdimg
(7)
For the sake of brevity, we only present some matrix formulation here. State vector Xˆ k and state transition matrix Fk are given as ˆ k−1 ny ˆ k−1 ]T Xˆ k = [xˆok yˆok nx ⎡ ⎤ 1010 ⎢0 1 0 1⎥ ⎥ Fk = ⎢ ⎣0 0 1 0⎦ 0001
(8)
ˆ k−1 , ny ˆ k−1 ) is where xˆok and yˆok are the centroid of detected object in step k. Here, (nx the output of the model estimation obtained from Algorithm 3. Measurement vector z k and measurement matrix H are given as
xok,measured yok,measured 1000 H= 0100
zk =
(9)
where xok,measured and yok,measured are the centroid of detected object from depth camera as mentioned in Sect. 2.1, when the input image is available. In the implementation, update happens only when a new depth image dimg is available.
2.5 Event-Based Time-To-Contact Let Cn and Cn−1 be the two event sets corresponding to an obstacle O, extracted at two consecutive time steps. The proposed algorithm relates these two event sets with a three-parameter model (nx, ny, nscale) given by the following equation
72
A. Gupta et al.
x = (xn−1 − hx(Cn−1 ))nscale + nx + hx(Cn−1 ) y = (yn−1 − hy(Cn−1 ))nscale + ny + hy(Cn−1 )
(10)
where (xn−1 , yn−1 ) is the location of an event in Cn−1 set, (x , y ) is the estimated event location according to the model in the next time step. The tuple ((hx(Cn−1 ), hy(Cn−1 ))) is the centroid of Cn−1 calculated using Algorithm 2. The algorithm to perform registration on two events sets is based on the idea that the two event sets Cn−1 and Cn should have similar distribution of events across the pixels, as the time step δt is small. Based on the relative motion of the object, there will be translation present between the two sets in the image plane represented by (nx, ny). Once the translation is removed, a similarity score metric is used to estimate the correct scale value. This complete flow is presented in Algorithm 3. Input to the registration block is Cn , Cn−1 and the resolution (width × height) of the camera. Event set Cn is assumed as the target set and the algorithm try to estimate the model parameters for Cn−1 to match it closely to the target set. One iteration of the model estimation starts with calculating the centroids of the two sets Cn and Cn−1 as explained in Algorithm 2. The function Event_Image will remove the centroid from the input event set and create an event count image I as explained in Algorithm 1. The model parameters are varied at every iteration and checked to maximize the similarity score. ModelUpdate function will generate new estimated event set C from Cn−1 events set with a current iteration nscale value as per (10). This function uses linear interpolation to distribute the real numbered pixel value to 2D image locations. Event count image I is generated with estimated event set C . Algorithm 3 Model_Estimation (Cn , Cn−1 , height, width) 1: Initialize h x (Cn ), h y (Cn ), h x (Cn−1 ), h y (Cn−1 ) 2: Initialize Score ← [ ] 3: (h x (Cn ), h y (Cn )) ← Calculate_Centroid(Cn ) 4: (h x (Cn−1 ), h y (Cn−1 )) ← Calculate_Centroid(Cn−1 ) 5: In ← Event_Image(Cn , width, height, h x (Cn ), h y (Cn )) 6: for nscale ∈ (nscalemin , nscalemax ) do 7: C ← ModelUpdate(Cn−1 ) 8: (h x , h y ) ← Calculate_Centroid(C ) 9: I ← Event_Image(C , width, height, h x , h y ) 10: iter_score ← CalulateScore(In , I ) 11: Score.append(iter_score, nscale) 12: nscale ← nscale + p 13: nscore, nscale ← FindMaxScore(Score) 14: nx = h x (Cn ) − h x (Cn−1 ) 15: ny = h y (Cn ) − h y (Cn−1 ) 16: return nx, ny, nscale
To calculate similarity score, we have used normalized cross-correlation (NCC), as this is quite simple yet useful metric for template matching [17]. This score function is robust as In and scaled I will have intensity values only at the pixels where events
Event-Based Time-To-Contact Estimation with Depth Image Fusion
73
are present. Similarity score R is defined as follows over the complete event count image pair In and I , R(In , I ) =
width,height (In (i, j) · I (i, j)) width,height In (i, j)2 · width,height I (i, j)2
(11)
The algorithm will iterate over the range of values (nscalemin , nscalemax ) in steps of p to identify the value of nscale for which the similarity score R is maximum. Morphological operations (erosion followed by dilation) [21] and low-pass filtering are done on the event image to remove the noise and achieve stable results. As the time step between the two consecutive spatiotemporal windows is very small, we can approximate (nx, ny) as nx = h x (Cn ) − h x (Cn−1 ) ny = h y (Cn ) − h y (Cn−1 )
(12)
where hx(Ci ) and hy(Ci ) are the centroid pixel location for the Ci spatiotemporal object event set. Model parameters (nx, ny) will indicate object motion direction in image plane and also input to the EFK prediction stage to estimate the bounding box location in the next spatiotemporal window at next time step. nscale value output from Algorithm 3 is substituted in (6) to estimate the TTC value. TTC will continuously decrease indicating that the object is approaching, while the value will increase for a receding object.
3 Experimental Setup and Results The algorithm is implemented in C/C++ and integrated in Airsim [22] simulator for testing and performance evaluation. It is an open-source simulator built on Unreal Engine which can also operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations [22]. Models for different quadrotors, cars and sensors (including an event camera simulator) are available in the simulator which makes it feasible to evaluate it in different test environment. Event and depth camera are configured to be mounted in front of the quadrotor with no offset. On a physical system, a fixed Tf transformation will be present. Pixel data transformation needs to be taken care of in the implementation while passing the bounding box information. All the tests were carried out on an Intel CoreTM i7-8809G quad-core processor with 512 GB storage and 8 GB RAM. Resolution for both the cameras is taken as (240 × 180) which is standard for the event camera (DVS-240c). Data stream from both the sensors is sampled as described in Sect. 2. Figure 3a depicts the simulation environment with two quadrotors Tq and Oq with static wall structure in the background and their relative placement. Depth camera has an active range of 20 m. Figure 3b represents the depth output in terms of pixel
74
A. Gupta et al.
(a)
(b)
(c)
(d)
Fig. 3 a Environment setup with target quadrotor Tq and obstacle Oq , Tq is the one closer to the viewer in image. b Image shows the output of depth sensor, wall is 20 m away from Tq and Oq is 7 m away. c Image shows the output of depth segmentation and connected components, here only the obstacle drone Oq is in the depth image range with ground. d Visualization for all the generated events in a spatiotemporal window. Here, red represents the events with negative polarity, while blue represents events with positive polarity
brightness. Here, darker pixel means the object is closer to the camera and white means that they are far from it. Bounding box is identified on the extracted clusters as explanation in Sect. 2.1. Blobs with a size greater than a certain threshold are rejected. Event image for all the capture events at a time step is shown in Fig. 3d. Events corresponding to the obstacle are parsed based on the bounding box information. To test the performance and robustness of the algorithm, various configurations of the target quadrotor (Tq ) and moving obstacle quadrotor (Oq ) are considered. It is assumed that Tq has map of the environment and is moving on a planned path. We have changed the distance, relative velocities and motion directions between the two quadrotors. The simulator provides current position of the quadrotors which in turn acts as the ground truth. TTC plot for a case where the relative speed between the target and obstacle is 3 m/s and the obstacle is approaching is shown in Fig. 4. In this scenario, obstacle stops 0.5 m before the target, as the scaling computation becomes erroneous due to very large event pixel motion. Figure 5 shows the event set output for different obstacle approach scenarios. Average TTC estimation error details in different configuration of obstacle position and relative speed are given in Table 1. It is observed that the average TTC results have larger error in the case when the speed is slow. This is due to fact that a less number of the events are generated when the speed is less. Smaller resolution of the camera is also responsible for the degradation of the results. This trend is observed for both 5 and 7 m depth scenarios. It is observed that the target quadrotor Tq was able to successfully identify and avoid the obstacles. We compare our algorithm with a depth-based obstacle avoidance solution which is implemented using the concepts discussed in Sect. 2.1 and Eq. (2). Quadrotor was able to successfully avoid the obstacles approaching straight in pure depth-based algorithm for smaller speeds (< 1.5) m/s, but failed to do so in higher speed and sideways scenarios. To perform avoidance maneuver, we have used a simple strategy available as an API in the simulator.
Event-Based Time-To-Contact Estimation with Depth Image Fusion
75
Fig. 4 TTC plot when the relative speed between the target and obstacle is 3 m/s and the obstacle is approaching. x axis the number of iteration and y axis is the TTC
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 5 a Visualization for a set Cn with all the static obstacles removed, it shows the obstacle Oq approaching straight toward the target Tq quadrotor, red represents the events with negative polarity, while blue represents events with positive polarity. b Event image for set Cn−1 events with centroid removed. c Event image for set Cn events with centroid removed. d Visualization which overlaps the two sets Cn and Cn−1 with default model parameter nscale = 1 and centroid removed, Cn set events are shown in green, while Cn−1 event set is with red. Images e–h are for a diagonal obstacle Oq approach scenario with similar explanation as Fig. 3a–d
76
A. Gupta et al.
Table 1 Average TTC error estimation over various range of relative speed and distance to an approaching obstacle Relative speed (m/s) Obstacle distance (m) Avg. TTC error (s) 1.5 3 1.5 3
5 5 7 7
0.399 0.327 0.438 0.387
4 Conclusion and Future Work In this paper, we presented an algorithm for fusion of depth and event cameras data to detect a dynamic obstacle and calculate TTC. The algorithm has been tested with different scenarios depicting varied degrees of complexity. Our algorithm is able to estimate TTC with good accuracy in all the scenarios. We compared qualitative results with respect to the ground truth. The proposed algorithm is devised to work on a single object. We plan to extend this work to robust obstacle avoidance strategy with multiple dynamic obstacle scenarios.
References 1. Tony LA, Jana S, Varun VP, Shorewala S, Vidyadhara BV, Gadde MS, Kashyap A, Ravichandran R, Krishnapuram R, Ghose D (2022) UAV collaboration for autonomous target capture. In: Congress on intelligent systems. Springer, pp 847–862 2. Lewis L, Ge S (2010) Autonomous mobile robots: sensing, control, decision making and applications 3. Scherer S, Singh S, Chamberlain L, Elgersma M (2008) Flying fast and low among obstacles: methodology and experiments. Int J Robot Res 27:549–574 4. Enkelmann W (1991) Obstacle detection by evaluation of optical flow fields from image sequences. Image Vis Comput 9:160–168 5. Serres J, Ruffier F (2017) Optic flow-based collision-free strategies: from insects to robots. Arthropod Struct Dev 46 6. Mori T, Scherer S (2013) First results in detecting and avoiding frontal obstacles from a monocular camera for micro unmanned aerial vehicles. In: IEEE international conference on robotics and automation (ICRA), pp 1750–1757 7. Dagan E, Mano O, Stein GP, Shashua A (2004) Forward collision warning with a single camera. In: IEEE intelligent vehicles symposium, 2004, pp 37–42 8. Lichtsteiner P, Posch C, Delbruck T (2008) A 128 × 128 120 db 15 µs latency asynchronous temporal contrast vision sensor. IEEE J Solid-State Circuits 43:566–576 9. Gallego G, Delbrück T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D (2019) Event-based vision: a survey. CoRR. arXiv: 1904.08405 10. Clady X, Clercq C, Ieng S-H, Houseini F, Randazzo M, Natale L, Bartolozzi C, Benosman R (2014) Asynchronous visual event-based time-to-contact. Front Neurosci 8:9 11. Xie Z, Chen S, Orchard G (2017) Event-based stereo depth estimation using belief propagation. Front Neurosci 11
Event-Based Time-To-Contact Estimation with Depth Image Fusion
77
12. Falanga D, Kleber K, Scaramuzza D (2020) Dynamic obstacle avoidance for quadrotors with event cameras. Sci Robot 5(40) 13. Mitrokhin A, Fermüller C, Parameshwara C, Aloimonos Y (2018) Event-based moving object detection and tracking. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1–9 14. Zhu A, Yuan L, Chaney K, Daniilidis K (2018) EV-FlowNet: self-supervised optical flow estimation for event-based cameras. In: Proceedings of robotics: science and systems, Pittsburgh, PA, June 2018 15. Sanket NJ, Parameshwara CM, Singh CD, Kuruttukulam AV, Fermuller C, Scaramuzza D, Aloimonos Y (2019) EVDodgeNet: deep dynamic obstacle dodging with event cameras 16. Holz D, Ichim AE, Tombari F, Rusu RB, Behnke S (2015) Registration with the point cloud library: a modular framework for aligning in 3-D. IEEE Robot Autom Mag 22(4):110–124 17. Gonzalez RC, Woods RE (2008) Digital image processing. Prentice Hall, Upper Saddle River, NJ 18. Lee DN (1976) A theory of visual control of braking based on information about time-tocollision. Perception 5:437–459 19. Horn BKP, Fang Y, Masaki I (2007) Time to contact relative to a planar surface. In: 2007 IEEE intelligent vehicles symposium, pp 68–74 20. Camus T (1995) Calculating time-to-contact using real-time quantized optical flow. In: National institute of standards and technology NISTIR 5609 21. Soille P (2003) Morphological image analysis: principles and applications, 2nd edn. SpringerVerlag, Berlin, Heidelberg 22. Shah S, Dey D, Lovett C, Kapoor A (2018) AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Field and service robotics. Springer International Publishing, pp 621–635
mCD and Clipped RBM-Based DBN for Optimal Classification of Breast Cancer Neha Ahlawat and D. Franklin Vinod
Abstract Breast cancer is amongst the deadliest forms of cancer, posing a serious danger to the health of a large percentage of women throughout the world. Although researchers all across the world have provided variational approaches for screening this disease, the existing approaches still require further refining to ensure proper and adequate monitoring of this disease. Deep models have recently been shown to be extremely strong generative models capable of extracting features automatically and achieving great prediction performance. The main pretension of this work is to introduce an approach for classifying the breast cancer histopathology images using a deep learning (DL) algorithm. In this work, we proposed a new modified contrastive divergence (CD) algorithm with clipped restricted Boltzmann machine (RBM) which overcomes the problems related to existing methodologies. As contrasted to conventional (standard) CD, it prompts a consistent estimate and lower bias which is very well suited to building a decent model. In comparison to existing methods, the proposed (suggested) method achieves 97.68%. Keywords Contrastive divergence (CD) · Breast cancer (BC) · Magnetic resonance imaging (MRI) · Persistent contrastive divergence (PCD) · Deep belief network (DBN) · Digital breast tomosynthesis (DBT)
1 Introduction In the contemporary world compared to other threats to public health, cancer is amongst the most hazardous disorders. Breast cancer (BC) is quite possibly the most consistently analysed malignant growth in women. It’s a type of cancer which starts growing from the breast itself then after some time the cells in the breast grow at an abnormal and uncontrollable rate. It has a higher mortality rate if compared with other cancer deaths each one out of three impacted women will die. The breast N. Ahlawat (B) · D. Franklin Vinod Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, NCR Campus, Delhi-NCR Campus, Delhi-Meerut Road, Modinagar, Ghaziabad, UP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_7
79
80
N. Ahlawat and D. Franklin Vinod
cancer mortality rate can be gradually reduced by recognizing high-risk patients in an early stage of cancer and treating them appropriately [1]. In this way, early detection as well as precise assessment are very important factors to increase the survival rates. Obtaining a diagnosis report from a pathologist is often exhausting and time-consuming throughout the clinical breast examination process [2]. Breast cancer (BC) growth can begin from different components of the breast. Most BC diseases start in the conduits that pass on milk to the areola (ductal malignancies), and some start in the organs that produce milk (lobular cancers). It’s essential to comprehend that most breast irregularities are not cancerous which we call benign tumours. Non-cancerous tumours are unusual growths, but usually, they are not that life-threatening as they don’t spread to the outer part of the breast [3]. While on the other hand, cancerous breast tumours are very harmful, and they can attack and harm encompassing tissue which we call malignant. This tumour will change into metastatic cancer when cells of a harmful (malignant) tumour start spreading and infecting the different and important organs of the individuals. It generally circulates through the blood and lymph framework and then starts to form tumours of secondary type. Film mammography, magnetic resonance imaging (MRI), sonography, digital breast tomosynthesis (DBT), and other medical (pathological) examinations are some of the current examination procedures that are often utilized for identifying BC. Among those strategies, the histopathology pictures are considered the best quality to work on the exactness of the examination for patients who have adequately gone through different examinations like mammograms. Histopathological evaluation can provide more accurate and comprehensive information for analysing cancer and evaluating its effects on surrounding tissues. It is used in the laboratory to stain cell nuclei (blue) before counterstaining cytoplasm and non-nuclear components in various colours to generate histopathology slides from breast malignancy tissues of the patients, as well as numerous transparent tissue and cellular aspects [4]. Even though the pictures are given by pathologists (humans) with a comprehensive view, there will be a chance of error when diagnosis becomes extensively tedious due to the huge size of the slides. To address this challenge, more studies are focusing on deep learning (DL) approaches to inspect histopathological images to optimize the malignancy recognition rate. In most cases, the endurance rate is reliant upon early analysis. Nonetheless, it is a very tedious task for pathologists to distinguish the dangerous (malignant) region from gigantic patterns of harmless areas. In directed (supervised) learning, the exhibition of a model is reliant upon the measure of marked (labelled) information. The more noteworthy the volume of the training set accessible for the classifier, the more it can learn. Though it is an exceptionally tedious, costly, and tough job to gather sufficient labelled information, particularly clinical data. Deep learning (DL) combined with a large dataset has demonstrated promise in depiction in many artificial (man-made) intelligence applications and is now being used in biomedical imaging [5].
mCD and Clipped RBM-Based DBN for Optimal Classification …
81
With the assistance of DL, it’s not difficult to track down complex examples in the image database that can be comprehended. Based on the images in the database, the suggested model can be trained to decide the existence of cancer cells and further categorize them into harmful (malicious) or benign, so the doctor can take appropriate measures and treat the patient effectively, and this will surely reduce the death of the patient due to BC. Image handling (processing) utilizing the idea of deep learning is used to identify the conceivable medical issues in the patient. On a comparable premise, utilizing deep learning, malignant growth cells can be detected in the person [6]. This paper is coordinated in this way: Sect. 1 portrays the introduction or theoretical background of the paper, Sect. 2 deliberates the inspiration of the conceptual features of the existing investigation done in the enclosure of breast tumour detection, and Sect. 3 divides into two subsections: Sect. 3.1 gives us a knowledge about the existing methodology, while Sect. 3.2 gives the specific knowledge of the proposed (suggested) system, Sect. 4 is committed to results and discourse progressively propagated by a consummation in 5th division.
2 Related Work The primary intention of this work is to introduce an approach that is utilized to classify the BC histopathology images based on a deep learning algorithm. Some of the prevalent works related to classification and training are discussed briefly here. In [7], Xu et al. proposed an inspiring breed approach for BC classification. It first robustly chooses a succession of unrefined areas from the unrefined picture with the help of rigorous visual consideration (attention) procedure, and afterwards, in every locale (region), it can explore the unusual parts dependent on another mechanism (soft attention). After that, a recurrent network is finished to settle on choices to describe the picture segment and furthermore to anticipate the region of the picture section (locale) that is investigated during the subsequent time frame. Since the area (zone) choice connection is non-differentiable, it smooths out the entire network through a technique called reinforcement which is helpful to deal with an ideal strategy to classify the areas. Results are not stable with other than four magnifications. Liang et al. [8] provided work based on the CNN approach which is used to classify breast cancer images with an attention mechanism. The photos from Pcam looked to be fraudulently stained with hematoxylin (H) and eoxin (E), although shading has no significance, and because the incisions are hand-stained by administrators, various shots had varied colours. As a consequence, each image was normalized independently by removing the average of all pixels and dividing by its standard deviation. Insert every pixel directly, putting two pixels straight in the centre of every pixel, resizing photos from 96 * 96 to 288 * 288 px to solve the issue of very few pixels in tumours. If the CBAM block increases, there is an increased likelihood that the model will overfit.
82
N. Ahlawat and D. Franklin Vinod
Shahidi et al. [9] suggested a dual-form objective. The initial purpose is to explore the various learning frameworks in characterizing bosom malignant growth histopathology pictures. This review recognizes the most exact models as far as the two, four, and eight orders of bosom disease histopathology picture information bases. The numerous exactness scores obtained for the DL models on the provided data clearly demonstrated that various parameters such as information pre-handling, transfer learning (TL), and augmentation of information techniques might impact the model’s ability to attain better accuracy. Another source of inspiration for our synthesis is to look at the most recent models that haven’t been looked at much or at all in previous findings. Models such as Dual-Path-Net, Res-NeXt SE-Net, and NAS-Net have been linked to the most recent ImageNet informative index results. On BreakHis, these models were examined for the two-overlap and eight configurations. Despite the fact that this model was unable to produce adequate results for four classes. In [10], Alhussan et al. suggested a comparison of breast disease recognition with two existing network models of the DL technique. The methodology generally involves the following processes—image pre-processing, classification, and performance evaluation. To assess the presence of the DL model network here used two models which are VGG16 and ResNet50 to characterize between normal tumours and unusual tumours using IRMA (dataset). No regularization approach was used since Adam enhancer was used and there was no entirely related layer. Moreover, information augmentation methods, including revolution, flat and vertical flipping, could work on the outcomes for practically every suggested model, with the exception of Inception—ResNeXt and ResNet-V2. In terms of precision, the outcome shows that the model VGG16 delivers a better outcome with 94% while ResNet50 achieves 91.7% only. The results of the statistical classifier are based on one evaluation metric only. In [11], Sultan et al. suggested a model which is dependent on a convolutional neural network and characterizes diverse brain tumours by utilizing two openly accessible datasets: one is taken from Nan Fang and other one acquired from General hospital, TMU. China. The first information is based on growths of different types of tumours. Another one isolates the three glioma grades into different categories of grades. Their proposed engineering has accomplished the highest precision of 96.13 and 98.7% concerning the above mentioned two data records. This simulation model is built from many layers beginning from the information layer which holds the pre-handled pictures which go through different layers of convolution and their enactment capacities (convolution, ReLU, standardization, and maxpooling layers). In spite of the small size of the dataset (caused by the many different imaging perspectives), the issue was solved with the aid of data augmentation, which revealed more accurate results.
mCD and Clipped RBM-Based DBN for Optimal Classification …
83
3 Methodology 3.1 Theoretical Background A shallow energy-based neural network, or RBM, is made up of two layers, the first of which is exposed (visible) and the second one is latent layer. Moreover, the RBMs can be utilized as a component separator to take in characteristics (features) from raw information. There is no connection between visible to visible and hidden to hidden neurons or no intralayer communication between neurons. Learning an RBM implies changing its parameters such that its likelihood distribution fits the training statistics well. To train the RBMs we chiefly focus on two methodologies, one is the type of data and its characteristics. The other is tuning of hyperparameters of RBMs. All standard RBM training techniques approximate the log-probability gradient given some statistics and carry out gradient descent on those approximations. Much latest research has shown that RBMs can be successfully educated through the usage of contrastive divergence (CD) and persistent contrastive divergence (PCD) algorithms which are able to detect and contrast between a wide diversity of styles (patterns) [12]. When k > 1, the fundamental computation for the first CD calculation is k-steps Gibbs sampling. As a result, enhancing this section might reduce overall execution time. One issue with CDs is that the Markov chain’s mixing rate slows down as learning develops. Increasing the amount of Gibbs steps included in each update may help relieve this problem to some extent, but this would slow down the learning process. Alternatively, persistent contrastive divergence (PCD) is a learning approach that uses a slightly modified method of obtaining samples from the model distribution. Instead of repeating the chains, it becomes clear that PCD is a better approximation to the probability gradient [13]. Classification is a fascinating challenge since it indicates an illustration of the way nicely the model can extract applicable features from the input. RBMs are regularly utilized as feature detectors, and this discovery shows that PCD produces feature detectors that are superior to CD − 1 in terms of categorization. The primary restriction of PCD is that it appears to need a slow learning rate in order to sample the “fantasy” specks from a distribution that is close to the fixed allocation for the prevailing loads [14].
3.2 Proposed DBN Using Modified Contrastive Divergence (mCD) The CD algorithm is a well-known approach to training the energy-based inert variable models, which has been generally used in many deep learning models such as the RBM and deep belief networks. The traditional algorithms experience the adverse effects of certain problems. To begin with, it is understood that increasing
84
N. Ahlawat and D. Franklin Vinod
the number of subliminal (latent) units employed increases the chance of successfully completing information preparation. In practice, however, we must contend with the problem of overcomplete representations, in which several learnt properties are highly correlated. Therefore, the use of too many hidden units influences the learning time and increases the overfitting risk. The next step is learning from generative stochastic models, for example, RBM which permits the acquisition of a good feature for the reconstruction of samples, but it is more fragile for the forecast. During RBM training, the subordinate for the sigmoid activation function experienced a vanishing gradient issue, which resulted in unstable behaviour. To address the gradient issue, we used the clipping RBM idea with ReLU activation for firing neurons in hidden units. In this paper, we proposed a new modified contrastive divergence (mCD) algorithm with a clipped methodology which overcomes the problems related to existing methodologies. The training of RBM hidden layers using the clipping concept can help to reduce the problem of exploding gradients. As contrasted to conventional (standard) CD, it prompts a consistent estimate and lower bias which is very well suited to building a decent model. However, when the RBM distribution has many neurons of hidden layers, the accordant estimate of mCD might have an extensive bias and the variance of the gradient requires a more modest learning rate. The modified CD has very less overhead as compared to traditional CD variants. According to the preceding theoretical study, the sample mixing rate, which is a trait of the Gibbs sampling chain, is a significant aspect of the RBM training technique drawn on Gibb’s sampling. The change in network weights has a significant impact on the Gibbs chain sampling mixing rate. According to the theoretical conclusions of the prior work, boosting the weight of the convergence theorem of Gibbs sampling tuning significantly improved the classification performance of joint-density RBMs. Our convergence criteria are not the same as those obtained by current convergence hypotheses; in fact, our convergence requirements are more useful in practice. During the training of deep belief network (DBN), each layer applies a nonlinear change to the vectors sent into it, and those transformed vectors are then fed into the next layer. After we’ve completed the training process using the mCD-RBM model to create the deep belief network (DBN) model, we can use the Reconstruct mCD-RBM to see how well it remakes the data.
4 Experimental Result and Analysis To assess the suggested technique’s performance, we have applied the proposed framework to two datasets: Wolberg et al. at the University of Wisconsin created WBC and the WDBC (UCI machine learning source). A machine configured with an Intel(R) CoreTM i3-2400 CPU clocked at 3.10 GHz and running Windows 10 has been utilized for the experiment. Python has been set up with a number of machine learning packages for simulation set up.
mCD and Clipped RBM-Based DBN for Optimal Classification …
85
Subsequent to preparing the mCD-RBM-based DBN model, we can use the mCDRBM function to see how well it reproduces the data. The modified model for breast cancer datasets should have been visible in Table 1 which exhibits the precision of the WBC dataset with different sorts of hidden (latent) layers employing the DBN approach (function). Here, we utilize different types of the hidden (latent) layer consisting of 100, 200, 300, 400, 500, 600, 700, 800 nodes in two and four layers. On the basis of the result in Table 1, the pattern of precision increases when we utilize the greater number of nodes in hidden layers. Assessment of this experiment is dependent on the calculation of precision. In Table 2, the suggested model’s evaluation metrics have been compared to the existing baseline technique, which is visually displayed in Fig. 1 for better comprehension. In comparison to existing methods, the proposed (suggested) method has a 97.68% (percent) accuracy with different metrics like S 1 , S 2 , and FS, as shown in Table 2 and Fig. 1. Accuracy (precision) is the frequency with which we get to know how frequently the model is correctly trained, as depicted in Table 2 by using the confusion matrix [19, 20]. Table 1 Classification precision by using the mCD-RBM tool with several sorts of hidden layers Number
Iterations
No. of nodes in hidden (latent) layers
Layers
Precision
Layers
Precision
1
800
100
2
0.8641
4
0.8750
2
800
200
2
0.8723
4
0.8896
3
800
300
2
0.9108
4
0.91179
4
800
400
2
0.9258
4
0.9279
5
800
500
2
0.9245
4
0.9321
6
800
600
2
0.9416
4
0.9450
7
800
700
2
0.9508
4
0.9591
8
800
800
2
0.96881
4
0.97680
Table 2 Performance evaluation of the suggested approach to current or baseline methods Method
Datasets
Evaluation metrics
k-nearest neighbours classifier (k-NN) [15]
Wisconsin (WDBC)
ROI, standard deviation
Pa-DBN-BC [16]
CINJ, CWRU, TCGA
ROI
WRBM-based DBN [17]
PolSAR
Accuracy
SVM [18]
WDBC
CV
CDBN [19]
MIAS
AUC
Proposed model
Wisconsin (WDBC)
Sensitivity (S 1 ), specificity (S 2 ) and F1-score (FS)
86
N. Ahlawat and D. Franklin Vinod
Fig. 1 Classification precision rate of the proposed method with existing methods
To assess the interpretation of the result four criteria are utilized, i.e. accuracy (A), specificity (S 2 ), sensitivity (S 1 ), and F1-Score (FS). These performance criteria have been determined by Eqs. 1, 2, 3 and 4, described here. (T.N. + T.P.) = Accuracy (A)/(F.N. + T.P. + T.N. + F.P.)
(1)
T.P. = (S1 )/(T.P. + F.N.)
(2)
T.P. = (S2 )/(F.P. + T.N.)
(3)
2 ∗ (T.P.) = FS/(2 ∗ (T.P.) + F.N. + F.P.)
(4)
In Eqs. 1, 2, 3, and 4, TP means true +ve, TN means true −ve, FP means false + ve, and FN means false −ve [21, 22]. The proposed method utilized the different performance metrics for analysing the performance given in Table 2 on the basis of their (S 1 ), (S 2 ), (A), and F1-score (FS). The comparative method utilized in Fig. 1 (Precision) is the frequency with which we get to know how frequently the model is correctly trained, as depicted in Tables 2 and 3 by using the confusion matrix.
mCD and Clipped RBM-Based DBN for Optimal Classification … Table 3 Performance evaluation parameters
87
Accuracy (A)
0.976
Sensitivity (S 1 )
0.941
Specificity (S 2 )
0.932
F1-score (FS)
0.966
5 Conclusion This hindsight paper finished with a conscious thought process and examined several techniques that were utilized to identify breast cancer from histopathology images. In this paper, we proposed a new modified contrastive divergence (CD) algorithm which improves the execution of the RBM by overcoming the problems related to existing methodologies. As contrasted to conventional CD, it prompts a consistent estimate and lower bias which is very well suited to building a decent model. Contrasting outcomes with other existing methods, mCD-DBN with clipped RBM shows a better precision level with layers two and four along with 800 iterations. In future studies, we hope to adapt the suggested strategy to a larger number of databases or applications, as well as to various models for adequate monitoring of other diseases also.
References 1. Avaznia C, Naghavi SH, Menhaj MB, Talebi H (2017) Breast cancer classification using covariance description in Riemannian geometry. In: 10th Iranian conference on machine vision and image processing (MVIP). IEEE, Iran, pp 110–113 2. Spanhol FA, Oliveira LS, Petitjean C et al (2016) A dataset for breast cancer histopathological image classification. IEEE Trans Biomed Eng 63(7):1455–1462 3. Whitney HM, Li H, Ji Y, Liu P, Giger ML (2020) Comparison of breast MRI tumor classification using human-engineered radiomics, transfer learning from deep convolutional neural networks, and fusion methods. Proc IEEE Inst Electr Electron Eng 108(1):163–177. https://doi.org/10. 1109/jproc.2019.2950187 4. Aboutalib SS, Mohamed AA, Berg WA et al (2018) Deep learning to distinguish recalled but benign mammography images in breast cancer screening. Clin Cancer Res 24(23):5902–5909 5. Wang D, Khosla A, Gargeya R et al (2016) Deep learning for identifying metastatic breast cancer. Cornell University, Ithaca, NY. arXiv:1606.05718 [q-bio.QM] 6. Litjens G, Kooi T, Bejnordi BE et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88. https://doi.org/10.1016/j.media.2017.07.005 7. Xu B, Liu J, Hou X et al (2013) Attention by selection: a deep selective attention approach to breast cancer classification. IEEE Trans Med Imaging 1. https://doi.org/10.1109/TMI.2019. 2962013 8. Liang Y, Yang J, Quan X, Zhang H (2019) Metastatic breast cancer recognition in histopathology images using convolutional neural network with attention mechanism. In: Chinese automation congress (CAC), pp 2922–2926. https://doi.org/10.1109/CAC48633.2019. 8997460 9. Shahidi F, Mohd Daud S, Abas H et al (2020) Breast cancer classification using deep learning approaches and histopathology image: a comparison study. IEEE Access 8:187531–187552. https://doi.org/10.1109/ACCESS.2020.3029881
88
N. Ahlawat and D. Franklin Vinod
10. Alhussan A, Samee N, Ghoneim V, Kadah Y (2021) Evaluating deep and statistical machine learning models in the classification of breast cancer from digital mammograms. Int J Adv Comput Sci Appl 12(10). https://doi.org/10.14569/IJACSA.2021.0121033 11. Sultan HH, Salem NM, Al-Atabany W (2019) Multi-classification of brain tumor images using deep neural network. IEEE Access 7:69215–69225 12. Pang T, Wong JHD, Ng WL et al (2020) Deep learning radiomics in breast cancer with different modalities: overview and future. Expert Syst Appl 158. ISSN 0957-4174. https://doi.org/10. 1016/j.eswa.2020.113501 13. Tlusty T, Amit G, Ben-Ari R (2018) Unsupervised clustering of mammograms for outlier detection and breast density estimation. In: 24th international conference on pattern recognition (ICPR), pp 3808–3813. https://doi.org/10.1109/ICPR.2018.8545588 14. Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th international conference on machine learning. ACM, pp 536–543. https://doi.org/10.1145/1390156.1390224 15. Mejía TM, Pérez MG, Andaluz VH, Conci A (2015) Automatic segmentation and analysis of thermograms using texture descriptors for breast cancer detection. In: Asia-Pacific conference on computer aided system engineering, pp 24–29. https://doi.org/10.1109/APCASE.2015.12 16. Hirra I et al (2021) Breast cancer classification from histopathological images using patch-based deep learning modeling. IEEE Access 9:24273–24287. https://doi.org/10.1109/ACCESS.2021. 3056516 17. Guo Y, Wang S, Gao C, Shi D, Zhang D et al (2015) Wishart RBM based DBN for polarimetric synthetic radar data classification. In: IEEE international geoscience and remote sensing symposium (IGARSS), pp 1841–1844. https://doi.org/10.1109/IGARSS.2015.7326150 18. Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482 19. Yang Y (2020) Medical multimedia big data analysis modeling based on DBN algorithm. IEEE Access 8:16350–16361. https://doi.org/10.1109/ACCESS.2020.2967075 20. Zararsiz G, Akyildiz HY, Goksuluk D et al (2016) Statistical learning approaches in diagnosing patients with nontraumatic acute abdomen. Turk J Electr Eng Comput Sci 24:3685–3697 21. Ali SA, Raza B, Malik AK, Shahid AR et al (2020) An optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction based on Ruzzo–Tompa and stacked genetic algorithm. IEEE Access 8:65947–65958 22. Al-Antari MA, Han SM, Kim TS (2020) Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms. Comput Methods Programs Biomed 196:105584
Digital Disruption in Major Ports with Special Reference to Chennai Port, Kamarajar Port, and Tuticorin Port S. Tarun Kumar, Sanjeet Kanungo, and M. Sekar
Abstract Digitalization is the new buzzword in port environments and has become the top priority for all major ports in India. The main motivators for this ambitious strategy are the financial profits that come along, increased trade volumes, and attracting new customers. In this context, many ports are endeavoring to employ cutting-edge technologies and improve port’s efficiency. The article aims to evaluate the pre- and post-port performances after implementation of EDI and partial automation. The researcher has used secondary data from the period 2010 to 2021. Later, this data was used to develop efficiency indices like—operating surplus per employee, operating surplus per vessel, and operating surplus per 000’ tons. These indices gaged and compared the port’s performances. It was found that adopting digital technologies positively impacts port’s economics and improves its efficiency. The author used secondary data due to paucity of time. Likewise, data for the year 2021–2022 was excluded from the study as it was only partly available. Keywords Digitalization · Port · Automation · EDI
1 Introduction and Background Currently, there is a dynamic shift in the way ports and its ancillary industries plan, execute and manage operations all over the globe. Thanks to digitalization and its top technologies that have been acting as a catalyst of change and pushing the boundaries of human imagination.
S. Tarun Kumar (B) · M. Sekar Indian Maritime University, Chennai, Tamil Nadu 600119, India e-mail: [email protected] S. Kanungo Tolani Maritime Institute, Induri, Maharashtra 410507, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_8
89
90
S. Tarun Kumar et al.
This rapid surge to attract more business, larger vessels, and be competitive at the global stage is pushing administrations and port stakeholders to experiment and adopt technology at a faster rate than ever. Early adopters of digitalization are already reaping the benefits in terms of efficiency in handling cargo, cost saving, and sustainability, and are now venturing further into unchartered territories of artificial intelligence (AI), Internet of Things (IoT), and big data. With the world’s top ports embracing newer technologies and developing at such a rapid pace, the article aims to check the ground reality of digitalization in three Indian major ports situated in the southern part of the country. India’s digitalization journey began only a few years back. It started to implement electronic data interchange or EDI for all major ports from Dec 2018. This technology was first developed and introduced in the 1960s for the United States army. Back then it was specifically used in their supply chain network. Subsequently from 1990s countries started to use it in its seaports. In brief, EDI is a concept of sharing documents electronically with all stakeholders thereby doing away with manual paperwork that took a lot of time, effort, and money. More recently, India has been implementing enterprise business system (ESB) in five major ports from Dec 2020 onwards. This offers a digital port ecosystem that embraces best practices accepted worldwide, without abandoning existing configuration that meets local needs. To simply put, these systems are comprehensive software packages that can integrate, monitor, and control all operations of a business. They function as command centers that assist in automation, effective decision-making, and reporting in businesses. Adding to this, a unified web-based port community system (PCS) has been made operational throughout all major ports. This allows smooth data flow between various stakeholders through a common interface. This paper envisages two objectives—performance of selected ports pre- and postEDI and partial implementation of automation and to suggest findings from the above study. The researcher has used secondary data available from basic port statistics, Ministry of Ports Shipping and Waterways (Government of India). For the purpose of study, data from 2010 to 2021 have been considered (10 years). The study may be useful to gage the performance of ports pre- and post-EDI and partial implementation of automation and could be valuable information for stakeholders, creditors, debtors, and investors.
2 Literature Review A need for common ground (index) is felt to compare various ports based on their smartness. Since there is no way to compare these ports, an attempt to create a framework using quantitative methods is made. There can be two ways to evaluate smartness of a port: ideology that depends on the government policies and using the resources smartly. The other is incorporating the latest technologies to boost port performance. Subsequently, the paper talks about Smart Port Index (SPI) that was
Digital Disruption in Major Ports with Special Reference to Chennai …
91
developed and discussed. Four broad areas of a port are assessed—operation, environmental, energy, safety, and security. This index value allowed users to a > forecast defects and take corrective actions in advance, b > self-evaluate and benchmark itself with other ports, and c > build on strategic decisions to remain competitive globally [1]. Digitalization in the maritime sector requires continued investments into technology, information sharing, coordination, and collaboration among stakeholders. These quite often get overlooked and create hurdles that need identification. The research paper tries to identify existing possibilities and obstacles in modern-day seaports and figure out how intra- and inter-organizational activities may lead to competitive advantages. We cannot succeed in digital transformation just by adopting modern technologies, but an adjustment to organizational structures is also required. Digital initiatives can go wrong if unique requirements for individual stakeholders are not considered. Therefore, there is a need for setting up strategies and collaboration among stakeholders which is critical to port operations with common goals. Considerations into intra-, inter-, and meta-organizational perspectives are to be made after analyzing resulting costs and benefits [2]. There is a tight grip on sharing data openly by the port actors. Take for example Finnish ports that currently do not have the attitude of fully open date. This is attributed mainly due to lack of understanding of legal requirements. As a result of which data is kept closed as a security measure. Another hurdle in implementation of open data is lack of expertise and vision. One can list out three main benefits for use of open data. It enables—a > development of applications required in daily work, b > overall coordination of port operations is enhanced; particularly unexpected events and logistic delays can be handled better [3]. One study on the inter-relation between Industry 4.0 technologies and business models in a seaport (especially smart port) found four traits for level of smartness. First a conscious effort must be taken in understanding the needs of the port’s clients, i.e., a customer-centric focus. Second, the importance of generating value for hinterland and the city (port integrating with smart city). Third is to support innovation by promoting a startup hub inside the port’s ecosystem. Lastly, implementation of certain data-driven functionalities like virtual gates (real-time data interchange, digitally enabled port synchronization). For the port of Barcelona, Industry 4.0 influences business models principally via market pull mechanisms. This is because the port continuously tries to keep up with advances in the industry, other ports, and their stakeholders [4]. Key drivers for the impending shift in ports include—standardization, logistic supply chain management, societal importance of ports in their cities, environmental efficiency (carbon neutrality) of ports, and technological trajectories. A proactive approach and constant planning are a must from the port managers. There is also a growing need for high skilled professionals for proposed protocols. Security and reliability of data in port digitalization will continue to be crucial for the near future [5].
92
S. Tarun Kumar et al.
Spanish authorities have launched Ports 4.0 initiative under its ministry of transport. Being the largest open innovation fund for the logistics-port ecosystem, the government wants to take stock of the situation and check their current state of digitalization in all Spanish ports. Findings suggest that very little attention has been paid to information systems and technologies in maritime logistics until now. Successful digital transformation has two parts: a > adopting new technologies and b > having a structured organization. In the case of Spain, the second part is highly segmented into port authorities that operate independently, with management autonomy. Likewise, there is general lack of training of the port authorities in handling digital technologies [6]. The focus with Industry 4.0 has been to imbibe newer technologies to improve efficiency. This has neglected the human element completely. There are implications for not focusing on the human element, and hence more emphasis on it is given in Industry 5.0. While comparing the two industries (4.0 and 5.0), the latter focuses additionally on the communication between humans and technology. There is growing approval for collaborative technologies, e.g., human–machine systems, collaborative robots, etc. A detailed analysis shows that IoT, cobots (collaborative robots), and AI are the most examined Industry 5.0 technologies in smart logistics [7]. With smart ports becoming the new normal for industrialized ports in the world, their performance revolves around four key domains: operations, environment, energy, safety, and security. If we can conserve more ‘energy’ by incorporating microgrids to ports’ energy networks, then ports can turn out smarter and more competitive. Preliminary assessment of research proposes means to transform a long-established port into a contributing segment of a sustainable ecosystem by using microgrids. Simulation results point out that the port microgrid can contribute to various aspects of port operation and management. These are—avoiding critical facility downtime, energy savings, energy dependency, and emission reduction [8]. These days, the Internet of Things (IoT) can be considered an important technological revolution related to smart cities, smart homes, and smart ports. If we talk about smart ports, IoT has made it possible to monitor almost everything on a realtime basis. This is made possible due to various sensing technologies available and their interacting with each other. Smart ports are deploying smart sensing systems and improving execution of different terminal tasks. These are also contributing to reduced handling time compared to classical container terminals. On the safety front, several solutions are used for structural health monitoring (SHM) for quayside cranes [9]. Key advances have taken place in telecommunications and the Internet of Things, which have given rise to numerous smart city scenarios in which smart services are provided. However, there remains a pressing need to provide smart services quickly, efficiently, in an interoperable manner, and that too in real time. Benefit of using a microservice architecture is that the functionality can be rapidly increased by deploying and interconnecting a new service. The provision of a microservice that integrates a complex event processing (CEP) engine provides a large real-time data processing capability. This is proven by a case study wherein the system is seamlessly
Digital Disruption in Major Ports with Special Reference to Chennai …
93
scalable and maintainable in terms of its functionality and evolution. None of the existing proposals for smart ports provides a fully interoperable approach (as done here with web of things {WoT} microservice architecture), nor do they benefit from an integrated CEP engine to facilitate real-time detection and notification of situations of interest [10]. Existing port logistic handling systems are highly centralized with limited prospects for collaboration among stakeholders. These systems lack traceability, transparency, and information security that adversely affects productivity of port terminals. Blockchain is a promising technology that uses decentralization (no centralized power) and hashing functions (encryption) to overcome the shortcomings. Frauds related to documentation can be eliminated by the use of this technology, thereby increasing trust among stakeholders. Hyperledger Fabric and Besu are two potential private blockchain platforms that are well suited for use in port logistics. Hyperledger fabric offers more features and is relatively more mature than Besu. There are also the advantages of reduced turnaround time of containers due to elimination of manual paperwork. Additionally, blockchain technology brings down carbon emissions and total downtime of the ships because of real time, trusted, and transparent logistics data sharing among members [11]. Concepts of sustainability and smart buildings are gaining popularity all over the world. Singapore, one of the early adopters of this technology, is already reaping the benefits. The smart city 5G technology has the government of Singapore’s backing. Because of this, the expectations of public w.r.t. comfortable buildings, environment protecting, energy saving, and efficiency are being met. Realization of the utmost cost-effective system is possible due to application of the 5G technology to intelligent buildings. As 1/3rd of Singapore’s electricity consumption is from buildings. The technology has a major role in reducing carbon footprint and meeting its commitment of reducing emissions by 36% by 2030 (from 2005 levels) [12]. To provide a wide array of smart applications, smart ports use information and communications technology (ICT) extensively. This is resulting in vastly improved vessel and container management, which subsequently improves the competitiveness and sustainability of the national economy. There are many ingenious solutions like information systems (IS) and locating systems that can be planned to be implemented in a smart port, but these come down the ladder of priority. Key issues like greenhouse gasses (GHG) emissions must be tackled first on an urgent basis. Currently, GHG has accelerated to alarming levels and there is a need to urgently address this unresolved issue [13]. For a port to remain competitive, it needs to be part of the digital networks. The existing technology should be able to integrate with modern-day technology platforms and architecture. This will ensure focused technology management and data transfer interoperability. To achieve digital transformation (DT), all stakeholders including smaller players should be able to implement new systems. The way the three categories (a > digitalization of information, b > exchange of digital information, c > automation of information exchange and operations) are managed, will indicate the level of port digitalization. A clear vision is a prerequisite before the starting of a large-scale digitization project. Wrong solutions or decisions made here may lead to
94
S. Tarun Kumar et al.
inflated costs and losses. Some major challenges are incompatible systems, lacking resources, cyber security threats, and resistance toward digitalization of ports. The correct reasons to chase digitalization measures should be—enhanced operations, reduced costs, and environment protection, and not because competitors/other ports are doing it [14]. Smart ports are the next generation of industrialized ports which use state of the art technologies. Integrating all the digital technologies in the port and analyzing the data generated can create innovative opportunities for port authorities (stakeholders). A digital twin (DT)-driven management can help with the above. Efficient operation of cranes and other cargo handling equipment is possible through the DT-based models. Also, machinery/equipment is properly maintained as health checks are done regularly. For making intelligent decisions, a DT-based system can collect a huge amount of data from the automated container terminal process, which acts as a data bank of knowledge. DT technology makes communicating and sharing of data more restricted. Predictive optimization is an important application of DT-driven management in smart ports. Failures are predicted in advance, thereby minimizing downtime of terminal operations. Real-time monitoring, synchronous description, and dynamic prediction (of various entities) can be performed by DT-driven systems for port environmental management [15].
3 Model Construction This paper makes an attempt to find out the performance of the ports post EDI and partial implementation of automation. The EDI and partial implementation of automation were carried out in Dec 2018. To gauge the performance of ports pre- and post-EDI and partial implementation of automation, indices like operating surplus per employee, operating surplus per vessel, operating surplus per 000’ tons, and compounded annual growth rate (CAGR) were identified. Normally, the performance of the port is gauged by net surplus, throughput, and number of vessels handled at the port at a given point of time. The author thought that these indices may not give a true picture, as all these should be aligned with number of employees, vessels handled, and the throughput. Operating surplus is calculated by deducting operating expenses from the operating income. Operating surplus per employee is calculated by dividing operating surplus by number of employees in the port (operating surplus/no of employees). Operating surplus per vessel is calculated by dividing operating surplus by no of vessels handled (operating surplus/no of vessels). Operating surplus per 000’ tons of throughput is calculated by dividing operating surplus by total traffic (operating surplus/total traffic). The CAGR is calculated by an online tool.
Digital Disruption in Major Ports with Special Reference to Chennai …
95
4 Data Analysis and Interpretation 4.1 Efficiency Index of VOC Port The operating surplus of VOC port from 2018 to 19 has gone up from Rs 25,234 Lakhs in 2018–19 to Rs 32,162 Lakhs in 2020–21. This reminds us that post-implementation of EDI and partial implementation of automation in VOC port, the operating expenses has come down resulting in increase in operating surplus. This is despite total traffic coming down in 2020–21 as compared to 2018–19 and also the number of vessels handled coming down to 1203 vessels in 2020–21 as against 1370 vessels handled in 2018–2019 (ref Table 1). Although the throughput has fallen, there is a steady increase in operating surplus and other efficiency indices (ref Table 1) like operating surplus per employee, operating surplus per vessel, and operating surplus per thousand tons handled. This clearly indicates implementation of EDI and partial implementation of automation has reduced the cost of operation. While discussing the preautomation era 2018–19, the operating surplus from the period 2010–11 to 2017–18 has a mixed trend of up and down performance. Even the efficiency indices are not that attractive (ref Table 1) whereas in the postautomation era, the efficiency indices are showing bright signs of improvements year on year. Table 1 VOC efficiency indices from 2010 to 2021 Ports
Operating surplus (Rs in Lakhs)
Operating surplus per employee (Rs in Lakhs)
Operating surplus per vessel (Rs in Lakhs)
Operating surplus per 000’ tons (Rs in Lakhs)
No of vessel handled
Total traffic (‘000 tons)
2010–11
14,137
6.80
10.48
0.550
1349
25,727
2011–12
11,298
5.78
2012–13
15,300
8.44
8.21
0.402
1376
28,105
12.39
0.541
1235
2013–14
14,567
8.85
28,260
13.33
0.509
1093
28,642
2014–15
23,771
2015–16
16,100
15.89
17.23
0.733
1380
32,414
11.79
10.63
0.437
1515
2016–17
36,849
35,002
35.00
21.06
0.910
1662
38,463
2017–18
37,331
42.52
25.19
1.020
1482
36,583
2018–19
25,234
32.60
18.42
0.735
1370
34,342
2019–20
32,880
47.58
22.72
0.911
1447
36,076
2020–21
32,162
52.13
26.73
1.012
1203
31,790
Source Basic Port Statistics, Ministry of Ports Shipping and Waterways
96
S. Tarun Kumar et al.
Table 2 Chennai port efficiency indices from 2010 to 2021 Ports
Operating surplus (Rs in Lakhs)
Operating surplus per employee (Rs in Lakhs)
Operating surplus per vessel (Rs in Lakhs)
2010–11
10,418
1.34
4.91
2011–12
6231
0.83
3.19
2012–13
4409
0.67
2013–14
616
0.10
Operating surplus per 000’ tons (Rs in Lakhs)
No of vessel handled
Total traffic (‘000 tons)
0.170
2123
61,460
0.112
1956
55,707
2.35
0.083
1880
53,404
0.35
0.012
1756
51,105
2014–15
13,394
2.34
7.48
0.255
1790
52,541
2015–16
− 2032
− 0.37
− 1.20
− 0.041
1691
50,058
2016–17
21,720
4.72
13.58
0.433
1600
50,214
2017–18
22,992
5.30
14.37
0.443
1600
51,881
2018–19
24,575
5.96
15.30
0.464
1606
53,012
2019–20
20,934
5.30
14.30
0.448
1464
46,758
2020–21
23,700
6.40
18.02
0.544
1315
43,553
Source Basic Port Statistics, Ministry of Ports Shipping and Waterways
4.2 Efficiency Index of Chennai Port For Chennai port the operating surplus has gone down from Rs 24,575 Lakhs in 2018– 19 to Rs 23,700 Lakhs in 2020–21 (ref Table 2). This decrease can be attributed to the impact of COVID, as it had severely affected the metropolitan city. Operating Expenses has slightly come down, but the Total Traffic has come down drastically to 43,553 (000’ tons) in 2020–21 as compared to 53,013 (000’ tons) in 2018–19. Additionally, the number of vessels calling port has reduced from 1606 vessels in 2018–19 to 1315 vessels in 2020–21 (ref Table 2). Even though throughput and operating surplus have fallen, there is a quantum jump in efficiency indices (ref Table 2). It is evident from this that EDI implementation and partial automation have reduced the cost of operation. In the pre-automation era (before 2018–19), the performance of operating surplus shows a mixed trend of up and down performance.
4.3 Efficiency Index of Kamarajar Port Kamarajar port had an operating surplus of Rs 35,452 Lakhs in 2020–21. This has gone down from its 2018–19 figures of Rs 52,621 Lakhs. Here too the impact of COVID can be seen, as it has close proximity to Chennai where the effects were harsh.
Digital Disruption in Major Ports with Special Reference to Chennai …
97
Table 3 Kamarajar port efficiency indices from 2010 to 2021 Ports
Operating surplus (Rs in Lakhs)
Operating surplus per employee (Rs in Lakhs)
Operating surplus per vessel (Rs in Lakhs)
Operating surplus per 000’ tons (Rs in Lakhs)
No of vessel handled
Total traffic (‘000 tons)
2010–11
13,468
153.05
45.97
1.223
293
11,009
2011–12
20,442
217.47
53.10
1.367
385
14,956
2012–13
28,075
280.75
59.11
1.570
475
17,885
2013–14
46,356
454.47
81.47
1.696
569
27,337
2014–15
50,766
497.71
65.17
1.678
779
30,251
2015–16
49,762
487.86
61.21
1.545
813
32,206
2016–17
46,935
455.68
58.60
1.563
801
30,020
2017–18
47,541
485.11
59.88
1.561
794
30,446
2018–19
52,621
487.23
60.00
1.525
877
34,498
2019–20
51,325
503.19
62.06
1.617
827
31,746
2020–21
35,452
358.10
50.22
1.369
706
25,889
Source Basic Port Statistics, Ministry of Ports Shipping and Waterways
Both operating surplus and total traffic have come down in 2020–21 as compared to levels in 2018–19 (ref Table 3). The number of vessels calling port has dropped slightly from 877 vessels in 2018–19 to 706 vessels in 2020–21 (ref Table 3). Here too, throughput and operating surplus have fallen from 2018 to 19 levels, but there is a minor jump in efficiency indices for 2019–20 and later a fall in 2020–21 (ref Table 3). In this case, EDI implementation and partial automation had reduced the cost of operation to a certain level, but the efficiency indices dipped after a year probably as an impact of COVID. But there is a stark difference in efficiency indices for pre- and post-automation era (base year 2018–19), the trend is generally upwards. Operating surplus has been a fluctuating performance since 2010–11 (ref Table 3).
4.4 CAGR—Operating Surplus Comparison Between Three Ports The operating surplus is one of the important performance indices for measuring the port performance. The author made an attempt to calculate compounded annual growth rate Year on Year (YoY), the results are depicted in Table 4. The CAGR YoY for all the three ports since 2010–11 has a roller coaster ride. The bone of contention here is whether there is a growth rate after 2018, as these three ports automation is in progress since 2018, and EDI was implemented around the same period. Theoretically, there is a growth in operation surplus aligning with operating
98
S. Tarun Kumar et al.
Table 4 Kamarajar parameters from 2010 to 2021 Ports
VOC
Years
Operating CAGR (%) Operating CAGR (%) Operating CAGR (%) surplus (Rs in surplus (Rs in surplus (Rs in Lakhs) Lakhs) Lakhs)
Chennai
2010–11 14,137
0
2011–12 11,298
− 20.08
2012–13 15,300
35.42
2013–14 14,567
− 4.79
2014–15 23,771
63.18
2015–16 16,100
− 32.27
2016–17 35,002 2017–18 37,331
10,418
Kamarajar
0
13,468
0
6231
− 40.19
20,442
51.78
4409
− 29.24
28,075
37.34
616
− 86.03
46,356
65.11
13,394
2074.35
50,766
9.51
− 2032
NA
49,762
− 1.98
117.4
21,720
NA
46,935
− 5.68
6.65
22,992
5.86
47,541
1.29
2018–19 25,234
− 32.4
24,575
6.89
52,621
10.69
2019–20 32,880
30.3
20,934
− 14.82
51,325
− 2.46
2020–21 32,162
− 2.18
23,700
13.21
35,452
− 30.93
Source Basic Port Statistics, Ministry of Ports Shipping and Waterways
income e and operating expense but COVID-19 played a great havoc in spoiling the growth rate and hence one can witness a mixed trend on the above index.
5 Findings After the analysis of the above table, one can see a stark difference in performance of ports prior and post implementation of EDI and partial execution of automation. The exercise to modernize and automate ports started from Dec 2018 onwards. A look at the operating surplus for all the three ports from 2010 to 18 clearly indicates a slow growth rate. VOC port and Chennai port have just about doubled their figures (VOC port: from Rs 14,137 Lakhs to Rs 25,234 Lakhs; Chennai Port from Rs 10,418 Lakhs to Rs 24,575 Lakhs; ref Tables 1 and 2). In case of Kamarajar port, these figures have quadrupled (i.e., from Rs 13,468 Lakhs in 2010–11 to Rs 52,621 Lakhs in 2018–19; ref Table 3). With the onset of COVID-19 around 2018–19 world saw its ripple effects across all sectors including ports. Despite this setback, we see that the operating surplus for all the three ports generally remains the same or competitive to pre-COVID-19 levels. Especially for VOC ports there is an increase in operating surplus, Rs 25,234 Lakhs in 2018–19 to Rs 32,162 Lakhs in 2020–21. Furthermore, operating surplus per employee in general has seen an upward growth for VOC port and Chennai port (highest growth), ref Tables 1 and 2. Similarly, the operating surplus per vessel shows a considerable growth for both these ports. The same is the case with operating surplus per 000’ tons where growth is evident,
Digital Disruption in Major Ports with Special Reference to Chennai …
99
particularly in the case of for VOC port operating surplus rose from Rs 0.734 Lakhs in 2018–19 to Rs 1.011 Lakhs in 2020–21 during the pandemic period. For all the indices discussed, the performance of Kamarajar port has dipped in comparison to VOC port and Chennai port. This can be attributed to lesser number of vessels calling the port, higher impact of COVID-19. To give some teeth to this argument we can analyze and see that Kamarajar port’s efficiency indices for the period of 2019–21 are quite competitive when compared to 2010–18 years, meaning EDI and partial implementation of automation actually helped get the situation under control rather than going for a free fall. Evaluation of total traffic, vessels handled (years 2019–2021) for all the three ports clearly shows that there is a downwards trend, suggesting that the ports were doing lesser business than usual (ref Tables 1, 2 and 3). But then the operating surplus has generally gone up or remained the same. The reason for this being that the number of processes has come down which has led to fewer paperwork. This in turn has improved the efficiency of the workforce. Also, efficiency of port has tremendously improved especially with respect to documentation which used to take a lot of time. So essentially, we can infer and give credit to EDI and partial implementation of automation in ports. Even the results of analysis of the CAGR, for all the three ports are on the expected lines, i.e., there was a general slow growth from 2010 to 18 (ref Table 4). Later we have degrowth from 2018 to 19 onwards (except for Kamarajar port, reasons as explained earlier), but they are not as pronounced as anticipated due to EDI and partial automation in place. Overall, we see that efficiency indices have generally remained strong, steady, or competitive to its previous performances (years 2010–2018). This is a clear indication that post-application of EDI and partial implementation of automation, the impact of COVID-19 was restricted or minimized and efficiency of ports was significantly improved which led to stabilizing and sometimes bettering of the indices.
6 Limitations Due to paucity of time, secondary data has been used. Data for 2021–2022 was partially available, and so we have excluded it from our study.
7 Conclusion The scope of study includes three Indian ports—VOC port, Chennai port, and Kamarajar port situated in the Indian peninsula. The researcher had done an unstructured interview with the stakeholders and key port officials. It is also found that the
100
S. Tarun Kumar et al.
processing time for documentation and other executive functions were drastically reduced after application of EDI and partial implementation of automation. Government and policy makers are continuously toying with innovative ideas and technologies to boost automation and digitalization of ports. One such emerging technology is the enterprise business system (EBS), wherein a software package tracks and controls all the intricate operations related to business. They function as command centers assisting in automation, decision-making, and reporting in businesses. These are currently being put into service in a phased manner. As of today, five of the twelve major ports have partially employed this technology. Because this system is not yet completely integrated with all mainstream ports, we are not commenting on it for now. Future research can be directed at the performance of ports post-application of EBS. Statistically, it’s proven that COVID-19 has dented the performance of the three ports but somehow, they were able to sustain operations and efficiency owing to application of EDI and partial implementation of automation. After analyzing the data, one would realize that we are still at embryonic stages on our path toward complete automation of ports. There is still a long way to go before we could see futuristic ports sprouting out along the Indian coastline.
References 1. Molavi A, Lim GJ, Race B (2020) A framework for building a smart port and smart port index. Int J Sustain Transp 14(9):686–700. https://doi.org/10.1080/15568318.2019.1610919 2. Heilig L, Lalla-Ruiz E, Voß S (2017) Digital transformation in maritime ports: analysis and a game theoretic framework. NETNOMICS Econ Res Electron Netw. https://doi.org/10.1007/ s11066-017-9122-x 3. Inkinen T, Helminen R, Saarikoski J (2019) Port digitalization with open data: challenges, opportunities, and integrations. J Open Innov Technol Mark Complex 5(2):1–16. https://doi. org/10.3390/joitmc5020030 4. Henríquez R, Martínez de Osés FX, Martínez Marín JE (2022) Technological drivers of seaports’ business model innovation: an exploratory case study on the port of Barcelona. Res Transp Bus Manag 100803. https://doi.org/10.1016/j.rtbm.2022.100803 5. Inkinen T, Helminen R, Saarikoski J (2021) Technological trajectories and scenarios in seaport digitalization. Res Transp Bus Manag 41. https://doi.org/10.1016/j.rtbm.2021.100633 6. González-Cancelas N, Molina Serrano B, Soler-Flores F, Camarero-Orive A (2020) Using the SWOT methodology to know the scope of the digitalization of the Spanish ports. Logistics 4(3):20. https://doi.org/10.3390/logistics4030020 7. Jafari N, Azarian M, Yu H (2022) Moving from Industry 4.0 to Industry 5.0: what are the implications for smart logistics? Logistics 6(2):26. https://doi.org/10.3390/logistics6020026 8. Molavi A, Shi J, Wu Y, Lim GJ (2020) Enabling smart ports through the integration of microgrids: a two-stage stochastic programming approach. Appl Energy 258. https://doi.org/10.1016/ j.apenergy.2019.114022 9. Yang Y, Zhong M, Yao H, Yu F, Fu X, Postolache O (2018) Internet of things for smart ports: technologies and challenges. IEEE Instrum Meas Mag. https://doi.org/10.1109/MIM.2018.827 8808 10. Ortiz G et al (2022) A microservice architecture for real-time IoT data processing: a reusable web of things approach for smart ports. Comput Stand Interfaces 81. https://doi.org/10.1016/ j.csi.2021.103604
Digital Disruption in Major Ports with Special Reference to Chennai …
101
11. Ahmad RW, Hasan H, Jayaraman R, Salah K, Omar M (2021) Blockchain applications and architectures for port operations and logistics management. Res Transp Bus Manag 41. https:// doi.org/10.1016/j.rtbm.2021.100620 12. Huseien GF, Shah KW (2022) A review on 5G technology for smart energy management and smart buildings in Singapore. Energy AI 7. https://doi.org/10.1016/j.egyai.2021.100116 13. Yau KLA, Peng S, Qadir J, Low YC, Ling MH (2020) Towards smart port infrastructures: enhancing port activities using information and communications technology. IEEE Access 8:83387–83404. https://doi.org/10.1109/ACCESS.2020.2990961 14. Brunila OP, Kunnaala-Hyrkki V, Inkinen T (2021) Hindrances in port digitalization? Identifying problems in adoption and implementation. Eur Transp Res Rev 13(1). https://doi.org/10.1186/ s12544-021-00523-0 15. Wang K, Hu Q, Zhou M, Zun Z, Qian X (2021) Multi-aspect applications and development challenges of digital twin-driven management in global smart ports. Case Stud Transp Policy 9(3):1298–1312. https://doi.org/10.1016/j.cstp.2021.06.014
SmartTour: A Blockchain-Based Smart Tourism Platform Using Improvised SHA C. L. Pooja and B. N. Shankar Gowda
Abstract Technological solutions are immediately needed in the conventional tourism sector to reduce costs and increase productivity. Smart tourism offers a reliable framework to connect the tourism firm and the visitors using blockchain technology which is a new innovation that has the potential to revolutionize the travel and tourism market. The earlier blockchain-based tourism systems, unfortunately, are either hypothetical or incapable of addressing the core issues facing the travel industry. In this paper, we propose SmartTour, a smart tourism application built on the blockchain technology with an improvised SHA to deal with the issues and a working concept. SmartTour’s general system structure is specifically created to establish a reliable connection between visitors and their destinations. Additionally, the visitor can give feedback once after he checks in to the attraction by using the one-time password (OTP). Keywords Blockchain · Tourism · SmartTour
1 Introduction The travel and tourism sector contributes significantly to both the global GDP and citizens’ everyday life. The World Travel and Tourism Council’s financial influence study claims that the direct, intermediate influence to the transport and tourist sector contributed for 8.9 trillion USD (10.3%) of the global GDP during 2019. The traveling and tourist sector makes up more than half of the GDP in certain nations and areas, including Macau, the Seychelles, and the Maldives. In addition, this sector accounts for one out of each of the 10 employment worldwide (about 330 million at the time). The numbers provide compelling confirmation of this sector’s enormous significance. Considering its enormous relevance, the conventional tourist sector has been confronted with significant difficulties in improving it.
C. L. Pooja (B) · B. N. Shankar Gowda Bangalore Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_9
103
104
C. L. Pooja and B. N. Shankar Gowda
Despite it being significant, the everyday tourism industry has been facing serious challenges within the development. First, the tourist routes are chosen from the existing practices and they are not improvized, which ends up in incomplete discovery of the attractions. It’s difficult to motivate the tourists to participate in the newly established events. Second, there’s no rewards or encouraging factors for the tourists to go to the attractions continuously. Finally, new events are difficult to promote. The newly established events are less attractive to the tourists compared with the longstanding ones whether or not they’re neat. During the special period of COVID-19 epidemic, the challenges within the tourism industry became quite severe. The normal tourism industry has to be reformed urgently.
2 Related Work The uses of modern BC innovations have definitely demonstrated additional growth capability to spark an uprising throughout the tourist field. In order to increase the trade dealing ability and provide various tourism traveler services, this paper recommends a multi-chain BC technology. With regard to knowledge and security, the suggested multi-chain structure combines an open as well as a private chain for diverse content forms. The private network retains sensitive material including customer info and directives, whereas the open network retains public info such as assets info. A situational analysis of hotels bookings done online was performed to prove the utility and comfort of the setup [1]. This paper uses examination of a currently well-known topic, specifically BC innovation. This paper focuses on ways it might eventually be used in the ongoing Soviet’s travel sector initiative trip chains, which likewise aspires to create a new tourist ecological network. This paper analyzes the causes of expenditures and options in reducing costs using the BC, for reference, in the tourist economy [2], using the data and details that are readily available. The work uses the example of the Rijeka company phase to demonstrate how BC technique is used in the Croatian travel sector. The limitations are that Croatia had been in front of the worldwide BC partnership once more until Oct 2019, so without a doubt, transport companies will make use of the organization’s capabilities throughout the upcoming vacation period. There is currently a lack of evidence and usage of the latest Internet 3.0 for BC innovation in Macedonia’s travel sector [3]. The work on BC innovation in the tourist business is still in its early stages. Regarding BC and also its prospective impacts on the transport business, namely on the development of the tourist company’s economy, this useful paper hopes to stimulate academic discussion. The implementation of distributed ledgers was also detailed in the study. Smart contracts, DAPPs, and electronic cash were proposed as possible means of implementing BC development [4]. The brand-new tool which emerged in the delivered material and BC became an important component of many modern technology trackers which may identify real-time data broadcasting possibilities as essential and adaptive quite so to enable
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
105
ongoing info analysis. Additionally, the material may be very difficult to understand. When these pertinent facts are approved under BC key and time-stamped, rather than eliminating data sources, they could be diffused. The principle of secure, lawful, and distributed info is likely to be closer to the goal of interconnectivity, especially if it leans towards the platforms management contract principle which requires logical compatibility [5]. BIDaaS, or BC identification as a Service, has been developed for identification monitoring. In this system, a 3rd-party verifies an individual’s identity just once, and the organizations using the facility can utilize the results. Not all solutions require the client to submit legal documentation. Single verified outcome could be applied to several locations, speeding up the procedure. If enough companies adopt this offering, it will be beneficial [6]. The BC technology can be employed as a reward/loyalty system. This concept proposes a universal reward/loyalty point which could be used for all the related businesses instead of having one for each brand/company. It is beneficial as the use of the accumulated points can be used across the brand/company making the word reward more meaningful [7, 8]. The distributed technology used in BC can be employed for classification of hotels based on the images uploaded in the official website and verifying and classifying them based on the user reviews [9]. Similarly, this concept can be applied for the rating of the articles or objects [10]. The BC technology can be used to verify the documents issued by the government to protect privacy while the authorities access the documents for verification from the database [11]. This technology can also be used to establish service level agreements which are scalable and ensure the commitment of the service providers by establishing the SLA prior to the tourists to see [12].
3 Problem Statement There are several websites to make attraction/place bookings online, which offer discounts and reward points, but none of them endorse trust or reliability of their sources. The reviews provided in their websites could be paid or fake. This is the issue which is going unnoticed. These issues can be addressed by developing a tourism platform to provide a secure and trustworthy way of booking a ticket to attractions/places and to verify the blocks in a blockchain. To get reviews of the attractions which are reliable.
4 Experimental Setup The implementation of this proposal is actualized using a user-defined blockchain in Java. The blockchain is integrated with a tourism web application to provide trust. The setup is depicted in Fig. 1. The admin sets up a user-defined permissioned blockchain
106
C. L. Pooja and B. N. Shankar Gowda
on which he has complete control to authorize. The requirements for this system are a computer, data storage means, browser on windows platform. The entity’s interaction with the system is picturized with the help of the directional arrows. • The admin is in charge of attraction details updating in the application and blockchain management. • The tourist browses the attractions and makes a booking, which is confirmed via email. • The tourist has to check-in at the attraction by providing the credentials and OTP received via email. • The check-in details are added to the block and new hash is generated to merge it into the blockchain. • Each block is created for each check-in by the tourist, the block contains the tourist details such as the date, and booking details. • The tourist can leave feedback about the attraction only after he has checked in.
Fig. 1 Experimental setup of the SmartTour system
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
107
Fig. 2 Blockchain structure
• The feedback is updated in the web application for the other tourists to see. • The admin can add an auditor to verify the authenticity of the block in case of need. • The admin sends a request to auditor to verify the block, and the auditor verifies it as specified in the Algorithm 7.4.
5 Blockchain Blockchain (BC) is a decentralized, dispersed, and an open electronic register which is utilized to log transactions throughout numerous systems in a manner to avoid changes from being made retrospectively without affecting any block behind them including the channel’s unanimity. This makes it possible for users to audit and certify activities on a budget. The term “blockchain” refers to a continuous series of documents, or “blocks,” that are connected by cryptographically. Every block includes the timestamp, a digital hashing of the preceding block, plus transactional details (called Merkle tree root hash). The body plus header of the block make up the BC. The hashed value reflects the transactional details and is produced by the SHA technique which makes up the body of the block. The prior SHA code, current SHA code, time, and random number make up the header of block as seen in Fig. 2.
6 Architecture A BC-enabled smart tourism system called SmartTour is designed for this scheme. The SmartTour design specification is specifically intended to connect visitors and sites in a reliable manner and the visitors can receive feedback using OTP only once they have visited the booked place. Tourist activities and privacy have to be safeguarded by the SmartTour system. While a tourist is entering the tourist place the check-in details are important and this has to be stored in a secure way for future verification and also it should not be altered, these details have to be stored in the blockchain storage, for this the block creation algorithm is used.
108
C. L. Pooja and B. N. Shankar Gowda
The proposed system architecture is shown in Fig. 3. There are three actors in this system like admin, auditor and tourist. System manager is the super user or admin user who will maintain the blockchain storage settings, and he will create data owners, and each data owner will have their own encryption and decryption keys. The admin can add the places and view the check-in details. The tourist can register first and then login; tourists can see the places with the ratings. The tourist can book the place; the ticket details will be mailed. The tourist can check in by entering the valid credentials and OTP. This transaction information is stored in the blockchain. All information is encoded using the AES method prior to being transferred to a CSP and stored in a ledger. The body of the block which comprises transactional info and block header is combined during the BC mining phase to form a block having a specific number. Each of these data is kept in a directory and subsequently transferred to storage blocks in a BC. After the tourist checks in, the tourist can give the feedback by entering the correct OTP by logging in on the user side. The check-in details and the feedback can be seen by the admin. The admin sends the verification request for that particular file to the auditor. The auditor does the verification where the hash code is seen. Comparing the previous and present hash code, the auditor will tell if the verification is successful or not. This way, the verification of check-in details is achieved using the blockchain process.
Fig. 3 System architecture
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
109
7 Algorithms Three algorithms have been proposed in the SmartTour. First is the SHA algorithm, next is the block creation algorithm, and last is the block verification algorithm. The description and steps for each algorithm are given below.
7.1 Improvised Secure Hash Algorithm (SHA) Secure hash algorithm, often known as SHA, is a crypto hashing mechanism that processes an entry and generates a hashing result that is 160 bits long. The phrase “message digest” refers to this hashed code. The result of this message digest is typically a 40-digit 8-bit integer. It was created by the US National Security Agency (NSA) and is a U.S. Federal Information Processing Benchmark. Figure 4 is a simplified working of the SHA-256 algorithm. The SHA-256 is used here, it is one of the strongest hashing algorithms, which is the SHA-1 successor and referred as SHA-2 which was made available in 2001 by the NSA. It is not much more difficult to write SHA-256 compared to SHA-1, and it has not yet been broken in any way. It is an excellent companion function for AES because of the 256-bit key. These are created utilizing the Davies–Meyer architecture from a sophisticated block cipher and the Merkle–Damgard architecture from a singleway compress method. The SHA-256 generates an almost-unique 256-bit (32-byte) signature for a text. The algorithm has two important steps: pre-processing and hash computation. The algorithm checks for the empty message and discards if it is null. The inclusion of a hash of a random prime number increases the complexity and becomes nontraceable, which makes the algorithm more secure. The block creation process is explained by Algorithm 7.3. The input taken is the check-in details entered by the tourist at the attraction. The hash is generated by the SHA 256 algorithm from the header with the previous block hash value, hash of the current block, timestamp, and nonce. Once the data is ready, it is encrypted, and the block is generated. The header and body are merged to form a block.
Fig. 4 SHA working
110
C. L. Pooja and B. N. Shankar Gowda
7.2 Improvised Secure Hash Algorithm Improvised Secure Hash Algorithm
Step 1: Pre-processing
• Convert the message to binary bits if it is not null. • Pad the message 64 bits less than a multiple of 512 bits. • Padding starts with 1 followed by all 0’s at the end of the message. • Add remaining 64 bits by applying modulus to original unpadded message bits. • Initialise 8 buffers with fixed random values. • An array K[63] is initialised with random hexadecimal values. • Divide the padded message into equal blocks W [i] of 512 bits each. Step 2: Hash computation
• Each block goes through 64 rounds of XOR operation. • The value (W [i] + K[i]) is computed. It is merged with R[i] by XOR operation. R[i]—hash of random prime number. • The output of the previous step is provided as input for the next block. • The process is repeated until 512-bit block is generated. • The final digest obtained is the 256-bit hash.
7.3 Block Creation Algorithm Once the tourist performs the check-in, the details are taken as the block message. The block creation starts by checking the transactions in the upload queue. Block Creation Process Algorithm Input: Tourist Check-In Details Output: Block User: Tourist 1 if N transactions are added in upload queue 2 Generate the Root-Hash-code from data
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
111
3 Fetch Pervious Block Hash (PBV) 4 Form content with PBV-RH-Timestamp-Nonce 5 Encrypt data and create Block Body 6 Merge Header and Body to create block 7 else 8 wait and check
7.4 Block Verification Process Algorithm Auditor verifies whether all the blocks in the blockchain are intact or not. For verification, the following algorithm is used. On request from admin for verification of the block. The Block ID is sent and the block details are fetched, the data is extracted, the current. Block Verification Process Algorithm Input: Block ID Output: Status of the Block User: Auditor 1 If number of Block (N ) > 0 then 2 List the Block details 3 Fetch block to be tested and Extract data 4 From the transaction generate Hash (1) 5 Fetch Previous Block Hash 6 Compare both Hash and show result 7 else 8 Display no block to verify
hash is fetched and as well as previous block hash. The both values are compared and the result is displayed.
8 Result and Analysis The system is deployed and tested for the blockchain formation as well as the proper working of the web application. The mandatory requirements for the working of the web application are tested using various testing methodologies by providing various sets of inputs for each of the modules. Expected functionality is achieved as it prompts appropriate messages for the set of inputs provided.
112
C. L. Pooja and B. N. Shankar Gowda
8.1 Reliability Analysis The SmartTour system is monitored by the system manager, he controls the access to all data in the system. The auditor can perform verification only if he is authorized by the admin. The conditions mentioned in Table 2 check for the genuine tourist who has visited the place; hence, only he can provide the feedback making the review trustable and relatable. The admin has to create the auditor account for auditor to login this is explained in Table 3. The authorization involves several steps. The results of the system test can be summarized in the tables. Table 1 shows the cases where the authenticity of the tourist is ensured, making the visiting person verified (Tables 2 and 3). Table 1 Check-in by tourist to add data to the block Tourist ↓
Check-in
Remark
Receives booking details with check-in credentials with OTP
Yes
Can check-in successfully
Does not receive check-in credential
No
Cannot visit the attraction
Tries to check-in on a different date
No
Check-in window is not active
Table 2 Tourist providing the feedback for the visited attraction Tourist ↓
Feedback
Remark
Provide feedback after check-in with OTP
Yes
Feedback submitted successfully
Provide feedback without check-in
No
Feedback is not accepted as check-in is not performed at the attraction
Provide feedback after check-in with no OTP
No
Feedback is not accepted as OTP is not provided
Table 3 Auditor login feasibility in various conditions Admin ↓
Auditor Auditor log-in
Remark
Adds auditor profile and sends email with details
Yes
Auditor can successfully login
Does not add the auditor profile
No
Fails to login due to being not authorized
Deletes the auditor profile
No
Unable to login due to denial of permission from admin
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
113
Table 4 Conditions for verifying the block in blockchain Admin ↓
Auditor Auditor verify
Remark
Sends the block verification request with Block ID
Yes
Auditor fetches the block details and verifies
Does not send the request
No
Cannot fetch the block details
One of the main functionality of this system is verification of the blocks to check if it is tampered. This action is done by the auditor on request by the admin. As the admin controls this action, the conditions check if the chances of failure of the control flow are compromised or not in this Table 4.
8.2 Performance Analysis The blockchain formation is tested based on the uploading and downloading time required for the blockchain. These results are summarized in the form of Fig. 5. As it can be observed, the block uploading consumes more time when compared to the block downloading time.
Fig. 5 Vertical bar graph for block uploading and downloading
114
C. L. Pooja and B. N. Shankar Gowda
Figure 6 depicts the time taken to verify the block which is comparing the hash values of the current and previous block. The encryption time required for the various hashing algorithms is depicted in Fig. 7.
Fig. 6 Graph for block verification process
Fig. 7 Encryption time required for RSA, AES, SHA-256 and IM-SHA for different file size
SmartTour: A Blockchain-Based Smart Tourism Platform Using …
115
8.3 Security Analysis The system uses blockchain which is a secure and trustable technology. The security in this system is enhanced by the use of an improved SHA algorithm, where it discards the transactions with empty messages and prevents generation of empty blocks in the blockchain which protects the system against unnecessary block creation. The addition of a hash of a random prime variable makes the generated hash even more non decipherable. Security increases by a few percent. The data stored in the blocks are encrypted using the AES algorithm which ensures the data stored is not easily accessible. In this system, the data stored in the block and the improvised SHA adds two layer of security making the system more secure than the existing ones.
9 Conclusion In this work, we suggest SmartTour, a BC-enabled system for smart tourism that may connect travel agencies with travelers to boost the effectiveness of the travel sector. To lessen the strain from the standpoint of the visitors, SmartTour proposes a tiered network design including the construction of stable ledger units as well as lighter variants. Additionally, a cutting-edge agreement system, or “certificate of involvement,” is created to validate the exchanges of place details with star rating, booking details, check-in data, plus feedback given by visitors. Ultimately, a SmartTour model is put into use, and numerous tests are run. DriveHQ is a cloud service system that is employed in this work for transaction details storing in the form of digital ledger.
References 1. Zhang L, Hang L, Jin W, Kim D (2021) Interoperable multi-blockchain platform based on integrated REST APIs for reliable tourism management. Electronics 10(23):2990. https://doi. org/10.3390/electronics10232990 2. Polukhina A, Arnaberdiyev A, Tarasova A (2019) Leading technologies in tourism: using blockchain in TravelChain project. In: Proceedings of the 3rd international conference on social, economic, and academic leadership (ICSEAL 2019), Jan 2019 3. Erceg A, Sekuloska JD, Kelic I (2020) Blockchain in the tourism industry—a review of the situation in Croatia and Macedonia. Informatics 7(1):5. https://doi.org/10.3390/informatics7 010005 4. Tyan I, Yagüe MI, Guevara-Plaza A (2021) Blockchain technology’s potential for sustainable tourism. Inf Commun Technol Tour 17–29 5. Rakic D (2021) Blockchain and tourist data interoperability. In: XII international conference of tourism and information & communication technologies, Malaga, Spain, Sept 2021 6. Lee J-H (2017) BIDaaS: blockchain based ID as a service. IEEE Access 6:2274–2278 7. Shrestha AK, Vassileva J (2019) User data sharing frameworks: a blockchain-based incentive solution. In: 2019 IEEE 10th annual information technology, electronics and mobile communication conference (IEMCON), Oct 2019
116
C. L. Pooja and B. N. Shankar Gowda
8. Agrawal M, Amin D, Dalvi H, Gala R (2019) Blockchain-based universal loyalty platform. In: 2019 international conference on advances in computing, communication and control (ICAC3), Mumbai, India, Dec 2019 9. Taborda CHC, Vazquez JG, Marin CEM, Garcia PG (2021) Decentralized application for the classification of hotels based on IPFS and blockchain. In: International conference on tourism, technology and systems, pp 12–24 10. Shaker M, Aliee FS, Fotohi R (2021) Online rating system development using blockchain-based distributed ledger technology. Wireless Netw 1715–1737 11. Malik G, Parasrampuria K, Reddy SP, Shah S (2019) Blockchain based identity verification model. In: International conference on vision towards emerging trends in communication and networking (ViTECoN) 12. Le D-T, Nguyen T-V, Lê L-S, Kurniawan TA (2020) Reinforcing service level agreements in tourism sector, the role of blockchain and mobile computing. In: International conference on advanced computing and applications (ACOMP)
Detection of Starch in Turmeric Using Machine Learning Methods Madhusudan G. Lanjewar , Rajesh K. Parate, Rupesh Wakodikar, and Jivan S. Parab
Abstract Detecting adulterants in turmeric is necessary because turmeric is a vital food constituent that adds color and flavor. In this work, the pure turmeric powders were mixed with starch to produce distinct concentrations (0, 10, 20, and 30%) (w/w). The reflectance spectra of samples were taken by visible-NIR spectroscopy. Spectroscopy in the wavelength range 400–1700 nm was used to record reflectance spectra of pure turmeric and starch-contaminated samples. The recorded spectra were preprocessed using a Savitzky–Golay filter and a second derivative with poly order of 2. The preprocessed spectra are then standardized, which are used to train and validate ML models. Three ML models were employed for classification: logistic regression (LR), K-nearest neighbor (KNN), and support vector machines (SVM). The LR and KNN obtained 100% accuracy, precision, recall, and F1-score, while SVM obtained 90% accuracy, 92% precision, 94% recall, and 91% F1-score. The performance of these models was tested with the stratified five fold method. The KNN model obtained the highest average accuracy of 92%, which is excellent compared to the other models. Keywords Starch · Turmeric · LR · KNN · SVM
M. G. Lanjewar · J. S. Parab (B) School of Physical and Applied Sciences, Goa University, Taleigao, Goa 403206, India e-mail: [email protected] M. G. Lanjewar e-mail: [email protected] R. K. Parate Departmentof Electronics, S. K. Porwal College, Kamptee, Maharashtra 441002, India R. Wakodikar Department of Electronics, Nevjabai Hitkarini College, Bramhapuri, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_10
117
118
M. G. Lanjewar et al.
1 Introduction Curcuma longa is another name for turmeric. Tuberous rhizomes and subterranean turmeric have been employed as sauces, dyeing, and fragrant enhancers in a variety of remedies since antiquity [1]. Turmeric is a popular spice manufactured in India that is utilized by 80% globally [2]. Turmeric and curcumin have often been utilized to cure many diseases [3]. As a result of its prominence, turmeric is in high dominance in the global market. Unfortunately, turmeric powder’s purity has suffered substantially due to regularly utilized adulterants for illegitimate profit. For the identification of contaminants in turmeric, several traditional approaches have been published [4– 6]. Several analysis techniques have been used to identify chemicals and chemical pollutants in turmeric [7–9]. To attain a more immense economic advantage, a less priced adulterant such as starch was introduced to turmeric, but the purity suffered as a result. Decreasing curcumin content means losing the consumer of turmeric’s intended nutritional value. Currently, marketed spice starches are usually purposely blended with turmeric to increase the quantity and save cost, particularly in the unattended zone, which cannot be spotted visually. Furthermore, the presence of starch reduces the bioactivity of turmeric sensitivity. To detect starch in turmeric, analytical techniques: high-pressure liquid chromatography, microscopic studies, and thin layer chromatography are used, which are time-consuming, expensive to operate, and require specialized staff. ML algorithms analyze data relationships and make decisions without explicit understanding [10]. ML-based classification technologies are powerful tools for creating accurate and dependable classifiers. Algorithms like decision trees (DT), random forest (RF), artificial neural networks (ANN), genetic algorithms, SVM, and fuzzy logic are examples of such applications. There are also sophisticated algorithms for training different ML models, adjusting complex input–output mapping techniques, and choosing and eliminating valuable characteristics [11].
2 Literature Review To validate the need for selecting features, de Macêdo et al. [12] used a model of partial least squares (PLS) with complete spectral data and a model with dependent variables with Genetic Algorithm (GA), and GA-PLS, to estimate the starch content of turmeric samples using DRIFT spectra’s. In PLS multiple linear regression (GAPLS), the subgroup with the best RMSEP values with 15 variables was chosen. The assumptions made with GAPLS for the test set demonstrated linearity (R2 = 0.996). Starch contamination was discovered by Konjac Glucomannan [13], where the root mean square error (RMSE) of forecast was reported to be 4.890 by the PLSR regression approach employing Fourier transform near-infrared (FT-NIR). In addition, starch was identified by FT-NIR in onion powder, with a RMSE forecast of 1.18% [14] using the PLSR approach. Another method used to identify various starch
Detection of Starch in Turmeric Using Machine Learning Methods
119
malpractices was one-class partial least square classification; no precise statistical analysis was performed [15]. Kar et al. [16] detected starch in turmeric powder using FT-NIR. Pure turmeric powders were mixed with starch to create starch-adulterated turmeric samples in various percentages (1–30% w/w). FT-NIR spectroscopy was used to capture the reflectance spectra of 224 pieces. Principal component analysis (PCA) was used to analyze exploratory data. The variable importance in projection (VIP) approach was used to pick the starch-related peaks, which were then investigated using the original reflectance spectra, first derivative spectra, PCA, and coefficients plot of the PLSR model. R2 and RMSE of PLSR models were determined to be 0.91–0.99 and 0.23–1.3, respectively. Ranjan et al. [17] proposed a NIR spectroscopic strategy with a chemometric technique to detect Metanil yellow, Sudan dye-IV, and cornstarch powder with 99% accuracy. Thangavel and Dhivya [18] proposed FT-NIR with PLS regression for turmeric’s speedy recognition of starch, curcumin, and moisture contents. The starch was recognized with 0.076% RMSECV and R2 of 0.968.
3 Hypothesis This study combined unadulterated turmeric powders with starch to obtain varying concentrations. Reflectance spectra of pure turmeric and starch mixed specimens were recorded using visible-NIR spectroscopy (400–1700 nm). The spectra were preprocessed with a Savitzky–Golay filter and a second derivative with a poly order of two. The preprocessed spectra are then standardized to improve the performance of the models. These preprocessed standardized data are used to train and validate ML models (LR, KNN, and SVM). These models perform very well and achieve 100% accuracy. To test the robustness of these models, the stratified K-fold approach was used to assess the performance. After evaluating these models, the KNN model performed relatively well compared to the other models.
4 Methodology Figure 1 shows the ML-based framework to identify the starch concentration in the turmeric powder. The turmeric was mixed with starch having 0, 10, 20, and 30% concentrations. The spectra of the mixed sample were recorded using a spectrophotometer in the range of 400–1700 nm. These spectra were then preprocessed with a Savitzky–Golay filter and a second derivative with poly order of 2. The preprocessed spectra are then standardized, which are used to train and validate ML models, such as LR, KNN, and SVM, for classification. Finally, the outcome of these models was tested utilizing a confusion matrix and K-fold cross-validation.
120
M. G. Lanjewar et al.
Fig. 1 Proposed framework
4.1 Sample Preparation, Data Acquisition, and Pre-processing The turmeric roots were collected, cleaned, dried at 65 °C, and crushed. To generate the dataset, pure sample is mixed in variable concentrations with starch. A vendorprovided analytical grade starch (Thermo Fisher Scientific India Pvt Ltd.) is used. The sample and the adulterant were weighed using an electronic balance (model MAB 250, WENSAR). Because the spectroscopic measurements are affected by moisture content, the samples were pre-heated at 60 °C for 1 hr to reduce the moisture content. The turmeric powder was tainted with four different concentrations of starch: 0, 10, 20, and 30% (w/w). The contaminated sample was separated into four portions, and the spectra of these four samples were acquired. The dataset included three repeats of four different sample concentrations combined with starch. As a result, 48 sample sets were generated: four (different concentration) × 4 (replica) × 3 (repetition). Eighty percent of the 48 spectra were used for model training, while the remaining 20% were used for validation. Figure 2a depicts the recorded spectra. Savitzky–Golay is used to denoise the acquired spectra (400–1700 nm) of starch-adulterated turmeric samples and pure turmeric. The Savitzky–Golay second-derivative estimate with 99 filtering spots and a second-order polynomial was used on the original spectra. After applying Savitzky–Golay, the spectra were obtained, as shown in Fig. 2b. Figure 2c shows the complete flow diagram.
4.2 Machine Learning Algorithms Three ML techniques (LR, KNN, and SVM) were explored to determine the starch concentration in turmeric. LR is built on standard statistical approaches for identifying determinants of a binary or multiclass result. KNN is a non-parametric pattern recognition technique that can classify and predict [19–21]. The number of K neighbors and Euclidean distance (d) from a predefined point are used to classify objects. Following that, SVM is a type of ML that organizes data by building a hyperplane that efficiently distinguishes between groups by maximizing the margin, eventually resulting in a hyperplane-bounded area with the most considerable possible margin [20]. The purpose of SVM is to maximize the difference (margin) among sets of data, which may also be utilized for nonlinear data as a linear approach by transforming the
Detection of Starch in Turmeric Using Machine Learning Methods
121
Fig. 2 a Recorded spectra (400–1700 nm) of pure turmeric and adulterated turmeric with starch, b spectra after applying Savitzky–Golay filter, and c complete flow diagram
122
M. G. Lanjewar et al.
relevant data into a different dimension. Rashidi et al. [22] explained these models in detail with graphical representation.
5 Results and Discussion This part involves testing the starch classification system using three ML classifiers. Preprocessed spectra were divided into training and validation datasets in an 80:20 ratio. The standard Scaler methodology was employed for these preprocessed spectra for data normalization. The models were trained using these values, and their performance was assessed. Finally, the result obtained from these three ML models was examined. A confusion matrix and K-fold cross-validation are used to evaluate the models’ efficiency. All models share the same dataset and proportions. The models were tested on the training dataset before being validated on the validation dataset.
5.1 ML Cross-Validation Using Confusion Matrix The confusion matrix was utilized to analyze the quality of every ML algorithm. The matrix compares the actual values to the forecasted values of the model [1, 23]. Accuracyst =
TPst + TNst TPst + FPst + TNst + FNst
(1)
TPst TPst + FPst
(2)
Precisionst = Recallst = F1 scorest =
TPst TPst + FNst
1 Recallst
2 1 + Precision st
(3) (4)
Figure 3 depicts the confusion matrix of all models. In the confusion matrix, “0” indicates pure turmeric, “1” means 10% adulterated turmeric, “2” shows 20% adulterated turmeric, and “3” indicates 30% adulterated turmeric. The LR, KNN, and SVM models successfully predicted 10, 10, and 9 cases out of a possible 10. The LR and KNN models correctly predict all cases, whereas the SVM model adequately predicts 9 cases correctly and incorrectly predicts only one. This demonstrates that the LR and KNN models outperform the SVM model.
Detection of Starch in Turmeric Using Machine Learning Methods
123
Fig. 3 Confusion matrix of a LR, b KNN, and c SVM
Figure 4 depicts bar charts of four key metrics for all classifiers, including accuracy, precision, recall, and F1-score. The LR and KNN models scored the maximum accuracy, precision, recall, and F1 score of 100%, while SVM achieved 90% accuracy, 92% precision, 94% recall, and 91% F1 score. When compared to the SVM, the LR and KNN models perform admirably. This performance demonstrates that the LR and KNN models can accurately classify the starch concentration in turmeric powder. The performance of the KNN and LR models is excellent, but the robustness of these models can be checked with cross-validation techniques like K-fold crossvalidation. It is a data partitioning approach that efficiently allows datasets to be used to develop a more generalized model. Any type of ML aims to generate a more generalized model that can execute well on unknown data. A flawless model built on training data with 100% accuracy or zero error may fail to generalize to unseen data [24]. In this work, the robustness of these models was checked with a stratified
Fig. 4 ML model performance
124
M. G. Lanjewar et al.
K-fold method with K equal to 5. Table 1 tabulates the results of the ML models after applying the stratified K-fold method. From Table 1, it is observed that the average accuracy (92%) performance of the KNN model is good compared to the LR (86%) and SVM (80%). On the other hand, the average precision (92%), recall (90%), and F1-score (90%) performance of the KNN model are reasonable compared to the LR and SVM. It indicates that the KNN model performance is good compared to the LR and SVM models. Detecting any adulterants in food is critical for human health. Traditional approaches aid in detecting these adulterants, but they are time-consuming. In this paper, an ML-based method for classifying starch concentration was reported. The review of studies highlighted several approaches for recognizing and classifying starch concentration. Compared to previously reported work, the proposed LR and Table 1 ML models performance Models K-fold Accuracy
Precision
Recall
F1-score
Fold = 1
LR (%) 60
KNN (%) 70
SVM (%) 50
Fold = 2
80
90
60
Fold = 3
100
100
100
Fold = 4
100
100
89
Fold = 5
89
100
100
Average
86
92
80
Fold = 1
50
68
43
Fold = 2
75
93
58
Fold = 3
100
100
100
Fold = 4
100
100
88
Fold = 5
88
100
100
Average
83
92
78
Fold = 1
30
56
25
Fold = 2
65
93
53
Fold = 3
100
100
100
Fold = 4
100
100
94
Fold = 5
67
100
100
Average
77
90
74
Fold = 1
38
63
31
Fold = 2
69
90
48
Fold = 3
100
100
100
Fold = 4
100
100
88
Fold = 5
88
100
100
Average
79
90
73
Detection of Starch in Turmeric Using Machine Learning Methods
125
KNN models perform better. The LR and KNN model accuracy (100%) and F1score (100%) are excellent among the methodologies available in the literature. The proposed ML technique will help to identify starch concentration in turmeric.
6 Conclusion This research effectively demonstrated an ML-based strategy for classifying starch concentration in turmeric. Python was used to successfully implement three classifiers: LR, KNN, and SVM. The 48 spectra were recorded, preprocessed, and then standardized to improve the results. The LR and KNN have excellent performance with 100% accuracy. After applying the K-fold method, the performance of KNN is good, with 92% average accuracy, precision, 90% recall, and 90% F1 score. The KNNbased approach will aid in correctly identifying starch concentration in turmeric. The authors plan to expand this research to classify more types of starch concentration in turmeric using deep learning techniques [25] and deploy it on the embedded system [10, 26].
References 1. Lanjewar MG, Morajkar PP, Parab J (2022) Detection of tartrazine colored rice flour adulteration in turmeric from multi-spectral images on smartphone using convolutional neural network deployed on PaaS cloud. Multimed Tools Appl 81(12):16537–16562 2. Akbar A, Kuanar A, Patnaik J, Mishra A, Nayak S (2018) Application of artificial neural network modeling for optimization and prediction of essential oil yield in turmeric (Curcuma longa L.). Comput Electron Agric 148:160–178 3. Amani M, Kakooei M, Moghimi A, Ghorbanian A, Ranjgar B, Mahdavi S, Davidson A, Fisette T, Rollin P, Brisco B, Mohammadzadeh A (2020) Application of Google earth engine cloud computing platform, sentinel imagery, and neural networks for crop mapping in Canada. Remote Sens 12(21):3561 4. Ashok V, Agrawal N, Durgbanshi A, Esteve-Romero J, Bose D (2015) A novel micellar chromatographic procedure for the determination of metanil yellow in foodstuffs. Anal Methods 7(21):9324–9330 5. Fuh M (2002) Determination of sulphonated azo dyes in food by ion-pair liquid chromatography with photodiode array and electrospray mass spectrometry detection. Talanta 56(4):663–671 6. Shah R (2017) Identification and estimation of non-permitted food colours (metanil yellow and aniline dyes) in turmeric powder by rapid color test and thin layer chromatography. WJPPS 2034–2045 7. Chen L, Hu J, Zhang W, Zhang J, Guo P, Sun C (2015) Simultaneous determination of nine banned azo dyes in foodstuffs and beverages by high-performance capillary electrophoresis. Food Anal Methods 8(8):1903–1910 8. Tateo F, Bononi M (2004) Fast determination of Sudan I by HPLC/APCI-MS in hot chilli, spices, and oven-baked foods. J Agric Food Chem 52(4):655–658 9. Zhao S, Yin J, Zhang J, Ding X, Wu Y, Shao B (2012) Determination of 23 dyes in chili powder and paste by high-performance liquid chromatography-electrospray ionization tandem mass spectrometry. Food Anal Methods 5(5):1018–1026
126
M. G. Lanjewar et al.
10. Parab J, Sequeira M, Lanjewar M, Pinto C, Naik G (2021) Backpropagation neural networkbased machine learning model for prediction of blood urea and glucose in CKD patients. IEEE J Transl Eng Health Med 9:1–8 11. Çetin N, Karaman K, Beyzi E, Sa˘glam C, Demirel B (2021) Comparative evaluation of some quality characteristics of sunflower oilseeds (Helianthus annuus L.) through machine learning classifiers. Food Anal Methods 14(8):1666–1681 12. de Macêdo IYL, Machado FB, Ramos GS, Costa AGDC, Batista RD, Filho ARG, Asquieri ER, de Souza AR, de Oliveira AE, Gil EDS (2021) Starch adulteration in turmeric samples through multivariate analysis with infrared spectroscopy. Food Chem 340:127899 13. Zhong J, Qin X (2016) Rapid quantitative analysis of corn starch adulteration in Konjac Glucomannan by chemometrics-assisted FT-NIR spectroscopy. Food Anal Methods 9(1):61–67 14. Lohumi S, Lee S, Lee W-H, Kim MS, Mo C, Bae H, Cho B-K (2014) Detection of starch adulteration in onion powder by FT-NIR and FT-IR spectroscopy. J Agric Food Chem 62(38):9246–9251 15. Xu L, Shi W, Cai C-B, Zhong W, Tu K (2015) Rapid and nondestructive detection of multiple adulterants in kudzu starch by near infrared (NIR) spectroscopy and chemometrics. LWT Food Sci Technol 61(2):590–595 16. Kar S, Tudu B, Jana A, Bandyopadhyay R (2019) FT-NIR spectroscopy coupled with multivariate analysis for detection of starch adulteration in turmeric powder. Food Addit Contam Part A 36(6):863–875 17. Ranjan R, Kumar N, Kiranmayee AH, Panchariya PC (2021) Application of handheld NIR spectroscopy for detection of adulteration in turmeric powder. In: 2021 7th international conference on advanced computing and communication systems (ICACCS), pp 1238–1241. https://doi. org/10.1109/ICACCS51430.2021.9441790 18. Thangavel K, Dhivya K (2019) Determination of curcumin, starch and moisture content in turmeric by Fourier transform near infrared spectroscopy (FT-NIR). Eng Agric Environ Food 12(2). https://doi.org/10.1016/j.eaef.2019.02.003 19. Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4(11):218–218 20. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 21. Lanjewar MG, Parate RK, Parab JS (2022) Machine learning approach with data normalization technique for early stage detection of hypothyroidism. In: Artificial intelligence applications for health care 22. Rashidi HH, Sen S, Palmieri TL, Blackmon T, Wajda J, Tran NK (2020) Early recognition of burn- and trauma-related acute kidney injury: a pilot comparison of machine learning techniques. Sci Rep 10(1):205 23. Lanjewar MG, Gurav OL (2022) Convolutional neural networks based classifications of soil images. Multimed Tools Appl 81(7):10313–10336 24. Pramoditha R (2020) k-fold cross-validation explained in plain English. Medium, 20 Dec. https://towardsdatascience.com/k-fold-cross-validation-explained-in-plain-english659e33c0bc0. Accessed 6 July 2022 25. Lanjewar MG, Panchbhai KG (2022) Convolutional neural network-based tea leaf disease prediction system on smart phone using PaaS cloud. Neural Comput Appl. https://doi.org/10. 1007/s00521-022-07743-y 26. Parab J, Sequeira M, Lanjewar M, Pinto C, Naik GM (2022) Blood glucose prediction using machine learning on Jetson nanoplatform. In: Handbook of intelligent computing and optimization for sustainable development, pp 835–848
A Study of Crypto-ransomware Using Detection Techniques for Defense Research Vyom Kulshreshtha , Deepak Motwani , and Pankaj Sharma
Abstract Digital assets all over the world are at risk from ransomware attacks. Recovery of devices from a crypto-ransomware contamination is virtually impossible unless a mistake has been made in the malicious cryptographic execution, as robust encryption is irreversible. The purpose of this study is to demonstrate why developing and deploying a detective solution that is successful and efficient against this specific malware group involves a significant technological undertaking. In the beginning of the study, we learn about the several varieties of ransomware and their sources. After that, a ransomware taxonomy will be offered as well. There is an illustration of the invariant incursion anatomy created by the malware. Additionally, the study explores and evaluates the numerous ransomware-deployable strategies. Malware poses a significant technological challenge in each of the situations described in the story. A security researcher planning to develop a preventive or predictive solution to crypto-ransomware must have a clear understanding of the technological obstacles that will arise before beginning the planned research project in order to be prepared to deal with them. As a conclusion, this article discusses the latest developments in ransomware defense research, as well as research gaps that remain. Keywords Malware · Crypto-ransomware · Ransomware · Crypto-miners · Locker-ransomware
V. Kulshreshtha (B) · D. Motwani Computer Science and Engineering, Amity University Madhya Pradesh, Gwalior, India e-mail: [email protected] D. Motwani e-mail: [email protected] P. Sharma Computer Science and Engineering, Eshan College of Engineering, Mathura, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_11
127
128
V. Kulshreshtha et al.
1 Introduction The frequency of cases involving a certain malware strain, namely ransomware, has increased dramatically in the recent years. Governments and corporations in virtually every industry have fallen victim to this infamous strain of spyware. As an example, there have been a slew of events involving fortune 500 firms, banks [1], cloud providers [2], chip makers [3], cruise line operators [4], government agencies [5], medical institutions [6], schools [7, 8], migration capture system [9], hospitals [10], universities [11] as well as police forces [12]. In 2021, ransomware is expected to cost corporations $20 billion in total losses, with a new organization being targeted every 11 s [13]. Even worse, in 2020, the first ransomware-related death was reported in Germany [14]. In light of the aforementioned occurrences, ransomware has already emerged as the most problematic threat and defensive actor arms race concern. There is a subset of malware called ransomware that encrypts data and prevents users from accessing it unless the attacker is paid a ransom sum. Cryptographic ransomware encrypts the victim’s files, while locker-ransomware prevents them from accessing their systems [15]. These two varieties of ransomware can be distinguished by the methodology used to classify them. Whichever mechanism is employed, both types of ransomware require money to decrypt the files or gain access to the machine. RANSOM WARFARE has been around since 1989, but just recently became one of the most well-known security dangers [16]. As a result of past mistakes and technological advancements, cybercriminals have honed their ransomware attack components (such as more powerful encryption techniques and worm-like capabilities, for example) and have even started to provide ransomware as a service (RaaS) [17]. To combat ransomware and keep its victims safe, scientists and industry have made numerous attempts. However, the number of occurrences is increasing every day. The main reason for this is that ransomware is difficult to eradicate [18]. Since various open-source implementations of encryption are readily available, ransomware often depends on strong encryption as a starting point. To begin with, practically every infection strategy used by recent malware families is utilized. The evasion strategies used by modern malware benefit from this as well (such as code obfuscation, encrypted communications, DGA or fast fluxing to vigorously shift, or construct domain names). Because it exploits APIs given by victims’ platforms to carry out its malicious acts, it is difficult to tell ransomware from other types of software. A pseudo-anonymous, The Onion Routing Networks (TOR), and unregulated payment systems like cryptocurrency are also used to help attackers get paid without identifying their identity easily [19]. It was suggested that several polls be conducted, each focusing on a different area of ransomware analysis. In this work, we focused mainly on detection techniques as well as defense research for ransomware. The next section will show the sources of ransomware.
A Study of Crypto-ransomware Using Detection Techniques …
129
Fig. 1 Operation steps of ransomware attack
2 Source of Ransomware There are numerous ways to get ransomware to a victim’s computer. The assailants use a large net to catch as many people as possible before setting traps for them. The following is a list of the most frequently utilized extortion methods: • When an email contains a link or an executable file that encourages the recipient to open the email, it is considered spam. Malicious micro-attachments, such as Trojans, may be included in these emails. • Attackers can contaminate machines via infected web pages, which can be used to download ransomware onto the victim’s system. Users can download and run ransomware when they access legitimate websites that have had their files corrupted by the hacker. • Attackers attempt to obtain access to an organization’s resources by exploiting known flaws in the digital infrastructure. These can get to the target host in the internal network by exploiting flaws in their operating systems or browsers. Any unprotected opening is ripe for an attack at some point. • Additionally, portable disks, phishing, malicious advertising and SMS can all infect your computer without you even knowing it. Social engineering is the term used to describe this type of attack. Attackers are always looking for new ways to entice the victim to do what they want. Figure 1 describes the common actions that are considered during ransomware attacks.
3 Family of Ransomware and Its Types In terms of type, there are two main ransomware families: Locky and Crypto. One prevents the user from using the gadget, while the other encrypts personal data and demands a ransom. Both methods are equally harmful to users.
130
V. Kulshreshtha et al.
3.1 Locky Ransomware The locky ransomware encrypts files on the infected computer and locks the user out. It spreads to the user’s computer, gains administrative rights, and prevents the user from logging in. Rather than encrypting the files, a non-encrypting technique makes them unreadable. The attacker displays an uncloseable popup window, preventing the user from gaining access to the machine. The system can be restarted in safe mode because no encryption is being used.
3.2 Crypto-ransomware This type of ransomware encrypts all of the data on a computer and holds it hostage until a ransom is paid. Some or all of the system files may be encrypted, depending on their importance. Alternatively, the ransomware may encode only a portion of the file, making it inaccessible. Although the attackers claim to be able to safely decrypt all of the files after payment, there is no way to know for sure whether or not data has been lost. We have covered a few of the more active ransomware families during the last three years below. There will be a focus on unique ransomware family behavioral characteristics in this talk. The goal is to aid in the development of detection methods that are up-to-date with today’s cyber-attacks. Ransomware encryption is used in the majority of current attacks. WannaCry is the most often seen crypto-ransomware, and according to Kao and Hsiao [20], it was the root of most of 2019s infections. However, in the year 2020, WannaCry is still alive and attempting to take advantage of flaws in the Windows system. Once the system has been infected, it uses the server message box vulnerability to spread to other PCs on the same network as well as to unrelated networks. Even though a fix for this vulnerability has been made available, many machines are still running with WannaCry active. WannaCry will accept bitcoins as payment. Using phishing emails, malicious advertising, and remote admin access, GrandCrab ransomware penetrates systems. This malware is available on RAAS and is being promoted by a number of affiliates while the authors continue to work on it. GrandCrab asks for ransom in the form of digital currency, or DASH, and also offers customer service to help the victims pay. To ensure the retrieval, the attackers let the victim decode one file of their choice. After the payment has been made, the victim will be provided with a decryptor tool to help them recover their funds. Some of the ransomware’s variants can be decoded using free and open-source decryption tools. Also in 2020, there was STOP ransomware, which uses a blend of Rivest, Shamir, and Adelman Algorithms and Advanced Encryption Standard (AES) to encrypt user files (AES). The document extension was initially modified to halt, but additional extensions are now used as well. Emails, hacked websites, and brute force attacks are the most common vectors for infection. The assailant provides a contact email
A Study of Crypto-ransomware Using Detection Techniques …
131
Fig. 2 Ransomware attack phases
address where the files can be retrieved. This ransomware has some decryption tools accessible. With its polymorphic nature and shape-shifting ability, VirLock is a ransomware that has changed the way ransomware assaults are conducted. By using shared inboxes and online storage, the VirLock can advance across networks. Each encrypted file is turned into a virus that may be used to infect more computers. Users’ data becomes encrypted and polymorphic once an infected file is opened by them. VirLock can also change its shape to prevent detection, making it the first of its kind. Using phishing emails and remote access protocols, Dharma ransomware infects computers and holds them hostage until the ransom is paid. With its exponentially increasing rewards and a lengthier attack period than other ransomwares, this ransomware is covered here. Advanced Encryption Standard (AES) is used to encrypt the file, and each time, the encrypted data is saved, and a different file extension is used. It removes all copies of the data, including backups and snapshots, to make a restoration impossible. Figure 2 depicts the overall progression of a ransomware assault. The ransomware attack phases to be brief as follows: • Infected computer, mobile device, IoT/CPS device, or other device that can be infected with ransomware. The delivery of ransomware is accomplished through the use of many infection channels by malicious actors. • For exchanging important information like encryption keys or target system information, ransomware communicates with the attacker via the C&C server after it has infected the system. Despite the fact that many ransomware families connect with C&C servers, there are a few that don’t. • At the end of this phase, the victim’s files and system are destroyed since ransomware has done its evil work, such as encrypting and locking them. • Extortion: Finally, the ransomware displays a ransom notice to inform the victim of the attack. Details about the attack and how to pay are revealed in the ransom note. Variations of the sequence of the encryption attack cycle phase among the selected ransomware families are presented in Table 1.
132 Table 1 Order of encryption phase
V. Kulshreshtha et al. Ransomware family
Encryption operation sequence
Locky
2nd phase
WannaCry
3rd phase
CryptoWall
4th phase
TeslaCrypt
5th phase
Cerber
5th phase
4 Ransomware Detection Techniques and Types Analysis Despite understanding how ransomware works can be utilized to detect it, thanks to the insights gained via malware analysis. Static and dynamic explorations are the two sorts of analysis methodologies, and each has pros and cons. When performing a static analysis, the code generated during the compilation process is examined, rather than running the virus itself to see the results. Source code is inspected to find elements that are similar to both ransomware and non-malicious programs while also being distinct from each other. By using this approach, we can learn more about ransomware without risking a system or environment. Unexpected repercussions can arise from the ransomware code being executed, which is why it is avoided in this scenario. It is possible that the ransomware code will go undetected if it is encrypted or mixed up with other codes. As a result, code obfuscation is a threat to this method. Running the ransomware in a sandbox or safe environment to observe its behavior is called dynamic analysis. The ransomware’s attributes are recorded and examined while it is running. This approach has the advantage of being able to cut through code obfuscation. Dynamic analysis takes a long time because the ransomware’s execution environment must exactly match the genuine one. Environment mapping is a technique used by ransomware to locate a virtual environment. The cost of setting up such a work environment is high. Static and dynamic analyses are combined in hybrid methodologies to get the best of both worlds while also overcoming the constraints of each. The detection method makes advantage of information gleaned from the examination of ransomware’s activities. Berrueta et al. [21] classify detection algorithm input parameters as local static, local dynamic, or network-based (or a combination of these). Detection strategies for ransomware are discussed in literature in this area. The examined works are divided into static, dynamic, and hybrid approaches based on ransomware analysis techniques. Using Term Frequency-Inverse document frequency (TF-IDF), Zhang et al. [22] propose a static analysis method where N-grams of opcodes are assessed to obtain feature vectors that serve as the train and test set for multi-class categorization of ransomware families and categorization of ransomware and benign samples. There is a 99.8% recall rate for binary classification using the random forest algorithm.
A Study of Crypto-ransomware Using Detection Techniques …
133
Although it can distinguish between CryptoWall, Cryrar, and Reveton, the multi-class classifier is unable to do so. The authors Poudyal et al. [23] suggested a machine learning-based ransomware detection method for static analysis of the virus’s DLL and assembly level properties. Reverse engineering ransomware binaries are used to collect code at various stages. Eight machine learning models are trained using the results of the static analysis. The average accuracy is 92.11%, with the best accuracy of 97.95% being attained by random forest. Future studies will benefit from a larger sample size, according to the researchers. The Lee et al. [24] study suggests a real-time detection method for ransomwareinfected backup data. Entropy analysis uses machine learning to classify contaminated files. Classifying infected files makes use of a variety of techniques, including logistic regression, k-nearest neighbor, random forest, decision trees, support vector machines (SVMs), and multilayer perceptrons (MLPs). This study’s findings could be applied to user-based backup solutions as well as a wider range of file types. Using bigrams and recursive feature addition (RFA) to encode payload strings, Hamed et al. [25] offer a feature selection-based network intrusion detection approach that makes use of SVM for classification. We might investigate the use of tri-grams and ensemble classifiers in this study. An NLP and machine learning-based system for detecting ransomware has been developed by Poudyal et al. [26]. In an Apache Spark big data context, the ransomware activity is examined at three different levels: DLL, assembly, and function call. More ransomware binaries may have been tested because Apache is used. Code randomization, garbage code insertion, and polymorphic codes can all be used to elude detection; hence, they should be taken into account. Convolutional neural networks and recurrent neural networks can be trained with a static analysis framework that uses N-gram opcodes, according to Zhang et al. [27]. The classification yields an average detection rate of 87.5%, and the suggested framework for the Petya family achieves precision, recall, and F1-score of 100%. DNAact-Ran is a digital DNA sequencing engine for machine learning-based ransomware detection proposed by Khan et al. [28]. The ransomware dataset is preprocessed to classify ransomware sample DNA sequences using this method. DNA sequences are synthesized and used as training data for the classifier using MOGWO and BCS for feature selection. The detection rate of the active learning classifier is 87%. Table 2 presents the literature review of various existing ransomware handling methods. It discusses the model architecture along with advantages and limitations of each method. Preda et al. [29] have proposed a semantic-based method to handle ransomware. The results show that the method performs well against code obfuscation but it suffers from a high positive-false rate. Deep Ransomware Threat Hunting and Intelligence System classifies 2 ransomware families [36]. For classification, DRTHIS uses two deep learning approaches based on the softmax algorithm: long short-term memory (LSTM) and
Pingaji is an analysis tool written in Python PE (Portable Executable) Python contains modules as API calls module extractor, anti-virtual machine detector, anti-debugger
EldeRan analyzes a set of actions applied in their first phase of installation to check for crypto-ransomware, without needing a complete family of crypto-ransomware to be accessible in advance
R-Locker places honey files to catch the ransomware and thwart its actions
Tracks the energy consumption pattern of distinct procedures to classify crypto-ransomware from non-malicious
Each layer tags the process of the Portable Executable file for behavior such as read, write, rename, and delete operation
Zabidi et al. [30]
Sgendurra et al. [31]
Gomez-Hernandez et al. [32]
Azmoodeh et al. [33]
Shaukat and Ribeiro [34]
Methods
Preda et al. [29]
Advantages
Outperform other models such as k-nearest neighbors, neural network, support vector machine, and random forest
Effective against such variants of ransomware that were not found previously
Able to classify common variant of ransomware family
Handles zero-day malware
Holds completeness Effective on malware obfuscation
Interactive Disassembler (IDA) tool Able to detect common crypto-ransomware with high detection rate
Sample grinding algorithm
Honey files and FIFO
Regularized logistic regression classifier, mutual information criterion
Trace semantics
Model architecture
Semantic-based framework to trace the behavior of malware
Authors
Table 2 Various existing models for handling ransomware Limitation
(continued)
Resource intensive, whereby the file entropy needs to be calculated for every single write operation This operation will also deteriorate the disk read and write performance Vulnerable toward intelligently written crypto-ransomware code
Having significant false positive due to certain characteristics and weak against partially encrypted files
Poor distribution of trap at the entry of ransomware may reduce the effectiveness
The model does not properly extract the features of those samples that are silent for some time
Too many API attributes were considered that make the analysis slow It is not able to proceed for analysis if the binary was compiled or written in non-standard PE compliance binary
Malware was not characterized by its action High false positive rate
134 V. Kulshreshtha et al.
Model architecture
R-Killer has developed Proactive Monitoring System (PMS) to monitor the processes downloaded from Email attachments
Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) approach is used to distinguish ransomware from goodware
Performs static analysis based upon N-gram
MDMC converts malware binary into Markov images. Deep neural network is used for classification of Markov images
VisDroid classifies local features such as Scale-Invariant Feature Transform, Speeded Up Robust Features, Oriented FAST and Rotated BRIEF (ORB) and KAZE as well as global features such as Color Histogram, Hu Moments, and Haralick Texture
Authors
Lokuketagoda et al. [35]
Homayoun et al. [36]
Zhang et al. [37]
Yuan et al. [38]
Bakour and Unver [39]
Table 2 (continued) Methods
Residual neural network, Inception-v3
Deep convolutional neural network, Markov images, and VGG16
Feature N-grams, TF-IDF
Long short-term memory, convolutional neural network, and softmax algorithm
LSTM, RNN
Advantages
Limitation
DRTHIS is not capable for classifying some new threat such as TorrentLocker attack
Less execution time Accuracy of classification is high
The DCNN having fewer fully connected layers and lower output dimensions than VGG16. It leads to less time and space requirement during training it doesn’t need pre-trained models as required in the transfer learning-based methods
(continued)
Suffered by code obfuscation
Ineffective against code obfuscation, high variant output, and targeted attacks
Works effectively on ransomware that Static-based analysis is platform can fingerprint the environment dependent Three families, namely CryptoWall, locky, and Reveton, cannot be distinguished well according to accuracy for binary classification
DRTHIS is capable of detecting previously unseen ransomware data from new ransomware families in a timely and accurate manner
Capable of gathering threat Separation of URL from e-mail body knowledge without compromising the is not automated confidentiality of user data Supports any IMAP email server
A Study of Crypto-ransomware Using Detection Techniques … 135
Model architecture
Provides transfiguration of data to an image without compromising the integrity. Used horizontal feature simplification (HFS) to monitor hexadecimal coding pattern of programming languages
DeepRan models look for abnormality of hosts in an operational enterprise system from host logging data. It uses attention-based BiLSTM with a fully connected layer. TF-TDF is used to extract information from logging data. The method uses the CRF model to classify ransomware
Focuses on entropy of the file and its structure before and after the manipulation
Authors
Kakavand et al. [40]
Roy and Chen [41]
Faghihi and Zulkernine [42]
Table 2 (continued) Methods
Malicious score calculator, ADL, Java API
Attention-based BiLSTM, TF-IDF, CRF
HFS, DJB2 algorithm
Advantages
Limitation Accuracy in crypto-ransomware detection is only 63%
Ransomware provides early detection of ransomware on smartphones Offers security with zero data loss Effective against zero-day ransomware
There is no incremental approach offered to detect new malware
Incremental learning technique Did not analyze early detection time strengthens the model over a time for real-world large-scale enterprise period High ransomware detection accuracy of 99.87% High classification accuracy of 96.5%
Effective on ransomware such as polymorphic, environment mapping, partially encrypting files and saturating the system with low frequency file write operation Independent from platform
136 V. Kulshreshtha et al.
A Study of Crypto-ransomware Using Detection Techniques …
137
CNN. The model can classify CryptoWall, TorrentLocker, and Sage families that have never been seen before. Shallow and deep learning networks for ransomware recognition and classification are evaluated by Vinayakumar et al. [43]. A sandbox is utilized to execute the ransomware so that it can collect API sequences for binary and multi-class categorization of ransomware families. Multilayer perceptron has been used in this study to attain a flawless detection rate. Multi-class classification was used to classify eight different ransomware families, with the exception of the Cryptowall and Cryptolocker families, which had a high true positive rate. To identify ransomware, Cusack et al. [44] use a configurable forwarding engine (PFEs). PFEs enable high-throughput per-packet network monitoring. High-level flow features are extracted from infected computers, and the C&C server utilizes this data to categorize ransomware using random forest on the network traffic between them. Classifying crypto-based Cerber ransomware has been made easier thanks to a new classifier that demonstrates an impressive CV score average of 0.905. According to Rhode et al. [45], a malicious executable can be discovered within 5 s of being executed using a dynamic detection technique. In order to do this, recurrent neural networks and a 5 s snapshot of malware behavior are both used together. The model has a detection time of 5 s and a detection accuracy of 94% for undetected malware executed within 1 s. To construct a portable anti-malware tool employing non-windows executable samples, a sliding window technique has been used, and a 5 s snapshot of activity is employed to detect dangerous software. Netconverse, an assessment approach for machine learning (ML) classifiers for ransomware detection, is illustrated by Alhawi et al. [46]. As part of a ransomware attack training, the network traffic conversation data is evaluated for a high accuracy rate. The model compares the performance of six ML classifiers, including Bayes Network, multilayer perceptron, random forest, J48, k-nearest neighbors, and logistic model tree. A 97.1% detection rate sets J48 apart from the rest of the classifiers. Data preparation is an important part of Netconverse’s model since it eliminates and duplicates irrelevant attributes while also improving the sample. The processing time was drastically shaved because of the feature reduction. By using a cloud-based architecture, the model may be rendered in real time. System API elements such as API packages, classes, and methods can be used to detect ransomware, according to Scalas et al. [47]. Malware, ransomware, and innocuous code are all classified using machine learning classifiers. An app called R-Packdroid, which is accessible on Google Play, was also produced by the creators using a similar approach. The report states that only features based on the system API are capable of detecting malicious software; no other features are included. Additionally, porting and implementing the program onto a mobile device is simple and results in good performance. API-based techniques have drawbacks in that system APIs can be replaced with user-implemented programs that are nearly identical, and finer adjustments to the features can go undetected. The authors hope to develop a modest number of difficult to manipulate features in the future.
138
V. Kulshreshtha et al.
Detecting malware should be dependent on how it interacts with system resources and error messages, according to Stiborek et al. [48]. Each instance of the training set is a collection of OS resource names and types. The training set is made up of instances like this. It reduces the impact of randomness by grouping resource names. If the malware doesn’t send out error signals, this strategy won’t work. According to Agrawal et al. [49], the LAMP model is a dynamic analytic technique that obtains emulation sequences and feeds them into an LSTM with attention mechanism to detect ransomware. A pattern of tiny API sequences being repeated repeatedly has been noted by the authors, and this pattern has been captured using attention mechanisms. Chen et al. [50] provide a technique to extract a sequence of events caused by ransomware families and determine the most distinctive aspects using TF-IDF, Fisher’s LDA, and extremely randomized trees (ET). Static analysis misses out on distinctive patterns and behaviors that the model captures. A cloud-based machine learning library can be used to expand the model. The two-stage mixed ransomware detection approach proposed by Jinsoo et al. [51] includes a Markov chain detector at the beginning and a random forest classifier at the end. For ransomware detection, this study compares the performance of random forest, logistic regression, and Markov chain. The second layer of the model employs random forest, which has a detection rate of 97.3%. The deep neural network (DNN) with batch normalization model proposed by Al-Hawawreh and Sitnikova [52] is used to categorize ransomware. Convolutional autoencoders (CAE) and variational autoencoders are used in hybrid feature engineering to produce the feature vector fed into the classifier (VAE). The model had an accuracy of 92.53% while making predictions using the hybrid feature designed feature set. According to Arabo et al. [53], the ransomware API properties may be obtained through a dynamic analysis technique that is then utilized to test out a neural network as well as nine ML classifiers. For the purpose of ransomware detection, this research seeks to better comprehend the connection between a process’s behavior and its nature. With a detection rate of 75.01%, random forest surpasses all other classifiers. As a result of not requiring a signature database, the authors say this technique has the advantage of being applicable to ransomware and non-ransomware material alike. Classifier detection rates can be improved by expanding the dataset. Static and dynamic analytical features are used by Egunjobi et al. [54] to detect ransomware, and four machine learning classifiers are used to assess their approach. The detection rate of 99.5% is achieved with random forest and SVM. The study’s other goal was to reduce the number of false positive results, and it succeeded, with an FPR of just 1%.
A Study of Crypto-ransomware Using Detection Techniques …
139
4.1 Ransomware Defense Research This section provides a comprehensive summary of the latest developments in ransomware defense technology. We first present an introduction of various ransomware analysis methods, and then classify and explain ransomware finding schemes in this survey.
4.2 Ransomware Analysis Research Understanding the behavior and/or characteristics of ransomware is part of ransomware analysis. Static and dynamic ransomware analysis approaches are similar to those used in classical malware analysis. By extracting structural information from a sample without running it, static analysis attempts to determine whether or not it is ransomware. Researchers deconstruct sample binaries and extract information about the content of the sample to study it without running it and still acquire important information. Because no sample is run, static analysis is both quick and safe. Malware authors, on the other hand, use anti-disassembler and concealment (i.e., obfuscation, polymorphism, encryption) techniques to thwart static analysis efforts and avoid defense schemes that rely on structural information gleaned through static analysis. It is possible to regulate whether a sample of ransomware is present by running it and watching the behavior it exhibits. To avoid damage from the studied sample, dynamic analysis is done by running the samples in a sandbox environment. The sandbox environment hooking techniques and features can be used by researchers to keep tabs on the sample’s activity. This method is more time consuming and resource intensive than static analysis because it necessitates an isolated environment and the actual deployment of ransomware. Static analysis techniques such as concealment and anti-disassembly are ineffective against dynamic analysis since they cannot hide the ransomware’s activities. Anti-debugging measures, sandbox fingerprinting techniques, and logic bomb schemes (such as starting harmful functionality when a specific time or event occurs) are all used by ransomware developers to thwart dynamic analysis efforts. Researchers utilize a grouping of static and dynamic analysis in hybrid analysis since each technique has merits and downsides.
4.3 Ransomware Detection Research This section categorizes and summarizes existing ransomware detection methods according to the target platforms they target. There are eight classifications of detecting systems used in this study:
140
V. Kulshreshtha et al.
• Blacklist-based: A list of malicious domain names or IP addresses used by ransomware families is used by the system to detect ransomware. • Rule-based: The system uses analysis characteristics to build rules to detect ransomware. A rule may use malware detection engine compatibility rules, maliciousness scores, or threshold values (e.g., YARA). • Statistics-based: Runtime data on sample characteristics reveals that the sample contains ransomware, and so the system alerts the user. • Formal Methods-based: A formal model used to identify ransomware can distinguish between harmful and benign patterns. • Information Theory-based: The technology uses information theoretic approaches to detect ransomware (e.g., entropy). Files that have been encrypted by cryptographic ransomware strains have had their data altered. As a result, numerous researchers believe that big variations in entropy are signs of ransomware activity. • Machine Learning-based: The system detects ransomware by utilizing machine learning (ML) models that have been constructed using a variety of analysis tools. Systems that identify ransomware using machine learning rely on either structural or behavioral data. An examination of ransomware binaries’ structural properties yields these structural features to the researchers. Detection systems can discover patterns in ransomware binary structures by training ML classifiers with structural information. Dynamic analysis of ransomware binaries, on the other hand, yields behavioral characteristics. Detection systems can identify patterns in ransomware binaries’ behavior by exploiting behavioral features in the ML classifier training process. • Hybrid: A collection of detection mechanisms helps the system catch ransomware.
5 Open Issues The unresolved issues in ransomware research must be highlighted in light of the evolution and classification of malware. The Constant Evolution of Ransomware: Since the first ransomware appeared in 1989, the threat landscape has changed dramatically. It has shifted its focus to new platforms and users, as well as infection strategies, encryption techniques, communication systems, and destruction tactics. With todays technology, it is possible for ransomware to infect a wide range of different systems via a large number of different attack vectors. It can also make use of dynamically generated domains, as well as the Tor network and Bitcoin to send encrypted messages. But the story does not finish here. There is an arm race between ransomware and anti-virus software that never ends. Future ransomware research will be able to address the specific and current malicious strategies used by developing ransomware families, as outlined in this article.
A Study of Crypto-ransomware Using Detection Techniques …
141
Human-Operated Ransomware Attacks: Human-operated ransomware operations against businesses have begun by trained cybercriminals, in contrast to autospreading ransomware like WannaCry or NotPetya. A human operator performs these steps of ransomware in such operations, unlike regular ransomware which uses automated infection and malevolent behaviors. As a result, defenders must engage in real-time warfare with attackers rather than rely on anti-ransomware binaries that operate in the background. Other malicious payloads are used in these attacks, as well as data theft and ransomware distribution by human operators [55]. Using human-operated ransomware could expand the scope of current ransomware defense studies. Rootkit Fashion. When it comes to ransomware, certain families (like Thanos [56]) have begun using rootkit tactics to hide their tracks. To prevent detection, such ransomware can try to disguise itself on the target platform or delay the execution of its code until a later date. Such conduct may have a negative impact on the current systems’ ability to detect objects accurately [57]. Ransomware Living of the Land. For a short time now, families like Netwalker [58] have been encrypting files and encrypting data using legitimate software (like Powershell). Ransomware living of the land (also known as fileless ransomware) is a type of malware assault. Because such ransomware uses the target platform’s innocuous tools to carry out their nefarious acts, they leave no traces on the system, making its detection extremely difficult. Changing Encryption Tradition. Once a system has been attacked, the goal of most ransomware strains has been to encrypt as many files as possible. This activity creates a distinct I/O pattern at the low level, which aids in the identification of ransomware as opposed to benign apps. Cybercriminals, on the other hand, can alter their encryption methods so that they no longer encrypt the victim’s files forcefully and limit the procedure in order to remain invisible. A more pressing issue may be the effectiveness of current defense mechanisms in dealing with ransomware developers’ evasive tactics. More Exfiltration Attacks: The basic destructive strategy of ransomware was to encrypt or lock the system until the demanded ransom sum was paid and then releases the victim’s data. As a result, the vast majority of defense strategies were created to counter such heinous attacks. However, ransomware gangs have just recently begun stealing victim data in order to put them on the public record [13]. Because stolen data may include sensitive information about the user or the firm, making it public could have a negative impact on the company or the victim. Leveraging Internal Threats: Traditional malware infection tactics like exploit kits, drive-by downloads, brute force efforts, and spam emails have been used to spread ransomware till now. Many of the typical approaches for infecting well-protected systems in major corporations may be ineffectual. These safeguards are being circumvented by cybercriminals who bribe company personnel to install ransomware as
142
V. Kulshreshtha et al.
a means of evading detection. Tesla has lately been linked to one of these incidents [59]. As a result, internal dangers must be considered, as they can facilitate ransomware infiltration. As a result, a potential insider attacker may try to deactivate the company’s existing defenses or infect the network with ransomware. Delayed Upgrades or Critical Software Patches: While spam e-mails are the most typical way that ransomware infects its victims, it can also take advantage of flaws in the operating system or other software. Although security-related software patches and upgrades try to fix these flaws, they should not be delayed in order to avoid infection. Although the SamSam and WannaCry ransomware strains have made headlines in the past, administrators have failed to immediately implement upgrades or vital software patches in such situations as well. Security (Un)aware End-Users: The success of ransomware can also be attributed to the end-users. There is some dispute over whether or not we should expect endusers to be security aware [60], but we believe that end-user security awareness can play a critical role in strengthening current protection solutions. End-user security training on ransomware infection vectors is critical. Adversarial Machine Learning Attacks: The vast majority of defense solutions make use of machine learning, as we saw in the section above. However, recent research has shown that ML-based classifiers are vulnerable to assaults that modify either the training data or test data to escape detection, even while the use of ML techniques boosts accuracy and enables the detection of never-before-seen ransomware variants [61]. They are known as adversarial ML attacks and have been used in a variety of sectors, including malware, as well as computer vision. In the malware area, adversarial ML attacks mostly target structural feature-based ML classifiers. Adversarial ML attacks can target the structural feature classifiers used in ransomware detection on PCs/workstations and mobile devices. There has been research on ransomware attacks and defense for general malware, but it is a matter of research whether such attacks can be directly used or whether proposed security measures can be employed for ransomware. In order to solve the open issues, the researchers need to develop effective techniques based on eight categories, which is described in ransomware detection research. From those techniques, ML and hybrid techniques may yield better performance than other techniques such as statistics-based, rule-based, information theory-based, and formal-based techniques.
6 Conclusion Nowadays, digital world network security has become more important to organizations, government offices. To understand and safeguard against digital assaults, it is important to comprehend the classes of assault. Ransomware attacks are very popular nowadays and attackers implemented new tactics for successful working of
A Study of Crypto-ransomware Using Detection Techniques …
143
attacks. Also, ransomware attacks become the most powerful threat to individuals and organizations as they stop the working of systems by attacking and encrypting files or systems. So, it is very important for individuals and organizations to stop the attack, and detection of such types of attacks is a very important step in the countermeasure of ransomware attack to protect the systems. In this paper, we focus on ransomware network attacks and survey of detection techniques for ransomware attacks. There are various detection techniques or approaches available for detection of ransomware attacks. As a result of the survey, we must say that detection of ransomware attacks is a very crucial task and necessity of network security for users as well as organizations.
References 1. Symantec Threat Hunter Team (2020) WastedLocker: symantec identifies wave of attacks against U.S. organizations. https://symantec-enterprise-blogs.security.com/blogs/threat-intell igence/wastedlocker-ransomware-us. Last accessed 13 Oct 2020 2. Cimpanu C (2020) Chilean bank shuts down all branches following ransomware attack. https://www.zdnet.com/article/chilean-bank-shuts-down-all-branches-following-ransom ware-attack/. Last accessed 13 Oct 2020 3. Cimpanu C (2020) Cloud provider stopped ransomware attack but had to pay ransom demand anyway. https://www.zdnet.com/article/cloud-provider-stopped-ransomware-attackbut-had-to-pay-ransom-demandanyway/. Last accessed 13 Oct 2020 4. CIS Security (2020) Fall 2019 threat of the quarter: Ryuk ransomware. https://www.cisecu rity.org/white-papers/fall-2019-threat-of-the-quarter-ryuk-ransomware/. Last accessed 13 Oct 2020 5. Reuters Staff (2020) Carnival hit by ransomware attack. https://www.reuters.com/article/ us-carnival-cyber/carnivalhit-by-ransomware-attack-guest-and-employee-data-accessed-idU SKCN25D2GR. Last accessed 13 Oct 2020 6. O’Ryan J (2020) ConnectWise partners hit by ransomware via automate flaw. https://www. crn.com/news/channelprograms/connectwise-partners-hit-by-ransomware-via-automate-flaw. Last accessed 13 Oct 2020 7. WIRED (2018) Atlanta spent 2.6 ransomware scare. https://www.wired.com/story/atlantaspent-26m-recover-from-ransomware-scare/ 8. Abrams L (2020) SunCrypt ransomware shuts down North Carolina school district. https:// www.bleepingcomputer.com/news/security/suncrypt-ransomware-shuts-down-north-car olina-schooldistrict/. Last accessed 13 Oct 2020 9. Abrams L (2020) Netwalker ransomware hits Argentinian government, demands $4 million. https://www.bleepingcomputer.com/news/security/netwalker-ransomware-hits-argent inian-governmentdemands-4-million/. Last accessed 13 Oct 2020 10. Collier K (2020) Major hospital system hit with cyberattack. https://www.nbcnews.com/tech/ security/cyberattackhits-major-u-s-hospital-system-n1241254. Last accessed 13 Oct 2020 11. BBC News. Northumbria University hit by cyber attack. https://www.bbc.com/news/uk-eng land-tyne-53989404. Last accessed 13 Oct 2020 12. Fraga B (2020) Swansea police pay $750 “ransom” after computer virus strikes. Last accessed 13 Oct 2020 13. Freedman L (2020) Ransomware attacks predicted to occur every 11 seconds in 2021 with a cost of $20 billion. https://www.dataprivacyandsecurityinsider.com/2020/02/ransomware-attackspredicted-to-occur-every-11-seconds-in-2021-with-a-cost-of-20-billion/. Last accessed 13 Oct 2020
144
V. Kulshreshtha et al.
14. Security Magazine (2020) First ransomware-related death reported in Germany. https://www. securitymagazine.com/articles/93409-first-ransomware-related-death-reported-in-germany. Last accessed 13 Oct 2020 15. Savage K, Coogan P, Lau H (2015) The evolution of ransomware. https://its.fsu.edu/sites/ g/files/imported/storage/images/information-security-and-privacy-office/the-evolution-of-ran somware.pdf 16. Kharraz A, Robertson W, Balzarotti D, Bilge L, Kirda E (2015) Cutting the Gordian Knot: a look under the hood of ransomware attacks. In: Detection of intrusions and malware, and vulnerability assessment. LNCS. Springer, vol 9148, pp 3–24 17. Segun I, Ujioghosa BI, Ojewande SO, Sweetwilliams FO, John SN, Atayero AA (2017) Ransomware: current trend, challenges, and research directions. In: Proceedings of the world congress on engineering and computer science. San Fransisco, USA 18. Kharaz A, Arshad S, Mulliner C, Robertson W, Kirda E (2016) UNVEIL: a large-scale, automated approach to detecting ransomware. In: 25th USENIX security symposium (USENIX security 16). Austin, TX, pp 757–772 19. Huang DY, Aliapoulios MM, Li VG, Invernizzi L, Bursztein E, McRoberts K, Levin J, Levchenko K, Snoeren AC, McCoy D (2018) Tracking Ransomware end-to-end. In: 2018 IEEE symposium on security and privacy. California, USA, pp 618–631 20. Kao D, Hsiao S (2018) The dynamic analysis of Wannacry ransomware. In: 20th international conference on advanced communication technology (ICACT). Chuncheon, South Korea, pp 159–166 21. Berrueta E, Morato D, Magana E, Izal M (2019) A survey on detection techniques for cryptographic ransomware. IEEE Access 7:144925–144944 22. Zhang H, Xiao X, Mercaldo F, Ni S, Martinelli F, Sangaiah AK (2019) Classification of ransomware families with machine learning based on n-gram of opcodes. Futur Gener Comput Syst 90:211–221 23. Poudyal S, Subedi KP, Dasgupta D (2018) A framework for analyzing ransomware using machine learning. In: IEEE symposium series on computational intelligence (SSCI), pp 1692– 1699 24. Lee K, Lee S, Yim K (2019) Machine learning based file entropy analysis for ransomware detection in backup systems. IEEE Access 7:110205–110215 25. Hamed T, Dara R, Kremer SC (2018) Network intrusion detection system based on recursive feature addition and bigram technique. Comput Secur 73:137–155 26. Poudyal S, Dasgupta D, Akhtar Z, Gupta KD (2019) A multi-level ransomware detection framework using natural language processing and machine learning 10 27. Zhang B, Xiao W, Xiao X, Sangaiah AK, Zhang W, Zhang J (2020) Ransomware classification using patch-based CNN and self-attention network on embedded n-grams of opcodes. Futur Gener Comput Syst 110:708–720 28. Khan F, Ncube C, Ramasamy LK, Kadry S, Nam Y (2020) A digital DNA sequencing engine for ransomware detection using machine learning, IEEE Access 8:119710–119719 29. Preda MD, Christodorescu M, Jha S, Debray S (2008) A semantics-based approach to malware detection. ACM Trans Program Lang Syst 30(5) 30. Zabidi MNA, Maarof MA, Zainal A (2012) Malware analysis with multiple features. In: Proceedings—14th international conference on modelling and simulation, UKSim. IEEE, Cambridge, United Kingdom, pp 231–235 31. Sgandurra D, Muñoz-González L, Mohsen R, Lupu EC (2016) Automated dynamic analysis of ransomware: benefits, limitations and use for detection. J Ambient Intell Human Comput 9:1141–1152 32. Gómez-Hernández JA, Álvarez-González L, García-Teodoro P (2018) R-Locker: Thwarting ransomware action through a honeyfile-based approach. Comput Secur 73:389–398 33. Azmoodeh A, Dehghantanha A, Conti M, Choo KKR (2018) Detecting crypto-ransomware in IoT networks based on energy consumption footprint. J Ambient Intell Humaniz Comput 9(4):1141–1152
A Study of Crypto-ransomware Using Detection Techniques …
145
34. Shaukat SK, Ribeiro VJ (2018) RansomWall: a layered defense system against cryptographic ransomware attacks using machine learning. In: 10th international conference on communication systems and networks. Bengaluru, India, pp 356–363 35. Lokuketagoda B, Weerakoon MP, Kuruppu UM, Senarathne AN, Yapa Abeywardena KR (2018) Killer: an email based ransomware protection tool. In: 13th international conference on computer science and education. ICCSE 2018. Kolombo, Sri Lanka, pp 735–741 36. Homayoun S, Dehghantanha A, Ahmadzadeh M, Hashemi S, Khayami R, Choo KKR, Newton DE (2019) Drthis: deep ransomware threat hunting and intelligence system at the fog layer. Futur Gener Comput Syst 90:94–104 37. Zhang H et al (2019) Classification of ransomware families with machine learning based on N-gram of opcodes. Futur Gener Comput Syst 90:211–221 38. Yuan B, Wang J, Liu D, Guo W, Wu P, Bao X (2020) Byte-level malware classification based on markov images and deep learning. Comput Secur 92 39. Bakour K, Ünver HM (2021) VisDroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques. Neural Comput Appl 33(8):3133–3153 40. Kakavand M, Arulsamy L, Mustapha A, Dabbagh M (2021) A novel crypto-ransomware family classification based on horizontal feature simplification. Adv Intell Syst Comput 1158:3–14 41. Roy KC, Chen Q (2021) DeepRan: attention-based BiLSTM and CRF for Ransomware early detection and classification. Inf Syst Front 23(2):299–315 42. Faghihi F, Zulkernine M, RansomCare: data-centric detection and mitigation against smartphone crypto-ransomware. Comput Netw 191 43. Vinayakumar R, Soman KP, Senthil Velan KK, Ganorkar S (2017) Evaluating shallow and deep networks for ransomware detection and classification. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 259–265, IEEE, Mangalore, India (2017). 44. Cusack G, Michel O, Keller E (2018) Machine learning based detection of ransomware using SDN. In: Proceedings of the 2018 ACM international workshop on security in software defined networks and network function virtualization, Ser. SDN-NFV Sec’18. Association for Computing Machinery, New York, NY, USA, pp 1–6 45. Rhode M, Burnap P, Jones K (2018) Early-stage malware prediction using recurrent neural networks. Comput Secur 77:578–594 46. Alhawi OMK, Baldwin J, Dehghantanha A (2018) Leveraging machine learning techniques for windows Ransomware network traffic detection. Cyber threat intelligence, Springer International Publishing, Cham, pp 93–106 47. Scalas M, Maiorca D, Mercaldo F, Visaggio CA, Martinelli F, Giacinto G (2019) On the effectiveness of system API-related information for android ransomware detection. Comput Secur 86:168–182 48. Stiborek J, Pevny T, Rehák M (2018) Multiple instance learning for malware classification. Exp Syst Appl 93:346–357 49. Agrawal R, Stokes JW, Selvaraj K, Marinescu M (2019) Attention in recurrent neural networks for ransomware detection. In: International conference on acoustics, speech and signal processing (ICASSP). IEEE, Brighton, United Kingdom, pp 3222–3226 50. Chen Q, Islam SR, Haswell H, Bridges RA (2019) Automated ransomware behavior analysis: pattern extraction and early detection. In: Science of cyber security. Springer International Publishing, Nanjing, China, pp 199–214 51. Jinsoo H, Jeankyung K, Lee S, Kim K (2020) Two-stage ransomware detection using dynamic analysis and machine learning techniques. Wireless Pers Commun 112:2597–2609 52. Al-Hawawreh M, Sitnikova E (2019) Leveraging deep learning models for ransomware detection in the industrial internet of things environment. In: 2019 military communications and information systems conference (MilCIS). IEEE, Canberra, Australia, pp 1–6 53. Arabo A, Dijoux R, Poulain T, Chevalier G (2019) Detecting ransomware using process behavior analysis. In: Complex adaptive systems. Procedia computer science, vol 168. Elsevier, Malvern, Pennsylvania, pp 289–296
146
V. Kulshreshtha et al.
54. Egunjobi S, Parkinson S, Crampton A (2019) Classifying ransomware using machine learning algorithms. In: Intelligent data engineering and automated learning—IDEAL 2019. Springer International Publishing, pp 45–52 55. Microsoft Security (2020) Human operated ransomware attacks a preventable disaster. https:// www.microsoft.com/security/blog/2020/03/05/human-operated-ransomware-attacks-a-pre ventabledisaster/. Last accessed 13 Oct 2020 56. Falcone R (2020) Thanos ransomware: destructive variant targeting state-run organizations in the Middle East and North Africa. https://unit42.paloaltonetworks.com/thanos-ransomware 57. Veracode (2014) Rootkit. https://www.veracode.com/security/rootkit. Last accessed 13 Oct 2020 58. Petcu A (2020) Netwalker ransomware explained. https://heimdalsecurity.com/blog/netwalkerransomware-explained/ 59. Hamilton IA (2020). Elon musk: tesla was target of a failed ransomware attack—business insider. https://www.businessinsider.com/elon-musk-confirms-tesla-was-target-of-failedransomware-attack-2020-8 60. Schneier B (2016) Stop trying to fix the user. IEEE Secur Priv 14:05 61. Suciu O, Coull S, Johns J (2018) Exploring adversarial examples in malware detection. CoRR abs/1810.08280 (2018). arXiv:1810.08280. http://arxiv.org/abs/1810.0828
Internet of Things (IOT)-Based Smart Agriculture System Implementation and Current Challenges Amritpal Kaur, Devershi Pallavi Bhatt, and Linesh Raja
Abstract The Internet of Things (IoT), an advanced breakthrough in the era of digitalization, allows devices to connect with one another while automating and controlling the process online. Using IoT in agriculture has many advantages for managing and monitoring crops. In this study, an architectural framework that integrates the Internet of Things (IoT) with various measures and techniques is built. These methods and measures are used to monitor temperature, humidity, and soil moisture data. The method offers in-crop sensor data analysis in real time and gives farmers the information they need to monitor soil moisture, temperature, and automated irrigation systems while using less of their time and energy. The experiment’s findings include information on temperature, soil moisture, and humidity as well as a decision-making analysis that involves the farmer. An autonomous irrigation system was constructed using Arduino and multisensory functionality (temperature and moisture sensors). This project seeks to measure temperature and moisture levels using an algorithm based on soil parameters. The actuator (water pump) is turned ON and OFF by the automated pumping system based on information from soil moisture sensors. Keywords Internet of Things · Smart agriculture · IoT applications · Soil moisture sensor · Temperature sensor
1 Introduction The term “Internet of Things” describes how we might use technology to collaborate with one another, interact, and transmit real-time data from sensors wireless for processing in order to deliver more useful information for effective decision-making in the relevant study field. IoT is a technology that is rapidly evolving in fields such as health care, defence, industry, agriculture, and others [1]. Its features are limitless and can be used to advance civilization and help people live better lives. A. Kaur · D. P. Bhatt (B) · L. Raja Manipal University Jaipur, Jaipur, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_12
147
148
A. Kaur et al.
To adopt IoT, one needs to be knowledgeable about the relevant research fields, hardware, and connectivity options for accessing the devices. An essential factor in water conservation is the use of the proper irrigation systems [2]. The irrigation system improves the development of plants, flowers, and lawns while saving money and time. Smart irrigation systems are designed to ensure that fields and plants receive an adequate amount of water while using the minimum amount of water possible. As a result, no water is wasted. Irrigation system installation is typically customized to meet customer needs [3]. This is due to the fact that different customers have different needs based on the type of land that needs irrigation. Systems for irrigation are essential to the modern economy. Increasing acres of soil (land) is now possible, even in areas with low rainfall [4]. All you require is a properly set-up watering system. Modern irrigation systems have been used all over the world thanks to rapid technological advancement. Systems have become essential in order to address the unique requirements of agriculture and the landscape. Irrigation system installation must be done correctly for construction projects like greenhouses, gardens, agriculture fields, and parks [5]. The relevant industries should implement the current irrigation methods to get a competitive advantage in the market. According to studies, utilizing a drip irrigation system in place of antiquated hosepipe irrigation results in a watering decrease of up to 90%. They are automatic and designed to significantly cut down on the amount of time needed to irrigate the crops [6]. These irrigation systems are quite efficient as well. To enhance agricultural alternatives, experts have developed a wide range of irrigation technologies over time. Systems for irrigating crops are designed to do so efficiently and with the least amount of water loss from evapotranspiration. Systems for irrigation in agriculture can save costs, especially over time [7]. They may be totally buried, totally submerged, or even totally exposed to the environment. The Arduino allows for a 50% water saving in the automated irrigation system concept. The humidity sensor, or soil moisture sensor, will determine the amount of moisture in the soil. This moisture sensor can be used to determine whether the soil or plant is receiving the necessary amount of water [8]. Crop irrigation is a labourintensive task that takes a lot of time to complete and requires additional manpower [9, 10]. Various technologies, such as automated irrigation system technology, are currently reducing the number of humans or the length of time needed to water the crops. The manual irrigation method used by a gardener to water the plant is insufficient and results in over-irrigation. It brings up the problem of wasting water and using water inefficiently [10, 11]. Using RC4, ECC, and SHA-256 algorithms to secure an irrigation system based on the Internet of Things. The RC4 scheme’s key is encrypted using the ECC technique to increase security. The encrypted data is then hashed using the SHA-256 technique [12]. Improving water consumption efficiency in open field, smart irrigation systems by real-time scheduling and monitoring of agriculture fields are used [13]. IoT technologies are used to develop intelligent agriculture management systems to increase crop productivity and agricultural profitability to regulate the irrigation system, predict the weather, maintain the air pressure, and lessen water wetness [14]. It is a cloud-based software programme that works in conjunction with Internet of Things (IoT) gadgets to automate the irrigation plan, utilizing knowledge
Internet of Things (IOT)-Based Smart Agriculture System …
149
from agricultural specialists and field-gathered environmental data from sensors. The programme is easily expandable to automate fertilizer and offer suggestions for weed and insect management [15]. The Internet of Things (IOT)-based smart garden system was designed using NODE MCU ESP8266 and other various hardware and software components which will operate with Blynk applications [16]. Promoting soil moisture forecasts for smart irrigation systems using IOT and machine learning algorithms [17]. The technique was created for resource- and demand-driven watering that safeguards urban trees without unduly escalating problems with water constraint [18]. Automated irrigation systems are designed based on secondary data collection methods, which are used for irrigation scheduling and water management systems [19]. Designing and implementing a distributed system for supporting smart irrigation using Internet of Things technology for automating the irrigation process using Raspberry Pi 3, soil moisture sensor, and air temperature and humidity sensors [20].
2 Smart Agriculture Applications By automating and optimizing all useful agricultural aspects to improve agricultural cultivation and productivity, smart agriculture improves the techniques of agriculture and makes it easier for farmers [11, 21]. IoT sensors are used to measure the moisture content, weather, and insect detection (see Fig. 1). All are discussed below.
2.1 Weather Monitoring Temperature, humidity, wind, air pressure, and other important meteorological variables have an effect on how well agriculture grows. Wired or wireless sensors are used to gather this data, which is then sent to cloud servers. The gathered information will be plotted against the environment, and the next steps to boost agricultural growth will be decided using analytical methods [22, 23].
2.2 Soil Monitoring One of the most difficult tasks in agriculture today is soil monitoring. Temperature, pH, wetness, and soil humidity are all important soil patterns for agricultural development [24, 25].
150
A. Kaur et al.
Fig. 1 Smart agriculture applications of IoT
2.3 Diseases Monitoring The digitization of some IoT agricultural applications, such as disease monitoring and detection, enables farmers to make well-informed decisions much more quickly [18]. To assess the health of the plants, machine learning and image processing methods are applied. The development of an IoT-based system for monitoring wheat illnesses and pests.
2.4 Irrigation Monitoring IoT helps us to innovate the conventional irrigation system by taking into consideration the current (in real time) weather and soil conditions. Based on the aforementioned criteria, irrigation is only carried out when it is essential. Farmers will benefit from this since it will save on irrigation costs and maximize water supplies [9].
Internet of Things (IOT)-Based Smart Agriculture System …
151
3 Methodology The primary sensor for determining the moisture content of the soil in the support irrigation system was the soil moisture sensor. Dhht11 sensors are used to measure the temperature and humidity of the air because temperature and humidity play an important role for crop growth. Let us say the sensors send a signal to the microprocessor when they notice that the plants’ moisture levels are low. The appropriate machinery receives a signal from the microprocessor to turn on the pump. The microcontroller shows that whenever the soil moisture reaches the desired level, interface devices shut off the pump depending on the performance of the sensors. The sensor has been gathering data, which the microprocessor then sends. The microprocessor frequently evaluates, contrasts, and triggers parameters based on their values. The flow chart is shown in diagram (see Fig. 2). Improving water consumption efficiency in open field, smart irrigation systems by real-time scheduling and monitoring are used.
Fig. 2 Flow chart of the smart irrigation project
152
A. Kaur et al.
Flow Chart Functioning Step 1: Start the device. Step 2: Measure the soil moisture value and temperature and humidity of the air. Step 3: Compare the measured values to threshold values. Step 4: If moisture level is low, ON the motor. Else Off the motor and go to step 1. Step 5: Stop going back to step one. The temperature and soil moisture are all factors in this autonomous irrigation system. The data is sent to the microprocessor from all of these sensors together with other characteristics. Microprocessors control how the entire irrigation system operates. Both temperature sensors and soil moisture sensors have been operating in accordance with their intended purpose. The water pump motor’s ON/OFF switch will control the amount of moisture in the soil. The microprocessor is the central component of the device. The CPU is supported by all of those sensors, as shown in the block diagram in (see Fig. 3).
Fig. 3 Block diagram of the project
Internet of Things (IOT)-Based Smart Agriculture System …
153
4 Hardware and Software Requirements 4.1 Hardware Arduino UNO The Arduino UNO is a detachable dual in-line package (DIP) AT-mega328 AVR microcontroller board. There are 20 digital input/output pins total, of which 6 are PWM outputs and 6 are analogue inputs. The user-friendly Arduino computer software may be used to programme it. The Arduino is a relatively simple approach to begin working with embedded electronics thanks to its large support network. The third and most recent generation of the Arduino UNO is called R3 [12]. Soil Moisture Sensor In order to calculate the water content of the soil, the capacitive soil moisture sensor module measures variations in capacitance. When a plant requires watering, this may be used to trigger an alarm of some sort or to automatically water plants [9, 21, 22]. • Analogue moisture content output. • Greater corrosion resistance than a resistive kind of sensor. • Operated at 3.3 or 5 V; low power means, it may be driven from an MCU digital pin. Temperature/Humidity Sensor (DHT11) The DHT11 is a popular sensor for detecting temperature and humidity. A common component is the DHT11 temperature and humidity sensors. The sensor has an 8bit microcontroller for serial data output and a customized NTC for temperature monitoring. The sensor is also factory tuned, which facilitates connectivity with other chipsets. When monitoring temperature between 0 and 50 °C and humidity between 20 and 90%, the sensor has an accuracy of 1° and 1% [2, 21]. Consequently, this equipment may be the ideal option if you want readings that fall within this range [22].
4.2 Software Arduino software IDE 1.8.5: To write code and upload it to Arduino boards, utilize the open-source Arduino IDE programme. For different operating systems, including Windows, Mac OS X, and Linux, the IDE programme is appropriate. The programming languages, C and C++, are supported. An Integrated Development Environment
154
A. Kaur et al.
is mentioned in this sentence. Sketching is a common term for writing a programme or piece of code in the Arduino IDE. We must connect the Arduino boards with the IDE in order to upload the sketch that was developed in the Arduino IDE programme. The file extension on the illustration is “.ino”.
5 Result and Discussion The soil temperature, moisture content, and volume of water delivered were all tested in relation to this irrigation system. The outputs of the irrigation system depending on soil moisture, temperature, and humidity of the air are monitored for particular time periods and are depicted in (see Fig. 4) and (see Fig. 5) accordingly. The soil moisture percentage goes down from 30%. The water pump motor will begin pumping the water whenever the system receives those readings. These readings represent the system’s threshold values. When water percentage decreases, the motor starts and when soil moisture goes to the required value, the motor automatically turns off as shown in (see Fig. 6) and (see Fig. 7). The partial goal was accomplished after the project’s design and component selection were finished. This irrigation system was entirely created, finished, and all requirements were followed in order to come to a conclusion. The result was verified and found to be accurate. The system is not complete until one or two temperature and moisture sensors from either line inform the Arduino that the soil and water around the plants are dry. When the Arduino board signal is received, the switch will instruct the field pump relay to irrigate these plants immediately. A capacitive soil moisture sensor and a DHT11 sensor were used to operate the pump system that required complicated programming to be uploaded onto an Arduino board. However, this system was completed, and with the aid of the Arduino library, results were delivered. Connecting 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
16.20.50
16.20.55
16.20.56
16.21.40
16.21.42
Soil Moisture Level Fig. 4 Soil moisture parameters
16.23.47
16.23.50
Internet of Things (IOT)-Based Smart Agriculture System …
155
100 80 60 40 20 0
16.20.54
16.20.55
16.20.56
16.21.40
Air Temperature
16.21.42
16.23.47
16.23.50
Air Humidity
Fig. 5 Air temperature and humidity parameters
Fig. 6 Sensors output parameters screen (low water level condition)
the cables with multiple sensors and items to the Arduino is challenging and complex. However, utilizing a plastic breadboard made attaching such wires much easier. The system was designed with special flexible tubes to make connections simpler, but there were issues connecting the pipes from the water tank to the plant project prototype (see Fig. 8) and (see Fig. 9). Soil moisture and DHTT11 sensors are implemented in the field for monitoring the real-time soil moisture, air temperature, and air humidity of the crop field. When soil moisture falls below 30%, the IoT-based irrigation system sends a message to the system indicating that the crop field soil moisture is below the threshold value and it sends signals to the relay to turn on the water motor. The automated irrigation system continuously monitors the soil moisture, air temperature, and humidity conditions.
156
A. Kaur et al.
Fig. 7 Sensors output parameters screen (required water level condition)
Fig. 8 Sensors, Arduino UNO, and breadboard connectivity
When soil moisture increases to the required water moisture level, the system indicates a message that water is equal to the required threshold value, the water motor is turned off, and the system continuously reads real-time values from the field. This paper helps farmers and researchers to improve irrigation scheduling in open field agriculture farms by choosing the best irrigation monitoring and controlling strategies.
Internet of Things (IOT)-Based Smart Agriculture System …
157
Fig. 9 Smart agriculture irrigation system with IOT
6 Current Challenges and Future Recommendations 6.1 Challenges Smart irrigation systems face a variety of problems and difficulties depending on the scenario, including developing the smart system, communicating data transformation, integrating hardware, making decisions, and data analysis, among others. The following are some of the problems that are discussed in this section in various contexts [26]. • Because different sensors are used for various purposes, integrating them is a highly difficult operation. Data from the gathered nodes must be integrated, which is a challenging task [2]. • Make the following changes to the automatic smart watering system: irrigation duration; water waste management; estimation of soil moisture content; and identification of water and nutrient sources. • IoT-based irrigation systems needed an automated, smart microcontroller, which called for automated switches, automated pumps, and smart irrigation infrastructure [27]. • When implementing smart irrigation, it is vital to take into account several occurrences, such as climatic characteristics (soil parameters, moisture, humidity, hue, rainfall timing, and forecasted times of arrival). • It is crucial to take into account decision-making based on predictions of the future and historical facts when putting plans into [28].
158
A. Kaur et al.
6.2 Future Direction The current study on smart irrigation systems has to be enhanced due to the emergence of new autonomous decision-making systems employing IoT, Big Data, artificial intelligence, and machine learning, among other technologies [28]. • Future data prediction is a major issue with smart irrigation. However, a large majority of the earlier studies compiled in the study do not discuss future data management in smart irrigation. • Sustainable smart irrigation systems must be implemented. • Frequency and a new data acquisition method. • It is necessary to have a standard architecture when designing and implementing IoT irrigation systems for various crops. • The most effective communication systems employ deep learning and machine learning techniques.
7 Conclusion This paper gives a general overview of how smart irrigation and agriculture are used, along with critical challenges and potential future prospects. Sensors are used to construct an automated irrigation system using an Arduino UNO that works more effectively. The farmers will be made aware of the appropriate analysis of the changing moisture, temperature, and humidity levels of the farms with the use of the sensors which are implemented in the fields, allowing them to plan the timing of watering fields crops, fertilization, and controlling and monitoring crops. The soil moisture sensors read the data from the field, and if soil moisture level is low, then it will send indications to the motor relay to switch on the motor and vice versa. DHT 11 sensor will monitor the air temperature and humidity level of the field. In future, there is a need to implement the sustainable smart irrigation systems. Need to design and implement a stable, reliable, and standard architecture of irrigation system based on IoT for various crops using deep learning and machine learning techniques.
References 1. Sinha BB, Dhanalakshmi R (2022) Recent advancements and challenges of internet of things in smart agriculture: a survey. Future Gen Comput Syst 126:169–184 2. Loucks DP, Van Beek E (2017) Water resource systems planning and management: an introduction to methods, models, and applications. Springer 3. Cuevas J et al A review of soil-improving cropping systems for soil salinization. Agronomy 9(6):295 4. Kumar KN, Pillai AV, Narayanan MB (2021) Smart agriculture using IoT. Mater Today Proc 5. Boobalan J et al (2018) An IoT based agriculture monitoring system. In: 2018 international conference on communication and signal processing (ICCSP). IEEE
Internet of Things (IOT)-Based Smart Agriculture System …
159
6. Munusamy S, Al-Humairi SNS, Abdullah MI (2021) Automatic irrigation system: design and implementation. In: 2021 IEEE 11th IEEE Symposium on computer applications and industrial electronics (ISCAIE). IEEE 7. Haider W et al (2022) Towards hybrid smart irrigation for mixed-cropping. In: 2022 global conference on wireless and optical technologies (GCWOT). IEEE 8. Di Matteo L et al (2019) The vista project: a test site to investigate the impact of traditional and precision irrigation on groundwater (San Gemini Basin, Central Italy). In: Flow-path 2019, national meeting on hydrogeology. Ledizioni Ledipublishing Via Alamanni, Milan, Italy, 12–14 June 2019, p 3. ISBN: 978-88-5526-012-1. https://doi.org/10.14672/55260121 9. Farooq MS et al (2020) Role of IoT technology in agriculture: a systematic literature review. Electronics 9(2):319 10. Chawla H, Kumar P (2019) Arduino based automatic water planting system using soil moisture sensor. In: International conference on advances in engineering science management and technology (ICAESMT). Uttaranchal University, Dehradun, India 11. García L et al (2020) IoT-based smart irrigation systems: an overview on the recent trends on sensors and IoT systems for irrigation in precision agriculture. Sensors 20(4):1042 12. Mousavi SK, Ghaffari A, Besharat S, Afshari H (2021) Improving the security of internet of things using cryptographic algorithms: a case of smart irrigation systems. J Ambient Intell Human Comput 12(2):2033–2051 13. Bwambale E, Abagale FK, Anornu GK (2022) Smart irrigation monitoring and control strategies for improving water use efficiency in precision agriculture: a review. Agric Water Manag 260:107324 14. Parvathi SB, Kumar N, Ambalgi AP, Haleem SLA, Thilagam K, Vijayakumar P (2022) IOT based smart irrigation management system for environmental sustainability in India. Sustain Energy Technol Assess 52:101973 15. Younes M, Salman A (2021) A cloud-based application for smart irrigation management system. In: 2021 8th international conference on electrical and electronics engineering (ICEEE). IEEE, pp 85–92 16. Talekar PS, Kumar A, Kumar A, Kumar M, Hashmi MI (2021) Smart irrigation monitoring system using Blynk app. Int J Innov Sci Res Technol 6:1353–1355 17. Togneri R, dos Santos DF, Camponogara G, Nagano H, Custódio G, Prati R, Fernandes S, Kamienski C (2022) Soil moisture forecast for smart irrigation: the primetime for machine learning. Exp Syst Appl 117653 18. Gimpel H, Graf-Drasch V, Hawlitschek F, Neumeier K (2021) Designing smart and sustainable irrigation: a case study. J Clean Prod 315:128048 19. Obaideen K, Yousef BAA, AlMallahi MN, Tan YC, Mahmoud M, Jaber H, Ramadan M (2022) An overview of smart irrigation systems using IoT. Energy Nexus 100124 20. Abdelmoamen Ahmed A, Al Omari S, Awal R, Fares A, Chouikha M.: A distributed system for supporting smart irrigation using Internet of things technology. Eng Rep 3(7):e12352 21. Mat I et al (2016) IoT in precision agriculture applications using wireless moisture sensor network. In: 2016 IEEE conference on open systems (ICOS). IEEE 22. Patil GL, Gawande PS, Bag RV (2017) Smart agriculture system based on IoT and its social impact. Int J Comput Appl 176(1):0975–8887 23. Tao W et al (2021) Review of the internet of things communication technologies in smart agriculture and challenges. Comput Electron Agric 189:106352 24. Pathan M et al (2020) Artificial cognition for applications in smart agriculture: a comprehensive review. Artif Intell Agric 4:81–95 25. Divija M (2022) IoT based smart irrigation module for smart cultivation. In: 2022 international conference on wireless communications signal processing and networking (WiSPNET). IEEE, pp 189–192 26. Kumar R et al (2021) Smart sensing for agriculture: applications, advancements, and challenges. IEEE Consum Electron Mag 10(4):51–56
160
A. Kaur et al.
27. Talavera JM et al (2017) Review of IoT applications in agro-industrial and environmental fields. Comput Electron Agric 142:283–297 28. Blessy JA (2021) Smart irrigation system techniques using artificial intelligence and IoT. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV). IEEE
Physical Unclonable Function and Smart Contract-Based Authentication Protocol for Medical Sensor Network Aparna Singh
and Geetanjali Rathee
Abstract In wireless medical sensor networks (WMSN), a group of nodes to collect a patient’s bio-data has revolutionized the existing medical paradigm, allowing medical professionals to access the patient’s record remotely and timely. However, the use of public communication channels and a centralized server to store the vital information of all the patients incurs greater risk to the patient’s privacy and also weakens the framework due to one centralized point of failure. Numerous researchers have proposed secure key agreement and authentication protocols but most of these protocols fail to withstand physical level attacks or they tend to incur added computation costs and increased number of bits being transmitted over the communication channel. IoT devices being resource-constrained need a lightweight and robust security mechanism without draining their already limited resources. To overcome these issues, in this paper, the proposed scheme integrates blockchain and the PUFassisted approach for WMSN. The proposed approach makes use of PUF and the challenge-response pair generated via it to produce a secure and unclonable session key without the need to store any public keys. The comparison study done later in the paper provides a comparison on the basis of number of bits being transmitted and number of hash functions being used, between the existing protocols and the proposed one. In the end, the formal security analysis done using the Scyther tool indicates the efficiency and reliability of the proposed scheme. Keywords Internet of Things · Blockchain · Physical unclonable functions
1 Introduction With the rapid advancements in the field of the Internet of Things (IoT), numerous associated applications came into existence like smart grids, smart cities, smart homes and many more. Integration of the existing IoT technologies with the healthcare sector resulted in the formation of the Internet of Medical Things (IoMT), which is a crucial A. Singh (B) · G. Rathee CSE Department, NSUT, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_13
161
162
A. Singh and G. Rathee
game-changer in the field of medical sciences. Wireless medical sensor networks, lying in the crux of IoMT, provide crucial services to medical professionals and patients around the globe. These services assist medical professionals in keeping a regular check on their patients, regular monitoring and accessing records from a remote location and can help in providing immediate assistance to the patients, if and when needed. A very important part of this environment is the sensor device that is used to collect patient’s data on a timely basis. These sensor devices can be a wearable sensor device which collects real-time health-related data like heart rate, sleep duration, blood pressure, stress levels, glucose levels and much more [1]. This collected data is later transmitted to a gateway node through which a medical professional can access this data from any remote location. However, since public communication channels are being used to transmit the patient’s data, there is a need to have secure and private access to ensure the privacy of the patient’s sensitive information. Moreover, the use of smart devices in IoT also possess several threats like privileged-insider attack, loss of physical device, impersonation attack, man-inthe-middle attack and many such attacks which can result in a security breach of the entire architecture. Thus, WMSN applications need to be secure and resistant to such attacks as mentioned above and also provide only authorized access. Mutual authentication is an essential step before any data is communicated between a medical device and gateway or among two or more sensor nodes. In this paper, an authentication approach using PUF and blockchain is proposed. Blockchain being intrinsically decentralized and immutable can help in providing transparency and trust in the proposed framework. PUF can be considered as a unique, unclonable and unpredictable fingerprint of any integrated circuit (IC). Instead of using public-key encryption, PUF can be relied on to generate a robust and unique session key, thereby providing resistance from physical attacks and cloning attacks [2]. Considering that most of the IoT devices are deployed in the outside world where any authorized person or an intruder can gain access to them, the use of PUF and blockchain will also eliminate the need to store any key and other vital information in the nodes itself, thereby providing resistance from a single point of failure, cloning and physical attacks and false identity attack [3].
1.1 Research Motivation Most of the existing protocols on mutual authentication in WMSN fail to provide security at the physical level. Those who did provide mutual authentication increased the communication cost by increasing the number of bits being communicated and thereby increasing the traffic in the network or ended up using complex operations for IoT nodes with limited resources. Hence, in the proposed scheme, the aim is to overcome all these issues by using blockchain and PUF, to provide mutual authentication in IoMT environment with comparatively lesser cost and operations.
Physical Unclonable Function and Smart Contract-Based …
163
1.2 Contribution (1) This paper highlights the existing issues of trust, authentication and security on the Internet of Medical Things, IoMT. (2) This paper presents a secure and robust authentication approach between a sensor device and a gateway node or between a sensor device and a medical professional, using smart contracts and PUF. (3) The proposed framework is implemented using the Scyther tool [4]. The security analysis done of the proposed approach shows that it is secure against physical layer attacks, impersonation attacks, man-in-the-middle attacks and many more prevalent attacks in IoT. In addition, the organization of the paper is as follows. The relevant work related to the authentication of sensor nodes in WMSNs in the IoMT environment followed by the preliminary study is discussed in Sects. 2 and 3. The registration and authentication steps related to the proposed framework are described in Sect. 4. The formal and informal security analysis of the proposed scheme is presented in Sect. 5. A precise comparative analysis of the existing models with the proposed framework based on computation cost and communication cost is covered in Sect. 6. Finally, Sect. 7 concludes the work along with the future scope.
2 Related Work Recently, a lot of researchers have focused on providing mutual authentication in an IoMT environment. Wu et al. [5] proposed a two-factor-based authentication scheme relying on the security of the session key which is generated with the help of random numbers produced by the communicating parties. However, the proposed scheme fails to provide enough protection against sensor node capture attack. Authors of [6], presented another two-stage authentication protocol using PUF for the added physical layer security. His proposed protocol is well suited for healthcare IoT devices with limited processing capability, memory and limited battery life. However, the protocol is vulnerable to a single point of failure attack. Similarly, Zerrouki [7] in his paper explored the randomness and uniqueness of PUF functions and proposed a key exchange scheme using elliptic curve multiplication and symmetric key. However, the framework fails to provide user anonymity and also suffers from a single point of failure as the entire architecture depends on the trusted server. Saqib [8] in 2020 suggested adding an extra layer of security between a gateway node and sensor node by using three-factor authentication. Along with using a digital signature scheme, he employs elliptical curve cryptography to build a reliable key exchange protocol.
164
A. Singh and G. Rathee
With the invention of blockchain, its properties like immutability, security and decentralization can be used to enhance the security in an IoT architecture. Over the last decade, many researchers [9, 10] presented an overview of various authentication schemes made using blockchain along with discussing the advancements made in e-health care. They provide a thorough comparison between different types of consensus mechanisms that can be used in an IoT environment along with the security aspects of integrating IoT with blockchain like scalability, security and efficiency. Subsequently, in the year 2020, Zhang [11] proposed an anonymous authentication scheme using PUF and blockchain as the distributed and decentralized database. Blockchain is used as a distributed ledger for storing all the key materials of the communicating parties providing a tamper proof and resistant to single point of failure database. Panda et al. [12] in 2021 proposed a one-way hash chain mechanism to distribute a pair of public and private keys to the IoT devices. Wang et al. [13] proposed a way to mutually authenticate a medical professional to a sensor device to fetch vital information about their patients. He used PUF along with fuzzy extractor and hashing to provide reliable and efficient physical layer security. However, Yu [14] found that Wang’s et al. scheme [13] does not protect against man-in-the-middle attack. Subsequently, it also fails to mutually authenticate a medical professional and a sensor device. To solve the aforementioned issues, they proposed an improved version of the authentication protocol using the same parameters. Gadekallu [15] proposed the concept of BEOT, blockchain edge of things where blockchain and PUF can be used to provide user privacy and efficient storage and sharing system in an IoT environment. Similarly, Xia [3] used PUF, a fuzzy extractor and Chinese remainder theorem to invent an authentication mechanism for a smart home environment. Most of the above-mentioned schemes however incurred extra computational cost by increasing the number of bits being transmitted among the communicating parties, thereby making it expensive and not cost-effective for an IoT environment. To overcome the above issue, the proposed scheme aims to decrease the communicational and computational cost in an IoT environment.
3 Preliminary Details 3.1 Smart Contract and PUF The paper [16] published in 2018 introduced a new data structure called a blockchain. A blockchain can be considered as a chain of valid blocks, with each block storing the hash value of the previous valid block making it immutable and tamper-proof. Blockchain can also be used in any application where transparency, autonomy, security and anonymity are of utmost relevance [9]. One of the major applications of blockchain is smart contracts. Smart contracts can be viewed as a piece of code that can be used to automatically initiate a set of operations once a certain condition is met. Immutability being the intrinsic nature of smart contracts makes it the ideal
Physical Unclonable Function and Smart Contract-Based …
165
choice for authentication purposes. Once a smart contract is built into the blockchain, it cannot be modified. PUF is a bit stream which is formed at the time of the manufacture of an integrated circuit, IC, by recording the minuscule variations in the properties of ICs and converting them into a digital form. F: Challenge (bit stream) = Response A PUF takes a challenge as an argument and produces a response in return. No two challenges can produce the same response making PUF an ideal choice for this authentication framework [17]. PUFs are desirable in place of asymmetric key encryption because of their unpredictability, unclonability and tamper-proof nature.
3.2 System Model Figure 1 illustrates the system model and the entities used in this paper as its basis. Entities in the proposed model: (1) Medical Professional (MP): This can be a doctor, a hospital staff or any other professional entitled to access the patient’s record. In an ideal case, only a registered medical professional is allowed to access the patient’s record. (2) Sensor Device (SD): The key job of a sensor device is to collect information from the patient’s body. PUF in a sensor device is responsible for registering the
Fig. 1 System model for the proposed approach in the WMSNs
166
A. Singh and G. Rathee
sensor device to the gateway node and the challenge-response pair generated via them also assists in forming a session key. (3) Gateway node (GWN): A cluster of gateway nodes connected with blockchain helps in communication between a medical professional and a sensor device. The key responsibility of a gateway node is the registration of both a medical professional and a sensor device.
3.3 Adversary Model Following the famous adversary model Dolev-Yao [18], the capabilities of an adversary are the following: (1) The adversary can gain complete access to the public communication channel. (2) The adversary can eavesdrop, reorder, resend and drop any messages being transmitted through the public communication channel.
4 Proposed Work This section is dedicated to the steps involved in the proposed model, involving smart contracts, PUF functions and fuzzy extractor. This scheme consists of three stages, the initial stage, registration stage and authentication stage. Table 1 lists all the notations that are used in the proposed work.
4.1 Initial Stage This stage involves organizing the gateway nodes in a cluster and deciding the consensus protocol the blockchain will be running on. These gateway nodes will be communicating with the smart contract, in the blockchain to invoke the registration function of both MP and SD. This phase is similar to the first phase of [13].
4.2 Registration Stage In this stage, the MP and SD will register themselves in the blockchain network via a GWN. Any new MP or SD will have to first register themselves before the beginning of the authentication process. Figure 2 depicts the flowchart for the proposed scheme. SD registration: The SD computes the (Cs, Rs) of its device, a random nonce and a timestamp T. The response generated is used as an input to the ‘Probabilistic Generation Function’ of the fuzzy extractor to generate (k, h). The key generated
Physical Unclonable Function and Smart Contract-Based … Table 1 Notations used in the proposed work
167
Notation
Meaning
MP, SD, GWN
Medical professional, sensor device and gateway node
IDm, IDs, IDg
Identity of MP, SD and GWN
RPW, BIO
Real password and biometric of a medical professional
(Cs, Rs), (Cm, Rm)
Challenge-response pair of MP and SD
SC
Smart contract
RN, RNs, RNm, RNg
Random nonce generated by SC, SD, MP and GWN
SKms
Session key
Regfun_m(), Regfun_s()
Registration function of MP and SD
PWm, key
Password of MP and private key of the GWN
H(), Gen(), Rep()
Hash function and fuzzy extractors
T i , ||, ⊕
Timestamps, concatenation and XOR operations
is unique to each SD due to its dependence on the PUF function. The SD sends its {IDs, Cs, Rs, h, T } along with more secret parameters masked with the help of hash functions to the GWN. The GWN invokes Regfunc_s () in the smart contract which verifies the authenticity of the secret parameters and either successfully registers an SD or aborts the request on the basis of the authenticity of these secret parameters. MP registration: Similar to the SD registration stage, the MP first computes its real hidden password, RPW as H(km||PWm). This RPW is then sent to the GWN along with its {IDm, (Cm, Rm), Regfunc_m}. The GWN invokes the Regfun_m in the smart contract to further complete the process. SC replies with the (IDm, RN, Rm} masked with its private key. In case the MP is already registered, this duplicate request is aborted.
4.3 Authentication Stage In the authentication stage, the GWN mutually authenticates both the MD and the SD based on the secret parameters shared between the nodes during the registration phase. Once authenticated, SD creates a session key which is communicated to the GWN masked with the hash function and also protected with the RNsd and RNg, random nonces created by SD and GWN. Once the validity of this key and the identity of the SD is verified by the GWN, it is forwarded to the MD for further communication.
168
A. Singh and G. Rathee
Fig. 2 Flowchart for the proposed approach for mutually authenticating an MP with a SD via GWN/SC
5 Security Analysis 5.1 Formal Analysis Scyther is used as a validation tool to prove whether the proposed protocol is secure against the discussed attacks in Sect. 5. Scyther is an automatic security analysis tool based on Python. It makes use of SPDL, security policy definition language, to implement the protocol. Table 2 depicts the list of successful claims made for the proposed protocol.
Physical Unclonable Function and Smart Contract-Based …
169
Table 2 Scyther proof of claims for the proposed protocol Entity
Claim
Status
Comments
MAuth, MP1
Secret IDm
Ok
No attacks within bounds
MAuth, MP2
Secret RNm
Ok
No attacks within bounds
MAuth, MP3
Secret T1
Ok
No attacks within bounds
MAuth, MP4
Secret H(FEGenKey(BIO), PWm)
Ok
No attacks within bounds
MAuth, MP5
Secret H(IDm, IDs, RNm, RNsd)
Ok
No attacks within bounds
MAuth, MP6
Secret PUF(Cm)
Ok
No attacks within bounds
MAuth, MP7
Alive
Ok
No attacks within bounds
MAuth, GWN1
Secret IDg
Ok
No attacks within bounds
MAuth, GWN2
Secret RNg
Ok
No attacks within bounds
MAuth, GWN3
Secret T2, T3, T5
Ok
No attacks within bounds
MAuth, GWN5
Secret H(IDm, IDs, RNm, RNsd)
Ok
No attacks within bounds
MAuth, GWN7
Alive
Ok
No attacks within bounds
MAuth, SD1
Secret IDs
Ok
No attacks within bounds
MAuth, SD2
Secret RNsd
Ok
No attacks within bounds
MAuth, SD3
Secret T4
Ok
No attacks within bounds
MAuth, SD4
Secret H(IDm, IDs, RNm, RNsd)
Ok
No attacks within bounds
MAuth, SD5
Secret PUF(Csone)
Ok
No attacks within bounds
MAuth, SD6
Nisynch
Ok
No attacks within bounds
MAuth, SD7
Alive
Ok
No attacks within bounds
5.2 Informal Analysis Physical Capture Attack: Even if an adversary manages to physically capture a sensor device and get hold of the secret parameters stored inside the local memory of the device, it is highly unlikely to predict the accurate pair of (Cs, Rs) pairs of the sensor device due to their unpredictable nature. Moreover, the adversary cannot know the secret random number RNs used by the smart contract. Impersonation Attack: For an adversary to impersonate a sensor device, medical professional or a gateway node, they need to be aware of the (C, R) pair of the said device to forge the request for data or request for authentication. Usage of a random nonce, RN, in each of the device authentication and registration phase again makes it impossible for the adversary to create a false identity. Man-in-the-Middle Attack: As an eavesdropper, even if an adversary manages to get access to all the messages being transferred between the communication parties, they cannot impersonate and generate a request for access of data from a sensor device or even send an authentication request as all the messages are protected by a random nonce, secret parameter like R1, S3 and also the IDs of the respected device.
170
A. Singh and G. Rathee
No. of Hash used
20
Wu et al.[5]
Amin et al. [1]
Yu et al. [14]
Ours
15 10 5 0 Medical Professional
Sensor Device
Gateway
Existing algorithms Fig. 3 Comparison of the computational cost
6 Performance Analysis In this section, a comparison is made of the proposed model with three existing schemes based on the computational cost and communication cost. Computational cost is done by tracking the number of hash functions used in the authentication phase of [1, 5, 14]. As denoted by [13], the time consumed by a hash function is 0.0005 s and the parameters of the device where the time was recorded can be accessed from [13]. The comparison given in Fig. 3 shows that the proposed model takes least computational cost. Figure 4 shows the comparison of the proposed model based on the number of total bits transmitted between the communicating parties. On the basis of the bits used in [14], the identity of MP is of 160 bits, challenge is of 32 bits, random nonces are of 160 bits, identity of SD and GWN is 32, private key is of 512 bits and timestamp and function identifiers are of 32 and 64, respectively. On the basis of the proposed protocol described in Sect. 4, five messages are being transmitted in the authentication stage. Each of these messages is 448 bits, 352 bits, 224 bits, 416 bits and 384 bits, respectively. Figure 4 clearly shows that the proposed model transmits the least number of bits in total, thereby making the possibility of collision and communication cost even less than the compared models.
7 Conclusion and Future Work This paper makes use of blockchain and PUF to achieve physical layer security and also offers a robust and secure authentication scheme in WMSNs. It uses PUF and fuzzy extractor to derive real password from the biometric information. This scheme is also resistant to some of the most prevalent attacks in IoMT. The Scyther tool is used to prove the efficiency and security of the proposed framework. Based on the
Physical Unclonable Function and Smart Contract-Based …
1400
Wu et al.[5]
Amin et al. [1]
Yu et al. [14]
171
Ours
Bits transmitted
1200 1000 800 600 400 200 0 Medical Professional
Sensor Device
Gateway
Existing algorithms Fig. 4 Comparison of the communicational cost
comparison study, it can be verified that the proposed approach is much more efficient and less time-consuming than the already existing approaches. It also produces less overhead as compared to some of the existing similar algorithms. For future purposes, the aim is to apply the proposed approach to digital forensics to provide a secure chain of evidence.
References 1. Amin R, Islam SH, Biswas GP, Khan MK, Kumar N (2018) A robust and anonymous patient monitoring system using wireless medical sensor networks. Futur Gener Comput Syst 80:483– 495. https://doi.org/10.1016/j.future.2016.05.032 2. Kwon D, Park Y, Park Y (2021) Provably secure three-factor-based mutual authentication scheme with PUF for wireless medical sensor networks. Sensors 21(18):6039. https://doi.org/ 10.3390/s21186039 3. Xia Y, Qi R, Ji S, Shen J, Miao T, Wang H (2022) PUF-assisted lightweight group authentication and key agreement protocol in smart home. Wirel Commun Mob Comput 2022:1–15. https:// doi.org/10.1155/2022/8865158 4. Cremers C, The scyther tool: verification, falsification, and analysis of security protocols (tool paper) [Online]. Available: http://people.inf.ethz.ch/cremersc/scyther/ 5. Wu F et al (2018) A lightweight and robust two-factor authentication scheme for personalized healthcare systems using wireless medical sensor networks. Futur Gen Comput Syst 82:727– 737. https://doi.org/10.1016/j.future.2017.08.042 6. Alladi T, Chamola V, Naren, HARCI: a two-way authentication protocol for three entity healthcare IoT networks. IEEE J Sel Areas Commun 39(2):361–369. https://doi.org/10.1109/JSAC. 2020.3020605 7. Zerrouki F, Ouchani S, Bouarfa H (2021) Towards a foundation of a mutual authentication protocol for a robust and resilient PUF-based communication network. Proc Comput Sci 191:215–222. https://doi.org/10.1016/j.procs.2021.07.027 8. Saqib M, Jasra B, Moon AH (2021) A lightweight three factor authentication framework for IoT based critical applications. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jks uci.2021.07.023
172
A. Singh and G. Rathee
9. Shammar EA, Zahary AT, Al-Shargabi AA (2021) A survey of IoT and blockchain integration: security perspective. IEEE Access 9:156114–156150. https://doi.org/10.1109/ACCESS.2021. 3129697 10. Vandana AryaSomayajula SV, Goyal A (2022) An overview of blockchain and IoT in e-healthcare system, pp 123–137. https://doi.org/10.1007/978-981-16-9416-5_10 11. Zhang Y, Li B, Wang Y, Wu J, Yuan P (2020) A blockchain-based user remote autentication scheme in IoT systems using physical unclonable functions. In: 2020 IEEE 5th international conference on signal and image processing (ICSIP), Oct 2020, pp 1100–1105. https://doi.org/ 10.1109/ICSIP49896.2020.9339402 12. Panda SS, Jena D, Mohanta BK, Ramasubbareddy S, Daneshmand M, Gandomi AH (2021) Authentication and key management in distributed IoT using blockchain technology. IEEE Internet Things J 8(16):12947–12954. https://doi.org/10.1109/JIOT.2021.3063806 13. Wang W et al (2022) Blockchain and PUF-based lightweight authentication protocol for wireless medical sensor networks. IEEE Internet Things J 9(11):8883–8891. https://doi.org/10. 1109/JIOT.2021.3117762 14. Yu S, Park Y (2022) A robust authentication protocol for wireless medical sensor networks using blockchain and physically unclonable functions. IEEE Internet Things J 1. https://doi. org/10.1109/JIOT.2022.3171791 15. Gadekallu TR et al (2022) Blockchain for edge of things: applications, opportunities, and challenges. IEEE Internet Things J 9(2):964–988. https://doi.org/10.1109/JIOT.2021.3119639 16. Nakamoto S, Bitcoin: a peer-to-peer electronic cash system [Online]. Available: www.bitcoi n.org 17. Aman MN, Chua KC, Sikdar B (2017) Mutual authentication in IoT systems using physical unclonable functions. IEEE Internet Things J 4(5):1327–1340. https://doi.org/10.1109/JIOT. 2017.2703088 18. Herzog J (2005) A computational interpretation of Dolev-Yao adversaries. Theoret Comput Sci 340(1):57–81. https://doi.org/10.1016/j.tcs.2005.03.003
Developing Prediction Model for Hospital Appointment No-Shows Using Logistic Regression Jeffin Joseph, S. Senith, A. Alfred Kirubaraj, and Jino S. R. Ramson
Abstract Patient no-shows are a significant problem in health care which leads to increased cost, inefficient utilization of capacity, and discontinuity in care. With the existing available patient appointment history, the research aims to predict the appointment no-shows of patients in a public hospital using the method of logistic regression. Based on characteristics of the appointment history data, the features are divided into demographic variables, appointment characteristics, and clinical characteristics. By considering these features and its multiple combinations, a fivefold cross-validation technique is used to choose the best feature combination for the best prediction model. From the analysis, it is found that appointment characteristics give better predictions. The study tested the model with appointment characteristics, and the performances are evaluated using accuracy, specificity, precision, recall, and F1-score. The model is evaluated using receiver operating characteristic curve and precision-recall curve. Hospitals can employ the model to predict appointment no-shows and implement mitigation strategies based on the outcome of the prediction. Keywords Hospital management · Logistic regression · Appointment no-shows · Predictive analytics · Machine learning
1 Introduction In recent years, considerable attention was given by researchers in predicting patient no-shows in hospitals. Patients’ no-shows or missed appointments are appointments which are scheduled, and the patients are unable to keep or even cancel at the right time. Beyond the consequences due to discontinuity in care, patient no-shows result in inefficient utilization of capacity and increased cost of operations [1]. A study found that missed appointments cost US healthcare systems 150 billion dollars annually
J. Joseph (B) · S. Senith · A. Alfred Kirubaraj Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India J. S. R. Ramson Saveetha School of Engineering, Thandalam, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_14
173
174
J. Joseph et al.
and an average of 200 dollars per unused time slot for a physician [2]. Managing the missed appointments is critical to hospitals and clinics all over the world. The idea is that missed appointments are caused by patients forgetting their appointments, thus reminder calls and texts are sent. Another mitigation strategy adopted to reduce patient no-show is overbooking. Many professional healthcare providers have adopted these methods, but still the missed appointment problem prevails. Overbooking strategy without proper prediction of missed appointments won’t help to solve the problem but will increase the patient waiting time and doctor over time [3]. The likelihood of missed appointments can be predicted using the contributed variables. The variables affecting patient missed appointments range from sociodemographic variables to clinical specialty and insurance status [4]. To predict the likelihood of a missed appointment, many prediction algorithms are used. The logistic regression model is one of the most basic and often used approaches for predicting missed appointments [4–6]. The research develops and evaluates the logistic regression-based prediction model to predict hospital appointment no-shows. The study helps to understand the important variables used for prediction of hospital appointment no-shows. The study will help the healthcare providers in developing no-show mitigation strategies effectively. The article is arranged as follows. The introduction to the article is given in Sect. 1. Related studies are presented in Sect. 2. The model is developed in Sect. 3. Performance measures are included in Sect. 4, and the conclusion is presented in Sect. 5.
2 Related Studies Prediction approaches have been widely used in the healthcare sector for detection of diseases and also management of healthcare resources [7–10]. There are numerous studies that forecast the appointment no-shows in hospitals using various prediction methodologies. One of the most important and common methods is the logistic regression approach. A study by Goffman in 2017 for predicting patient no-shows of clinics measured the performance of the model in terms of receiver operating characteristic curve and obtained an area under the curve (AUC) of 0.71 [11]. In a different study, Hong and Alaeddini used L1/L2 regularized multinomial logistic regression to predict four different types of events tardiness, cancellations, no-shows, and shows by giving due consideration to variables such as weekday, appointment time, location, gender, age, marital status, and insurance [12]. Another study for predicting no-shows used regularized logistic regression, least absolute shrinkage, and selection operator (LASSO) which shrinks Beta coefficients towards zero to generate a more stable estimate to avoid overfitting and has considered variables such as age, sex, race, previous appointments, prior no-shows, day of the week, appointment time, appointment length, insurance, and follow-up, for prediction [13].
Developing Prediction Model for Hospital Appointment No-Shows …
175
Another study conducted in 2018 used factors such as visit type, language, ethnicity, race, gender, marital status, employment, insurance, drugs, age, annual income, prior no-shows, and appointment data such as appointment length, lead time, visit status, appointment time, season, specialty, and weekday for developing logistic regression model for predicting appointment no-show and obtained the performance as AUC (0.81), sensitivity (0.72), and accuracy (0.73) [14]. Another study in 2018 used multiple logistic regressions for developing a predictive model for risk stratification of no-show and the AUC obtained as 0.718 [15]. In 2019, Ahmad estimated the sensitivity, specificity, and AUC at 0.47, 0.79, and 0.70, respectively, for the probit regression model to predict clinical no-shows [16]. Again in 2019, Li and Tang used Bayesian nested logit model for predicting appointment no-shows and obtained area under the curve as 0.886 [17]. Another logistic regression model for predicting appointment no-shows developed by Harvey predicted no-shows in radiology department using variables distance, appointment time, day of the week, lead time, risk, age, race, language, income, insurance, employment, education, marital status, and sex and got an AUC of 0.753 [18]. In 2020, Gromisch adopted a logistic regression method based on the bivariate analysis with dichotomized no-shows as the output and the c-statistic was obtained as 0.78 [19]. In a different study, Alloghani used decision trees and multiple logistic regression models and obtained sensitivity and specificity values of 0.26 and 0.98, respectively [20]. Random forest, decision tree, gradient boosting, and neural networks are also used for prediction of appointment no-shows [21–23]. Logistic regression provides better prediction performance when compared with other methods [24, 25]. Thus, the related studies helped to evaluate the performance of the models employed in related studies for predicting hospital appointment noshows. Related studies also contributed to the selection of variables used for the development of the logistic regression prediction model employed in the current study.
3 Model Development The source dataset for the study comprises records of 62,285 patients and their 110,478 appointments from a public hospital in Brazil. The total no-show count in the data is 20% of the total appointments. In order to analyse, the data is first cleansed and the incomplete records are deleted. The original data consisted of variables: Patient_Id, appointment ID, high blood pressure, diabetic, alcohol addiction, handicapped, gender, scheduled date, appointment date, age, SMS received, and appointment no-show. New variables are computed and extracted from the existing variables. The process is called feature engineering. The extracted new variables are waiting duration, appointment weekday, missed appointment before, number of missed appointments, and appointment type. The variable waiting duration is extracted from the data appointment date and scheduled date. The variable waiting duration consists of the same day, within two days, within a week, within a month, and
176
J. Joseph et al.
more than a month. The missed appointments before and number of missed appointments are extracted from the appointment no-show data. The variable number of missed appointments consists of categories: not missed any appointments, less than two missed appointments, between three and ten missed appointments, and more than ten missed appointments. The appointment type describes whether the appointment is a new appointment or a repeated appointment. The variable age is grouped and converted to a new variable age group for better prediction results. The variable age group consists of categories: old age adult, mid-age adult, child, young age adult, and adolescent. Then the variables in the new dataset are classified into demographic variables, appointment characteristics, and clinical characteristics. The data is divided into two sections: a training set with 80% data and a test set with 20% data [26]. The training set is encoded, and the logistic regression model is fivefold crossvalidated using the training set with different combinations of the classified variables. Seven cross-validations are conducted with (1) demographic variables, (2) appointment characteristics, (3) clinical characteristics, (4) demographic variables and the appointment characteristics, (5) demographic variables and clinical characteristics, (6) appointment characteristics, and clinical characteristics (7) demographic variables, appointment characteristics, and clinical characteristics. The best-performing characteristics are used for developing the prediction model using the entire training set. The test set is encoded and is used to test the model. The prediction performance is analysed using different performance measures including accuracy, precision, recall, and F1-score. The model is also evaluated using the further performance indicators receiver operating characteristic curve and the precision-recall curve. The model is implemented using Python with the Scikit-learn library. The prediction output can be used by the hospitals in the real-time environment to adopt mitigation strategies like overbooking. For example, if the model predicts 20% no-shows, the appointment system can take 20% more appointments than the allocated capacity. The appointment system can overbook patients based on the prediction output. Figure 1 describes the overall framework of the study.
3.1 Demographic Variables A descriptive analysis of the demographic variables is given in Table 1. The variable age group consists of categories: old age adult, mid-age adult, child, young age adult, and adolescent. Another variable is gender which consists of two categories: male and female. The output variable missed appointment consists of two outcomes: show and no-show.
Developing Prediction Model for Hospital Appointment No-Shows …
177
Fig. 1 Framework of the study
Table 1 Descriptive analysis demographic variables Demographic variables
Total
Show
No-show
Gender
Female
71,803
57,214
14,589
Male
38,675
30,952
7723
Age group
Old age adult
25,288
21,030
4258
Mid-age adult
52,057
41,453
10,604
Child
19,941
15,906
4035
Young age adult
2237
512
1725
Adolescent
6342
4652
1690
3.2 Appointment Characteristics The appointment characteristics consist of six variables: waiting duration, appointment weekday, number of appointments missed, missed appointment before, type of appointment, and message received. A descriptive analysis of the appointment characteristics with the show and no-show is given in Table 2.
178
J. Joseph et al.
Table 2 Descriptive analysis appointment characteristics Appointment characteristics Waiting duration
Appointment weekday
Total More than a month
Show 6492
No-show 208
Same day
38,534
36,743
1791
Within a month
30,065
20,520
9545
Within a week
20,245
15,189
5056
Within two days
11,934
9222
2712
Monday
22,708
18,020
4688
Tuesday
25,630
20,479
5151
Wednesday
25,836
20,747
5089
Thursday
17,247
13,909
3338
Friday
19,018
14,981
4037
39
30
Saturday Number of appointments missed
9700
9
Not missed any appointments
76,800
Less than two missed appointments
31,390
10,507
20,883
2244
847
1397
44
12
32 22,312
Between three and ten missed appointments More than ten missed appointments
76,800
Missed appointments before
Yes
33,678
11,366
No
76,800
76,800
Type of appointment
Fresh
51,385
50,094
12,191
Repeat
48,193
38,072
10,121
Message received
Yes
35,468
25,686
9782
No
75,010
62,480
12,530
3.3 Clinical Characteristics The clinical characteristics are high blood pressure, diabetic, alcohol addiction, and handicapped. A descriptive analysis of clinical characteristics with show and no-show is given in Table 3.
3.4 Cross-Validation The logistic regression model is fivefold cross-validated, with the three sets of variables and their combinations. Seven cross-validations are conducted with (1) demographic variables, (2) appointment characteristics, (3) clinical characteristics, (4)
Developing Prediction Model for Hospital Appointment No-Shows …
179
Table 3 Descriptive analysis clinical characteristics Clinical characteristics
Show
No-show
High blood pressure
Yes
Total 21,796
18,027
3769
No
88,682
70,139
18,543
Diabetic
Yes
7941
6511
1430
No
102,537
81,655
20,882
Alcohol addiction Handicapped
Yes
3360
2683
677
No
107,118
85,483
21,635
Nil
108,237
86,332
21,905
1. Disability
2042
1676
366
2. Disability
183
146
37
3. Disability
13
10
3
4. Disability
3
2
1
demographic variables and the appointment characteristics, (5) demographic variables and clinical characteristics, (6) appointment characteristics and clinical characteristics, and (7) demographic variables, appointment characteristics, and clinical characteristics. The performance measures such as accuracy, precision, recall, and F1-score are determined for each cross-validation. The performance of the cross-validation is obtained, and the performance measures are given in Table 4. The performance measures such as average values of accuracy, precision, recall, and F1-score are obtained as 0.52, 0.23, 0.57, and 0.32, respectively, for crossvalidation 1. The values of the performance indicators are very low, and hence, it can be concluded that the demographic variables have less significance in predicting the missed appointments. Secondly, the researchers validated the logistic regression model using appointment characteristics. The values of different performance measures obtained from the cross-validation 2 are given in Table 4. The average values of performance measures such as accuracy, precision, recall, and F1-score are obtained as 0.93, 0.74, 0.98, and 0.85, respectively. The values of the performance indicators are high, and hence, it can be concluded that the model with variable appointment characteristics has greater significance in predicting the missed appointments. Table 4 Comparison of performance measures of cross-validations (1–7) CV1
CV2
CV3
CV4
CV5
CV6
CV7
Average accuracy
0.52
0.93
0.34
Average precision
0.23
0.74
0.20
0.93
0.52
0.93
0.93
0.74
0.22
0.74
0.74
Average recall
0.57
0.98
0.82
0.97
0.56
0.98
0.98
Average F1-score
0.32
0.85
0.34
0.84
0.32
0.84
0.84
180
J. Joseph et al.
Next, the logistical regression model is fivefold cross-validated using clinical characteristics. The values of different performance measures of cross-validation 3 are given in Table 4. The average values of performance measures such as accuracy, precision, recall, and F1-score are obtained as 0.34, 0.20, 0.82, and 0.34, respectively. The values of the performance indicators are very low, and hence, it can be concluded that the clinical characteristics have less significance in predicting the missed appointments. The logistical regression model is fivefold cross-validated using a training set with demographic variables and the appointment characteristics. The average values performance measures such as accuracy, precision, recall, and F1-score are obtained as 0.93, 0.74, 0.97, and 0.84, respectively. The values of the performance indicators of cross-validation 4 are very high, and hence, it can be concluded that the demographic variables and appointment characteristics have high significance in predicting the missed appointments. The logistical regression model is again fivefold cross-validated using the demographic variables and the clinical characteristics. The average values of performance measures such as of cross-validation 5, accuracy, precision, recall, and F1-score are obtained as 0.52, 0.22, 0.56, and 0.32, respectively. The values of the performance indicators are very low, and hence, it can be concluded that the combination of demographic variables and clinical characteristics has less significance in predicting the missed appointments. Now the logistical regression model is fivefold cross-validated using a training set with demographic variables and the clinical characteristics. The average values of performance measures such as accuracy, precision, recall, and F1-score are obtained as 0.93, 0.74, 0.98, and 0.84, respectively. The values of the performance indicators of cross-validation 6 are very high, and hence, it can be concluded that the combination of clinical characteristics and appointment characteristics has high significance in predicting the missed appointments. The logistical regression model is again fivefold cross-validated using a training set with the demographic variables, the appointment characteristics, and the clinical characteristics. The values of different performance measures are given in Table 4. The average values of the performance measures for cross-validation 7, accuracy, precision, recall, and F1-score are obtained as 0.93, 0.74, 0.98, and 0.84, respectively. The values of the performance indicators are very high, and hence, it can be concluded that the combination of demographic variables, clinical characteristics, and appointment characteristics has high significance in predicting the missed appointments. When the cross-validation results are compared, it is found that the models which used the appointment characteristics are having the high-performance measures. The model which used only the appointment characteristics gives high value for all performance indicators. Hence, researchers come to the conclusion of using the test set on the model with appointment characteristics rather than on models with other variables or its combinations. The performance measures of the test set are analysed in the next section.
Developing Prediction Model for Hospital Appointment No-Shows …
181
4 Performance Measures The model is tested with the test set, and the performance measures are analysed. The performance measures such as accuracy, precision, recall, and F1-score are analysed. Table 5 shows the values of performance measures of the prediction. The performance measures such as accuracy, precision, recall, and F1-score are obtained as 0.93, 0.75, 0.98, and 0.85, respectively. The high values show high performance for the models.
4.1 Receiver Operating Characteristics Curve and Precision-Recall Curve The receiver operating characteristic curve (ROC) and precision-recall curve for the model are shown in Fig. 2. The ROC and the area under curve show how successfully the model can distinguish between show and no-shows [27]. The area under the curve for the model is 0.99 which is remarkable. The precision-recall curve helps to understand at which point the precision and recall are high. It determines the model’s capability to predict no-shows accurately when the number of individuals show up is significantly greater than no-shows. Precision recall is a better performance measure in cases where the kind of imbalance problem occurs [28]. Average precision (AP) shows the measure of precision-recall curve, and the value is obtained as 0.95. Table 5 Performance measures of prediction model
Performance measure
Value
Accuracy
0.93
Precision
0.75
Recall
0.98
F1-score
0.85
Fig. 2 a Receiver operating characteristics curve. b Precision-recall curve
182
J. Joseph et al.
5 Discussion and Conclusion The study cross-validated logistic regression model using different sets of variables to find the best set of variables for prediction. It is found from the study; the appointment characteristics are the most significant predictors in predicting the hospital appointment no-shows. The performance measures of the test set, especially the value of AUC, show the high performance of the model. The AUC for the model is obtained as 0.99. Another important performance measure is the average precision value which is obtained as 0.95. The majority of comparable research has not used the precision-recall curve to analyse the model’s performance. Accuracy, precision, recall, and F1-score are the other performance metrics used for evaluation with respective values of 0.93, 0.75, 0.98, and 0.85. The performance of the logistic regression model used for prediction in the current research is high when compared with the results of some of the machine learning approaches used for prediction in similar studies [21–23]. In a comparable study by Dashtban, in 2021 the best model for predicting appointment no-shows in the study, logistic regression has an AUC value of only 0.704. The current research can be enhanced with few suggestions which are listed as follows. The study used demographic variables, appointment characteristics, and clinical characteristics available in the database of the hospital selected for the study. Predictions can be made more accurately if there are additional variables available. Several demographic factors like place, marital status, race, ethnicity, employment, and annual income might have included improved prediction outcomes [4, 14, 18]. The appointment variables which may have been additionally considered but not available are length of appointment, appointment channel, prior cancellation, proceeding day holiday, and school holiday [4]. The study has used only very limited clinical variables such as high blood pressure, alcohol addiction, handicapped, and diabetic. Some important clinical variables may have included disease, medicine, laboratory result, specialty, risk group, and psychiatry condition [4]. The current study has considered missed appointments and cancellations together; the study can be enhanced by considering missed appointments and cancellations separately. Tardiness can also be included to develop a better model [12]. The current study uses data from a single hospital in Brazil; the study might be expanded with data from more hospitals. Another recommendation is to fix the imbalance problem [29]. The dataset used in the study is an imbalanced dataset. The appointment shows are 80% of the dataset, whereas no-shows are only 20%. The current study has not adopted any strategies to fix the imbalance problem. Future studies can adopt methods such as SMOTE [30] to fix the imbalance problem. The outcome of the study helps the researchers to better understand about the predictors of hospital missed appointments. The study will also help the healthcare facility providers in predicting the missed appointments and thereby adopting mitigation strategies to increase the efficiency of the outpatient department of the hospital. The predictions made by the proposed model are more accurate and precise than a
Developing Prediction Model for Hospital Appointment No-Shows …
183
scheduling system with an overbooking strategy can be developed. Next mission of the researchers is to develop a hospital appointment system based on the no-show prediction.
References 1. Gupta D, Denton B (2008) Appointment scheduling in health care: challenges and opportunities. IIE Trans 40:800–819 2. Gier J, Missed appointments cost the U.S. healthcare system $150B each year …, https:// www.hcinnovationgroup.com/clinical-it/article/13008175/missed-appointments-cost-the-ushealthcare-system-150b-each-year 3. Zacharias C, Pinedo M (2013) Appointment scheduling with no-shows and overbooking. Prod Oper Manag 23:788–801 4. Dantas LF, Fleck JL, Cyrino Oliveira FL, Hamacher S (2018) No-shows in appointment scheduling—a systematic literature review. Health Policy 122:412–421 5. Alaeddini A, Yang K, Reddy C, Yu S (2011) A probabilistic model for predicting the probability of no-show in hospital appointments. Health Care Manag Sci 14:146–157 6. Samuels RC, Ward VL, Melvin P, Macht-Greenberg M, Wenren LM, Yi J, Massey G, Cox JE (2015) Missed appointments. Clin Pediatr 54:976–982 7. Kute SS, Tyagi AK, Malik S, Deshmukh A (2022) Internet-based healthcare things driven deep learning algorithm for detection and classification of cervical cells. In: Lecture notes on data engineering and communications technologies, pp 263–278 8. Das A, Das HS, Choudhury A, Neog A, Mazumdar S (2021) Detection of Parkinson’s disease from hand-drawn images using deep transfer learning. In: Intelligent learning for computer vision, pp 67–84 9. Oza A, Bokhare A (2022) Diabetes prediction using logistic regression and K-nearest neighbor. In: Lecture notes on data engineering and communications technologies, pp 407–418 10. Huang D, Wang S, Liu Z (2021) A systematic review of prediction methods for emergency management. Int J Dis Risk Reduct 62:102412 11. Goffman RM, Harris SL, May JH, Milicevic AS, Monte RJ, Myaskovsky L, Rodriguez KL, Tjader YC, Vargas DL (2017) Modeling patient no-show history and predicting future outpatient appointment behaviour in the Veterans Health Administration. Mil Med 182 12. Hong SH, Alaeddini A (2017) A multi-way multi-task learning approach for multinomial logistic regression. Methods Inf Med 56:294–307 13. Ding X, Gellad ZF, Mather C, Barth P, Poon EG, Newman M, Goldstein BA (2018) Designing risk prediction models for ambulatory no-shows across different specialties and clinics. J Am Med Inform Assoc 25:924–930 14. Mohammadi I, Wu H, Turkcan A, Toscos T, Doebbeling BN (2018) Data analytics and modeling for appointment no-show in community health centers. J Prim Care Commun Health 9:215013271881169 15. Chua SL, Chow WL (2018) Development of predictive scoring model for risk stratification of no-show at a public hospital specialist outpatient clinic. Proc Singap Healthc 28:96–104 16. Ahmad MU, Zhang A, Mhaskar R (2019) A predictive model for decreasing clinical no-show rates in a primary care setting. Int J Healthc Manag 14:829–836 17. Li Y, Tang SY, Johnson J, Lubarsky DA (2019) Individualized no-show predictions: effect on clinic overbooking and appointment reminders. Prod Oper Manag 28:2068–2086 18. Harvey HB, Liu C, Ai J, Jaworsky C, Guerrier CE, Flores E, Pianykh O (2017) Predicting no-shows in radiology using regression modeling of data available in the electronic medical record. J Am Coll Radiol 14:1303–1309
184
J. Joseph et al.
19. Gromisch ES, Turner AP, Leipertz SL, Beauvais J, Haselkorn JK (2020) Who is not coming to clinic? A predictive model of excessive missed appointments in persons with multiple sclerosis. Mult Scler Relat Disord 38:101513 20. Alloghani M, Aljaaf AJ, Al-Jumeily D, Hussain A, Mallucci C, Mustafina J (2018) Data science to improve patient management system. In: 2018 11th international conference on developments in eSystems engineering (DeSE) 21. Daghistani T, AlGhamdi H, Alshammari R, AlHazme RH (2020) Predictors of outpatients’ no-show: big data analytics using apache spark. J Big Data 7 22. Fan G, Deng Z, Ye Q, Wang B (2021) Machine learning-based prediction models for patients no-show in online outpatient appointments. Data Sci Manag 2:45–52 23. Dashtban M, Li W (2021) Predicting non-attendance in hospital outpatient appointments using deep learning approach. Health Syst 1–22 24. Song X, Liu X, Liu F, Wang C (2021) Comparison of machine learning and logistic regression models in predicting acute kidney injury: a systematic review and meta-analysis. Int J Med Inf 151:104484 25. Lynam AL, Dennis JM, Owen KR, Oram RA, Jones AG, Shields BM, Ferrat LA (2020) Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults. Diagn Progn Res 4 26. Xu Y, Goodacre R (2018) On splitting training and validation set: a comparative study of crossvalidation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Test 2:249–262 27. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159 28. Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10 29. Kaur P, Gosain A (2018) Issues and challenges of class imbalance problem in classification. Int J Inf Technol 14:539–545 30. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357
Mixed-Language Sentiment Analysis on Malaysian Social Media Using Translated VADER and Normalisation Heuristics James Mountstephens and Mathieson Tan Zui Quen
Abstract Most work in Sentiment Analysis has so far been in a single-language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian Social Media, which freely combines both Malay and English. The highly-cited and effective English Sentiment Analysis system VADER was converted to Malay for the first time and used in combination with English VADER to create a multi-language Sentiment Analysis system. Significant patterns in noisy Malaysian Social Media text were identified and heuristics for normalising them were devised. Mixed-language VADER with normalisation heuristics was able to achieve a 12% improvement in accuracy as compared to Malay VADER alone. In absolute terms, performance must be improved, but the results obtained here are encouraging for the future continuation of this approach. Keywords Sentiment analysis · Mixed language · VADER · Normalisation · Malaysian social media
1 Introduction Sentiment Analysis is an important and highly-active field of research combining branches of AI that include Natural Language Processing and machine learning [1, 2]. The essential goal of Sentiment Analysis is to determine the valence of input text that is, whether the speaker’s feeling towards the subject is positive, negative, or neutral [2]. This information may have great utility in many applications. For example, in business, where determining customer sentiment towards products can be essential in the processes of development and marketing [3, 4]. The problem remains a challenging one but progress has been made using both machine learning and lexicon-based methods. Machine learning approaches [5–7], based on a variety J. Mountstephens (B) · M. T. Z. Quen Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_15
185
186
J. Mountstephens and M. T. Z. Quen
of algorithms such as standard [8] and recurrent neural networks [9], long shortterm memory (LSTM) networks [10, 11], bidirectional recurrent CNNs [12, 13], multi-layer attention-based CNNs [14, 15], context-based ensemble vectors [16], and capsule and category attention networks [17, 18], all of which require extensive training data but do not require the manual construction of a fixed lexicon with valence labels. Lexicon-based approaches usually work based on the construction of valence-labelled lexicons and a number of rules [19–21]. To date, most work in Sentiment Analysis has been in a single language, namely English. However, mixed-language environments are common. For example, in Malaysia, the use of Bahasa Rojak (literally, “mixed language”) which combines native Malay with English is very common in informal situations [21]. Take for example, the following sentences found on Malaysian twitter: “aku kira good mood” (“I think I’m in a good mood”) and “woww. after kau complain about your skin. now yg manis manis pulak.” (“wow. after you complain about your skin. now it’s so sweet.”). It can be seen that both English and Malay supply sentiment content and also neutral and functional words. The mixing of languages is extensive enough to motivate government initiatives to preserve the pure Malay language [22], and within the specific domain of Sentiment Analysis in Malaysia it has been acknowledged that local Social Media text contains both Malay and English [23–25]. However, so far, this issue and its implications have not been widely considered. Mixed language can complicate the process of Sentiment Analysis. Clearly, approaches based on sentiment lexicons will have to deal with multiple lexicons rather than a single one. But even machine learning-based methods could still be affected tacitly by the greater complexity of a multilingual search space. This work explores the use of lexicon-based Sentiment Analysis in a mixedlanguage environment consisting of two languages: Malay and English. The specific domain will be Malaysian social media and the Sentiment Analysis method used will be the lexicon-based algorithm Valence Aware Dictionary and Sentiment Reasoner (VADER) which has currently been cited more than 3400 times and is known to perform successfully on English social media [26]. VADER can achieve an F1 score of 0.96 on English tweets and this success has led to work in translating it to the German, Swedish, Bengali, and Assamese languages [27–30]. Here, we explore its translation to Malay using both machine translation and manual translation. VADER in two languages will supply the lexicons and the basic rules for processing the social media text used in this research. The two lexicons will be combined into a mixedlanguage Sentiment Analysis system that is intended to provide greater coverage and better performance than a single-language system. The choice of social media text as a problem domain is also challenging. More structured domains such as online reviews express sentiment explicitly and have been the subject of considerable research effort. However, sentiment expressed in less explicit and semi-structured data is now becoming more important. In particular, social media as a source of Sentiment Analysis data has been increasingly studied [4]. Social media text often contains important personal opinions but it also presents difficulties in terms of processing. Slang, short form, misspellings, and typos are
Mixed-Language Sentiment Analysis on Malaysian Social Media Using …
187
frequent features of social media text that can be problematic for Sentiment Analysis algorithms [3, 4]. These issues make the process of text normalisation before performing Sentiment Analysis important [22, 23]. In the work presented here, the analysis of patterns of abnormal text in Malaysian tweets will be used to develop suitable normalisation techniques. It should be noted that this is exploratory work on a neglected issue and therefore it is not supposed to yield a perfect Sentiment Analysis system that outperforms all others. Our intention here was to see the relative effects on performance for each lexicon individually, and especially in combination. The results are encouraging for relative improvement but it was not the intention to compare performance to existing Malaysian Sentiment Analysis systems. A key reason for this is that unlike Sentiment Analysis in English, Malaysian Sentiment Analysis researchers use only private datasets and do not make the code for their algorithms public. This means that direct comparison using the same dataset and algorithms is not possible as it is in English Sentiment Analysis. However, we will comment on comparative performance in the conclusion.
2 Literature Review 2.1 Valence Aware Dictionary and Sentiment Reasoner (VADER) The main component of VADER [26] is its lexicon of sentiment words and emoticons which were obtained from established sentiment word lists and labelled for sentiment by ten native English speakers. VADER has 7517 words in its lexicon, including 450 common emoticons and short forms. As well as its lexicon, VADER also handles a number of features of particular salience in a social media environment. These include: features that can modify sentiment intensity (exclamation and interrogation marks), capitalisation and boosters, which also intensify sentiment-laden words, negators, which invert a word’s valence, and several other rules. For an input sentence consisting of n tokens, VADER attempts to find a match for each word in its lexicon and then calculates their combined sentiment level. This raw valence can be modified if exclamation marks and capitalisation are present. Words without a match may still be identified as either negators or boosters which will then modify the valence of other words in the sentence. VADER’s final outputs are three separate sentiment ratings (positive, negative, neutral) for the input sentence, and also their weighted sum in a single (compound) rating. When tested on English social media input text, VADER was able to achieve an F1 score of 0.96, which was superior to SVM and Naïve Bayes classifier performance on the same dataset. VADER’s impressive performance has motivated several attempts to convert it from English to other languages. VADER in Swedish employed automatic translation for this purpose [27]. VADER in German [28], Bengali [29], and Assamese [30]
188
J. Mountstephens and M. T. Z. Quen
carried out the same construction method as the English original by using the ratings of human subjects. Although all of these conversions failed to achieve the same level of performance as the original, they do provide some degree of encouragement that a Malay language version of VADER might be feasible.
2.2 Text Normalisation Social media data is known to be highly noisy. Short forms, errors in grammar and punctuation, typos, and idiomatic patterns all combine to necessitate some form of preprocessing before the main task of Sentiment Analysis is carried out [4]. For lexicon-based systems, the goal of normalisation is to make the aberrant text correspond to terms in the lexicon so that it can be recognised. For example, consider the positive valence sentence “He was already smling” which contains a typo. If the token “smling” is not recognised as the positive valence word “smiling” in a sentiment lexicon then then the sentence would be classed as neutral overall. There are generic NLP techniques for normalisation which include stemming, stopword removal, and lemmatisation [2]. And there are techniques that are tailored to the particular characteristics of a chosen domain or context. The Malaysian context has been noted as particularly challenging in this respect [23]. Malaysian researchers have observed a number of characteristic abnormalities in local social media. Samsudin et al. [23] identified 10 characteristic distortions that Malaysians commonly carry out. For example, Malaysians frequently remove all vowels from a word (e.g. the Malay word for “school”, sekolah becomes sklh) or use only the first and last characters in a word (e.g. the Malay word for “village”, kampong becomes kg). Bakar et al.[24] and Handayani et al. [26] have also conducted similar work on identifying characteristic distortions. These identified patterns have led to normalisation heuristics that improve recognition and subsequent performance in NLP tasks and Sentiment Analysis. However, the online world of Malaysian social media is varied and constantly evolving which means that there may still be some abnormal patterns which are yet to be identified. We will identify two such patterns later.
3 Method This research was broken into a number of individual tasks, as shown in Fig. 1. Initially, the English VADER lexicon was translated to Malay by both machine methods and manual translation. Malaysian tweet data was then collected and labelled with three valence levels (positive, neutral, negative). This social media data was then analysed for patterns of aberrant text that may potentially be corrected by heuristics that will normalise them. English and Malay VADER were combined into a multi-language Sentiment Analysis system in two ways (described next) and the
Mixed-Language Sentiment Analysis on Malaysian Social Media Using …
189
Fig. 1 Mixed-language sentiment analysis system
normalisation heuristics were tested for their effectiveness. Lastly, the performance of the various combinations was evaluated for accuracy and F1 score, as compared to single-language performance. The freely-available nltk implementation of VADER was used here [31] and a significant amount of custom Python code was created to integrate the different components. The Google Translate API [32] was used for machine translation purposes. The purpose of the multi-language Sentiment Analysis system is to combine VADER in English and Malay to cover a greater amount of sentiment content in Malaysian social media. In this early exploration, we tried two methods of combination. In “English Override” mode, tweets are processed in Malay by default and only if English sentiment words are detected in the text will English VADER supersede the Malay prediction. In “Winner Takes All” mode, each input tweet was processed by VADER in both languages and the predicted valence with the highest absolute value is taken to be the winner. These two modes are illustrated in Fig. 2. Finally, the normalisation heuristics identified earlier were combined with these multi-language modes and the results were compared to the performance of Malay VADER alone in order to demonstrate the importance of English sentiment content in Malaysian social media.
4 Results 4.1 VADER Translation to Malay Initially, the Python Google Translate API [32] was used to translate the 7517 items in the VADER lexicon. However, English and Malay are significantly different
190
J. Mountstephens and M. T. Z. Quen
Fig. 2 Mixed-language sentiment analysis modes
languages and so this process was not without issue. Three hundred nine English words were found to lack any Malay equivalent. Inspection revealed that most of them were idiomatic and infrequently used (for example, “wisenheimer” and “woebegone”) and therefore were not considered fatal for this pilot study. However, a more serious issue was the finding that many different English words translated to a single Malay word. For example: “ache”, “ached”, “aches”, “aching”, “hurters”, “hurts”, “ill”, “pain”, “pains”, “sick”, “sickened”, “sickening”, “sickens”, and “sore” all were translated as the Malay word “sakit” (sick). Some of this is a result of the smaller Malay lexicon where distinctions in English do not exist in Malay and another factor is that Google Translate failed to properly conjugate verbs in Malay. In the end, only 4719 different Malay words were found as equivalents to the 7517 English words, which is a significant loss of coverage. For these reasons, time-consuming manual translation was carried out instead. A native Malay speaker with university-level proficiency in English did the main work, followed by three other native Malay speakers who validated the translation. Although this was effort-intensive, the results were more natural and accurate than those achieved by Google Translate, as can be seen in the examples given in Table 1. This manual translation was able to achieve subtler distinctions than Google and yielded 5195 different terms for the original 7517 English words.
Mixed-Language Sentiment Analysis on Malaysian Social Media Using …
191
Table 1 Examples of Google conflations and manual corrections Valence
English
Google Malay
Manual Malay
−1.2
Avoid
Berkabung
Mengelak
−1.7
Avoidance
Berkabung
Elakkan
−1.1
Avoidances
Berkabung
Elakkanelakkan
−1.4
Avoided
Berkabung
Dihindari
−2
Irritating
Orang yang kalah
Menjengkelkan
−2
Irritatingly
Orang yang kalah
Secara menjengkelkan
Table 2 Malaysian food tweet dataset Food
Negative tweets
Kek Batik
125
Neutral tweets 363
Positive tweets 970
Total 1458
Kek Lapis
35
90
145
270
Mee Goreng
72
182
133
387
Nasi Ayam Penyet
45
110
475
630
Roti Canai
145
685
694
1524
Total
421
1430
2418
4269
4.2 Social Media Data Collection In order to test the translated lexicon, 4269 tweets concerning 5 types of popular Malaysian foods were collected using the Tweepy API and then labelled for valence manually. This dataset is comparable in size with much existing work. Foods were chosen as a topic because Malaysians are known to hold strong feelings towards food and it was hoped that clear sentiment would be displayed in the data. The distribution of tweets by food type and valence is shown in Table 2. It can be seen that the distribution of valence is uneven; the majority of tweets have positive valence, followed by neutral, and only about 10% are negative tweets. This imbalance could potentially be a problem for the proper training of a machine learning-based system but it is not a problem for VADER since it is lexicon-based and does not require training. These skewed proportions seem to reflect the informal understanding that Malaysians feel positively about these popular foods.
4.3 Normalisation Heuristic Identification With this tweet data, it was possible to analyse and identify any frequently occurring patterns of abnormal text. Two hitherto undetected patterns were found, which we name the “repetitive letter” and “joint words” patterns. With the patterns identified it was then possible to devise heuristics intended to normalise them.
192
J. Mountstephens and M. T. Z. Quen
Repetitive Letter Pattern. It appears common in Malay social media for letters in a word to be repeated to emphasise emotion. For example, the word “sedap” (meaning delicious) might be written as sedapppppppp! This repetition is often found as a suffix but can also occur within a word, for example, “heeeeeeeeey!”. It is clear that these aberrant word forms would be absent from any standard lexicon and would therefore be classed as words with neutral valence despite their strong sentiment. The Repetitive Letter heuristic tries to detect this situation and to handle it. If repeated letters are found within a word, the repeated subsection is trimmed one character at a time until either (i) it is the empty string or (ii) the now trimmed word can be found in either the English or the Malay lexicon. It is only necessary to normalise sentimentcharged words (i.e. only those words found in the VADER lexicons) because any others would have neutral valence by default. Joint Words Pattern. The Malay language has numerous multi-word entities which are joined by hyphens. For example, “peminat-peminat” (meaning “fans” of a football team) and “sikap-jalang” (meaning “bad attitude”). Unfortunately, users of social media frequently omit these important hyphens and when the component words are taken on their own, they often have very divergent meanings and sentiment to that of the multi-word entity. The “Joint Words” heuristic tries to insert omitted hyphens in order to make it more likely that VADER will recognise these multi-word entities. Essentially, all multi-word units in the translated Malay VADER lexicon are inserted into a search tree with a maximum depth of four. Each level of the search tree corresponds to its location in the multi-word unit. To process a tweet containing X words, every possible window of four consecutive tokens is input to the search tree and if a match of n tokens in length is found, those individual tokens are replaced by one hyphenated token. Pattern Matching. The “repetitive letter” and “joint words” heuristics were based on the unique properties of social media in Malaysia. However, it was also thought that a more generic normalisation method might also be effective. Therefore, we also used the Gestalt Pattern Matching algorithm (in its Python difflib library implementation [33]) to find the closest matches to any tokens unrecognised by VADER in either language. A threshold of 85% matching confidence was used to replace any unrecognised words.
4.4 Mixed-Language Sentiment Analysis Results Single-Language Baseline. In order to make an assessment of mixed-language performance, the tweet data was first processed by VADER in English and Malay alone. As can be seen in Table 3, English VADER performed at about random (0.30) for a three class problem. This is as might be expected since the tweets contain more Malay content than English, making the unrecognised Malay words appear as neutral, as seen in the confusion matrix. Malay VADER did better than English
Mixed-Language Sentiment Analysis on Malaysian Social Media Using …
193
Table 3 Sentiment analysis performance by English and Malay VADER alone English VADER −
Confusion matrix
Malay VADER 0
+
−
0
+
−
50
331
40
−
151
169
101
0
112
1150
168
0
264
839
327
+
103
1968
347
+
315
1428
675
Accuracy
0.30
0.39
F1 score
0.30
0.39
VADER (0.39) and was able to make some genuine distinctions between positive and negative tweets. Although this performance is disappointing overall, the 0.39 accuracy gives us a reference point to judge any improvements made by a mixed-language method. Mixed-Language Approach. As stated earlier, the English and Malay VADERs were combined in two possible modes. In “English Override” mode, text is processed using Malay VADER by default but if sentiment words in English are found then English VADER will be used instead. In contrast, “Winner Takes All” mode simply takes the highest absolute predicted valence in either language. As shown in Table 4, “English Override” mode was found to perform better than English VADER alone but worse than Malay VADER. However, “Winner Takes All” mode gave an 8% increase in accuracy over Malay VADER, showing that the greater coverage provided by the two lexicons was useful and effective. Mixed-Language Approach with Normalisation. The “Winner Takes All” mode was then combined with the normalisation heuristics identified earlier and the following results were obtained, as shown in Table 5. The repetitive letters heuristic was able to normalise a number of aberrant words found in the tweets. For example, “Heeeey” was normalised to “hey” and “sedapppppppp…..” was normalised to “sedap” (meaning delicious). However, neither it nor the joint words heuristic were able to yield any overall improvement in accuracy or F1 score. In contrast, the pattern matching method of normalisation was Table 4 Mixed-language sentiment analysis performance Mixed language: english override
Mixed language: winner takes all
−
0
+
−
0
+
−
109
222
91
−
91
134
96
0
138
1012
280
0
331
706
393
+
165
1733
520
+
353
954
1111
Confusion matrix
Accuracy
0.38
0.47
F1 score
0.36
0.49
194
J. Mountstephens and M. T. Z. Quen
Table 5 Mixed-language sentiment analysis performance with normalisation Repetitive letters −
Confusion matrix
Joint words 0
+
−
0
+ 96
−
96
129
96
−
191
134
0
326
681
423
0
331
696
403
+
358
924
1136
+
353
954
1111
Accuracy
0.47
0.47
F1 score
0.49
0.49
Table 6 Mixed-language sentiment analysis performance with pattern matching Pattern matching −
0
−
206
99
116
0
313
666
451
+
348
780
1290
Confusion matrix
Accuracy
0.51
F1 score
0.52
+
able to give an improvement of 2% accuracy over the basic mixed-language Sentiment Analysis, as shown in Table 6. This can be seen to arise from greater numbers of correct positive and negative valence predictions.
5 Conclusion This work is an early exploration of a neglected issue in Sentiment Analysis, namely that of mixed-language environments. We have conducted time-consuming manual translation of the highly-cited VADER system to the Malay language and identified two patterns of aberrant text common in Malaysian social media. Our combination of both English and Malay VADER and pattern matching normalisation was able to achieve greater coverage and an improvement of 12% accuracy over Malay VADER alone. As stated earlier, achieving absolute performance was not the purpose of this exploratory paper and that direct comparison with existing Malaysian Sentiment Analysis systems is very difficult. However, we still estimate that in absolute terms, our method currently underperforms other comparable Malaysian SA methods and that more work is necessary. In absolute terms, performance must be improved but the results obtained here are encouraging for the future continuation of this approach.
Mixed-Language Sentiment Analysis on Malaysian Social Media Using …
195
Acknowledgements The authors gratefully acknowledge the financial support for this work by Universiti Malaysia Sabah under Grant No. DN20090.
References 1. Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55(7):5731–5780 2. Liu B (2012) Sentiment analysis and opinion mining. Claypool Publishers, Williston VT 3. Qazi A, Raj RG, Hardaker G, Standing C (2017) A systematic literature review on opinion types and sentiment analysis techniques: tasks and challenges. Internet Res 27(3):608–630 4. Ligthart A, Catal C, Tekinerdogan B (2022) Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev 54(2):4997–5053 5. Khairnar J, Kinikar M (2013) Machine learning algorithms for opinion mining and sentiment classification. Int J Sci Res Publ 3(6):1–6 6. Nagaraj P, Deepalakshmi P, Muneeswaran V, Muthamil Sudar K (2022) Sentiment analysis on diabetes diagnosis health care using machine learning technique. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems, vol 114. Lecture notes on data engineering and communications technologies. Springer, Singapore, pp 491–502 7. Rajalakshmi R, Reddy P, Khare S, Ganganwar V (2022) Sentimental analysis of code-mixed hindi language. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems, vol 111. Lecture notes on data engineering and communications technologies. Springer, Singapore, pp 739–751 8. Singal A, Thiruthuvanathan MM (2022) Twitter sentiment analysis based on neural network techniques. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems, vol 114. Lecture notes on data engineering and communications technologies. Springer, Singapore, pp 33–48 9. Cai Y, Huang Q, Lin Z, Xu J, Chen Z, Li Q (2020) Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: a multi-task learning approach. Knowl-Based Syst 203(1):1–12 10. Alarifi A, Tolba A, Al-Makhadmeh Z, Said W (2020) A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. J Supercomput 76(6):4414–4429 11. Wang S, Zhu Y, Gao W, Cao M, Li M (2020) Emotion-semantic-enhanced bidirectional LSTM with multi-head attention mechanism for microblog sentiment analysis. Information 11(5):280– 290 12. Abid F, Li C, Alam M (2020) Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks. Comput Commun 157:102–115 13. Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483–490 14. Hajek P, Barushka A, Munk M (2020) Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Comput Appl 32(23):17259–17274 15. Zhang S, Xu X, Pang Y, Han J (2020) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 51(3):2089–2103 16. Wankhade M, Annavarapu CSR, Verma MK (2021) CBVoSD: context based vectors over sentiment domain ensemble model for review classification. J Supercomput 78(1):1–37 17. Zhang B, Li X, Xu X, Leung KC, Chen Z, Ye Y (2020) Knowledge guided capsule attention network for aspect-based sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process 28(3):2538–2551
196
J. Mountstephens and M. T. Z. Quen
18. Xi D, Zhuang F, Zhou G, Cheng X, Lin F, He Q (2020) Domain adaptation with category attention network for deep sentiment analysis. In: Proceedings of the web conference 2020. ACM, New York, pp 3133–3139 19. Yang L, Li Y, Wang J, Sherratt RS (2020) Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 8(4):23522–23530 20. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113 21. Chekima K, Alfred R (2018) Sentiment analysis of Malay social media text. In: Alfred R, Iida H, Ibrahim AA, Lim Y (eds) Computational science and technology. ICCST 2017. Lecture notes in electrical engineering, vol 488. Springer, Singapore, pp 56–62 22. Wikipedia. https://en.wikipedia.org/wiki/Bahasa_Rojak. Last accessed 1 June 2022 23. Samsudin N, Puteh M, Hamdan AR, Ahmad Nazri MZ (2013) Normalization of noisy texts in Malaysian online reviews. J Inf Commun Technol 12(2):147–159 24. Bakar M, Idris N, Shuib L, Khamis N (2020) Sentiment analysis of noisy Malay text: state of art, challenges and future work. IEEE Access 8(1):24687–24696 25. Handayani D, Awang Abu Bakar NS, Yaacob H, Abuzaraida MA (2018) Sentiment analysis for Malay language: systematic literature review. In: Proceedings 2018 international conference on information and communication technology for the Muslim world (ICT4M). IIUM, Malaysia, pp 305–310 26. Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the international AAAI conference on web and social media. AAAI, Oxford, pp 216–225 27. Gustafsson M (2022) Sentiment analysis for tweets in Swedish—using a sentiment Lexicon with syntactic rules. Bachelor’s thesis. http://www.diva-portal.org/smash/get/diva2:1391359/ FULLTEXT01.pdf. Last accessed 1 June 2022 28. Tymann K, Lutz M, Palsbroker P, Gips C (2019) GerVADER—a German adaptation of the VADER sentiment analysis tool for social media texts. In: Proceedings conference at HumboldtUniversity zu Berlin. University of Berlin, Berlin, pp 1–12 29. Amin A, Hossain I, Akther A, Alam KM (2019) Bengali VADER: a sentiment analysis approach using modified VADER. In: Proceedings 2019 international conference on electrical, computer and communication engineering (ECCE). IEEE, Bangladesh, pp 1–6 30. Dev C, Ganguly A, Borkakoty H (2021) Assamese VADER: a sentiment analysis approach using modified VADER. In: Proceedings 2021 international conference on intelligent technologies (CONIT). IEEE, India, pp 1–5 31. NLTK. https://www.nltk.org/_modules/nltk/sentiment/vader.html. Last accessed 1 June 2022 32. PyTrans. https://pypi.org/project/pytrans/. Last accessed 1 June 2022 33. Difflib. https://docs.python.org/3/library/difflib.html. Last accessed 1 June 2022
Impact of Feature Selection Techniques for EEG-Based Seizure Classification Najmusseher and M. Umme Salma
Abstract A neurological condition called epilepsy can result in a variety of seizures. Seizures differ from person to person. It is frequently diagnosed with fMRI, magnetic resonance imaging and electroencephalography (EEG). Visually evaluating the EEG activity requires a lot of time and effort, which is the usual way of analysis. As a result, an automated diagnosis approach based on machine learning was created. To effectively categorize epileptic seizure episodes using binary classification from brain-based EEG recordings, this study develops feature selection techniques using a machine learning (ML)-based random forest classification model. Ten (10) feature selection algorithms were utilized in this proposed work. The suggested method reduces the number of features by selecting only the relevant features needed to classify seizures. So to evaluate the effectiveness of the proposed model, random forest classifier is utilized. The Bonn Epilepsy dataset derived from UCI repository of Bonn University, Germany, the CHB-MIT dataset collected from the Children’s Hospital Boston and a real-time EEG dataset collected from EEG clinic Bangalore is accustomed to the proposed approach in order to determine the best feature selection method. In this case, the relief feature selection approach outperforms others, achieving the most remarkable accuracy of 90% for UCI data and 100% for both the CHB-MIT and real-time EEG datasets with a fast computing rate. According to the results, the reduction in the number of feature characteristics significantly impacts the classifier’s performance metrics, which helps to effectively categorize epileptic seizures from the brain-based EEG signals into binary classification. Keywords EEG · Epilepsy · Seizure · Machine learning · Classification · Feature selection
Najmusseher · M. Umme Salma (B) Department of Computer Science, CHRIST (Deemed to be University), Bengaluru 560029, India e-mail: [email protected] Najmusseher e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_16
197
198
Najmusseher and M. Umme Salma
1 Introduction Over 70 million people worldwide have epilepsy, the second most prevalent and severe brain condition [1]. It may be treated surgically or with an anti-epileptic drugs in some cases. However, following the first diagnosis, 20–30% of them would probably develop worse and some would even stop responding to the current treatment [2]. If patients and caregivers recognize seizures early enough, they may be able to take the necessary precautions to lessen the risk of harm [3]. A seizure is an uncontrollable electrical discharge that causes abnormal neural activity in the cortical brain areas. A group of nerve cells begins firing excessively and simultaneously as a response. Epileptic individuals are typically those who have frequent, unprovoked seizures [4]. Seizures come in various ways; however, the International League Against Epilepsy (ILAE) categorized epileptic seizures into two main categories such as: focal and generalized seizures [5]. Focal seizures start in a specific brain area and can spread to other areas. Generalized seizures, on the other hand, start in bilateral hemisphere regions and spread fast to all cortical regions. EEG is used to examine and observe brain signals and track electrical activity at brain synapses. EEG is, therefore, less noisy when comparing and evaluating the brain signals using scalp electrodes, and seizures may often be detected earlier using intracranial electrodes [6]. Neurologists frequently use EEG signals to identify brain diseases or activities. By visually examining the EEG signal, the doctor may be able to identify the brain regions that are infected. It takes time and is prone to human mistakes to visually analyze numerous channels to find the EEG signal produced by the brain. Therefore, a computer diagnostic system is much needed to reliably identify EEG-based epileptic seizure signals. This research contributes to creating a machine learning method for identifying epileptic seizures from an EEG dataset. Although adding more features increases the model’s complexity, this approach makes the model capable of providing a better detection rate by increasing the effectiveness and real-time performance of prediction model. The dimensionality of the EEG dataset must be minimized in the proposed ML strategy to categorize an epileptic episode in binary form. This is done using feature selection algorithms. The suggested approach is tested using random forest (RF) classifier, being the most notable and efficient supervised machine learning models. Performance parameters are calculated to evaluate the effectiveness of the suggested ML approach, such as the accuracies of each strategy. The EEG dataset from the UCI and CHB-MIT data repositories is utilized to validate the suggested methodology. On real-time EEG data, the proposed work’s effectiveness has been evaluated. The findings indicate a positive response for the most accurate and lowdimensional classification of epileptic seizure and healthy subjects. The paper is organized in the following manner: Sect. 2 examines literature review. The experimental datasets are described in Sect. 3, and the methodology which includes a flowchart of the suggested technique, methods for feature selection, a classification algorithm and performance analysis explained in Sect. 4. Results and discussion is presented in Sects. 5 and 6 concludes the proposed work.
Impact of Feature Selection Techniques for EEG-Based …
199
2 Literature Review By removing superfluous and unnecessary data, feature selection provides a straightforward yet efficient solution to the curse of dimensionality, especially in medical data, it enhances the cognition for the ML model and improves prediction accuracy. Many researchers developed diverse methods to feature selection including, Harpale V et al. suggested a model to categorize the seizure states based on EEG data using time and frequency parameters. Hypothetical testing and a fuzzy classifier have been used to improve feature selection and classify EEG data. The model was able to deliver an accuracy of 96.48% [7]. Shufang Li et al. offer a novel approach based on support vector machines and empirical mode decomposition (EMD) for feature extraction and pattern detection of ictal EEG. The model produces encouraging results [8]. Using multi-channel EEG signals, Hadi Ratham Al Ghayab et al. introduce a new feature extraction and selection technique. The sequential feature selection (SFS) approach is used in this model. The least-square support vector machine (LS SVM) model is used to classify the selected features. According to the experimental findings, the approach has a 99% accuracy rate [9]. To choose standout characteristics from the high-dimensional breast cancer dataset, Salma MU et al. presented a combination strategy utilizing clustering and stochastic approaches. The particle swarm optimization (PSO) algorithm incorporates the quick K-means algorithm to provide results with an accuracy of 99.39% [10]. Mukesh Saraswat et al. developed a feature selection technique based on an enhanced biogeography-based optimization algorithm to select the most significant characteristics. A comparison analysis utilizing several classifiers has been examined on the selected characteristics [11]. Salma MU et al. proposed a feature selection approach for the Breast Cancer Surveillance Consortium (BCSC) dataset, which produced a small feature subset with a classification accuracy of 98% after subjecting the dataset to constraintgoverned association rule mining [12]. M. D’Alessandro et al. proposed a comprehensive genetic search strategy for EEG data collected from several intracranial electrodes. The model attained an average prediction probability with 62.5% sensitivity [13]. A Aarabi et al. presented an automated feature-based seizure detection system for infants. The acquired EEG data were parameterized in this model using the ReliefF technique. The findings revealed a 91% average seizure detection rate [14].
3 Experimental Datasets The EEG datasets for the proposed work are collected from UCI and CHB-MIT online repositories. The summary of the datasets is provided in further sections.
200 Table 1 Class labels in UCI dataset Label (y) 1 2 3 4 5
Najmusseher and M. Umme Salma
Description Recordings made from seizure activity Recordings made from the tumor’s location Recordings made from of healthy brain Recordings made when eyes closed Recordings made when eyes opened
3.1 UCI Epilepsy Dataset The data collection is split into five subsets, each including 100 single-channel recordings of 500 people lasting 23.6 s, as gathered using the international 10–20 electrode placement technique. The same electrode system the channel is used to record all signals [15, 16]. Table 1 provides a thorough summary of the dataset.
3.2 Children’s Hospital Boston, Massachusetts Institute of Technology (CHB-MIT) Epilepsy Dataset A PhysioNet server hosts this dataset, produced by Children’s Hospital Boston at Massachusetts Institute of Technology (CHB-MIT) is publicly accessible and is one of the benchmark datasets in the field of epilepsy research. It was collected by employing the Cygwin application, which connects with the PhysioNet server. It gives the number of seizure and non-seizure EEG recordings for each CHB patient. There are 23 individuals in the trial, aged 3–22 years, including 5 boys and 17 girls. The seizure and non-seizure recordings for every patient are saved in European data format (.edf), which shows the spikes with seizure start and finish timings and may be examined using an “EDFbrowser”. The core datasets are in 1D format and comprise EEG signals that were recorded using a variety of channels positioned on the brain’s surface following the 10–20 International System. All signals in the dataset were sampled at a rate 256 Hz [17, 18].
3.3 Real-Time Epilepsy Dataset The data collected in .csv files with raw waveform signals from 16 probes or channels placed around the scalp. The sampling rate is 1024 Hz, allowing frequency with respect to channels is up 256 Hz. Two separate subsets of the dataset, each lasting 10 s, are created and collected using the international 10–20 electrode placement technique. The recordings were collected from EEG clinic Bangalore comprised of seizure and healthy subjects.
Impact of Feature Selection Techniques for EEG-Based …
201
Fig. 1 Flowchart of the proposed work
4 Methodology The main objective of this study is to comprehensively analyze feature selection methods applied to EEG data processing to improve classification accuracy for providing best prediction results. The following are the ten (10) main feature selection strategies used in this work: Relief, F-classification (F-classif), Mutual Information (Mutual Info), Low Variance, Univariate, Recursive Feature Elimination with Random Forest model (RFERF), Tree based, Recursive Feature Elimination with Linear Regression (RFE-LR), L1-regularization or Lasso (L1-based) and Embedded approach. The Python 3.7 language is used to implement each approach. Figure 1 illustrates the flowchart for the proposed model. The first step is EEG signal acquisition. The signals are acquired by proper positioning of the sensors over the scalp. The captured signals are then subjected to the next step called data preprocessing, which helps to remove artifacts or noise in the datasets. In this proposed model, the collected EEG data utilizes .csv data format. As a preprocessing, we standardize the data using a standard scalar. The third step is feature selection where the dimensionality of the signals will be reduced by selecting the most appropriate or relevant features from the datasets. The feature selection techniques help to improve the predictive performance of machine learning classifiers, and it also helps in emphasizing or detecting components of interest in an acquired EEG signal. Furthermore, using the classification model correctly classifies the seizure and healthy subjects. Here, we have used the random forest classification model, which constructs decision trees on various samples and uses their majority decision for the classification.
4.1 Feature Selection One of the crucial phases in addressing machine learning problems is feature selection. Feature selection is an automatic method that selects relevant features, reducing the input variable by considering only relevant data and omitting the undesired noise in the data.
202
Najmusseher and M. Umme Salma
Fig. 2 Feature selection methods hierarchy
The key component of the data analysis is feature selection, which involves mapping the existing feature characteristics onto a lower-dimensional space which helps to boost model performance and lower the computational time as well as the complexity of data [19]. By excluding pointless or extraneous elements from the feature set as a whole, this approach reduces the dimensionality of input data. It improves the predictive performance of model by selecting essential features and eliminating redundant and irrelevant features. In this proposed work, we have used majorly three different feature selection techniques with ten different methods such as Relief, F-classification (F-classif), Mutual Information (Mutual Info), Low Variance, Univariate, Recursive Feature Elimination with Random Forest model (RFERF), Tree based, Recursive Feature Elimination with Linear Regression (RFE-LR), L1-regularization or Lasso (L1-based) and Embedded approach, respectively. Filter, wrapper and embedded approaches are the three main groups into which the feature selection techniques have been categorized. Figure 2 describes the hierarchy of these methods. Filter Method Filter techniques are generally not computationally expensive because they depend on the statistical properties of explanatory factors and their relationship with the outcome variable [20]. Instead of focusing on cross-validation performance, the filter methods focus on the intrinsic characteristics of the features, i.e., their relevance is often not computationally costly. Relief is one such prominent method under filter method approach. Relief algorithms do not rely on conditions like the attributes conditional independence when estimating their quality. In instances where there are significant connections between attributes, they are practical and can accurately evaluate the quality of attributes. One of the most effective preprocessing algorithms is relief algorithms, primarily considered a feature subset selection approach used in a preprocessing step. They have been effectively applied in various contexts and are general feature estimators. Because it is confined to classification tasks with two classes, we used it for feature selection modeling in this proposed model. Relief algorithm’s main
Impact of Feature Selection Techniques for EEG-Based …
203
principle is to estimate an attribute’s quality by how effectively it can distinguish between instances that are close to one another. Wrapper Method Wrapper techniques use various feature combinations to create a prediction model before choosing the collection of feature characteristics with the best evaluation performance. These methods are typically slow and time-consuming [21]. As a result, choosing the subset of features for large-scale problems is inappropriate. Based on the classifier’s performance, wrapper techniques evaluate the value of the features. The wrapper technique is a learning-related intrinsic model-building measure. Wrapper approaches improve classifier performance to address the realtime problem, but because they include repeated learning stages and cross-validation, they are computationally more expensive than filter methods. Embedded Method Embedded approaches utilize built-in feature selection. During model training, the integrated embedded model automatically chooses features by concurrently doing feature selection and model fitting [22]. Embedded techniques are somewhat comparable to wrapper methods, given that they are also employed to enhance the performance of a learning model.
4.2 Classification: Random Forest Classifier An effective and quick machine learning algorithm is random forest. Because it can efficiently run on massive databases, it requires less training and prediction time and is resilient to nonlinear data distribution [23]. The classifier can explain each processing step in terms that people can comprehend. As a result, it contributes to highly accurate knowledge that is human-interpretable [24]. In this study, the random forest classifier is used since it is proven efficient in managing vast amount of realtime analysis of nonlinear EEG data. According to the evaluation of simulation results of this proposed model, for accurately identifying epileptic episodes in real time, the random forest method with feature selection is efficient and reliable.
4.3 Performance Evaluation With the availability of fundamental terms like true positives (TP) (predicted and positive), false positives (FP) (predicted positive but negative), true negatives (TN) (predicted and negative) and false negatives (FN), the performance of the classification model (random forest classifier) is assessed using these parameters [25]. The authors estimated the model’s accuracy, precision and recall based on these equations, as shown in Eqs. (1), (2) and (3). Accuracy: The ratio of accurately predicted data points among all the available data points is termed accuracy.
204
Najmusseher and M. Umme Salma
ACC =
TP + TN TP + TN + FP + FN
(1)
Precision: The fraction of the data points that are relevant is known as precision. P=
TP TP + FP
(2)
Recall: The ratio of recovered sample points among all related data points is termed as recall. R=
TP TP + FN
(3)
5 Results and Discussion In this proposed approach, authors have utilized three different EEG-based seizure datasets to test the suggested approach. As indicated in section two, benchmark datasets are gathered from online repositories, one of which is real-time EEG data from an EEG clinic in Bangalore. The performance scores of UCI epilepsy dataset, CHB-MIT epilepsy dataset and real-time epilepsy dataset are recorded in the Tables 2, 3 and 4, respectively. All ten feature selection approaches are used in this suggested study to produce reliable results, among them relief feature selection method stands out with the maximum potential accuracy of 100% for both CHB-MIT and real-time epilepsy datasets. In comparison, the relief feature selection technique outperforms others with a 90% accuracy rate for the UCI epilepsy dataset. Compared to other existing models from the literature, the suggested model is straightforward and reliable.
Table 2 Performance scores of UCI epilepsy dataset Feature selection method Test accuracy (%) Relief F-classif Mutual info Low variance Univariate RFE-RF Tree based L1-based RFE-LR Embedded
90 85 85 87 86 86 84 86 85 83
Train accuracy (%) 100 100 100 100 100 100 100 100 100 100
Impact of Feature Selection Techniques for EEG-Based … Table 3 Performance scores of CHB-MIT epilepsy dataset Feature selection method Test accuracy (%) Relief F-classif Mutual info Low variance Univariate RFE-RF Tree based L1-based RFE-LR Embedded
100 100 100 100 100 100 100 100 100 100
Table 4 Performance scores of real-time epilepsy dataset Feature selection method Test accuracy (%) Relief F-classif Mutual info Low variance Univariate RFE-RF Tree based L1-based RFE-LR Embedded
100 100 100 100 100 99 100 100 99 99
205
Train accuracy (%) 100 100 100 100 100 100 100 100 100 100
Train accuracy (%) 100 100 100 100 100 100 100 100 100 100
The proposed study provides a clear insight to data scientists working on epileptic seizure detection using EEG signals with a list of the most promising feature selection techniques based on the prediction performance accuracies.
6 Conclusions The suggested model helps in clearly differentiating between seizure and non-seizure patterns by taking into account a few straightforward feature selection techniques for an epileptic seizure classification. It should be emphasized that the outcomes of this work can be regarded as promising because there is not much literature on feature selection which helps to enhance the efficacy of epileptic seizure classification or prediction. In order to classify seizures and healthy subjects from EEG signals, a
206
Najmusseher and M. Umme Salma
feature selection model is integrated with the capabilities of supervised machine learning classifiers like the random forest classifier. The collected findings reveals that the technique was quite efficient in terms of accuracy and processing speed. In comparison with all the datasets used and the processing time utilized for the proposed model, the relief approach was used to obtain high-quality features with reduced dimensions. These characteristics features assisted the random forest model in achieving the best level of classification accuracy. Additionally, the suggested approach can be used for similar issues where high accuracy and little computational complexity are desired.
References 1. Hussein R, Ahmed MO, Ward R, Wang ZJ, Kuhlmann L, Guo Y (2019) Human intracranial EEG quantitative analysis and automatic feature learning for epileptic seizure prediction. arXiv preprint arXiv:1904.03603. https://doi.org/10.48550/arXiv.1904.03603 2. Yang S, Li B, Zhang Y, Duan M, Liu S, Zhang Y, Feng X, Tan R, Huang L, Zhou F (2020) Selection of features for patient-independent detection of seizure events using scalp EEG signals. Comput Biol Med 119:103671. https://doi.org/10.1016/j.compbiomed.2020.103671 3. Singh A, Trevick S (2016) The epidemiology of global epilepsy. Neurol Clin 34(4):837–847. https://doi.org/10.1016/j.ncl.2016.06.015 4. Farahmand S, Sobayo T, Mogul DJ (2018) Noise-assisted multivariate EMD-based mean-phase coherence analysis to evaluate phase-synchrony dynamics in epilepsy patients. IEEE Trans Neural Syst Rehabil Eng 26(12):2270–2279. https://doi.org/10.1109/TNSRE.2018.2881606 5. Fisher RS, Boas WV, Blume W, Elger C, Genton P, Lee P, Engel J Jr (2005) Epileptic seizures and epilepsy: definitions proposed by the International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE). Epilepsia 46(4):470–472. https://doi.org/10.1111/ j.0013-9580.2005.66104.x 6. Parvizi J, Kastner S (2018) Promises and limitations of human intracranial electroencephalography. Nat Neurosci 21(4):474–483. https://doi.org/10.1038/s41593-018-0108-2 7. Harpale V, Bairagi V (2021) An adaptive method for feature selection and extraction for classification of epileptic EEG signal in significant states. J King Saud Univ Comput Inform Sci 33(6):668–676. https://doi.org/10.1016/j.jksuci.2018.04.014 8. Li S, Zhou W, Yuan Q, Geng S, Cai D (2013) Feature extraction and recognition of ictal EEG using EMD and SVM. Comput Biol Med 43(7):807–816. https://doi.org/10.1016/j. compbiomed.2013.04.002 9. Ghayab HR, Li Y, Abdulla S, Diykh M, Wan X (2016) Classification of epileptic EEG signals based on simple random sampling and sequential feature selection. Brain inform 3(2):85–91. https://doi.org/10.1007/s40708-016-0039-1 10. Salma MU (2016) PSO based fast K-means algorithm for feature selection from high dimensional medical data set. In: 2016 10th international conference on intelligent systems and control (ISCO), pp 1–6. IEEE. https://doi.org/10.1109/ISCO.2016.7727092 11. Saraswat M, Pal R, Singh R, Mittal H, Pandey A, Chand Bansal J (2020) An optimal feature selection approach using IBBO for histopathological image classification. In: Congress on intelligent systems 2020. Springer, Singapore, pp 31–40. https://doi.org/10.1007/978-981-334582-9_3 12. Salma MU (2017) Reducing the feature space using constraint-governed association rule mining. J Intell Syst 26(1):139–152. https://doi.org/10.1515/jisys-2015-0059 13. D’Alessandro M, Esteller R, Vachtsevanos G, Hinson A, Echauz J, Litt B (2003) Epileptic seizure prediction using hybrid feature selection over multiple intracranial EEG electrode
Impact of Feature Selection Techniques for EEG-Based …
14.
15.
16.
17. 18.
19.
20.
21.
22.
23.
24.
25.
207
contacts: a report of four patients. IEEE Trans Biomed Eng 50(5):603–615. https://doi.org/10. 1109/TBME.2003.810706 Aarabi A, Wallois F, Grebe R (2006) Automated neonatal seizure detection: a multistage classification system through feature selection based on relevance and redundancy analysis. Clin Neurophysiol 117(2):328–340. https://doi.org/10.1016/j.clinph.2005.10.006 Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE (2001) Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E 64(6):061907. https://doi.org/10. 1103/PhysRevE.64.061907 Kassahun Y, Perrone R, De Momi E, Berghöfer E, Tassi L, Canevini MP, Spreafico R, Ferrigno G, Kirchner F (2014) Automatic classification of epilepsy types using ontology-based and genetics-based machine learning. Artif Intell Med 61(2):79–88. https://doi.org/10.1016/j. artmed.2014.03.001 Shoeb A, CHB-MIT scalp EEG database. https://doi.org/10.13026/C2K01R MohanBabu G, Anupallavi S, Ashokkumar SR (2021) An optimized deep learning network model for EEG based seizure classification using synchronization and functional connectivity measures. J Ambient Intell Hum Comput 12(7):7139–7151. https://doi.org/10.1007/s12652020-02383-3 Priyanka S, Dema D, Jayanthi T (2017) Feature selection and classification of epilepsy from EEG signal. In: 2017 international conference on energy, communication, data analytics and soft computing (ICECDS). IEEE 2017, pp 2404–2406. https://doi.org/10.1109/ICECDS.2017. 8389880 Vora S, Yang H (2017) A comprehensive study of eleven feature selection algorithms and their impact on text classification. In: 2017 computing conference. IEEE 2017, pp 440–449. https:// doi.org/10.1109/SAI.2017.8252136 Sanz H, Valim C, Vegas E, Oller JM, Reverter F (2018) SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinform 19(1):1–8. https:// doi.org/10.1186/s12859-018-2451-4 Cherrington M, Thabtah F, Lu J, Xu Q (2019) Feature selection: filter methods performance challenges. In: 2019 international conference on computer and information sciences (ICCIS). IEEE 2019, pp 1–4. https://doi.org/10.1109/ICCISci.2019.8716478 Singh K, Malhotra J (2019) IoT and cloud computing based automatic epileptic seizure detection using HOS features based random forest classification. J Ambient Intell Hum Comput 1–6. https://doi.org/10.1007/s12652-019-01613-7 Siddiqui MK, Morales-Menendez R, Huang X, Hussain N (2020) A review of epileptic seizure detection using machine learning classifiers. Brain Inform 7(1):1–8. https://doi.org/10.1186/ s40708-020-00105-1 Rabby MK, Islam AK, Belkasim S, Bikdash MU (2021) Epileptic seizures classification in EEG using PCA based genetic algorithm through machine learning. In: Proceedings of the 2021 ACM southeast conference 2021, pp 17–24. https://doi.org/10.1145/3409334.3452065
Adaptive Manta Ray Foraging Optimizer for Determining Optimal Thread Count on Many-core Architecture S. H. Malave
and S. K. Shinde
Abstract In high-performance computing, choosing the right thread count has a big impact on execution time and energy consumption. It is typically considered that the total number of threads should equal the number of cores to achieve maximum speedup on multicore processor systems. Changes in thread count at the hardware and OS levels influence memory bandwidth utilization, thread migration rate, cache miss rate, thread synchronization, and context switching rate. As a result, analyzing these parameters for complex multithreaded applications and finding the optimal number of threads is a major challenge. The suggested technique in this paper is an improvement on the traditional Manta Ray Foraging Optimization, a bio-inspired algorithm that has been used to handle a variety of numerical optimization problems. To determine the next probable solutions based on the present best solution, the suggested approach uses three foraging steps: chain, cyclone, and somersault. The proposed work is simulated on NVIDIA-DGX Intel Xeon-E5 2698-v4 using the wellknown benchmark suite The Princeton Application Repository for Shared Memory Computers (PARSEC). The results show that, compared to the existing approach, the novel AMRFO-based prediction model can determine the ideal number of threads with very low overheads. Keywords Parallel processing · Threads · Nature-inspired
1 Introduction Engineering and scientific applications have become more data-intensive, necessitating the development of new high-performance computer systems and methodologies which can efficiently execute many tasks on available processors. Multithreading refers to the use of parallel programming methods on a shared memory multicore processor topology. It is the capability of the Central Processing Unit (CPU) and operating system to execute numerous threads simultaneously [1]. The primary goal of S. H. Malave (B) · S. K. Shinde Lokmanya Tilak College of Engineering, Navi Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_17
209
210
S. H. Malave and S. K. Shinde
multithreading is to allow concurrent execution of two or more blocks of code in order to maximize CPU utilization. Every part of a program is referred to as a thread in this context. Multithreading is a relatively new method of computing technology that is widely used in today’s programming. Multiple threads are used by programmers for a variety of purposes, including building responsive servers that communicate with multiple clients, doing complex calculations simultaneously on a multiprocessor for better throughput, and constructing sophisticated users [2]. High-Performance Computing (HPC) development tools are necessary to make programming on HPC machines easier. Various programming tools are developed, to assist programmers in writing efficient parallel programs. Therefore, in the past few decades, multiprocessors have created universal usage in each kind of computer system ranging from personal to supercomputers. The OpenMP programming language now has a more comprehensive set of directives that cover a broader spectrum of parallelization options than just shared memory. When we consider the future of OpenMP, we see a further expansion of support for a variety of parallelization schemes, as well as the inclusion of support performance monitoring and debugging tools [3]. The recent introduction of general-purpose graphics processing units (GPGPU) as low-cost processor architecture has given HPC a significant boost in computing capacity. Nvidia recently launched CUDA for programming GPGPUs, considerably easing the strain on programs. CUDA includes C/C++ APIs, tools, and a hardware abstraction mechanism to assist programmers in running parallel applications on GPGPUs. Thread is also termed as a lightweight sub-process that can operate with other threads of the parallel procedure simultaneously. Parallelism is performed on different levels, so it is difficult to state exactly. Moreover, executing manifold computations simultaneously makes execution faster. In addition, the process of software parallelizing can lead to deadlocks, race conditions, bugs, and other issues in the program. Separating works need a few kinds of [4]. This communication provides overhead which may affect performance degradation. When a large number of threads use CPU cache concurrently, it can create overflow and a high cache miss. Even though parallel programming on multicore processors is a more evolved technology, it is significant to detect the ideal level of parallelism that should be utilized for a specified task to increase the average CPU utilization of multicore processors for a specific amount of time [5]. Without substantially performing the tasks with multiple threads, predicting the performance of an application with numerous software threads is an essential element to allow such effective usage of CPU [6]. Therefore, designing and forecasting the performance of programs is a major difficult task due to the growing complexity of the system architecture. Traditional logical performance methods are more effective to estimate the thread count based on features in hardware and programs. Still, they are naturally not perfect enough or need at least implementation details of the target processor to gather hardware-specific traces or implementation statistics like memory traces and instruction counts which may affect the prediction time.
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
211
2 Motivations and Problem Statement The functions accomplished by mainframes and desktops are becoming extremely complex as computers are used in every aspect of our lives. In addition, the amount of data that modern computers can manage has been significantly increased. Modern personal computers often include two to eight cores, allowing many tasks (threads) to be executed at the same time. Moreover, too many threads in an application may cause contention risks with other application threads such as poor cache locality, thread migrations, and CPU time stealing from other threads and if there are too few threads, the resources would be wasted. The number of threads generated by a program at the time of execution is growing by the day, making thread management more challenging. The parallel programming problems on the multicore systems are categorized into seven groups: Synchronization, Task Granularity, Load Balancing, Data sharing, Resource sharing, Data Locality, and Input or Output. To enable proper task execution, the operating system must address all of these problems. These problems are also termed as overheads in thread management. As the number of threads increases, the overheads also increase and the cost of overheads can surpass the profits. Thus, finding the optimum number of threads for the given problem is a challenging task. Figure 1 shows the execution time taken by the PARSEC benchmarks: streamcluster, swaptions, and ferrets for sample input data on a 40 core Xeon processor computer system. The X-axis represents the number of threads used, while the Y-axis represents the time taken by the program to complete its execution for the specified thread count. It can be seen that the streamcluster scales nicely up to 10 threads but becomes inefficient as the thread count exceeds 40. Both swaptions and ferrets perform well up to 20 and 12 threads, respectively, but do not improve significantly. Figure 2 depicts the region of Fig. 1 where these benchmarks failed to scale as the number of processors increased. It is always believed that the number of threads in multithreaded applications should match the number of cores in order to have the greatest performance on multicore processor systems. This is not the case, however, for the benchmarks here. Figure 2 shows a random variation in execution time; hence no algorithm or technique can be adopted to discover the thread count with the shortest execution time. To address this issue, a unique searching optimization approach for forecasting thread count in multithreaded applications is proposed. The major contributions of the paper are as follows: • To study the use of nature-inspired searching algorithms to obtain the optimum thread count. • To recommend an optimum number of threads and to improve the processing time using adaptive Manta Ray foraging optimization technique. The rest of the paper is prepared as trials: Recent papers are listed in Sect. 3 and the suggested approach of finding an optimal number of threads based on the AMRFO is discussed in Sect. 4. The simulation outcomes are described in Sect. 5 and also the conclusion and future scope of the research work is discussed in the last section.
212
S. H. Malave and S. K. Shinde
Fig. 1 Execution time taken by the PARSEC benchmarks streamcluster (left), swaptions (middle), and ferret (right) for sample input data
Fig. 2 Zoomed in the area of Fig. 1 showing regions where benchmarks were unable to scale with the number of threads
3 Related Work Researchers have developed a number of optimization strategies to address optimization issues in the engineering area. These algorithms are used to locate the best solutions in the search space that exist for the given problem. These methods were developed as a result of machine learning, genetic algorithms, simulations of the behavior of biological organisms in nature, and gradient-based mathematical models. Manta ray foraging is an optimization technique inspired by strategies used by Manta rays for searching food in the water. The manta rays’ cognitive processes served as an inspiration for the MRFO algorithm presented by Weiguo et al. [7]. It has three stages, including somersault, chain, and cyclone foraging. The MRFO has found the best solutions to a variety of optimization issues by taking into account the positions of Manta rays in the specified context. In the fields of high performance and cloud computing, MRFO can also be used to solve complex optimization problems. The study done by Moore et al. [8] employs a combination of Manta ray foraging and Harris-hawk optimization to resolve the load balancing issue, which is a wellknown challenge in high-performance computing and is proven to be superior to other methods currently being used in the industry. Applications of MRFO can also be found in the area of image compression, data mining, and renewable energy as presented by Nora et al. [9], Lakshmi and Krishnamurthy [10] and Saleh et al. [11]. A population-based optimization algorithm called the symbiotic organism search (SOS) was presented by Min-Yuan, Doddy [12]. It mimics the symbiotic relationships that exist between species in an environment. This algorithm uses three relationships
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
213
that exist between organisms to search optimal solutions in the given search space: mutualism, commensalism, and parasitism. This method is used by Mohammed et al. [13] in high-performance computing to solve task scheduling problems in cloud computing. The ant colony optimization (ACO) is a well evolved nature-inspired method, used in computer science and combinatorial optimization, is a statistical method for resolving computing challenges that may be simplified to finding efficient routes across graphs as used by Dorigo et al. [14]. ACO uses concepts like cooperation, adaptability, and self-organization, which were inspired by those shown by ant colonies, to effectively resolve complicated engineering problems. It is being used to solve various complex problems like traveling salesman problem, 0/1 knapsack problem by Patricia et al. [15] and Chauhan et al. [16]. ACO is also used in grid computing to solve job scheduling problems by Idris et al. [17]. The optimization algorithm based on adaptive heuristic search methods known as genetic algorithms is used by researchers to solve task scheduling problems in HPC as shown by Kassab et al. [18]. Survival of the fittest and genetics are the foundations of genetic algorithms. These algorithms use random search, based on historical data that focus the search on areas of the solution where the chances of finding the best answer are higher. Particle swarm optimization (PSO) is also a nature-based algorithm which uses numerical techniques in scientific computing to solve optimization problems by repeatedly attempting to improve the state of possible solutions as presented by Kennedy et al. [19]. It solves the problems by using a population of possible solutions, here referred to as particles, and moving them across the problem space in accordance with a precise mathematical model over the particle’s position and velocity. PSO algorithm is known for its abilities to solve very complex problems like large-scale optimization, an NP-Hard problem which is presented by Gupta et al. [20]. These nature-inspired algorithms are also used to find issues of security in wireless adhoc networks. The proposed work suggested by Kaushik et al. [21] in his study shows that the algorithms derived from nature enhance network security. This study provides a thorough analysis of the benefits and drawbacks of the security methods currently in use. Numerous methods have been created by researchers to identify parallelism in serial programs, but they do not assist in determining the number of threads required at runtime. The basic guideline is to create the appropriate number of threads for the system’s processors. A few CPU-intensive programs fall under this category, while memory- and IO-intensive apps do not, as shown by Solmaz and Lu [22]. Programmers must thus use profiling tools and do some manual effort to count the number of threads. The execution time may be negatively impacted if they select the incorrect number. Many applications have been developed to compute the performance of applications for particular hardware along with changing degrees of parallelism. To accomplish this, features of applications that correspond to a single thread are mined to forecast speedups for a different number of threads as presented by Pestel et al. [23]
214
S. H. Malave and S. K. Shinde
and Agarwal et al. [24]. After executing the application of interest on the target hardware for a single thread, these features are gathered. Speedups are determined based on the extracted features for the various multithreaded performance of the program on the desired hardware. This empowers the multithreaded application programs to detect the ideal number of threads that should be used after performing only a single-threaded version of the same. As the application desires to be implemented once on a single thread to mine features for detecting the ideal number of threads, this is not supported for all applications envisioned to be implemented just once or twice in their lifespan. Therefore, finding the ideal number of threads is the major challenging task for applications envisioned to be run numerous times. Through dynamically adjusting the count of the thread, the energy effective thread mapping was introduced by Tao et al. [25] for heterogeneous multicore structures. Through the regression procedure, the thread count prediction model (TCPM) was utilized to identify the optimal count of threads according to the feature of heterogeneous multicore structure and the program running behaviors. Further, dynamic predictive thread mapping (DPTM) was employed which employs the model of prediction to find the ideal count of thread and vigorously alters the number of energetic hardware threads based on the phase alteration of the executing program for accomplishing the ideal energy effectiveness. A dynamic analysis (DA) method was suggested by Jihyun et al. [26] to examine the cause and the kinds of concurrency bugs that initiate in multi-process surroundings. In the suggested scheme, the false detection rate could be reduced by the hooking theory. Using the investigation of shared memory calls, the bug between processes and threads was examined. Using the Ewha COncurrency Detector (ECO), the developed scheme was executed in the Linux surroundings. Based on the ECO, the concurrency bugs like deadlocks, atomicity violations, and other violations were identified. To resolve the huge scale traveling salesman problem (TSP), a multicore and multi-thread-based optimization were presented by Xin et al. [27]. The suggested approach was executed with Delphi language to resolve the average and huge scale TSP occurrences from TSPLIB which can entirely speed up the penetrating procedure without considering the quality of loss. On heterogeneous multi-processing structures, a performance and energy tradeoff for the parallel application was introduced by Demetrios and Georgiou [28]. Along with a single instruction set, the performance and energy trade-offs for the parallel application were suggested. To analyze the performance and power utilization, a new analytical design was developed. Here the parameters were close-fitted through some well sampled offline measurements. To compute the performance of application and energy utilization, these models were employed in the entire configuration space. The decision to assess the Pareto-optimal outlines of the model was represented by these offline predictions which could be employed to update the choice of the outline.
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
215
4 Proposed Methodology Due to the widespread availability of multicore computers, there has been a lot of interest in developing methods for achieving maximum performance on multicore architectures. The multithreaded application performance depends on the sum of threads that execute on a multicore structure. Therefore, predicting the optimal thread count to yield good performance is a significant task. Many methodologies have been developed in multithreaded applications executing on multicore computer systems for predicting the minimum count of threads. The limitations of these approaches are their low computing performance and low prediction accuracy. Therefore, in this paper, the optimal number of the threads prediction model is introduced using the MRFO to increase the computing performance and decrease energy consumption. By the combination of cyclone foraging, somersault foraging, and the chain foraging of MRFO, the new AMRFO algorithm has been developed to improve exploitation and exploration capabilities. The idea is to test the multithreaded applications multiple times for a small quantity of data to see how long they take to run. It necessitates the execution of the applications on desired hardware before its use. The user and the algorithm jointly decide how many times this process will be repeated. The AMRFO is a searching method where each iteration finds a new thread count based on the positions of Manta Rays in the given search space. The application is then run for the generated number of threads, and it is compared to previous runs to see if it takes less time to complete. Finally, it returns the thread count that takes the least time to execute.
4.1 Mathematical Models Representing Cognitive Activities of Manta Rays At first, the manta ray populations are randomly initialized as given by: d = L db + r Ubd − L db xrand
(1)
d in the hunt space, the lower and upper Here, the arbitrary location is defined as xrand limit of the dth dimension is defined as L db and Ubd individually and the arbitrary number is denoted as r in the range of [0, 1].
Chain Foraging To form a foraging chain, Manta Rays travel from head to tail. The accurate model of chain foraging is given by: xid (t + 1) d d d (t) − xid (t) + α × xbest (t) − xid (t), i = 1 xi (t) + r × xbest (2) = d d (t) − xid (t) + α × xbest (t) − xid (t) , i = 2, . . . , N xid (t) + r × xi−1
216
S. H. Malave and S. K. Shinde
d Here, in dth dimension the location of (i − 1)th Manta Ray is denoted as xi−1 (t) at d time t and also the location of ith Manta Ray is denoted as xi (t). A high concentration d plankton is denoted as xbest (t) and the constant is denoted as α that can be given as:
α = 2r |log(r )|
(3)
Cyclone Foraging This behavior is scientifically designed by the subsequent expressions. xid (t + 1) d d d (t) − xid (t) + β × xbest (t) − xid (t) , i = 1 xbest + r × xbest = d d d + r × xi−1 (t) − xid (t) + β × xbest (t) − xid (t) , i = 2, . . . , N xbest (4) T −t +1 × sin(2πri ) β = 2 exp r1 × T
(5)
Here, the maximum number of iterations is defined as T, the weight factor is denoted as β and the arbitrary number is defined as r1 in the range of [0, 1]. By deliberating the prey as the reference locations, the candidates search arbitrarily. Here, this process offers suitable exploitation to the good solution region. Further, to enhance the exploration process, this process can be designed by taking an arbitrary location as the reference location. Somersault Foraging In this process, the individual position can be updated to enhance the local ability that can be given as: d − r3 × xid (t) , i = 1, 2, . . . , N xid (t + 1) = xid (t) + S × r2 × xbest
(6)
Adaptive Somersault foraging Here, the somersault coefficient is denoted as S which is parallel to 2. The arbitrary numbers are defined as r2 and r3 that lies in the range of [0, 1]. In this process, the value of “S” is a constant, therefore, it does not enhance itself and also causes a local optimal point. Hence, the Cauchy mutation process is introduced in this process to improve the exploration capability while avoiding the local search space. In addition, to improve the exploration stage in the previous epoch. When Cauchy mutation is included, MRFO becomes adaptable because it seeks to explore areas around the best solution in the past and converges to the best solution without being stuck in a local minima problem.
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
217
The 1D Cauchy function can be given as: f (x) =
1 1 , −∞ < x < ∞ π x2 + 1
(7)
The equivalent PDF (probability density function) of Cauchy function is described as: F(x) =
1 1 + arctan x 2 π
(8)
The function of Cauchy is developed as the step size of mutation. A higher step size supports the distinct jump out of the local optimum once the individual falls into the local optimum point. A lesser step size quickens the speed of the convergence of the individual once the individual is searching for the ideal solution and close to convergence. The new scientific model of this process is defined as: d xid (t + 1) = xid (t) + C r2 xbest (t) − r3 xid (t)
(9)
Here, the arbitrary number of Cauchy distribution is defined as C. Then by merging the new system of cyclone and somersault foraging with the chain foraging strategy in MRFO, AMRFO is accomplished. Adaptive Manta Ray Optimization (AMRFO) Finding the optimal number of threads is the main significant factor that affects the program running performance. In this section, the new efficient AMRFO algorithm is introduced for an optimal number of threads prediction in parallel processing. The model reveals the association between the number of threads and the program performance. This algorithm contains three stages: chain foraging, cyclone foraging, and somersault foraging using the Cauchy mutation approach. Because the current position of the Manta Ray specifies the number of threads, the fitness function is assessed for these positions, and the best position is noted as the best solution. The proposed optimal threads prediction model contains the following steps: 1. Set the number of Manta Rays, N, to a positive integer. 2. Determine the position for each Manta Ray using Eq. (1). Here positions of Manta Rays are integers and indicate the number of threads. 3. Set the number of cores or processors available for computations as the starting best position. 4. Repeat the following steps until the maximum number of iterations are done or the desired solution is found. a. For each Manta Ray i. Find a random number r between [0, 1] and if it is less than 0.5 then perform Cyclone foraging using Eq. (4) and update the new position. ii. Else perform chain foraging using Eq. (2) and update the new position.
218
S. H. Malave and S. K. Shinde
iii. Compute the fitness value for the new position obtained in the above steps. iv. Make the new position of the Manta Ray the best position if the current fitness value is less than the fitness value of the previously calculated best position. b. For each Manta Ray: Perform the Somersault foraging using Eq. (9). Compute the fitness value and determine the best position. 5. The best position obtained in the above steps indicates the optimum number of threads. In this context, the fitness value is the time it takes for the program to run with the specified number of threads. The following steps are performed to obtain fitness value for Manta Rays: 1. Get the current position of the Manta Ray. 2. Get the multithreaded application and sample input data for which the thread count is to be determined. 3. Run the application with the same number of threads as indicated by Manta Ray’s present position. Return the execution time as the fitness value.
5 Results and Discussion The proposed method of the AMRFO work is simulated using the well-known benchmark suite, PARSEC [29]. The seven benchmarks are used from the PARSEC suite such as ferret, streamcluster, freqmine, swaptions, vips, and vorland. The experimental system consists of an Intel-Xeon E5-2698-v4 2.2 GHz server. This machine contains 40 logical cores with 256 GB of primary memory. We conducted our research on the Linux operating system because it provides a broad range of tools for examining and understanding the behaviors of applications. We conducted each experiment 10 times and then averaged the results. The PARSEC benchmark defines six input data sets for each program. The input dataset called “simsmall” is smaller in size and used in AMRFO’s fitness function to find the execution time. The speedup of a parallel program is an excellent technique to assess its performance. If a sequential program on a single core takes T (1) seconds to complete and a parallel program on P number of processors takes T (P) seconds, then speedup, S(P) is defined as S(P) =
T (1) T (P)
(10)
The speedup is calculated for the same benchmark programs to estimate the prediction accuracy of the proposed AMRFO-based prediction scheme. The proposed speedup is compared to the system’s best thread count for benchmark programs as shown in Table 1. Here N is the thread count obtained using AMRFO. The
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
219
Table 1 Performance of benchmarks Benchmark
T (N)
3.231
0.543
5.950
37
0.523
6.177
3.82
Streamcluster (sc)
5.007
1.573
3.183
12
0.691
7.246
127.64
Freqmine (fq)
5.785
1.084
5.336
28
1.05
5.509
3.24
Swaptions (sp)
4.619
0.387
11.935
38
0.244
18.930
58.61
Vips (vp) Vorland (vr)
T (1)
3.946 120.4
T (40)
S(40)
S(N)
δ (%)
N
Ferret (ft)
0.312
12.647
38
0.292
13.513
6.85
11.192
10.761
57
9.08
13.265
23.26
improvement in AMRFO over the traditional approach is defined as δ=
S(N ) − S(40) S(40)
(11)
It is apparent that the results obtained with AMRFO are significantly superior to those obtained by assuming thread count equal to the number of cores in the system. The speedups achieved for various PARSEC programs are shown in Table 1. The columns “T (1)” and “T (40)” in this table show the program execution times with single thread and 40 threads, respectively. Running a program with a single thread is equivalent to running the serial version of the code. Since the simulation machine has 40 possessors, it is reasonable to assume that 40 threads will result in the greatest speedup. The ideal thread count predicted by the AMRFO method is indicated in column “N” and its execution time is shown in column “T (N)”. The speedups achieved at thread counts of 40 and N are shown in the columns “S(40)” and “S(N)”, respectively. The last column “δ” indicate the improvements in speedups when programs are executed with optimal thread count over the 40 number of threads. The ferret, streamcluster, freqmine, swaptions, and vips produce better results when thread count is less than 40. However, the vorland has better results with a thread count of 57, which is more than 40. When compared to running with 40 threads, the streamcluster showed a 127% increase in speedup. The streamcluster, swaptions, and vorland all show significant speedups, demonstrating algorithm’s applicability. The graphical comparisons of speedups obtained in Table 1 are shown in Fig. 3. Figure 4 shows the comparison between execution times with a single thread and optimal thread count. To fit correctly in the graph, the execution time of vorland has been scaled down. All of the benchmarks show a significant amount of speedup.
6 Conclusion In this paper, an efficient thread prediction model is developed using the AMRFO algorithm. The prediction model easily determines the optimum number of threads to maximize speedups. Simulation results show that the suggested algorithm efficiently
220
S. H. Malave and S. K. Shinde
Fig. 3 The comparisons of speedups between 40 number of threads and the optimal thread count obtained by AMRFO
Fig. 4 The comparisons of execution time taken by the serial code with single thread and the parallel code executed with optimal thread count obtained by AMRFO
Adaptive Manta Ray Foraging Optimizer for Determining Optimal …
221
explores the available search space and quickly converges to an optimal solution. With the addition of the Cauchy mutation, the algorithm could explore more search space without getting stuck in local optima. Finding thread count using the method described in this paper is simple.
References 1. Hossein S, Homayoun H (2017) Scheduling multithreaded applications onto heterogeneous composite cores architecture. In: 2017 Eighth international green and sustainable computing conference (IGSC). IEEE 2. Rinard M (2001) Analysis of multithreaded programs. Int Static Anal Symp 2126:1–19 3. De Supinski BR, Scogland TRW, Duran A, Klemm M, Bellido SM, Olivier SL, Terboven C, Mattson TG (2018) The ongoing evolution of openmp. In: Proceedings of the IEEE 4. Langmead B, Wilks C, Antonescu V, Charles R (2019) Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics 421–432 5. Nagasakaa Y, Matsuoka S, Azad A, Buluc A (2019) Performance optimisation, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput 6. Moore RW, Childers BR, Xue J (2015) Performance modeling of multithreaded programs for mobile asymmetric chip multiprocessors. In: 2015 IEEE 17th international conference on high performance computing and communications, pp 957–963 7. Weiguo Z, Zhang Z, Wang L (2014) Manta ray foraging optimisation: an effective bio-inspired optimiser for engineering applications. Eng Appl Artif Intell 8. Mohammad H, Swaleha Z (2021) Mantaray modified multi-objective Harris hawk optimisation algorithm expedites optimal load balancing in cloud computing. J King Saud Univ Comput Inf Sci 9. Nora A, Hanan T, Halawani SM, Abdelkhalek A, Laxmi L (2022) Manta ray foraging optimisation with vector quantization based microarray image compression technique. Intell Neurosci 10. Lakshmi N, Krishnamurthy M (2022) Association rule mining based fuzzy manta ray foraging optimisation algorithm for frequent itemset generation from social media. Concurrency Computat Pract Exper 11. Saleh A, Omran WA, Hasanien HM, Tostado-Véliz M, Alkuhayli A, Jurado F (2022) Manta Ray foraging optimisation for the virtual inertia control of islanded microgrids including renewable energy sources. Sustainability 12. Min-Yuan C, Doddy P (2014) Symbiotic organisms search: a new metaheuristic optimisation algorithm. Comput Struct 139 13. Mohammed A, Ngadi MA, Shafi’i Muhammad A (2016) Symbiotic organism search optimisation based task scheduling in cloud computing environment. Future Gener Comput Syst 56:640–650 14. Dorigo M, Stützle T (2019) Ant colony optimisation: overview and recent advances. In: Gendreau M, Potvin JY (eds) Handbook of metaheuristics. In: International series in operations research & management science, vol 272. Springer, Cham 15. Patricia G, Osorio RR, Pardo XC, Banga JR, Ramón D (2022) An efficient ant colony optimisation framework for HPC environments. Appl Soft Comput 114 16. Chauhan R, Sharma N, Sharma H (2022) An ant system algorithm based on dynamic pheromone evaporation rate for solving 0/1 knapsack problem. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore 17. Idris H, Ezugwu AJ, Sahalu A, Aderemi A (2017) An improved ant colony optimisation algorithm with fault tolerance for job scheduling in grid computing systems
222
S. H. Malave and S. K. Shinde
18. Kassab A, Nicod J, Philippe L, Rehn-Sonigo V (2018) Assessing the use of genetic algorithms to schedule independent tasks under power constraints. In: 2018 International conference on high performance computing & simulation (HPCS), pp 252–259 19. Kennedy J, Eberhart R (1995) Particle swarm optimisation. In: Proceedings of ICNN’95— international conference on neural networks, vol 4, pp 1942–1948 20. Gupta S, Kumari R, Kumar S (2022) Limac¸on inspired particle swarm optimisation for largescale optimisation problem. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 111. Springer, Singapore 21. Kaushik R, Singh V, Kumari R (2021) A review of nature-inspired algorithm-based multiobjective routing protocols. In: Sharma H, Saraswat M, Kumar S, Bansal JC (eds) Intelligent learning for computer vision. CIS 2020. Lecture notes on data engineering and communications technologies, vol 61. Springer, Singapore 22. Solmaz S, Lu L (201) Memory bandwidth prediction for HPC applications in NUMA architecture. In: IEEE 5th international conference on data science and systems (HPCC/SmartCity/DSS), pp 1115–1122 23. Pestel SD, Den Steen SV, Akram S, Eeckhout L (2018) Rppm: rapid performance prediction of multithreaded applications on multicore hardware. In: IEEE international symposium on performance analysis of systems and software (ISPASS), pp 183–186 24. Agarwal N, Jain T, Zahran M (2019) Performance prediction for multi-threaded applications. In: International workshop on AI-assisted design for architecture 25. Tao J, Zhang Y, Zhang X, Du X, Dong X (2019) Energy-efficient thread mapping for heterogeneous many-core systems via dynamically adjusting the thread count 26. Jihyun P, Choi B, Jang S (2020) Dynamic analysis method for concurrency bugs in multiprocess/multi-thread environments. Int J Parallel Prog 48:1032–1060 27. Xin W, Ma L, Zhang H, And Liu Y (2021) Multi-core-, multi-thread-based optimisation algorithm for large-scale traveling salesman problem. Alexandria Eng J 60:189–197 28. Demetrios C, Georgiou, K (2020) Performance and energy trade-offs for parallel applications on heterogeneous multi-processing systems. In: IFIP/IEEE 27th international conference on very large scale integration (VLSI-SoC), pp 232–233 29. Bienia C, Kumar S, Singh JP, In KL (2018) The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques
Iterated Local Search Heuristic for Integrated Single Machine Scheduling and Vehicle Routing Gabriel P. Félix, José E. C. Arroyo, and Matheus de Freitas
Abstract In this paper we address a problem that integrates the single machine scheduling problem and the vehicle routing problem. In this problem, N jobs must be executed on a machine, and, using a set of vehicles, the jobs must be delivered to the respective customers. The objective of the problem is to determine the sequencing of jobs on the machine and the routes of the vehicles, in order to minimize the total fixed costs, the total travel costs of vehicles and the total weighted tardiness of the jobs. To solve the problem, initially a mixed integer linear programming model is proposed. Motivated by the computational complexity of the problem, two hybrid heuristics based on the iterated local search (ILS) and random variable neighborhood descent (RVND) metaheuristics are proposed. The performances of the proposed heuristics are evaluated and compared on a set of instances randomly generated. The results show that one of the proposed heuristics has a superior performance, as can be seen through the comparison with a genetic algorithm from the literature. In average, the solutions obtained by our heuristic were within 0.67% of the best known solutions. Keywords Production scheduling · Vehicle routing · Metaheuristics
1 Introduction The supply chain is the set of activities/processes that involve the production, storage and transport of products or services. These activities include purchasing raw materials, controlling inventory, producing the products and delivering the products to end customers on time. All these activities must be very well planned and optimized so that quality products can be delivered and generate positive results. Production G. P. Félix · J. E. C. Arroyo (B) · M. de Freitas Department of Computer Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil e-mail: [email protected] G. P. Félix e-mail: [email protected] M. de Freitas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_18
223
224
G. P. Félix et al.
and distribution operations are linked directly, with no intermediate steps [3]. The main objectives that are pursued by the supply chain are as follows: cost reduction, meeting deadlines and customer satisfaction. This work aims to propose a decision-making approach to an integrated production scheduling and vehicle routing problem. In this integrated problem, a given set of jobs of different sizes must be processed on a single machine (available in a factory) and delivered to the respective customers by a heterogeneous fleet of vehicles, taking into account delivery times. The objective is to minimize total weighted delivery tardiness and transport costs (related to variable vehicle costs and vehicle travel costs/times). Production scheduling and vehicle routing problems are two combinatorial optimization problems widely studied in the literature, and they are usually addressed separately [2, 6, 14]. There are few works in the literature that address the integration of production scheduling and vehicle routing. Some related works are presented below. Ullrich [13] studied an integrated problem that consists of scheduling a set of jobs on parallel machines with machine-dependent ready times. Processed jobs are distributed using a heterogeneous fleet of vehicles. This author considers processing times, delivery time windows and service times. To minimize the total tardiness of the jobs, the author presents a mixed integer linear programming (MILP) model, two classic decomposition approaches and a genetic algorithm (GA). Chang et al. [3] addressed a problem where jobs are first processed on unrelated parallel machines and then distributed to customers by capacitated vehicles with no intermediate inventory. They present a mathematical model for the problem and proposed an ant colony algorithm to minimize the weighted sum of total weighted job delivery time and the total distribution cost. Karao˘glan and Kesen [7] studied the problem of production and distribution with limited shelf life of the products. They considered a single vehicle for transport, where the vehicle can make multiple trips. To determine the minimum time required to produce and deliver all customer demands, they proposed a branchand-cut algorithm. Zou et al. [15] addressed the integrated production scheduling and vehicle routing problem in a make-to-order environment. To minimize the maximum order delivery time, an GA is proposed. Tamannaei and Rasti-Barzoki [12] addressed an integrated problem of production scheduling on single machine and vehicle routing with the objective of minimizing the sum of total weighted tardiness and transport costs, including the fixed cost of the vehicle and the cost of travel in the transport network. These authors considered a fleet of homogeneous vehicles, jobs of equal size, and they proposed a MILP model, a branch-and-bound algorithm and an GA. Recently, Liu et al. [8] studied the production scheduling and vehicle routing, in an integrated way, where there is a single machine for production and a limited number of homogeneous vehicles for transport. To minimize the sum of order delivery times, the authors presented a MILP model and a variable neighborhood search (VNS) heuristic. In this work, we propose two multi-start hybrid heuristics based on the iterated local search (ILS) metaheuristic, where the ILS local search is guided by the random variable neighborhood descent (RVND) method. RVND uses seven neighborhood structures which are explored in random order. The heuristics are compared with
Iterated Local Search Heuristic for Integrated Single …
225
a GA proposed by Tamannaei and Rasti-Barzoki [12]. To solve small instances of the problem under study, we present a MILP model which is solved by the CPLEX solver. The remainder of this paper is organized as follows. In Sect. 2, we describe the problem under study and present the MILP model. The proposed heuristics and their components are detailed in Sect. 3. In Sect. 4, we present the experimental tests performed to analyze the performance of the proposed heuristics. Finally, in Sect. 5, we conclude this paper.
2 Problem Definition and MILP Formulation The integrated problem of this paper consists of two subproblems: the single machine scheduling and the vehicle routing. In this integrated problem, denoted by SeqRot, N jobs must be executed on a single machine, and, using a set of K vehicles, these jobs must be delivered to their respective clients. For each job i (i = 1 . . . N ), the following parameters are known: processing time Pi , size si , due date di and penalty for unit delivery tardiness wi . For each vehicle k, its capacity Q k and its fixed cost of use Fk are known. Jobs are considered to be produced in a factory representing the depot (i = 0), and the delivery locations for each job are spread over a geographic area, in such a way that travel times between each couple of locations are known. The travel time from local i to local j is denoted by ti j , where ti j = t ji , ∀i, j = 0 . . . N . The objective of the SeqRot problem is to determine the processing order of the jobs and the routes of the used vehicles, in order to minimize the sum of total weighted tardiness, transportation costs and total travel time. It is assumed that all jobs are available to be processed from an initial time zero, and the machine processes one job at a time. In the SeqRot problem, the vehicle departure times must also be determined. A vehicle can only leave the factory after completing the processing of all jobs transported in the vehicle, that is, the vehicle routing subproblem discussed here considers ready times [1]. The time spent to load and unload jobs is not considered in the problem. Vehicles will leave the factory, visit customers once and return to the factory. It is worth mentioning that the SeqRot problem is NP-hard, since the subproblems, single machine scheduling with the total tardiness minimization and vehicle routing with heterogeneous fleet (VRHF) are NP-hard problems [4, 5]. Figure 1 shows a solution for an instance with N = 6 jobs and K = 3 vehicles. In this figure, it can be seen that the sequence of jobs processing is [5, 3, 6, 2, 1, 4], and three vehicles are used to deliver the jobs. The delivery routes for vehicles 3, 1 and 2 are, respectively, {0 → 5 → 3 → 0}, {0 → 6 → 2 → 0} and {0 → 1 → 4}, where 0 represents the depot. The departure times (or start time) for vehicles 3, 1 and 2 are S3 = 88, S1 = 180 and S2 = 301, respectively.
226
G. P. Félix et al.
2.1 MILP Model Next, an integer linear programming (ILP) is presented. This model is an adaptation of the model proposed by [12]. In this work, we considered vehicles with variable capacities and costs and jobs with different sizes. The decision variables of the model are as follows: Ai j = 1 if job i is executed before job j and 0 otherwise, Ci is the completion time of job i, Yk = 1 if vehicle k is used and 0 otherwise, Sk is the start time of vehicle k, X ikj = 1 if vehicle k does drive from customer of job i to customer of job j, and 0 otherwise, Di is the delivery time (or arrival time) of job i, and Ti is the tardiness of job i. The ILP model is as follows: min
N N K
X ikj ti j +
i=0 j=0 k=1
st.
K
N
k=1 j=0, j=i N
N
i=0 j=0, j=i N
K
Fk Yk +
k=1
N
wi Ti
(1)
i=1
X ikj = 1, ∀i = 1 . . . N
si ∗ X ikj ≤ Q k ∗ Yk , ∀k = 1 . . . K ;
X 0k j = Yk , ∀k = 1 . . . K ;
(2)
(3)
(4)
j=1 N
X ikh =
i=0
N
X hk j , ∀h = 0 . . . N ; ∀k = 1 . . . K
Ai j + A ji = 1, ∀i, j = 0 . . . N ; i = j; Ai j + A jr + Ari ≥ 1, ∀i, j, r = 0 . . . N ; i = j = r ; Cj =
(5)
j=0
N i=0,i= j
(Pi ∗ Ai j ) + P j , ∀ j = 1 . . . N ; ⎛
Sk ≥ C j − G ∗ ⎝1 −
(6) (7) (8)
⎞
N i=0,i= j
X ikj ⎠ , ∀ j = 1 . . . N ; ∀k = 1 . . . K ;
D j ≥ Sk + t0 j − G ∗ (1 − X 0k j ), ∀k = 1 . . . K ; ∀ j = 1 . . . N ; ⎛ ⎞ K k D j ≥ Di + ti j − G ∗ ⎝1 − X i j ⎠ , ∀i, j = 1 . . . N ;
(9) (10) (11)
k=1
Ti ≥ Di − di , ∀i = 1 . . . N ; (12) Ti ≥ 0, Di ≥ 0, Sk ≥ 0, ∀i = 1 . . . N ; ∀k = 1 . . . K ; (13) X i jk ∈ {0, 1}, Ai j ∈ {0, 1}, Yk ∈ {0, 1}, ∀i, j = 0 . . . N ; ∀k = 1 . . . K ; (14)
Iterated Local Search Heuristic for Integrated Single …
227
Fig. 1 Example of a solution for an instance with six jobs and three vehicles
Equation (1) defines the function to be minimized. Constraints (2) ensure that each client (owner of a job) is visited exactly once. Constraints (3) indicate that if a vehicle k is used (Yk = 1), the total size of jobs carried in this vehicle does not exceed its capacity. Constraints (4) guarantee that if a vehicle is used, it leaves the depot. Constraints (5) ensure that a vehicle must arrive and leave a customer. Constraints (6) and (7) determine a valid sequencing of jobs on the machine. The model considers a dummy job ‘0’ to indicate the predecessor of the first job in the sequence or the successor of the last job. Job completion times are determined by constraints (8). Constraints (9) guarantee that if task j is transported by vehicle k, then Sk ≥ C j . Constraints (10) determine the arrival time of a vehicle to the first customer of its route. Constraints (11) determine the arrival times of a vehicle to other customers on its route. G is a sufficiently large positive number. Constraints (12) determine the delivery tardiness of jobs. The constraints (13) and (14) determine the domain of the decision variables.
3 Multi-start Hybrid Iterated Local Search Heuristic The proposed heuristic to solve the SeqRot problem is a hybrid algorithm based on ILS and random variable neighborhood descent (RVND) metaheuristics. ILS [9] is a simple metaheuristic algorithm used to solve combinatorial optimization problems. From an initial solution, ILS iteratively applies a perturbation method to diversify the current solution, and the perturbed solution is improved by a local search (LS) heuristic. The LS is based on RVND which uses a number nv of randomly ordered neighborhoods to systematically modify the current solution. Algorithm 1 presents the pseudocode of the proposed multi-start hybrid heuristic. The algorithm, called ILS_RVND_1, has three parameters as input: pert (perturbation level), NIterILS (maximum number of iterations without ILS improvement) and Nrestarts (number of restarts of the ILS_RVND_1 algorithm). The ILS algorithm starts (or restarts) with a new solution s (step 3). This initial solution s is improved with the RVND local search (step 4). The ILS iteratively executes steps 5 through 10, until the stop condition is satisfied. In steps 6, the solution s is improved, and the
228
G. P. Félix et al.
obtained solution is improved in step 7. In steps 8 and 9, the best solution s obtained by ILS is updated. Note that the ILS terminates if s is not improved by consecutive NIterILS iterations. If there is an improvement of s , the ILS iteration counter j is reset (step 10). After the ILS is finished, the best global solution s ∗ is updated (steps 11–12). ILS_RVND_1 is restarted Nrestarts times, and the solution s ∗ is returned as an output. Algorithm 1: ILS_RVND_1 ( per t, N I ter I L S , Nr estar ts) f (s ∗ ) ← ∞; for i ← 1 to Nr estar ts do s ← Initial_Solutionl(); s ← RVND_1(s); for j ← 1 to N I ter I L S do s ←Perturbe(s, per t); s ← RVND_1(s ); if s is better than s then s ← s ; j ← 1; // ILS iterations restart 11 if s is better than s ∗ then 12 s ∗ ← s; return : s ∗ 1 2 3 4 5 6 7 8 9 10
A solution of the SeqRot problem is represented by a vector (of size N + K ) that stores the indices of the jobs and vehicles. The first position of the vector is the index of a vehicle. For example, for N = 6 and K = 3, the solution shown in Fig. 1 is represented by the vector [V3 , 5, 3, V1 , 6, 2, V2 , 1, 4]. This representation divides the set of jobs into K subsets (routes), Rk , where 0 ≤ |Rk | ≤ K , ∀k = 1, . . . , K . If |Rk | = 0, the vehicle k is not used for delivery of jobs. Note that, in this representation, the job processing order is the same as the delivery order. To start the execution of the ILS_RVND_1 algorithm, a solution is constructed in a greedy way by the following steps. (1) Initially, the jobs are ordered using a priority rule (obtaining a sequence of jobs S). (2) The vehicle with the lowest cost is chosen, and the first job of the sequence S is inserted in this vehicle. (3) As long as there is space in the vehicle, the next job of the sequence S is inserted in the vehicle. (4) If there are still jobs not allocated to a vehicle, the next lowest cost vehicle is chosen and steps (3) and (4) are repeated. To obtain the ordered sequence S, three priority rules were used [10]: Apparent Tardiness Cost (ATC), Weighted Modified Due Date (WMDD) and Weighted Earliest Due Date (WEDD). The priority indices of these rules are defined in Eqs. (15), (16) and (17).
Iterated Local Search Heuristic for Integrated Single …
229
IATC (i) = wi /Pi ∗ exp[− max(Di − Pi − t, 0)/P]
(15)
IWMDD (i) = max(Pi , di − t)/wi IWEDD (i) = Di /wi
(16) (17)
where P is the average of the processing times of all jobs, and t is the sum of the processing times of the jobs already inserted in S.
3.1 RVND Local Search The pseudocode of the RVND local search is presented in Algorithm 2. The algorithm, called RVND_1, receives as input the solution s to be improved. This algorithm uses nv neighborhoods. Initially, the neighborhoods are ordered randomly (step 1). The algorithm performs steps 2–8, while there is improvement of the solution s. At each iteration i, the i-th neighborhood is used and the best solution s of this neighborhood is selected. The algorithm terminates if no solution s better than s is found in all the nv neighborhoods (i.e., if s is a local optimum for all neighborhoods). If there is an improvement for the solution s, it is updated and the neighborhood counter is reset (i.e., again all neighborhoods will be used in random order).
Algorithm 2: RVND_1(s) 1 2 3 4 5 6 7 8
N eighbor hood[] ←RandomlyOrderedNeighborhoods(nv); // nv number of neighborhoods for i ← 1 to nv do Ni ← N eighbor hood[i]; // Select the i-th neighborhood s ← BestNeighborSolution(Ni (s)); if s is better than s then s ← s ; N eighbor hood[] ←RandomlyOrderedNeighborhoods(nv); i ← 1; // Restart of algorithm RVND return : s
Neighborhood Structures. Seven neighborhood structures are used in the RVND_1 local search, three of the Intra-Route type (jobs movements in the vehicle routes) and four of the Inter-Route type (jobs movements between different routes). It is noteworthy that, in the adopted solution representation, the processing order of the jobs is the same as the delivery order of the jobs. Intra-Route Neighborhoods Swap-Adj-Job(R)—Swap two adjacent jobs in the route R. Insert-Job(R)—A job is inserted in another position of the route R. 2-opt(R)—In the route R, two non-adjacent arcs are deleted and two new arcs are added generating a new route R .
230
G. P. Félix et al.
Inter-Route Neighborhoods These neighborhoods consider movements of jobs or vehicles that change two or more routes at the same time. These movements can generate unfeasible solutions in terms of vehicle capabilities, but these solutions are not considered (not evaluated). The four used neighborhoods are as follows: Swap-Job(R1 , R2 )—Swap a job i from route R1 with a job j from R2 . Insert-Job(R1 , R1 )—Inserts a job i from R1 in the other positions of R2 . Swap-Adj-Vehicles(R1 , R2 )—Swap two adjacent vehicles carrying their batches of jobs (i.e., swap two routes R1 and R2 ). For example, for the solution of Fig. 1, represented by [V3 , 5, 3, V1 , 6, 2, V2 , 1, 4], a movement generated by this neighborhood can be to change the routes of vehicles V3 and V1 , generating the solution [V1 , 6, 2, V3 , 5, 3, V2 , 1, 4]. Insert-Vehicle(R)—Inserts the route R of a vehicle before or after another route. For example, for the solution represented by [V3 , 5, 3, V1 , 6, 2, V2 , 1, 4], a movement generated by this neighborhood is to insert the route of vehicle V3 after the route of vehicle V2 , generating the solution [V1 , 6, 2, V2 , 1, 4, V3 , 5, 3]. In this new solution, the jobs of vehicle V1 will be processed first and the vehicle V3 will be the last to leave the depot. Perturbation Two perturbation mechanisms are used. The first consists of randomly selecting two different routes and randomly swapping per t jobs. The second consists of randomly selecting two different routes, then a job (chosen at random) is removed from the first route and inserted into the other route. These perturbation mechanisms can be considered as multiple Inter-Route moves: Swap-Job(R1 , R2 ) and InsertJob(R1 , R2 ).
3.2 ILS_RVND_2 In this work, a second multi-start heuristic, called ILS_RVND_2, is developed. The LS of this heuristic is based on the RVND procedure proposed by [11]. In this local search, Intra-Route neighborhood structures are used only if the current solution is improved using an Inter-Route neighborhood. The pseudocode of this local search (named RVND_2) is presented in Algorithm 3. This algorithm receives as input the solution s to be improved. Initially, the set of nv1 = 4 Inter-Route neighborhoods is randomly ordered (step 1). At each iteration i, the best solution s of the i-th Inter-Route neighborhood is determined (step 4). If s is better than solution s, the current best solution is updated (step 6), a IntraRoute local search is performed using the nv2 = 3 Intra-Route neighborhoods (steps 7–12), and the counter of Inter-Route neighborhoods is reset to use all Inter-Route neighborhoods again (steps 13–14). The Intra-Route local search ends if the solution s is not improved with any of the Intra-Route neighborhoods.
Iterated Local Search Heuristic for Integrated Single …
231
Algorithm 3: RVND_2(s) Inter-Route[] ←Randomly_Order_Inter-Route_Neighborhoods(nv1 ); for i ← 1 to nv1 do Ni ← I nter Rota[i]; // Select the i-th Inter-Route neighborhood s ← BestNeighborSolution(Ni (s)); if s is better than s then s ← s ; Intra-Rota[] ←Randomly_Order_Intra-Route_Neighborhoods(nv2 ); for j ← 1 to nv2 do Nr ← Randomly choose a neighborhood from the set Intra-Rota[ j..nv2 ]; s ← BestNeighborSolution(Nr (s)); if s is better than s then s ← s ; j ← 1; 13 i ← 1; 14 Inter-Route[] ←Randomly_Order_Inter-Route_Neighborhoods(nv1 ); return : s
1 2 3 4 5 6 7 8 9 10 11 12
4 Computer Experiments To evaluate the performance of the heuristics ILS_RVND_1 and ILS_RVND_2, they are compared with the CPLEX solver that solves the MILP model of the problem (for small instances) and with a genetic algorithm (denoted by GA_LS) presented by [12]. All algorithms and the ILP model have been coded in C++ and run on computer with Intel Core i7 CPU, 4.00 GHz, 32 GB RAM. The metric used to compare the algorithms is the Relative Percentage Deviation (RPD) with respect to the best known solution. This metric is defined as follows: RPD = fmethodfbest− fbest ∗ 100%, where f best is the best known value of the objective function (obtained among all the compared methods) and f method is the value obtained by the evaluated algorithm. The algorithms are evaluated on two sets of instances of the SeqRot problem. The instances were generated based on the work of [13]. The first set contains small instances, where N ∈ {8, 10, 15, 20} and K ∈ {3, 4, 5, 6}. The second set contains medium-large instances, where N ∈ {50, 80, 100} and K ∈ {5, 8, 10, 12}. We performed a full factorial experiment with the input parameters of the algorithms ILS_RVND_1 and ILS_RVND_2: pert, NIterILS and Nrestarts. Due to space limitations, parameter calibration results are not shown. The best parameters found for the ILS_RVND_1 algorithm were Nrestarts = 10, NIterILS = 100, pert = 5. For ILS_RVND_2, the best parameters were Nrestarts = 10, NIterILS = 100, pert = 2.
4.1 Results for Small Instances In the first experiment, the algorithms ILS_RVND_1, ILS_RVND_2 and GA_LS are compared with the CPLEX 12.8 solver on small instances. CPLEX is used to solve
232
G. P. Félix et al.
(a) Small size instances.
(b) Medium-Large size instances.
Fig. 2 95% confidence intervals for all compared methods
the ILP model of the problem (Sect. 2.1) with a CPU timeout of 3600 s seconds for each instance. For all methods (heuristics and CPLEX), the RPD with respect to the best known solution is determined. Each heuristic is run ten times for each instance. We consider the best and average (Avg.) RPDs. For the set of instances with N = 10, 15 and 20, the algorithm ILS_RVND_1 determines the best solutions (it presents the smallest RPDs, best and Avg.). The performance of GA_LS is better than ILS_RVND_2. For all instances of this group, the three heuristics determined better results than CPLEX. For this instance group, in 1 h, the CPLEX has not found any optimal solution. Regarding runtime, the CPLEX spent the maximum time (1 h), in all instance. For instances with N = 20, the heuristics spent less than 2 s. The results of the heuristics (for small instances) are validated through a statistical test. The result of the test is shown in Fig. 2a, which displays the means plot and Tukey’s Honestly Significant Difference (HSD) intervals at 95% confidence level. We can see that ILS_RVND_1 and GA_LS have the best averages (they are statistically equivalent). These algorithms are significantly better than ILS_RVND_2. We can also observe that all heuristics significantly improved the results of CPLEX.
4.2 Results for Medium–Large Instances For medium–large instances, the heuristics ILS_RVND_1 and ILS_RVND_2 are compared against GA_LS. Each instance was solved ten times by the three heuristics. The best (Best) and average results (Avg.) were considered in the comparisons. The quality of the solutions found is also evaluated by the RPD (%) measure. For the three heuristics, the RPDs are presented in Table 1. In this Table, bold values represent best results. The RPDs are classified by the number of jobs (N ) and number of vehicles (K ). For each instance class, there are 45 instances. It can see that, for all instance classes (except for group 50 × 5), the RPD values (Best and Avg.) of ILS_RVND_1
Iterated Local Search Heuristic for Integrated Single …
233
Table 1 Average RPDs (%) and CPU time (s) for medium–large size instances N×K ILS_RVND_1 ILS_RVND_2 GA_LS Best Avg. Time Best Avg. Time Best Avg. 50 × 5 50 × 8 50 × 10 50 × 12 80 × 5 80 × 8 80 × 10 80 × 12 100 × 5 100 × 8 100 × 10 100 × 12 Average
0.41 0.00 0.00 0.00 0.71 0.05 0.00 0.00 0.61 0.00 0.01 0.00 0.15
0.89 0.41 0.31 0.31 1.63 0.65 0.59 0.40 1.24 0.69 0.46 0.45 0.67
17.57 16.93 16.32 16.46 68.03 58.88 55.00 53.02 136.08 114.50 103.80 97.80 62.87
12.02 10.90 10.11 9.05 14.52 16.53 16.13 14.56 14.63 17.12 16.86 16.64 14.09
13.91 13.14 11.86 10.75 15.80 18.05 17.63 16.19 15.95 18.90 18.56 18.17 15.74
8.48 11.90 14.44 17.22 28.37 35.36 40.32 46.44 51.34 61.28 68.83 77.04 38.42
0.21 0.93 1.15 1.40 0.89 1.76 2.19 2.11 0.91 2.03 2.24 2.37 1.52
1.48 2.22 2.38 2.43 2.40 3.38 3.43 3.29 2.44 3.40 3.57 3.48 2.83
Time 35.41 36.60 36.55 38.87 180.42 162.78 154.89 160.59 415.69 357.94 340.52 343.74 188.67
algorithm are smaller than the respective values of the other algorithms. GA_LS algorithm obtained the smallest value of best for group 50 × 5 (the smallest size instances of second set). For all other instances, ILS_RVND_1 outperforms the other algorithms. By considering the overall average of the algorithms, GA_LS is better than ILS_RVND_2. The algorithms ILS_RVND_1, ILS_RVND_2 and GA_LS have overall averages (Avg.) of 0.67%, 15.74% and 2.83%, respectively. Table 1 also shows the average CPU times (Time) spent by the algorithms. It is observed that the GA_LS algorithm presents the longest times and the ILS_RVND_2 the smallest ones. For the algorithms ILS_RVND_1 and GA_LS, the CPU time generally increases as the number of vehicles decreases. By considering the average results (Avg.), a statistical test is performed to verify if the observed differences are in fact significant. Figure 2b displays the graph of means and Tukey’s HSD intervals, at 95% confidence, obtained from the statistical test in all medium–large instances. Once again, the best performance of the ILS_RVND_1 heuristic is clear. Its confidence interval does not overlap with the intervals of other heuristics. That is, ILS_RVND_1 is statistically better than the other heuristics. It is also observed that the second best algorithm is the GA_LS. We believe that ILS_RVND_1 is much better than the ILS_RVND_2, because the first algorithm uses all seven neighborhoods (Inter and Intra-Route) which are chosen randomly in the RVND_1 local search. The ILS_RVND_2 algorithm only uses IntraRoute neighborhoods if the solution is improved with an Inter-Route neighborhood. ILS_RVND_2 generally uses only the three Inter-Route neighborhoods. This makes the RVND_2 local search much faster; however, it loses in the quality of the solutions found.
234
G. P. Félix et al.
5 Conclusions This article addressed a problem related to production scheduling and distribution operations. An integrated single machine scheduling and vehicle routing problem is addressed, in order to minimize the sum of the weighted total tardiness of the jobs, the total cost of the used vehicles and the total travel time. This class of problem can be commonly found in companies that do not keep finished goods in stock, carry out make-to-order and express delivery services. In an attempt to determine optimal solutions to the problem, a MILP model was developed. As the problem is NP-hard, two hybrid heuristics based on the ILS and RVND metaheuristics were proposed. Priority rules were used to build good quality solutions. The solutions were improved with the RVND local search which uses seven neighborhoods. The parameters of the proposed heuristics were calibrated. Computational experiments showed that the ILS_RVND_1 heuristic presents better solutions than the ILS_RVND_2 heuristic. The proposed heuristics were compared against a genetic algorithm (GA_LS) from the literature. ILS_RVND_1 was better than GA_LS which in turn was better than ILS_RVND_2. The obtained results were validated by a statistical analysis. Acknowledgements This work was supported by CAPES and CNPq.
References 1. Archetti C, Feillet D, Speranza MG (2015) Complexity of routing problems with release dates. Eur J Oper Res 247(3):797–803 2. Braekers K, Ramaekers K, Van Nieuwenhuyse I (2016) The vehicle routing problem: state of the art classification and review. Comput Ind Eng 99:300–313 3. Chang YC, Li VC, Chiang CJ (2014) An ant colony optimization heuristic for an integrated production and distribution scheduling problem. Eng Opt 46(4):503–520 4. Chen ZL (2010) Integrated production and outbound distribution scheduling: review and extensions. Oper Res 58(1):130–148 5. Du J, Leung JYT (1990) Minimizing total tardiness on one machine is NP-hard. Math Oper Rese 15(3):483–495 6. Fink M, Morillo L, Hanne T, Dornberger R (2022) Optimizing an inventory routing problem using a modified Tabu search. In: Congress on intelligent systems. Springer, Singapore, pp 577–586 7. Karao˘glan ˙I, Kesen SE (2017) The coordinated production and transportation scheduling problem with a time-sensitive product: a branch-and-cut algorithm. Int J Prod Res 55(2):536–557 8. Liu L, Li W, Li K, Zou X (2020) A coordinated production and transportation scheduling problem with minimum sum of order delivery times. J Heuristics 26(1):33–58 9. Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. In: Handbook of metaheuristics. Springer, pp 320–353 10. Molina-Sánchez L, González-Neira E (2016) Grasp to minimize total weighted tardiness in a permutation flow shop environment. Int J Ind Eng Comput 7(1):161–176 11. Penna PHV, Subramanian A, Ochi LS (2013) An iterated local search heuristic for the heterogeneous fleet vehicle routing problem. J Heuristics 19(2):201–232
Iterated Local Search Heuristic for Integrated Single …
235
12. Tamannaei M, Rasti-Barzoki M (2019) Mathematical programming and solution approaches for minimizing tardiness and transportation costs in the supply chain scheduling problem. Comput Ind Eng 127:643–656 13. Ullrich CA (2013) Integrated machine scheduling and vehicle routing with time windows. Eur J Oper Res 227(1):152–165 14. Zarandi MHF, Asl AAS, Sotudian S, Castillo O (2020) A state of the art review of intelligent scheduling. Artif Intell Rev 53(1):501–593 15. Zou X, Liu L, Li K, Li W (2018) A coordinated algorithm for integrated production scheduling and vehicle routing problem. Int J Product Res 56(15):5005–5024
Modeling Volatility of Cryptocurrencies: GARCH Approach B. N. S. S. Kiranmai and Viswanathan Thangaraj
Abstract Since its inception in 2009, cryptocurrency’s acceptance remains controversial because of its nature, absence of inherent worth, and ambiguous issuance authority. This resulted in the significant volatility and unpredictability of the cryptocurrency price, leading to the loss of investors’ funds. In this study, we model the heteroskedastic volatility of cryptocurrencies and forecast the future price using GARCH. Historical analysis of cryptocurrencies shows volatility ranging from medium to extremely high volatility. Like other financial assets, cryptos exhibit volatility clustering, i.e., high volatility leading to further high volatility and low volatility leading to low volatility forming volatility clusters. In this paper, we apply seven variants, namely Standard GARCH (SGARCH), IGARCH (1,1), EGARCH (1,1), GJR-GARCH (1,1), Asymmetric Power ARCH (APARCH) (1,1), Threshold GARCH (TGARCH) (1,1), and Component GARCH (CGARCH) (1,1), of the GARCH model to forecast the volatility of cryptocurrencies and identify the bestfit model based on forecasting accuracy. We find that among the seven variants of GARCH, EGARCH and CGARCH models appropriately forecast the volatility of cryptocurrencies. We conclude that exponential and component GARCH provides the appropriate future price of meme and utility coins in the short run. Keywords Cryptocurrency · GARCH models · ARCH effect · Volatility
1 Introduction The market value of cryptocurrencies increased exponentially from less than the US $20 billion in January 2017 to over US $3 trillion as of November 2021 [1]. To make optimal asset allocation decisions, it is essential to have accurate real-time forecasts B. N. S. S. Kiranmai (B) · V. Thangaraj Symbiosis Institute of Business Management, A Constituent of Symbiosis International (Deemed) University, Bengaluru, India e-mail: [email protected] V. Thangaraj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_19
237
238
B. N. S. S. Kiranmai and V. Thangaraj
for cryptocurrency returns. It is, therefore, not surprising that the emerging literature has analyzed the predictability of cryptocurrency returns using various (linear and nonlinear) models and (economic, financial, and behavioral) predictors [2–9]. Investors, governments, and businesses alike have shown a strong interest in the volatility characteristics of financial markets due to its predominance and importance in many areas of risk management, security pricing, monetary policy, and asset allocation. For years, researchers and practitioners have studied, predicted, and modeled the volatility of financial markets and assets on an empirical and hypothetical level. Cryptocurrencies have become increasingly important in the financial world and the financial system, so it becomes increasingly important to anticipate their volatility, given the complex dynamics underlying their volatility. In contrast to traditional currencies, cryptocurrencies are digital or virtual currencies and mediums of exchange that use cryptography to protect financial transactions. Due to their deflationary and decentralized nature, most cryptocurrencies are characterized by a limited supply and being decentralized and thus immune to central banking systems and governmental interference [10]. As a result, they offer many advantages over traditional payment methods, including speed and liquidity, low transaction costs, and anonymity. Cryptocurrencies are unregulated and digital, making them attractive targets for hackers [11]. The previous research primarily concentrates on Bitcoin, but there is no literature on classifying cryptocurrencies based on utility. In this paper, we used the GARCH models to forecast the cryptocurrency returns, test the predictive capacity of the GARCH models, and test whether the results vary between meme and utility coins.
2 Literature Review Cryptocurrency volatility research is extensive, well-known, and documented. Recent research by Kurihara and Fukushima [12] has found that there are a variety of different models of generalized univariate ARCH that can be used to analyze Bitcoin volatility. The study found that both symmetric and asymmetric generalized ARCH models provide a good description of Bitcoin prices over short- and long-term periods, and traders should consider both long-term and short-term volatilities while analyzing prices. Katsiampa [13] studied a variety of models of generalized ARCH to understand Bitcoin volatility. The Hannan–Quinn methods and Akaike Bayesian were used to determine the most accurate model for the study. The data analysis shows that the generalized ARCH model is effective and concluded that both short-term and long-term volatilities are important factors to consider when assessing conditional variance. As stated by Chu et al. [14], cryptocurrencies, with the exception of Bitcoin, have received little attention from researchers due to their complex mathematical models. Chu et al. [14] looked at a variety of generalized ARCH models and error distributions to better understand the behavior of seven cryptocurrencies. Empirical
Modeling Volatility of Cryptocurrencies: GARCH Approach
239
evidence proposes that the normal distribution is the most common distribution that is successful and that the symmetric integrated generalized ARCH model is the most accurate model in most instances. The Markov-switching generalized ARCH model was tested on Bitcoin to test its capability to forecast the value-at-risk [15]. Generalized ARCH models in the presence of regime changes, according to Ardia [15], lead to insufficient risk predictions. Based on their results, the two Markov-switching generalized ARCH models show the highest degree of reliability for forecasting Bitcoin risks. In their study of Bitcoin/USD price volatility, Naimy and Hayek [16] focused on the GARCH, EWMA, and EGARCH models and found that out of these models, EGARCH (1,1) outperformed the others in both in-sample and out-of-sample. GARCH models have been provided for the first time on seven of the most popular cryptocurrencies by Chu et al. [14]. The models were evaluated by using information criteria and fitting 12 different types of GARCH models. The integrated GARCH and Glosten, Jagannathan, and Runkle GARCH models offer the most accurate fitting in-sample among the cryptocurrencies. According to Bouoiyour and Selmi [17], the volatility of Bitcoin is higher in the events of negative shocks than in the events of positive shocks, suggesting a leverage effect is involved. Trucíos [18] used robust generalized ARCH and generalized autoregressive score models to analyze a cryptocurrency portfolio’s risk and expected shortfall. This paper studies how Bitcoin returns and popularity are correlated by fitting a generalized ARCH (1,1) model to daily Google Trends data, Wikipedia data, and Twitter data to determine whether web content can be predictive. Charles and Darné [19] extended Katsiampa’s study by identifying jumps in Bitcoin returns. They estimated generalized ARCH models on filtered data and used a semiparametric test to detect jumps. Based on filtered returns, the research team concludes that Bitcoin returns are characterized by jumps and that the best result is shown by AR-GARCH. Catania et al. [20] and other researchers argue that volatility models that take into account leverage and skewness can be more accurate than traditional models. Stavroyiannis and Babalos [21] estimated the fractionally integrated APARCH modeling (1, d,1) model with skewness using data collected from 2013 to 2016. When considering how Bitcoin could be used as a hedge, the researchers concluded that the US market is not a good environment to use Bitcoin as a hedge or diversifier. Based on the study by Balcilar et al. [22], trading volume can be used to forecast volatility for Bitcoin using a quantiles-based approach. The researchers concluded that trading volume could predict returns, but it is not nearly as precise as predicting volatility when the market is in median mode. Most studies focus on Bitcoin, with a strong correlation between cryptocurrencies. However, Burnie found that forks of cryptocurrencies have a similar correlation. To make accurate predictions for cryptocurrency prices, [23] examined the past to understand better how prices change, and a machine learning algorithm was used to predict the 12 most liquid cryptocurrencies. Plakandaras et al. [6] used different machine learning techniques to forecast the price of the cryptocurrency. Hyun et al.
240
B. N. S. S. Kiranmai and V. Thangaraj
[24] studied the directional dependence structure among cryptocurrencies using copula and neural network models. Using GARCH and EGARCH models with independent variables, Dyhrberg [25] compared the volatility of gold, US dollar, and Bitcoin. In his view, as a medium of exchange and store of value, Bitcoin has potential uses in portfolio management and in financial markets, as it resides between US dollar and gold on a scale from a pure store of value to pure medium of change. In another paper, Dyhrberg [26], using the asymmetric GARCH method, explored the short-term hedging potential of Bitcoin against FTSE index and US dollar. According to Cermak [27], in China, the US, and Europe, Bitcoin’s volatility behaves similarly to that of fiat currencies when GARCH (1,1) is used as an explanatory variable. However, that is not the case in Japan. Among four major cryptocurrency returns, Kumar and Anandarao [28] explore the dynamics of volatility spillover. The period covered by this study was August 2015–January 2018. According to results, Bitcoin volatility spillovers are statistically significant from Ethereum and Litecoin, with spillovers increasing after 2017. Chu et al. [14] estimated the volatility of the seven most popular cryptocurrencies using 12 GARCH-type models. The GJR-GARCH (1,1) and GARCH (1,1) were both best fits for Dogecoin, while the GJR-GARCH (1,1) was best fitted for Ripple. With the help of the multivariate GARCH model, Holtappels [29] quantified how cryptocurrencies behaved in comparison to certain fiat currencies and indexes. There is a strong correlation between the past variance values of cryptocurrencies and their current variance, and the variance forecast of cryptocurrencies is exploding. Recently, as a part of their study on the Bitcoin stylized facts related to volatility [30, 31]. The purpose of this paper is to forecast the price returns of meme coins and utility coins. The rationale for predicting the returns of these cryptocurrencies is as follows: Meme coins are like cryptocurrencies that are inspired by memes that are popular on the internet and social media. The first meme-based cryptocurrency was Dogecoin (DOGE). Meme coins tend to be highly volatile. The main reason these new cryptocurrencies are gaining popularity so quickly is because they are communitydriven and often get a lot of support from online communities. The study found that Dogecoin and Shiba Inu are the two dominant cryptocurrencies, with Monacoin and Dogelon Mars having high market share after them. Utility coins chosen for the study are Ethereum (ETH) and Ethereum Classic (ETC) and are decentralized blockchain with smart contract capabilities on which DeFi gaming can be developed. Monero and Verge are cryptocurrencies that focus on privacy.
3 Methodology We use seven GARCH-type models, namely the Standard GARCH (SGARCH), IGARCH (1,1), EGARCH (1,1), GJR-GARCH (1,1), Asymmetric Power ARCH (APARCH) (1,1), Threshold GARCH (TGARCH) (1,1), and Component GARCH
Modeling Volatility of Cryptocurrencies: GARCH Approach
241
(CGARCH) (1,1), to model the time-varying volatility of the selected eight cryptocurrencies.
3.1 Input Data For this paper, we have taken six-month daily prices starting from 08-03-2022 to 0809-2021 for four utility coins (Ethereum (ETH), Ethereum Classic (ETC), Monero (XMR), and Verge (XVR)) and four meme coins (Dogecoin (DOGE), Shiba Inu coin (SHIB), Monacoin (MONA), and Dogelon coin (ELON)) by taking USD as the base currency for all the coins.
3.2 Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Model The GARCH model is applied to forecast the volatility of cryptocurrencies. The model predicts the volatility based on the past lag of cryptocurrencies’ prices and past errors in prediction. The general nature of financial assets is that the current price is influenced by the recent past with significant lag influence. The optimal lags influencing the current price are denoted as “p” in GARCH, and the lag order for past errors is denoted as “q”. Steps in the GARCH model Following are the steps followed in building the GARCH model: • Examine Autoregressive Conditional Heteroskedasticity Effect (ARCH) in the historical price movement of cryptocurrencies. • Identify the optimal lag orders of p and q. • Construct GARCH model of order p, q. • Optimize the model parameters. • Forecast n step ahead volatility of cryptocurrencies. Equation of ARCH and GARCH model The ARCH model detects the autocorrelation among residuals in the forecasting model. The variance of yt is conditional on yt − 1 at time t is modeled as 2 Var(yt (yt |yt−1 ) = σt2 = α0 + α1 yt−1
We impose the constraints ≥ 0 and ≥ 0 to avoid negative variance. GARCH models the volatility using the order of p and q. The equation of the GARCH model is given below
242
B. N. S. S. Kiranmai and V. Thangaraj 2 σt2 = ω + α1 u 2t−1 + β1 σt−1
εt ∼ i · i · d(0, 1) The three components in the conditional variance equation are: ω—Long-run variance u 2t−1 —squared residuals at time t − 1 2 —variance at time t − 1 σt−1 α1 and β1 —parameters to be estimated.
4 Results and Analysis 4.1 Descriptive Statistics Table 1 shows the descriptive statistics of cryptocurrencies. We examine the nature of distribution using skewness, kurtosis, and the Jarque–Bera test. The test results show that all cryptocurrencies do not exhibit uniform price movement. The returns are positively skewed for DOGE, ETC, ETH, and SHIB. The returns are negatively skewed for other currencies. The Jarque–Bera test indicates that the returns of cryptocurrencies are not normally distributed.
4.2 ARCH Effect Table 2 indicates the results of ARCH effect on the residuals of cryptocurrencies. Forecasting volatility using GARCH is possible only when there is autocorrelation among the residuals. We perform ARCH tests individually for every cryptocurrency for a 6-lag period at 5% significance level. The results indicate the presence of the ARCH effect for all currencies. When there is an ARCH effect, a conclusion can be drawn that the current variance is influenced by the past variances of any lag order. It reveals the patterns of volatility clusters, where the periods of high volatility followed by high volatility and low volatility lead to further decline in volatility.
4.3 GARCH Models The volatility patterns of cryptocurrencies are influenced by macro-econometric factors, investor sentiment, and the rules and regulations governing cryptocurrencies in various countries. It is pertinent for investors in cryptocurrencies to understand the nature of volatility and also predict future volatility. The visual examination and application of the ADF test show the heteroskedastic nature of volatility among cryptos.
1.836872
8.00E − 05
41.61738
3554.011
1.332609
3.07E − 05
218.4726
0.018565
ETC
ETH
MONA
SHIB
XMR
XVR
0.03504
289.7658
4812.087
60.90315
2.29E − 06
9.58E − 07
ELON
Maximum
0.300447
0.194997
DOGE
Mean
0.008964
144.479
7.00E − 06
0.959027
2405.181
24.04541
5.02E − 08
0.117105
Minimum
Table 1 Descriptive statistics for cryptocurrency data (Figure in USD) Std. dev.
0.0062
43.26938
1.49E − 05
0.239239
656.0606
11.4143
5.86E − 07
0.047233
Jarque–Bera
8.498553
13.71626
15.27889
10.05825
10.34317
18.66985
6.585054
12.49373
Probability
0.014275
0.001051
0.000481
0.006545
0.005676
0.000088
0.03716
0.001937
Sum sq. dev.
0.006959
338,875.3
4.02E − 08
10.35955
77,905,216
23,581.82
6.21E − 11
0.403798
Modeling Volatility of Cryptocurrencies: GARCH Approach 243
331.52 1.0261E − 72 333.16 4.5132E − 73 299.07 1.1407E − 65 332.54 6.1519E − 73 292.18 3.5881E − 64
171.39
3.675E − 39
155.94
8.7108E − 36
171.24
3.9617E − 39
160.41
9.2163E − 37
Score_DOGE_USD
P-Value_DOGE_USD
Score_SHIB_USD
P-Value_SHIB_USD
Score_MONA_USD
P-Value_MONA_USD
Score_ELON_USD
P-Value_ELON_USD
3.6733E − 73
1.8578E − 39
P-Value_XMR_USD
170.69
333.58
172.75
Score_XMR_USD
5.2411E − 39
2.9274E − 75
4.1133E − 40
P-Value_ETC_USD
P-Value_XVR_USD
343.24
175.75
Score_ETC_USD
Score_XVR_USD
337.61 4.8798E − 74
174.01
9.854E − 40
2
P-Value_ETH_USD
1
Score_ETH_USD
LAG
Table 2 ARCH effect for meme coins and utility coins
8.7595E − 86
397.24
5.053E − 105
486.04
1.2742E − 91
424.19
5.704E − 106
490.41
1.572E − 104
483.77
6.359E − 105
485.58
3.662E − 109
505.14
7.3E − 107
494.53
3
2.73E − 104
487.93
2.12E − 135
631.71
1.626E − 115
539.83
6.561E − 138
643.30
2.302E − 134
626.93
3.273E − 135
630.84
9.568E − 142
661.02
2.521E − 138
645.22
4
8.609E − 122
573.94
9.226E − 164
768.09
3.467E − 136
640.56
3.055E − 168
788.81
1.139E − 162
763.05
8.789E − 164
768.19
6.618E − 173
810.37
2.339E − 168
789.34
5
3.685E − 140
663.96
2.143E − 190
896.50
1.529E − 153
725.94
3.227E − 197
928.06
3.37E − 189
890.97
1.483E − 190
897.24
4.693E − 203
955.05
2.613E − 197
928.48
6
244 B. N. S. S. Kiranmai and V. Thangaraj
Modeling Volatility of Cryptocurrencies: GARCH Approach
245
The families of GARCH models have the power to model and predict heteroskedastic volatility. GARCH facilitates modeling conditional heteroskedasticity, IGARCH detects the persistence of past shocks in the price movement, EGARCH captures the non-contemplated volatility which GARCH could not model, GJR-GARCH improves the forecasting accuracy through leverage effect, and APARCH reveals the components of volatility such as fat-tails, the persistence of volatility, asymmetry, and leverage effect. CSGARCH decomposes the volatility into permanent and transitory components. We apply the families of the GARCH model and evaluate the forecasting accuracy of all models using AIC. The model with minimum AIC is considered optimum to model and predict the volatility of cryptocurrencies. We model the volatility of all cryptocurrencies by applying all families of GARCH, and the results are tabulated below. The interpretation and analysis of the results of GARCH models of volatility for every cryptocurrency from Tables 3, 4, 5, 6, 7, 8, 9, and 10 are explained in detail under the discussion section. Table 3 GARCH models for ETH_USD ETH_USD GARCH
IGARCH
EGARCH
GJR-GARCH
APARCH
CSGARCH
Alpha
2.74E − 11
5.35E − 09
− 0.340
7.66E − 09
8.38E − 02
1.15E − 07
Beta
9.78E − 01
1.00E + 00
0.652
1.00E + 00
4.72E − 01
1.06E − 05
a+β
9.78E − 01
1.00E + 00
0.312
1.00E + 00
5.56E − 01
1.07E − 05
U
–
–
− 0.062
− 4.68E − 02
1.00E + 00
–
δ
–
–
–
–
1.39E + 00
–
ρ
–
–
–
–
–
1.00E + 00
Ø
–
–
–
–
–
1.00E −08
AIC
12.823
12.814
12.781
12.805
12.82
12.858
Table 4 GARCH models for ETC_USD ETC_USD GARCH
IGARCH
EGARCH
CSGARCH
0.011
1.13E − 11
0.467
0.996
9.80E − 01
3.89E − 02
a + β 9.97E − 01 1.00E + 00
3.44E − 02 1.008
9.80E − 01
5.06E − 01
U
–
–
0.510
− 0.018
– 1.00E + 00 –
δ
–
–
–
–
3.50E + 00
–
ρ
–
–
–
–
–
9.98E − 01
Ø
–
–
–
–
–
0.038
AIC
3.9549
3.958
3.9265
3.9615
3.9301
3.935
Alpha 3.65E – 03 Beta
0.030019056 − 0.109
GJR-GARCH APARCH
9.94E − 01 0.969980944 0.143
246
B. N. S. S. Kiranmai and V. Thangaraj
Table 5 GARCH models for XMR_USD XMR_USD GARCH
GJR-GARCH APARCH
CSGARCH
Alpha 3.76E − 09 1.16E − 09 − 0.123
3.30E − 06
4.85E − 20
0.038
9.99E − 01 1.00E + 00 − 0.233
1.00E + 00
9.96E − 01
0.422
a + β 9.99E − 01 1.00E + 00 − 3.56E − 01 1.00E + 00
9.96E − 01
4.60E − 01
Beta
IGARCH
EGARCH
U
–
–
− 0.0537
− 5.18E − 02 − 1.00E + 00 –
δ
–
–
–
–
2.52E + 00
–
ρ
–
–
–
–
–
0.989
Ø
–
–
–
–
–
0.000
AIC
7.4033
7.3967
7.41
7.3918
7.4178
7.4316
GJR-GARCH
APARCH
CSGARCH
Alpha 1.28E − 01 1.34E − 01 0.182
1.11E − 01
5.89E − 02
1.16E − 01
Beta
8.57E − 01 8.66E − 01 0.984
9.47E − 01
8.57E − 01
7.02E − 01
a+β
9.85E − 01 1.00E + 00 1.17E + 00 1.06E + 00
U
–
–
− 0.080
− 1.56E − 01 − 6.65E − 01 –
δ
–
–
–
–
2.35E + 00
–
ρ
–
–
–
–
–
9.94E − 01
Ø
–
–
–
–
–
9.98E − 02
AIC
− 10.852
− 10.861
− 10.972
− 10.954
− 10.92
− 10.837
Table 6 GARCH models for XVR_USD XVR_USD GARCH
IGARCH
EGARCH
9.16E-01
Table 7 GARCH models for DOGE_USD DOGE_USD GARCH
IGARCH
EGARCH
GJR-GARCH
APARCH
CSGARCH
Alpha
0.634251327
2.09E − 01
− 0.213
7.34E − 02
2.44E − 01
6.03E − 01
Beta
0.147795389
7.91E − 01
0.407
1.10E − 01
5.92E − 02
1.19E − 01
a+β
7.82E − 01
1.00E + 00
0.194
1.84E − 01
3.03E − 01
7.22E − 01
U
–
–
0.756
7.05E − 01
6.00E − 01
–
δ
–
–
–
–
2.98E + 00
–
ρ
–
–
–
–
–
1.00E + 00
Ø
–
–
–
–
–
7.04E − 02
AIC
− 6.5984
− 6.5643
− 6.5951
− 6.6127
− 6.6076
− 6.6261
Modeling Volatility of Cryptocurrencies: GARCH Approach
247
Table 8 GARCH models for SHIB_USD SHIB_USD GARCH
IGARCH
EGARCH
GJR-GARCH
APARCH
CSGARCH
Alpha 5.00E − 02 5.00E − 02 5.38E − 01
5.00E − 02
5.00E − 02 1.44E − 02
Beta
9.00E − 01 9.50E − 01 9.68E − 01
9.00E − 01
9.00E − 01 9.99E − 02
a+β
9.50E − 01 1.00E + 00 1.51E + 00
9.50E − 01
9.50E − 01
U
–
–
− 3.04E − 01 5.00E − 02
5.00E − 02 –
δ
–
–
–
–
2.00E + 00 –
ρ
–
–
–
–
–
Ø
–
–
–
–
–
AIC
− 22.019
− 22.049
− 22.987
− 2.20E + 01 − 21.951
− 22.155
GJR-GARCH APARCH
CSGARCH
9.01E − 01 6.60E − 02
Table 9 GARCH models for MONA_USD MONA_USD GARCH
IGARCH
Alpha 1.49E − 01 1.65E – 01
EGARCH
0.171
1.71E − 07
Beta
8.36E − 01 8.35E − 01 0.94994412
− 0.09393797 0.065 0.787
0.848
1.36E − 01
a+β
9.85E − 01 1.00E + 00 8.56E − 01
8.51E − 01
1.02E + 00 1.36E − 01
U
–
–
0.308
0.255
0.199
–
δ
–
–
–
–
0.627
–
ρ
–
–
–
–
–
9.84E − 01
Ø
–
–
–
–
–
1.36E − 01
AIC
− 3.1929
− 3.2029
− 3.2335
− 3.2144
− 3.2242
− 3.1678
Table 10 GARCH models for ELON_USD ELON_USD GARCH
IGARCH
EGARCH
GJR-GARCH
APARCH
CSGARCH
Alpha
5.00E − 02
5.00E − 02
1.32E − 01
5.00E − 02
5.00E − 02
5.90E − 02
Beta
9.00E − 01
9.50E − 01
9.52E − 01
9.00E − 01
9.00E − 01
2.47E − 01
a+β
9.50E − 01
1.00E + 00
1.08E + 00
9.50E − 01
9.50E − 01
3.06E − 01
U
–
–
1.25E + 00
5.00E − 02
5.00E − 02
–
δ
–
–
–
–
2.00E + 00
–
ρ
–
–
–
–
–
4.17E − 01
Ø
–
–
–
–
–
2.01E − 01
AIC
− 28.259
− 28.416
− 29.539
− 28.233
− 28.216
− 25.47
248
B. N. S. S. Kiranmai and V. Thangaraj
Table 3 presents the results of Ethereum ETH_USD. The parameters of GARCH are significant at 5% level for all GARCH. The AIC value is low when the EGARCH model is applied indicating that this model is best fit for forecasting ETH_USD. Table 4 presents the results of Ethereum Classic ETC_USD. The parameters of GARCH are significant at 5% level for all GARCH except GJR-GARCH. The AIC value is low when EGARCH model is applied indicating that this model is best fit for forecasting ETC_USD. Table 5 presents the results of Monero XMR_USD. The parameters of GARCH are significant at 5% level for all GARCH except GJR-GARCH. The AIC value is low when GJR-GARCH model is applied indicating that this model is best fit for forecasting XMR_USD. Table 6 presents the results of Verge XVR_USD. The parameters of GARCH are significant at 5% level for all GARCH. The AIC value is low when the EGARCH model is applied indicating that this model is best fit for forecasting XVR_USD. Table 7 presents the results of the Dogecoin DOGE_USD. The parameters of GARCH are significant at 5% level for all GARCH except EGARCH. The AIC value is low when CSGARCH model is applied indicating that this model is best fit for forecasting DOGE_USD. Table 8 presents the results of Shiba Inu coin SHIB_USD. The parameters of GARCH are significant at 5% level for all GARCH. The AIC value is low when the EGARCH model is applied indicating that this model is best fit for forecasting SHIB_USD. Table 9 presents the results of the Monacoin MONA_USD. The parameters of GARCH are significant at 5% level for all GARCH. The AIC value is low when the EGARCH model is applied indicating that this model is best fit for forecasting MONA_USD. Table 10 presents the results of the Dogelon Mars coin ELON_USD. The parameters of GARCH are significant at 5% level for all GARCH. The AIC value is low when the EGARCH model is applied indicating that this model is best fit for forecasting ELON_USD.
5 Discussion The basic premise of feature engineering is to improve forecasting accuracy. The naïve GARCH model can model the conditional volatility by estimating the parameters of long-run variance, autoregressive (AR) and moving average components (MA). The AR terms show the influence of past price on the current error, and the MA components indicate the lag relationship between the current and past errors in forecasting. The model is reasonably fit enough to apply when there is conditional heteroskedasticity. It also assumes that the shocks happened in the past will die out soon with the increase in time. However, GARCH has its own limitation in predicting volatility when the shocks are persistent. Therefore, the forecasting accuracy can be improved by adding more features and optimize the estimation parameters. The
Modeling Volatility of Cryptocurrencies: GARCH Approach
249
families of GARCH models such as IGARGH, EGARCH would increase the estimation parameters. Application of variants of GARCH shows that including more features facilitates decomposing the patterns of volatility into various components of volatility. The forecasting accuracy of the GARCH model is tested by applying Akaike Information Criterion (AIC). The model with minimum AIC is considered to be the best fit due to less information loss and more accuracy in forecasting. The analysis of results shows that EGARCH is the appropriate model to forecast volatility of ETH_USD, ETC_USD, XVR_USD, SHIB_USD, MONA_USD, and ELON_USD. The obtained minimum AIC value for EGARCH for ETH_USD is (12.781), ETC_USD is 3.92, XVR_USD is − 10.92, SHIB_USD is − 22.92, and ELON_USD is − 29.539. Research reports indicate that Ethereum highly correlates with stock markets. The advantage of using EGARCH is to forecast volatility without having any parameter restrictions. On the contrary, using GARCH expects the parameters to be less than zero. The CSGARCH model improves the forecasting accuracy of DOGE_USD.
6 Conclusion We apply seven types of GARCH to model the volatility of cryptocurrencies and identify the model that best fit to forecast. We find that EGARCH and CSGARCH models provide the best fit. EGARCH is an extension of GARCH model, which is applied to model conditional heteroskedasticity. The volatility of cryptocurrencies changes over time, and it can be modeled by EGARCH model. The grace of investing in cryptocurrencies is increasing over the years, and the market has been witnessing more cryptocurrencies in circulation. Still the world has not concluded whether buying a cryptocurrency is an investment or speculation? Whether cryptocurrencies will compete with the fiat currencies or not? Will countries legalize or ban trading in cryptocurrencies? Since the cryptocurrencies are unregulated, various factors induce the volatility of cryptocurrencies. As a result, cryptocurrencies exhibit volatility that changes overtime. Application of EGARCH models would improve the forecasting accuracy of cryptocurrencies’ volatility.
References 1. Iyer T (2022) Cryptic connections: spillovers between crypto and equity markets. 13 2. Catania L, Grassi S, Ravazzolo F (2019) Forecasting cryptocurrencies under model and parameter instability. Int J Forecast 35(2):485–501. https://doi.org/10.1016/j.ijforecast.2018. 09.005 3. Nasir MA, Huynh TLD, Nguyen SP, Duong D (2019) Forecasting cryptocurrency returns and volume using search engines. Financ Innov 5(1):2. https://doi.org/10.1186/s40854-018-0119-8 4. Sun X, Liu M, Sima Z (2020) A novel cryptocurrency price trend forecasting model based on LightGBM. Financ Res Lett 32:101084. https://doi.org/10.1016/j.frl.2018.12.032
250
B. N. S. S. Kiranmai and V. Thangaraj
5. Kraaijeveld O, De Smedt J (2020) The predictive power of public Twitter sentiment for forecasting cryptocurrency prices. J Int Finan Markets Inst Money 65:101188. https://doi.org/10. 1016/j.intfin.2020.101188 6. Plakandaras V, Bouri E, Gupta R (2021) Forecasting Bitcoin returns: is there a role for the US–China trade war? J Risk. Accessed: 26 May 2022. (Online). Available: https://www.risk. net/node/7796966 7. Sebastião H, Godinho P (2021) Forecasting and trading cryptocurrencies with machine learning under changing market conditions. Financ Innov 7(1):3. https://doi.org/10.1186/s40854-02000217-x 8. Bouri E, Gupta R (2021) Predicting Bitcoin returns: Comparing the roles of newspaper- and internet search-based measures of uncertainty. Financ Res Lett 38:101398. https://doi.org/10. 1016/j.frl.2019.101398 9. Koki C, Leonardos S, Piliouras G (2022) Exploring the predictability of cryptocurrencies via Bayesian hidden Markov models. Res Int Bus Financ 59:101554. https://doi.org/10.1016/j. ribaf.2021.101554 10. Fantazzini D, Nigmatullin E, Sukhanovskaya V, Ivliev S (2016) Everything you always wanted to know about bitcoin modelling but were afraid to ask. I. Appl Econometrics 44:5–24 11. Grinberg R (2011) Bitcoin: an innovative alternative digital currency. 4(50) 12. Kurihara Y, Fukushima A (2018) How does price of bitcoin volatility change? Int Res Econ Finan 2:8. https://doi.org/10.20849/iref.v2i1.317 13. Katsiampa P (2017) Volatility estimation for Bitcoin: a comparison of GARCH models. Econ Lett 158:3–6. https://doi.org/10.1016/j.econlet.2017.06.023 14. Chu J, Chan S, Nadarajah S, Osterrieder J (2017) GARCH modelling of cryptocurrencies. J Risk Financ Manage 10(4). Article no. 4. https://doi.org/10.3390/jrfm10040017 15. Ardia D, Bluteau K, Rüede M (2019) Regime changes in Bitcoin GARCH volatility dynamics. Financ Res Lett 29:266–271. https://doi.org/10.1016/j.frl.2018.08.009 16. Naimy VY, Hayek MR (2018) Modelling and predicting the Bitcoin volatility using GARCH models. Int J Math Model Numer Optimisation 8(3):197–215. https://doi.org/10.1504/IJM MNO.2018.088994 17. J. Bouoiyour and R. Selmi, “What Does Bitcoin Look Like?,” p. 44, 2015. 18. Trucíos C, Tiwari AK, Alqahtani F (2020) Value-at-risk and expected shortfall in cryptocurrencies’ portfolio: a vine copula–based approach. Appl Econ 52(24):2580–2593. https://doi. org/10.1080/00036846.2019.1693023 19. Charles A, Darné O (2019) Volatility estimation for Bitcoin: replication and robustness. Int Econ 157:23–32. https://doi.org/10.1016/j.inteco.2018.06.004 20. Catania L, Grassi S, Ravazzolo F (2018) Predicting the Volatility of cryptocurrency time-series. In: Corazza M, Durbán M, Grané A, Perna C, Sibillo M (eds) Mathematical and statistical methods for actuarial sciences and finance. Springer International Publishing, Cham pp 203– 207. https://doi.org/10.1007/978-3-319-89824-7_37 21. Stavroyiannis S, Babalos V (2017) Dynamic properties of the bitcoin and the US market. Social Science Research Network, Rochester, NY, SSRN Scholarly Paper 2966998, May 2017. https:// doi.org/10.2139/ssrn.2966998 22. Balcilar M, Bouri E, Gupta R, Roubaud D (2017) Can volume predict Bitcoin returns and volatility? A quantiles-based approach. Econ Model 64:74–81. https://doi.org/10.1016/j.eco nmod.2017.03.019 23. Akyildirim E, Goncu A, Sensoy A (2021) Prediction of cryptocurrency returns using machine learning. Ann Oper Res 297(1):3–36. https://doi.org/10.1007/s10479-020-03575-y 24. Hyun S, Lee J, Kim J-M, Jun C (2019) What coins lead in the cryptocurrency market: using copula and neural networks models. J Risk Financ Manage 12:3. (Article no. 3). https://doi. org/10.3390/jrfm12030132 25. Dyhrberg AH (2016) Bitcoin, gold and the dollar—a GARCH volatility analysis. Financ Res Lett 16:85–92. https://doi.org/10.1016/j.frl.2015.10.008 26. Dyhrberg AH (2016) Hedging capabilities of bitcoin. Is it the virtual gold? Financ Res Lett 16:139–144. https://doi.org/10.1016/j.frl.2015.10.025
Modeling Volatility of Cryptocurrencies: GARCH Approach
251
27. Cermak V (2017) Can bitcoin become a viable alternative to fiat currencies? An empirical analysis of bitcoin’s volatility based on a GARCH model. SSRN J. https://doi.org/10.2139/ ssrn.2961405 28. Kumar AS, Anandarao S (2019) Volatility spillover in crypto-currency markets: some evidences from GARCH and wavelet analysis. Phys A 524:448–458. https://doi.org/10.1016/j.physa. 2019.04.154 29. Holtappels LA (2018) Cryptocurrencies: modelling and comparing time-varying volatility— the MGARCH approach. 35 30. Nikolova V, Trinidad Segovia JE, Fernández-Martínez M, Sánchez-Granero MA (2020) A novel methodology to calculate the probability of volatility clusters in financial series: an application to cryptocurrency markets. Mathematics 8(8):1216. https://doi.org/10.3390/math8081216 31. Dimitrova V, Fernández-Martínez M, Sánchez-Granero MA, Trinidad Segovia JE (2019) Some comments on bitcoin market (in) efficiency. PLoS ONE 14(7):e0219243. https://doi.org/10. 1371/journal.pone.0219243
Digital Boolean Logic Equivalent Reversible Quantum Gates Design Bikram Paul, Nupur Choudhury, Eeshankur Saikia, and Gaurav Trivedi
Abstract Digital logic equivalent quantum gates for easy integration of quantum logic with existing classical counterparts are proposed in this work. The proposed quantum gates are used to develop algorithms to emulate classical logic operations using the matrix transfer function method. The number of ancilla qubits for developing quantum gates is reduced by introducing swap operation. The proposed basic classical gates are implemented and validated for their correctness on IBM quantum experience (IBM QE) by applying 1024, 2048, and 8096 input samples. Subsequently, several classical circuits are realized employing the proposed set of classical equivalent quantum gates, which are validated with a significant number of statistical measurement samples. The resultant probabilistic measurements are more than 72% in favor of all the desired outcome for both basic gates and classical circuits, whereas the cumulative probability of the top two expected qubit states is greater than 95%. Keywords Qubit · Quantum gate · Classical logic · Quantum circuits · Reversible logic
1 Introduction According to Moore’s law, the system becomes more complex daily as devices become increasingly integrated. Additionally, this worsens the issue of power dissipation at lower technological nodes. By recording each intermediate state in the memory and allowing for the reversibility of computations, every output may be linked back to its original input. The energy dissipation that is the paramount need in low-power VLSI design can be decreased using reversible logic gates. Reversible logic allows the system to function forward and backward and halt or return to any B. Paul (B) · G. Trivedi Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India e-mail: [email protected] N. Choudhury · E. Saikia Guwahati University, Guwahati, Assam 781014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_20
253
254
B. Paul et al.
point in the computation’s history. Reversible circuits have unique characteristics that make them useful for optical technology, nanotechnology, and quantum computing applications. Reversible circuits are now a popular technology for cutting-edge, power-efficient computing systems. Due to the widespread belief that feedback is not conceivable in reversible logic, researchers first concentrated primarily on synthesizing reversible logic for combinational circuits [9, 26]. Toffoli asserted, however, that a reversible sequential circuit is feasible if a delay element delivers the feedback because the feedback data will be available as the input to the reversible combinational circuit in the following clock cycle. The mathematical transfer function of a quantum gate is expressed as a 2-D square matrix where the number of elements of a row or column depends on all the possible states of the input qubits. A single qubit can be represented as a column vector, and a gate acting on that qubit can be represented by a 2 × 2 matrix. As we know, quantum gates always satisfy unitary property [17, 29] and the relation, U U = I , where U is the transfer function, and I is an identity matrix. This relation implies that the sum of the probability amplitudes of all the input states must be 1. This property ensures the reversible logic in quantum gates, which means inputs can be regenerated by feeding the output to a quantum circuit as input. If we consider {α, β} and {α , β } are the values for the probability amplitudes of the qubit before and after a particular operation applied, then the relation, |α|2 + |β|2 = |α |2 + |β |2 = 1, must hold. Transfer functions of a few single input quantum gates are given below where I , X , Y , and Z are Pauli’s identity gate, NOT gate, and multiplication of a qubit by i and −1, respectively. I =
10 ; 01
X=
01 0 −i ; Y = ; 10 i 0
Z=
1 0 0 −1
The phase gate S changes the qubit by 90◦ , and a Hadargate sets equal probability of |0 and |1 are expressed as below.
10 S= ; 0i
1 1 H= 1 −1
Multiple qubits quantum gates are essential for implementing complex quantum algorithms to have reversible [26] properties. Four possible states exist for two qubits as input; therefore, the transfer functions are represented by 4 × 4 matrices. For example, the transfer function of a two-input controlled quantum CNOT gate can be represented below. ⎡ ⎤ 1000 ⎢0 1 0 0⎥ ⎥ CNOT = ⎢ ⎣0 0 0 1⎦ 0010
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
255
Quantum Information Processing Qubit is a 2-D Hilbert space (H ) expressed as H 2 ≈ C 2 , where C is the average computational basis of each quantum state. Qubits are represented as vectors states from the Hilbert space |ϕ = c0 |0 + c1 |1, where |0 and |1 are an orthonormal set or standard computational basis and c0 , c1 ∈ C, |c0 |2 + |c1 |2 = 1. The physical realization of a qubit can be the spin of a quantum particle expressed as |ϕ = c↑ |↑ + c↓ |↓ where |0 = |↑ and |1 = |↓, or energy levels of an atom or ion, or opposite super-conducting fluxes in a super-conducting flux, etc. Quantum logic operations are the transformation of quantum state vectors in a Hilbert space. It is to mention that all the operations are unitary; therefore, they obey reversible logic criteria [5, 16]. A quantum state of n qubits is a vector in 2n -dimensional Hilbert space denoted by n ⊗ H 2 = H 2 ⊗ H 2 ⊗ . . . H 2 (n − times) = H 2n . Measurement of a single qubit |ϕ = c0 |0 + c1 |1 in the standard computational basis gives classical bit of information as given below. • If the probability |c0 |2 gives the measurement result M = 0 and then immediately after the measurement quantum state shall collapse to |ψ = |0; • If the probability |c1 |2 gives the measurement result M = 1 and then immediately shall collapse after the measurement quantum state to |ψ = |1. While designing a quantum computer and communication system, it is impossible to eliminate classical circuits, especially during data storage and other Boolean operations. The shortcomings of the existing quantum gates can be observed while replicating basic Boolean logical operations, which consume many quantum gates with poor probabilistic outcomes. Analogous to classical architecture, a quantum processor contains quantum micro-architectures for processor, memory, and other peripheral connections with the qubits. Therefore, directly mapping classical logic onto quantum circuits will facilitate a new paradigm for conventional quantum computing algorithms and reduce the constraints for classical and quantum logic integration. Various hybrid circuits must be designed to integrate classical and quantum counterparts seamlessly. In this work, we developed a library of new quantum gates that demonstrate one-to-one mapping of basic classical Boolean logic in the quantum domain. Quantum equivalent of classical two-input AND, OR, NAND, NOR, and XOR gates is proposed, which shows significant favorable probabilistic measure. These gates facilitate researchers to map classical logic circuits on quantum processing units easily. Several classical logic circuits employing the proposed gates are implemented with more than 72% favorable probabilistic measures. The rest of the paper is organized as follows. Section 2 presents a brief literature review of the proposed work. Section 3 presents the formulation of proposed quantum gates with corresponding transfer functions. Subsequently, the proposed quantum gates and the statistical outcomes obtained from IBM QE [14] are in Sect. 4. A detailed design methodology of generic classical logic circuits and a few common digital logic employing proposed quantum gates are elaborated in the same section, which is also realized using IBM QE. Finally, Sect. 5 concludes the work by citing future possibilities in the pertinent studies.
256
B. Paul et al.
2 Literature Review Conventional computers operating under classical Boolean logic are restricted only to performing deterministic computational operations [19]. Although classical supercomputers have improved their performance and power consumption, many NP-hard problems such as protein folding, accurate weather modeling, genetic engineering, and astronomical simulations cannot be done efficiently on these deterministic systems [6, 9, 21, 23]. These hurdles can be mitigated by employing heterogeneous computational models [2, 7, 27]. Advance heterogeneous computational devices comprised quantum processors [24] as central processing unit and classical components for data handling and other peripheral activities. The heterogeneous systems are segmented into classical CPU, FPGA, and GPU and quantum CPU-based operations which helps efficient utilization of quantum algorithms [15] to boost computation performance. Quantum computers compute information non-deterministically [13, 20] based on qubits which can hold “0” or “1” or both state simultaneously due to qubit’s superposition property. To design a reliable quantum computer, a vast range of qubit operations and transfer functions have to be developed, such as quantum NOT gate, swap gate, Hadamard gate, Pauli X/Y/Z gate, Toffoli gate, and Fredkin gate. [1, 8, 18] which generally expressed in the form of complex matrix transformations [4, 25, 28]. All these quantum gates transform appropriate microwave or physical input signals to the output qubit states [10, 11, 22] and facilitate massive parallelism in computation.
3 Formulation of Proposed Quantum Gates Quantum gates act as qubits to change their state in a controlled manner where the input state vectors transform under the influence of Hamiltonian operation according to the time-varying Schrödinger equation [3, 12]. Thereafter to determine the next state of a qubit, the time-varying Schrödinger equation needs to be solved for a particular quantum circuit. Physical quantum gates represent the equivalent operation of solving the time-varying Schrödinger equation to evaluate the next state of a qubit with respect to the initial qubit state vector configuration. Quantum gates are modeled by linear unitary transformations bounded by the postulates of quantum information theory. Due to unitary transformation, all the practical quantum gates are reversible where individual input states can be determined from the output qubit states. In addition to unitary transformation, the quantum gates preserve the inner product. For example, the matrix transfer function of a single input CNOT can be represented
|input-i as i output-i|, and the basic elements |0 and |1 can be expressed as 1 0 and . Here, the theoretical formulation of gate transfer functions for all the 0 1 proposed classical logic equivalent quantum gates is discussed in detail.
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
257
Fig. 1 Quantum AND gate
3.1 Quantum AND Gate Formulation A three-qubit gate is proposed to replicate classical AND operation where 2-input |A and |B are controlled bits, and the third qubit, initialized as |0, is an ancillary bit that is utilized for storing and manipulation of the result. Fig. 1 depicts qubit configuration for the same. The inputs to AND gate are |000, |010, |100, |110 and the corresponding outputs |000, |010, |100, |111. The result of the operation is stored in the third bit. The outputs for three state qubits are obtained from the Kronecker product of the single qubit basis matrix below. Therefore, similarly, the other states can be denoted as follows: ⎡ ⎤ 1 ⎢0⎥ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ 1 1 ⎢0⎥ ⎢0 ⎥ ⎢0⎥ ⎢0⎥ 1 1 1 ⎢ ⎥ ⎥ → |000 = ⎢ ⎥ ⊕ |00 = |0 ⊕ |0 = ⊕ =⎢ = ⎢0⎥ ⎣0 ⎦ ⎣0⎦ 0 0 0 ⎢ ⎥ ⎢0⎥ 0 0 ⎢ ⎥ ⎣0⎦ 0 Therefore, similarly the other sates can be denoted as following: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 0 ⎢0⎥ ⎢0⎥ ⎢0⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢1⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢0⎥ ⎢ ⎥ ⎥ ; |100 = ⎢ ⎥ ; |110 = ⎢ ⎥ ; |111 = ⎢0⎥ . |010 = ⎢ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢0⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1⎦ ⎣0⎦ ⎣0⎦ ⎣0 ⎦ 0 0 0 1 The transfer function of quantum AND gate (MAND ) can be expressed as,
258
B. Paul et al.
MAND =
|input-i output-i|
i
= |000 000| + |010 010| + |100 100| + |110 111| ⎤ ⎡ 10000000 ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 1 0 0 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ =⎢ ⎢0 0 0 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ ⎢ ⎣0 0 0 0 0 0 0 1 ⎦ 00000000 The basis states of a three-qubit input are represented as, a1 |000 + a2 |001 + a3 |010 + a4 |011 + a5 |100 + a6 |101 + a7 |110 + a8 |111. For input to thebreak gate as |000, the probability amplitude a1 set as 1, and the rest are set as 0. Therefore, input matrix appears to be [10000000], and to perform AND operation, the input matrix is multiplied with the gate matrix and produced output [10000000] which implies the probability a1 = 1 assigned to the vector state |000. Similarly, all the inputs and their corresponding outputs can be determined simultaneously. For input to the gate |010, |100, and |110, the probability amplitude a3 , a5 , a7 are 1 and the input matrix appears to be [00100000], [00001000], and [00000010] produced output as [00100000], [00001000], and [00000001] after AND gate operation, respectively. The output probability amplitudes of the ancilla qubit a8 are 1 only for input |110. All the transformations can be expressed as given below. [10000000] × MAND = [10000000]; [00100000] × MAND = [00100000]; [00001000] × MAND = [00001000]; [00000010] × MAND = [00000001].
3.2 Quantum OR Gate Formulation A three-input quantum OR gate is proposed to replicate classical OR operation where 2-input |A and |B are controlled bits, and the third qubit, initialized as |0, is an ancillary bit that is utilized for storing and manipulation of the result. Fig. 2
Fig. 2 Quantum OR gate
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
259
depicts qubit configuration for the same. The inputs to OR gate are |000, |010, |100, |110 and the corresponding outputs are |110, |101, |011, |001 and the transfer function of quantum⎡OR gate is MOR .⎤MOR = |000 110| + |010 101| + 00000010 ⎢0 0 0 0 0 0 0 0⎥ ⎥ ⎢ ⎢0 0 0 0 0 1 0 0⎥ ⎥ ⎢ ⎢0 0 0 0 0 0 0 0⎥ ⎥. ⎢ |100 011| + |110 001| = ⎢ ⎥ ⎢0 0 0 1 0 0 0 0⎥ ⎢0 0 0 0 0 0 0 0⎥ ⎥ ⎢ ⎣0 1 0 0 0 0 0 0⎦ 00000000 The basis states of a three-qubit input are represented as a1 |000 + a2 |001 + a3 |010 + a4 |011 + a5 |100 + a6 |101 + a7 |110 + a8 |111, and similar to the proposed above quantum logic gate, all the simultaneous output transformations can be expressed as below. [10000000] × MOR = [00000010]; [00100000] × MOR = [00000100]; [00001000] × MOR = [00010000]; [00000010] × MOR = [01000000].
3.3 Quantum NAND Gate Formulation The proposed quantum NAND gate consists of three inputs where |A and |B are controlled input qubits and the third qubit, initialized as |0, is an ancillary bit utilized to store the result after passing through the quantum NAND gate; Fig. 3 illustrates the proposed gate configuration. The inputs to quantum NAND gate are |000, |010, |100, and |110 and the corresponding outputs are |001, |011, |101, and |110. The transfer function for the quantum NAND gate is denoted by MNAND and represented as MNAND = |000 001| + |010 011| + |100 101| + |110 110|. ⎡
MNAND
0 ⎢0 ⎢ ⎢0 ⎢ ⎢0 =⎢ ⎢0 ⎢ ⎢0 ⎢ ⎣0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎦ 0
The basis states of a 3-qubit input is represented as, a1 |000 + a2 |001 + a3 |010 + a4 |011 + a5 |100 + a6 |101 + a7 |110 + a8 |111 and similar to the proposed above quantum logic gate, all the simultaneously output transformations can be expressed as below.
260
B. Paul et al.
Fig. 3 Quantum NAND gate
[10000000] × MNAND = [01000000]; [00100000] × MNAND = [00010000] [00001000] × MNAND = [00000100]; [00000010] × MNAND = [00000010].
3.4 Quantum NOR Gate Formulation Quantum NOR gate has three inputs where |A and |B are input qubits, and the third ancillary qubit is initialized as |0, which is employed to the result. Figure 4 presents the proposed NOR gate configuration where |000, |010, |100, and |110 are the input state vectors, and their corresponding outputs are |111, |100, |010, and |000. The transfer function for the quantum NOR gate is denoted by MNOR which can be expressed as MNOR = |000 111| + |010 100| + |100 010| + |110 000|. ⎤ ⎡ 00000001 ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 0 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ MNOR = ⎢ ⎢0 0 1 0 0 0 0 0 ⎥ ⎥ ⎢ ⎢0 0 0 0 0 0 0 0 ⎥ ⎥ ⎢ ⎣1 0 0 0 0 0 0 0 ⎦ 00000000 The 3-qubit inputs are represented as a1 |000 + a2 |001 + a3 |010 + a4 |011 + a5 |100 + a6 |101 + a7 |110 + a8 |111, and all the simultaneous output transformations can be expressed as below.
Fig. 4 Quantum NOR gate
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
261
[10000000] × MNOR = [00000001]; [00100000] × MNOR = [00001000]; [00001000] × MNOR = [00100000]; [00000010] × MNOR = [10000000].
3.5 Quantum XOR Gate The quantum XOR gate is similar to previous gates if the given inputs are |00, |01, |10, and |11. XOR gate produces the outputs as |00, |01, |11, and |10, respectively. The matrix representation of the quantum XOR gate is MXOR =
i |input i output i| = |00 00| + |01 01| + |10 11| + |11 10|, and simultaneous output after XOR operations is given below: ⎡
⎤ 1000 ⎢0 1 0 0⎥ ⎥ MXOR = ⎢ ⎣0 0 0 1⎦ 0010 [1000] × MXOR = [1000]; [0100] × MXOR = [0100]; [0010] × MXOR = [0001]; [0001] × MXOR = [0010]. Table 1 presents the comparison between classical and the proposed quantum gates based on their logical expressions. We have derived all the theoretical transfer functions of all the proposed classical equivalent quantum gates for three-qubit input-output configuration and optimized three-input two-output configuration. The following section discusses the practical implementation of the proposed basic quantum gates and a few specific classical circuits, along with a generic methodology to convert classical logic circuits to quantum circuits.
Table 1 Quantum counterpart of the basic classical gates Gate Classical expression A¯ NOT AND NAND OR NOR XOR
A. B A. B A+B A+B A⊕B
Derived quantum expression A⊕1 0⊕ A. B 0⊕ A. B ⊕1 0 ⊕ 1 ⊕ A¯ . B¯ 0 ⊕ 1 ⊕ A¯ . B¯ ⊕ 1 A⊕B
262
B. Paul et al.
4 Experimental Results, Validation, and Discussions 4.1 Physical Implementation on IBM QE The physical implementation of all the quantum gates proposed above is realized with the help of IBM QE. Figure 5 displays the physical implementation of the basic gates and the histogram of the percentage probability. From the histogram, it can be clearly observed that all the average probabilistic results match with the above theoretical formulations. Here, AND gate maximum probability measured for inputs 00, 01, 10, and 11 are 66.113%, 64.258%, 72.363%, and 71.094%, respectively. The OR gate’s maximum probability for inputs 00, 01, 10, and 11 is 69.434%, 72.07%, 57.129%, and 70.898%, respectively. The maximum probability measured of the NAND gate for inputs 00, 01, 10, and 11 are 67.773%, 68.066%, 72.656%, and 71.582%. Finally, maximum probabilities measured for NOR inputs 00, 01, 10, and 11 are 72.266%, 70.898%, 67.188%, and 66.895%. All these results are showcased in Table 2 for 1024, 2048, and 8096 statistical outcomes.
4.2 Validation of Proposed Quantum Gates The previous sections present the theoretical formulation of the proposed quantum gates and their practical validation on IBM QE. These classical equivalent quantum gates can be considered a library of basic gates, and utilizing them. Any classical combinational logic circuits can be realized using them. To validate our claim here, three specific simple classical circuits, viz. half-adder, half-subtract, full-adder, and a generic Boolean logic design using min-term and max-term expressions are designed with our proposed basic quantum logic gates, and the practical statistical outcome after realization on IBM QE is also presented. The subsequent sections present a circuit diagram using proposed quantum gates and their corresponding theoretical outcome probability with practical implementation results obtained from IBM QE. The cumulative probability of the desired state against the non-desired state is noted in Tables 3, 4, and 5, where the second and third columns are desired outcomes with theoretical probability. Thereafter, three separate groups of result for 1024, 2048, and 8096 samples are delineated. Each group has four columns; out of these columns, the first two are cumulative expected output probability of the state and the second two are of non-desired states. Although theoretically, there is no limitation while designing classical logic circuits but due to the limitation of IBM QE, the restriction is drawn up to five-qubit systems only. Moreover, in some cases, the result showcased are bounded within five-qubit (e.g., quantum full-adder need six-qubit for all the outcomes, but here only sum is taken to demonstrate), and due to this, the test cases changed a single bit in consecutive state transitions. A quantum half-adder circuit is designed with the proposed quantum gates, which circuit consists of a quantum XOR gate followed by a quantum AND gate, and the
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
Fig. 5 Experimental circuits for all the four gates obtained from IBM QE
263
264
B. Paul et al.
Fig. 6 Quantum circuit for half-(adder and subtracter)
probability of correct output qubit state vectors is found to be above 50%. The circuit implementation of the quantum half-adder is delineated in Fig. 6a and the experimental results of the quantum half-adder are at par with the theoretical values, which can be seen in the first part of Table 3. Similarly, the quantum half-subtracter is designed with one quantum XOR, one quantum NOT, and one quantum AND gate. Further, one QXOR gate is introduced between 2nd and 3rd qubit so that the 2nd qubit, input to both QXOR and QAND, can be satisfied. The circuit is run multiple times on IBM QE, and the results are matched with theoretical outcomes for all possible inputs. The circuit implementation of the quantum half-subtract is shown in Fig. 6b, and statistical probabilistic measurement results are physical realization as tabulated in the second part of Table 3.
4.3 Quantum Full-Adder Quantum full-adder designed with proposed quantum gates, and it is presented in Fig. 7, which contains two quantum XOR, two quantum AND, and one quantum OR gate. The first QAND gate is associated with qubit (1, 2, 4); the first QXOR gate is with qubit (1, 2); the second QAND gate is with qubit (2, 3, 5); the second QXOR is with qubit (2, 3), and the QOR is with qubit (4, 5, 6). The result is taken from the qubit-4 and the carry by the qubit-6. The step-by-step circuit operations are shown below. |A, B, C, 0, 0, 0 → |A, B, C, A . B, 0, 0 → |A, B, C, A . B, 0, 1 → |A, A ⊕ B, C, A . B, 0, 1 → |A, A ⊕ B, A ⊕ B ⊕ C, A . B, (A ⊕ B) . C, 1 → A, A ⊕ B, A ⊕ B ⊕ C, A . B, (A ⊕ B) . C, 1
6. → A, A ⊕ B, A ⊕ B ⊕ C, A . B, (A ⊕ B) . C, A . B.(A ⊕ B) . C .
1. 2. 3. 4. 5.
The circuit implementation of the quantum full-adder is shown in Fig. 7, and the experimental results of the quantum full-adder are at par with the theoretical values that are tabulated in Table 4. It is to be noted that only five qubits are available in IBM QE; thus, probabilistic outcomes for the first five qubits are tabulated in Table 4.
NOR
NAND
OR
|000 |010 |100 |111 |110 |101 |011 |001 |001 |011 |101 |110 |111 |100 |010 |000
|000 |010 |100 |110 |000 |010 |100 |110 |000 |010 |100 |110 |000 |010 |100 |110
AND
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Outcome Theoretical State Prob.
Input
Operation
|000 |010 |100 |111 |110 |101 |011 |001 |001 |011 |101 |110 |111 |100 |010 |000 79.59 82.323 85.351 82.715 82.325 83.301 79.786 83.399 80.469 83.69 83.692 85.645 84.669 82.812 82.911 77.734
|001 |011 |101 |110 |111 |100 |100 |000 |000 |010 |100 |111 |110 |101 |011 |001
IMB QE 1024 shots State Freq. State 20.41 17.676 14.649 17.285 17.675 16.699 20.214 16.601 19.531 16.31 16.308 14.355 15.331 17.188 17.089 22.266
Freq.
Table 2 Experimental results obtained from IBM QE for proposed gates
|000 |010 |100 |111 |110 |101 |011 |001 |001 |011 |101 |110 |111 |100 |010 |000 79.174 68.311 70.679 67.846 63.354 67.016 72.511 74.462 75.463 75.269 69.215 61.645 70.752 69.8 66.992 83.3
|001 |011 |101 |110 |111 |100 |010 |000 |000 |010 |100 |111 |110 |101 |011 |001
IMB QE 2048 shots State Freq. State 20.826 31.689 29.321 32.154 36.646 32.984 27.489 25.538 24.537 24.731 30.785 38.355 29.248 30.2 33.008 16.7
Freq. |000 |010 |100 |111 |110 |101 |011 |001 |001 |011 |101 |110 |111 |100 |010 |000
75.195 69.361 68.678 68.945 64.05 68.14 73.584 72.876 73.572 75.646 69.947 61.072 73.975 82.336 76.513 85.144
|001 |011 |101 |110 |111 |100 |010 |000 |000 |010 |100 |111 |110 |101 |011 |001
IMB QE 8096 shots State Freq. State
24.805 30.639 31.322 31.055 35.95 31.85 26.416 27.124 26.428 24.354 30.053 38.928 26.025 17.664 23.487 14.856
Freq.
Digital Boolean Logic Equivalent Reversible Quantum Gates Design 265
Input
Half adder
|000 |010 |100 |110 Half subtract |0000 |0100 |1000 |1100
Operation
|000 |010 |110 |101 |1000 |1111 |0100 |0010
1 1 1 1 1 1 1 1
Outcome Theoretical State Prob.
|000 |010 |110 |101 |1000 |1111 |0100 |0010 82.032 76.074 85.254 89.844 95.705 81.541 83.202 94.043
|010 |000 |100 |111 |1100 |1011 |0000 |0110
IMB QE 1024 shots State Freq. State 17.968 23.926 14.746 10.156 4.295 18.459 16.798 5.957
Freq. |000 |010 |110 |101 |1000 |1111 |0100 |0010 87.768 74.901 79.687 82.227 95.508 80.152 82.324 92.286
|010 |000 |100 |111 |1101 |1010 |0001 |0111
IMB QE 2048 shots State Freq. State
Table 3 Experimental results obtained from IBM QE for quantum half-adder and half-subtract
12.232 25.099 20.313 17.773 4.492 19.848 17.676 7.714
Freq.
|000 |010 |110 |101 |1000 |1111 |0100 |0010
79.87 73.962 81.934 76.465 95.142 80.517 83.129 93.042
|010 |000 |100 |111 |1101 |1010 |0001 |0111
IMB QE 8096 shots State Freq. State
20.13 26.038 18.066 23.535 4.858 19.483 16.871 6.958
Freq.
266 B. Paul et al.
|00011 |00111 |01111 |01010 |11111 |11010 |10001 |10101
|00000 |00100 |01000 |01100 |10000 |10100 |11000 |11100
1 1 1 1 1 1 1 1
Outcome Theoretical State Prob.
Input
|00011 |00111 |01111 |01010 |11111 |11010 |10001 |10101 77.296 64.258 48.437 69.727 52.344 53.809 65.821 29.59
|00111 |00011 |01011 |01110 |11011 |11110 |10101 |10001
IMB QE 1024 shots State Freq. State 22.704 35.742 51.563 30.273 47.656 46.191 34.179 70.41
Freq. |00011 |00111 |01111 |01010 |11111 |11010 |10001 |10101 63.647 44.556 48.413 61.695 32.788 64.746 59.155 31.616
|00111 |00011 |01011 |01110 |11011 |11110 |10101 |10001
IMB QE 2048 shots State Freq. State
Table 4 Experimental results obtained from IBM QE for quantum full-adder
36.353 55.444 51.587 38.305 67.212 35.254 40.845 68.384
Freq.
|00011 |00111 |01111 |01010 |11111 |11010 |10001 |10101
77.002 30.151 29.98 66.284 51.989 72.912 58.68 38.122
|00111 |00011 |01011 |01110 |11011 |11110 |10101 |10001
IMB QE 8096 shots State Freq. State
22.998 69.849 70.02 33.716 48.011 27.088 41.32 61.878
Freq.
Digital Boolean Logic Equivalent Reversible Quantum Gates Design 267
268
B. Paul et al.
Fig. 7 Quantum full-adder
4.4 Quantum Circuit Design from Min/Max-Term Equations The proposed quantum gates are employed to realize quantum circuits from Boolean or min equation to demonstrate general proposed
circuit implementation. Here, a three-input min-term equation, min Y (A, B, C) = m(0, 2, 3, 6, 7), is considered and implemented after simplification by the fundamental quantum gates. From the ¯ C¯ + ABC ¯ min-term equation, we get, Y = A¯ B¯ C¯ + AB + AB C¯ + ABC = (B + ¯ ¯ ¯ ¯ ¯ AC)(B + B) = B + AC. Here, in quantum logic, A + B is obtained from the operation |0 ⊕ 1 ⊕ AB, and similarly, A . B is obtain from |0 ⊕ AB. Thus, the min-term equation can be represented by the quantum gates as expressed below.
¯ . B¯ . A¯ C¯ + B = 0 ⊕ 1 ⊕ (0 ⊕ A¯ C) The quantum circuit depicted in Fig. 8a comprises two quantum NOT, one quantum AND, and one quantum OR gate which is run multiple times on IBM QE to validate with the desired outcomes by applying all possible inputs. The implementation results of the designed quantum circuit for min-term Boolean expression satisfy the theoretical values which can be seen in Table 5. As an alternative approach for the same circuit, we employed quantum gates to develop a quantum circuitdepicted in Fig. 8b from the Boolean max-term equation, max Y (A, B, C) = m(1, 4, 5). From the max-term equation, the simpli¯ A¯ + B + C)( A¯ + B + C) ¯ = fied expression can be expressed as Y = (A + B + C)( ¯ ¯ ¯ ¯ B + (A + C) A = ( A + B)(B + C). Applying a strategy similar to the previous approach, the quantum gates can represent the max-term equation below.
Fig. 8 Quantum circuit for min-term and max-term approach
|11101 |11010 |10101 |10011 |01110 |01010 |00111 |00011
|00000 |00100 |01000 |01100 |10000 |10100 |11000 |11100
1 1 1 1 1 1 1 1
Outcome Theoretical State Prob.
Input
|11101 |11010 |10101 |10011 |01110 |01010 |00111 |00011 49.316 41.406 61.328 54.004 50.593 49.707 65.136 58.887
|11100 |11011 |10100 |10010 |01111 |01011 |00110 |00010
IMB QE 1024 shots State Freq. State 50.684 58.594 38.672 45.996 49.407 50.293 34.864 41.113
Freq. |11101 |11010 |10101 |10011 |01110 |01010 |00111 |00011 59.815 57.954 41.944 57.715 52.392 59.571 62.305 64.429
|11100 |11011 |10100 |10010 |01111 |01011 |00110 |00010
IMB QE 2048 shots State Freq. State
Table 5 Experimental results obtained from IBM QE for min/max equations
40.185 42.041 58.056 42.285 47.608 40.429 37.695 35.571
Freq.
|11101 |11010 |10101 |10011 |01110 |01010 |00111 |00011
50.659 60.034 62.439 55.517 57.519 48.181 49.085 67.005
|11100 |11011 |10100 |10010 |01111 |01011 |00110 |00010
IMB QE 8096 shots State Freq. State
49.341 39.966 37.561 44.483 42.481 51.819 50.915 32.995
Freq.
Digital Boolean Logic Equivalent Reversible Quantum Gates Design 269
270
B. Paul et al.
¯ = 0 ⊕ (0 ⊕ 1 ⊕ A¯ B)(0 ¯ ¯ ( A¯ + B)(B + C) ⊕ 1 ⊕ B¯ C) The circuit depicted in Fig. 8a comprises two quantum NOT, one quantum AND, and two quantum OR gate. The experimental results are same as last case of min-term since the underlying Boolean expression is identical and shown in Table 5. It is to be noted that although some of the desired outcomes are slightly less than 50% in probabilistic measures, the cumulative probability of the next qubit configuration is more than 95%. Therefore, quantum error-correcting logic can be applied to enhance the probability of the exact desired qubit configurations. From all of the above implementations, we can be very sure that our proposed normal and modified classical equivalent quantum gates are capable of designing full-fledged complex classical logic circuits with favorable probabilistic measures. Therefore, our designed quantum gates can act as building a library for classical to quantum logical circuit conversion.
5 Conclusion The work presented in this manuscript is to design basic classical gates with quantum circuits, which will help compose a wide range of classical computational logic circuits on quantum computers while retaining all the quantum properties. The primary contribution is to develop a library of classical logic equivalent quantum gates, which can replicate classical logic over quantum computers to make the integration of classical and quantum computing seamless. All the proposed gates are implemented on IBM QE for 1024, 2048, and 8096 samples and validated with the theoretical outcomes. All the proposed gates exhibit favorable probabilistic measurement, i.e., more than 72%, after realization on IBM QE online quantum computer. Quantum physical circuit development is an ongoing research area, and the technology of generating qubits is not standardized; the design of time-dependent components is in the development phase, and the upcoming target is to design time-dependent sequential quantum logic.
References 1. Ali MB, Hirayama T, Yamanaka K, Nishitani Y (2016) New two-qubit gate library with entanglement. In: 2016 IEEE 46th international symposium on multiple-valued logic (ISMVL), pp 235–240 (2016) 2. Ali S, Siegel HJ, Maheswaran M, Hensgen D, Ali S (2000) Representing task and machine heterogeneities for heterogeneous computing systems. Tamkang J Sci Eng 3(3):195–208 3. Barnes E, Sarma SD (2012) Analytically solvable driven time-dependent two-level quantum systems. Phys Rev Lett 109(6):060401 4. Beauregard S, Brassard G, Fernandez JM (2003) Quantum arithmetic on Galois fields. arXiv preprint quant-ph/0301163
Digital Boolean Logic Equivalent Reversible Quantum Gates Design
271
5. Bennett CH, Shor PW (1998) Quantum information theory. IEEE Trans Inf Theory 44(6):2724– 2742 6. Brooks M (2012) Quantum computing and communications. Springer Science & Business Media 7. Danalis A, Marin, G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pp 63–74 8. de Almeida, AAA, Dueck GW, da Silva ACR (2018) Efficient realizations of CNOT gates in IBM’s quantum computers. In: 2018 8th international symposium on embedded computing and system design (ISED), pp 58–62 9. De Vos A (2011) Reversible computing: fundamentals, quantum computing, and applications. Wiley 10. DiVincenzo DP (2000) The physical implementation of quantum computation. Fortschr Phys Progr Phys 48(9–11):771–783 11. Eisert J, Jacobs K, Papadopoulos P, Plenio MB (2000) Optimal local implementation of nonlocal quantum gates. Phys Rev A 62(5):052317 12. Ertik H, Demirhan D, Sirin ¸ H, Büyükkılıç F (2010) Time fractional development of quantum systems. J Math Phys 51(8):082102 13. Gurvits L (2003) Classical deterministic complexity of Edmonds’ problem and quantum entanglement. In: Proceedings of the thirty-fifth annual ACM symposium on theory of computing, pp 10–19 14. Hardware Company IC (2016) IBM quantum experience. https://quantum-computing.ibm. com/ 15. Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Phys Rev Lett 103(15):150502 16. Hayashi M (2017) Quantum information theory. In: Graduate texts in physics. Springer 17. Hong-Yi F, VanderLinde J (1989) Mapping of classical canonical transformations to quantum unitary operators. Phys Rev A 39(6):2987 18. Miller DM, Wille R, Sasanian Z (2011) Elementary quantum gate realizations for multiplecontrol Toffoli gates. In: 2011 41st IEEE international symposium on multiple-valued logic, pp 288–293 19. Parigot M (1992) Free deduction: an analysis of computations in classical logic. In: Logic programming. Springer, pp 361–380 20. Paul WJ, Pippenger N, Szemeredi E, Trotter WT (1983) On determinism versus nondeterminism and related problems. In: 24th annual symposium on foundations of computer science (SFCS 1983). IEEE, pp 429–438 21. Raychev N (2015) Quantum computing models for algebraic applications. Int J Sci Eng Res 6(8):1281 22. Schmidt-Kaler F, Häffner H, Riebe M, Gulde S, Lancaster GP, Deuschle T, Becher C, Roos CF, Eschner J, Blatt R (2003) Realization of the Cirac-Zoller controlled-not quantum gate. Nature 422(6930):408–411 23. Spector L, Barnum H, Bernstein HJ, Swamy N (1999) Quantum computing applications of genetic programming. Adv Gen Program 3:135–160 24. Steane A (1998) Quantum computing. Rep Prog Phys 61(2):117 25. Takahashi Y (2009) Quantum arithmetic circuits: a survey. IEICE Trans Fundam Electron Commun Comput Sci 92(5):1276–1283 26. Thomsen MK, Glück R, Axelsen HB (2010) Reversible arithmetic logic unit for quantum arithmetic. J Phys A Math Theoret 43(38):382002 27. Ucar B, Aykanat C, Kaya K, Ikinci M (2006) Task assignment in heterogeneous computing systems. J Parallel Distrib Comput 66(1):32–46 28. Vedral V, Barenco A, Ekert A (1996) Quantum networks for elementary arithmetic operations. Phys Rev A 54(1):147 29. Wang X, Zanardi P (2002) Quantum entanglement of unitary operators on bipartite systems. Phys Rev A 66(4):044303
Adaptive Modulation Classification with Deep Learning for Various Number of Users and Performance Validation P. G. Varna Kumar Reddy
and M. Meena
Abstract The automatic modulation classification (AMC) is a key component of modern wireless frameworks. It is commonly used in various military and commercial applications such as electronic surveillance and cognitive radio. There are various regulation types that are considered when it comes to the classification of complex data. This paper presents an overview of the various features-based (FB) AMC techniques. They are mainly focused on the balance between speculation capacity and the constraints of the algorithm. A robust strategy is then presented for handling this test using the convolutional neural network (CNN). This paper presents a method that can be used to characterize the signals received by a system without the need for feature extraction. It can also take advantage of the features of the received signals. In addition, the confusion matrix is drawn and analyzed. The logistic regression algorithm performance is measured in terms of accuracy, sensitivity, specificity and F1_score for various users who are maintaining various distances from the base station. Keywords Adaptive modulation · Convolutional neural network · Machine learning · Logistic regression algorithm
1 Introduction The advancement of society orders relies on the availability of reliable and secure wireless communication networks. These networks are the cornerstones of modern society. The continuous increasing interest in establishing better connectivity will require the development of more advanced wireless infrastructure [1]. In this article, we concentrated on exploring multi-Tb/s and wise 6G remote organizations for 2030 and then some. We introduced a 6G vision and examined use situations and key capacities of 6G. Current wireless networks rely heavily on computational models to design their systems. These models do not provide the necessary information to make informed decisions regarding the design of the networks. In addition, they do P. G. Varna Kumar Reddy (B) · M. Meena Department of Electronics and Communication Engineering, Vels Institute of Science, Technology and Advanced Studies (VISTAS), Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_21
273
274
P. G. Varna Kumar Reddy and M. Meena
not have accurate numerical equations for certain components of the systems. This means that the design of these networks becomes subject to testing [2], where the authors discussed advanced wireless networks with software-defined radio, but link adaptation is not applied. Due to the complexity of wireless systems, they require a lot of computational time and energy to maintain their operation. This is why machine learning is expected to play a vital role in the development of 6G networks. This technology can be used to perform various tasks such as analyzing and mapping models that are not commonly used in the computational systems. In addition to improving the efficiency of the wireless infrastructure, it is also expected that machine learning instruments can be used to perform various tasks at the local level. In the future, machine learning will allow the continuous control and examinations of 6G networks. This will make it easier for the users to manage their complex wireless networks [3]. However, it will also become more difficult to deal with the various complications that come with using a wireless network in a static and rigid manner. In a wireless communication network, the number of users that are connected to a receiver and transmitter pair can be identified by the type and frequency of the propagation in the medium used. In Fig. 1, the active users are connected to a receiver in one of the three different propagations. The first type of propagation is known as the direct path, which is used when the distance between a receiver and the transmitter is low [4]. The sky wave propagation is then used when the distance between the receiver and the transmitter is high. The third type of propagation is known as the space propagation, which is used when the receiver and the transmitter are located in geographical locations [5]. The receiver and the transmitter must then choose a modulation that is appropriate for the signal power requirements of the network. This is done through the automatic classification of multiple modulations.
1.1 Automatic Modulation Classification (AMC) An automatic modulation classification technique is utilized to distinguish the kind of modulation that is used in the wireless frameworks that are non-agreeable. The current approach is based on the idea that the signal’s signal strength ratio (SNR) is equivalent to that of testing and preparing. However, in the event that the test and preparation are carried out under certain conditions, the effect of the assessment might be unpredictable. In most cases, the current plans for the classification of signals are not feasible due to the lack of speculation capacity. This paper presents a CNN-SVM model that can help the secondary user identify the primary user. For instance, by reducing the signal-to-noise ratio, the model can help recognize the presence of the primary user. The literature [6] shows that the proposed adaptive multi-channel (AMC) plot achieves higher group precision in both noisy and lethargic channels when compared to the conventional learning-based plans. This group of analysts has presented a variety of CNN-based signals that incorporate various components. These include the
Adaptive Modulation Classification with Deep Learning for Various …
275
Fig. 1 Wireless communication network
examination in quad-phase and cyclic signal waveforms [7, 8], eye chart [9], Choi– Williams appropriation [10], and CD-related pictures [11]. In addition, they have additionally utilized a group of constellations or the one with a legitimate channel to produce an improved constellation [12]. The receiver section of the proposed multichannel system receives multiple signals and then sends them to the processing stages as shown in Fig. 2 [13]. The proposed model shows that the signals are decoded using the same algorithm that is used for the classification of AM/FM/PM and other similar signals. However, instead of relying on a single receiver, an automatic modulation classifier is used to identify the type of modulation that is required to generate the demodulated signal.
1.2 Machine Learning A processing framework known as a multi-level model (ML model) is used to learn about the characteristics of a framework that cannot be introduced by a numerical model. This model can then perform various tasks such as co-operations with a skilled specialist and characterization [14]. When a model learns the multiple qualities of a framework, it can then effectively perform the assignment by utilizing some of its
276
P. G. Varna Kumar Reddy and M. Meena
Fig. 2 Automatic modulation classifier at the receiver end
mathematical computation. ML calculations should be performed at various levels of the system [15], such as the board layer, the center, and the radio base stations. This can be done with the help of the gadget programmability or arrangement. The increasing number of standards that are related to the development of multilevel models might cause the need for more information-driven system engineering. This type of engineering can be performed in the areas where the executive has to rely on various sources of information. For instance, connecting variation and physicallayer calculations can be performed in a more predictable and controlled manner with the help of ML specialists. Currently, most of the time, the calculations that are performed on multi-level models are sent statically [16]. However, with the ability to change their execution and use progressively, they can be more useful in improving the efficiency of the system. Another advantage of this type of engineering is that it allows the design of the system to be automated. Most of the time, the techniques used in this process are based on the classification and feature extraction stages as shown in Fig. 3. The classification and feature extraction stages of deep learning techniques are usually combined to speed up the process of identifying and performing operations. Although this feature is considered a subset of machine learning, it has proven to be very effective in complex applications. In certain frameworks, transmitters can freely pick the regulation signals that they want to transmit. This allows them to demodulate the signs that the transmission is effective. An adequate method to address this issue is by implementing an automatic adjustment method (AMC). This method has been widely used in the past two
Adaptive Modulation Classification with Deep Learning for Various …
277
Fig. 3 Working of machine learning and deep learning
decades to improve the range proficiency calculation of transmitters. Usually, the regular calculations of an automatic adjustment method are performed in two categories: likelihood based [17] and feature based [18]. Although these techniques can hypothetically achieve the ideal arrangement, they tend to experience the drawbacks of high computational complexity. The various techniques used for calculating the range proficiency of transmitters depend on the probability of getting the signal. For instance, the classification and highlight extraction methods are commonly used. One of the most important techniques used in the development of computational fluid systems (FBS) is the use of a classifier. This allows us to reduce the number of tests and improve the speculation capacity. In machine learning, the data is fed into a tool that is designed to perform various tasks, such as filtering and cleaning the data (shown in Fig. 4). The goal of this process is to maintain the consistency of the data collected by the system. In order to achieve this, different algorithms are used to apply varying conditions to the data. The resulting results are then represented in visual graphs. The presentation of FB techniques fundamentally relies upon the extricated highlight set. Provisions should be physically intended to oblige the comparing group of regulations, and channel climate might not be credible in almost all conditions.
278
P. G. Varna Kumar Reddy and M. Meena
Fig. 4 Steps involved in machine learning process
The goal of feature selection techniques in machine learning is to find the best set of features that allows one to build useful models of studied phenomena. Thinking about these elements, profound learning (DL) techniques, which can naturally extricate highlights, have been embraced. DL is a part of AI that has made remarkable progress due to its order capacity. It is being widely used in various fields, such as picture grouping [19] and regular language preparation. The arrangement exactness of the techniques used in DL is much better than that of other classifiers. Most of the time, the various techniques used in an adaptive multi-model (AMC) strategy [20] are carried out in two phases: preprocessing and arrangement. In the former, the models are used to deal with preprocessing tasks while in the latter, they are used for feature extraction and transformation [21]. In the latter, the strategy is dependent on the DBN. In 2012, a paper by Dai and colleagues proposed an interclass order framework that combines the various techniques used in the preprocessing and arrangement phases [22]. The main idea behind this strategy is that the order capacity of the models is equivalent to the sum
Adaptive Modulation Classification with Deep Learning for Various …
279
of their test and preparing results. However, due to the nature of the approach, the aftereffect of the assessment is often off base.
2 System Model The comparison between the conventional model and the presented methods for the adaptive multi-convolutional system (AMC) is shown in Fig. 5. The system interaction between the signal discovery and the demodulation at the beneficiary is carried out through the multiple steps of the system. Some of the commonly used methods for this type of adjustment are the ANN designs [19]. Unlike the traditional methods, which rely on the selection of the limit by the user, the edge in ANN is not set in stone. This allows the system to perform well on various tweak types, such as AM, FM, FSK, and ASK [23]. The emergence of deep neural networks has greatly contributed to the examination of the space of video, discourse, and picture handling [24]. The ability to perform profound learning calculations is mainly related to the applications that suffer from errors in existing models. These applications can then be accessed through the vast amount of information that is stored in these models [25]. Deep learning has been presented in the literature as a promising technique for identifying multiple tweak types in a convolutional neural network. A convolutional long-term deep neural network (LDNN) has also been presented in this area, leveraging the strengths of both long-term memory and CNN. The main advantage of this approach is that it combines the features of both the CNN and the LSTM models, leveraging the complementarity of these two networks. In our contribution, a CNN is designed with a logic regression technique to estimate the modulation scheme of any user. The proposed system predicts the type of modulation scheme based on the trained dataset given by us. The performance of the proposed system is discussed in Sect. 3 in terms of various parameters. Estimation
Fig. 5 Proposed system model on AMC and CNN
280
P. G. Varna Kumar Reddy and M. Meena
of a modulation scheme based on other characteristics of a signal (distance from the base station) is a novelty of our work.
3 Results and Discussion The CNN trained to identify the eight modulation types, commonly used in television programs, was able to predict the different frame types based on the data collected from over a thousand samples [26]. It was able to generate several PAM4 frames using the various factors that affect the modulation quality. The CNN used the same method to generate 10,000 frames, which were then used for training, validation, and testing. About 80% of the frames were used for training, while 10% were used for validation. During the training phase, the network used the training and validation frames to improve the classification accuracy. The training frames were able to achieve a 95% accuracy, while the validation and training frames were able to improve the accuracy of the classification (confusion matrix is shown in Fig. 6). However, the network was not able to identify the appropriate modulation types for the 16 and 64-QAM frame types. The issue with the training frames was caused by the 128 symbols in each frame. It also confused the QPSK and 8-PSK frames since the constellations of these two types appear similar once the fading channel and frequency offset are combined. We presume that all users are at varying distances from the base station with different modulation schemes in Fig. 7, showing confusion matrix generated for 1000, 500, 100, and 10 users, respectively, based on logistic regression algorithm. The performance of the algorithm is measured in different parameters such as accuracy, precision, sensitivity, specificity, and F1_score, and their respective values for different users are shown in Table 1. It is observed that this algorithm worked well
Fig. 6 Confusion matrix for various modulation schemes
Adaptive Modulation Classification with Deep Learning for Various …
281
Fig. 7 Confusion matrix for the number of users a 1000, b 500, c 100, d 10
Table 1 Analysis of parameters with the various number of users Parameter
Number of users 1000
500
100
10
Accuracy
0.831
0.838
0.79
0.7
Precision
0.908
0.911
0.897
0.75
Sensitivity
0.903
0.909
0.868
0.857
Specificity
0.127
0.130
0.0
0.333
F1_score
0.906
0.910
0.882
0.799
for more users compared to less number of users, which is a most desirable factor for our future research works.
4 Conclusions This paper presented a detailed analysis on automatic modulation classification strategy with relevant figures. Further, the motivation behind the machine learning
282
P. G. Varna Kumar Reddy and M. Meena
and the necessity of ML algorithms for AMC systems is presented. The representation results confirm the exhibition improvement of the proposed AMC conspiracy and furthermore give a method in advancing the boundaries of the AMC neural networks. In future, we investigate and analyze the automatic modulation classification (AMC) technique by using various machine learning algorithms while maintaining high spectrum efficiency and energy efficiency. AMC will be applied on signals based on their distance from the base station, and this work will be extended with the help of non-orthogonal multiple access (NOMA), which is an emerging multiple access technology for future wireless systems. Comparative analysis of different algorithms will be presented. For practical proof, all the simulation results will be tested with the help of an experimental setup—software-defined radio (SDR).
References 1. Bajracharya R et al (2022) 6G NR-U based wireless infrastructure UAV: standardization, opportunities, challenges and future scopes. IEEE Access 10:30536–30555 2. Siva Kumar Reddy B (2021) Experimental validation of non-orthogonal multiple access (NOMA) technique using software defined radio. Wirel Pers Commun 116(4):3599–3612 3. Hu S et al (2021) Distributed machine learning for wireless communication networks: techniques, architectures, and applications. IEEE Commun Surv Tutorials 23(3):1458–1493 4. Siva Kumar Reddy B (2019) Experimental validation of spectrum sensing techniques using software-defined radio. In: Nanoelectronics, circuits and communication systems. Springer, Singapore, pp 97–103 5. Siva Kumar Reddy B, Lakshmi B (2016) Adaptive modulation and coding with channel estimation/equalization for WiMAX over multipath faded channels. In: Wireless communications, networking and applications. Springer, New Delhi, pp 459–472 6. Zhou Y, Lin T, Zhu Y (2020) Automatic modulation classification in time-varying channels based on deep learning. IEEE Access 197508–197522 7. O’Shea TJ, Corgan J, Clancy TC (2016) Convolutional radio modulation recognition networks. In: Proceedings of international conference on engineering applications of neural network Aberdeen, U.K., pp 213–226 8. Wang Y et al (2020) Deep learning-based cooperative automatic modulation classification method for MIMO systems. IEEE Trans Veh Technol 69(4):4575–4579 9. Wang D et al (2017) Modulation format recognition and OSNR estimation using CNN-based deep learning. IEEE Photon Technol Lett 29(19):1667–1670 10. Zhang M, Diao M, Guo L (2017) Convolutional neural networks for automatic cognitive radio waveform recognition. IEEE Access 5:11074–11082 11. Li R, Li L, Yang S, Li S (2018) Robust automated VHF modulation recognition based on deep convolutional neural networks, IEEE Commun Lett 22(5):946–949 12. Gao Q, Lim S, Jia X (2018) Hyperspectral image classification using convolutional neural networks and multiple feature learning. Remote Sens 10(2):299–308 13. Kumar Y (2020) Automatic modulation classification based on constellation density using deep learning. IEEE Commun Lett 24(6):1275–1278 14. Luo F-L (ed) (2020) Machine learning for future wireless communications 15. Zappone et al (2018) Modelaided wireless artificial intelligence: Embedding expert knowledge in deep neural networks towards wireless systems optimization. ArXiv e-prints 16. Simeone O (2018) A very brief introduction to machine learning with applications to communication systems. IEEE Trans Cogn Commun Netw 4(4):648–664
Adaptive Modulation Classification with Deep Learning for Various …
283
17. Xu JL (2011) IEEE Trans Syst Man Cybernet Part C Appl Rev 41(4):455–469 18. Siva Kumar Reddy B (2018) Advancement in wireless technologies and networks. In: Emerging wireless communication and network technologies. Springer, Singapore, pp 3–11 19. Siva Kumar Reddy B, Modi D, Upadhyay S (2019) Performance evaluation of various digital modulation techniques using GNU radio. In: Innovations in infrastructure. Springer, Singapore, pp 13–20 20. Wu Y et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 21. Fu J, Zhao C, Li B, Peng X (2015) Deep learning based digital signal modulation recognition. In: The proceedings of the third international conference on communications, signal processing, and systems. Springer, pp 955–964 22. Dai A, Zhang H, Sun H (2016) Automatic modulation classification using stacked sparse autoencoders. In: 2016 IEEE 13th international conference on in signal processing (ICSP), pp 248–252 23. Zhou S et al (2019) A robust modulation classification method using convolutional neural networks. EURASIP J Adv Sign Proces 1–15 24. Huang G et al (2017) Densely connected convolutional networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) 25. Borovkova S, Tsiamas I (2019) An ensemble of LSTM neural networks for high-frequency stock market classification. J Forecast 38(6):600–619 26. O’Shea TJ, Roy T, Clancy TC (2018) Over-the-Air deep learning based radio signal classification. IEEE J Sel Top Sign Proces 12(1):168–179
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance Systems: A Review P. Shreedevi and H. S. Mohana
Abstract In recent years, with increasing population growth and global urbanization, the crowd phenomenon and surveillance has become a topic of active research. A crowd can give rise to several unprecedented unusual situations which may be due to mass panic, intentional pushing, stampede or crowd crushes. In India, according to the National Crime Records Bureau (NCRB) report, there have been 3550 incidents of stampede from 2001 to 2015 resulting in the death of over 2901 people. Thus, surveillance and monitoring of activities of potentially high crowded spaces for abnormality detection are of primary concern which demands high quality of video analysis. Conventional surveillance systems are being replaced by intelligent systems with the advancement in technology. The behavior analysis employing visual surveillance demands progressive and state-of-the-art researches in computer vision and artificial intelligence. This paper gives background knowledge and aggregates different techniques and current developmental researches in crowd analysis encompassing motion detection, behavior recognition, and anomaly detection and assesses their challenges. Keywords Anomaly detection · Behavior recognition · Crowd analysis · Motion estimation · Crowd tracking
1 Introduction The on-going urbanization and increasing world demography have resulted in crowd or group of people serving as a medium of cultural exchange and economic development. Huge crowds can be seen in almost all places like subway stations, airports,
P. Shreedevi (B) Malnad College of Engineering, Hassan, Karnataka, India e-mail: [email protected] H. S. Mohana Navkis College of Engineering, Hassan, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_22
285
286
P. Shreedevi and H. S. Mohana
malls, railway stations, and stadiums. The crowd is a collection of people or individuals assembled in a specific place. The formation of the crowd varies in different situations. The nature of the crowd in a music concert varies from the crowd engaged in a protest. This condition in which the term ‘crowd’ is being used signifies a group of individuals in terms of size, duration, structure, intention, proximity, and cohesion of individuals. Firstly, the security of crowd event in large gatherings is of the highest significance. Abnormal behavior or an undesired occasion in dense crowds would lead to a cascade of unpleasant consequences due to the synergic effect of human interactions. This growing crowd phenomenon has put forth enormous challenges to security, public safety, and public management. Moreover, greater the magnitude of the crowd, the more strenuous and harder is the visual surveillance for the human eye. Accordingly, authorities prescribe automatic intelligent tools and solutions for crowd behavior analysis and for aiding the identification and localization of abnormal events within crowds. Thus, the increased demand for public safety at crowded places has led to active research in automated video analysis of crowded scenes in visual surveillance and computer vision. Visual surveillance systems are essential to collect information and to keep track of people, events, and activities. It plays a crucial part in the present world in guaranteeing security to the areas where huge numbers of people are likely to congregate. As a result, one can usually notice surveillance cameras being installed in streets, stadiums, temples, shopping complexes, and places of mass gathering to ensure security and safety. Computer graphics and computer vision techniques employ computational methods for synthesizing the crowd phenomenon aiding with statistical and mathematical models by extracting features and detecting crowd events [1]. There are several techniques and algorithms formulated for the analysis of crowd involving crowd tracking, crowd density estimation, crowd behavior recognition, and anomaly detection. However, conventional video surveillance techniques do not hold good for crowded scenes involving greater density variations, severe occlusions, and complex crowd dynamics in the scene [1, 2]. Thus, the evolving intelligent surveillance systems equipped with fast learning algorithms have replaced the traditional video surveillance over the past decades. Detecting anomalous human behavior involves modeling and classification of human activities on definite criteria. These become difficult due to the randomness and unprecedented human behavior. The observed human movements are partitioned into discrete states and are appropriately classified whether the behavior is suspicious or not. Most of the researchers divide the problem into two or three sub-problems. At the first stage, image processing techniques are incorporated for analyzing data which is analogous to a lower-level image processing for extracting primitive events followed by the second stage in which structural analysis is conducted for the data collected in the former step, i.e., it is an advanced level artificial intelligence module that involves detecting more intricate and abstract behavior patterns. Figure 1 depicts a general framework to perform crowd analysis in a given video involving crowd localization, motion pattern analysis, crowd tracking, and their behavior evaluation.
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
287
Fig. 1 General framework of a standard crowd analysis system [4]
The behavior analysis of crowd scenes finds several applications such as: • Crowd management: The formation of huge crowds is anticipated whenever there are public gatherings, sports events, concerts, or festivals. The crowd analysis is used to build up effective strategies for carrier crowd flow and movement and safety by avoiding any crowd disasters. • Visual surveillance: It allows tracking and monitoring the actions of individuals in a crowd. Thus, crowd analysis is used to detect any abnormal events like stone pelting or identifying suspects among the crowd. Today, almost all places are equipped with CCTVs to ensure safety. However, traditional surveillance systems may fail for higher density objects in terms of computation and accuracy. • Public space design: The layouts of public space like airport terminals, train stations, theaters, mass events are designed on the guidelines of crowd analysis. This provides an easy flow of people and assures proper safety. • Crowd mathematical models are employed in networked applications that allow the interaction of users and the computing environment thus improving human life experience. • Intelligent Environment: The analysis of crowd is essential for supporting crowd or an individual in the intelligent environment as it involves the interpretation of data gained by studying the natural movements of objects or group of people. The behavior analysis of crowd [3] is a novel area of research in computer vision with a variety of applications such as automatic detection of natural disasters, violent events, panic and escapes behavior, riots, or chaotic scenarios in crowds. Crowd behavior analysis provides a better understanding of crowd dynamics and related people’s behavior through developing resilient surveillance or crowd control systems,
288
P. Shreedevi and H. S. Mohana
designing and organizing public spaces and improving computer animation models mainly used in special effects or video games. The flow and organization of this review paper is as follows. Section 2 of the paper gives some insight and background knowledge about the crowd, its structure and formation. Section 3 gives detailed information of different stages employed in crowd scene analysis like crowd motion segmentation, crowd density estimation, crowd tracking, crowd behavior recognition, and anomaly detection. This section also describes various approaches and techniques involved in each of those stages. Section 4 describes recent technological developments in visual surveillance, and Sect. 5 provides an analysis of the existing research gap in crowd anomaly detection. Section 6 gives a detailed description of frequently exploited datasets for crowd analysis. The review paper is concluded in Sect. 7.
2 Interpreting and Understanding Crowd The crowd is hierarchically categorized into individuals, groups, and crowds. An individual or a person is the primary unit of a crowd. The collection of individuals result in a group, and a collection of such groups will form a crowd with a set of motivations and intentions [1]. This categorization allows for a flexible analysis of a broad range of crowd densities and complex behaviors. Crowded scenes are divided into two categories [4] based on the order of their motions, structured and unstructured crowds. The crowd advances coherently in the same direction with no frequent change in direction in a structured crowded scene. The spatial location consists of one major crowd behavior at the time. An unstructured crowded scene has a disordered and random crowd motion in which the spatial location contains collective crowd behaviors as the movement of participants is in random, distinct directions at different times. They have different dynamic and visual characteristics. The analysis of crowd can also be performed at macroscopic and microscopic levels. Since the crowd is a collection of individuals, microscopic analysis is adopted whenever the behavior of an individual is of prime concern and macroscopic analysis is adopted to analyze the global motions of the crowd or group of people. Crowd behavior modeling can be attained by two major approaches: • Continuum-based approach: preferred for macroscopic-level analysis where medium and high-density crowds are considered. In this technique, the whole crowd is regarded as a fluid with particles which involves methods from statistical mechanism and thermodynamics. • Agent-based approach: Preferred when individuals present in the crowd are to be examined, i.e., at microscopic level, these individuals are considered as autonomous agents.
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
289
In particular, recognizing specific crowd behaviors like unanticipated convergence or divergence of crowd flow are essentially important for the prevention of hazardous accidents in real-time and forensic purposes. Thus, examining these behaviors of the crowd is of vital importance as any unexpected critical situation can be brought under control before it turns worse. To analyze the crowd, crowd dynamics and visual information are required. The crowded scene analysis involves crowd motion detection, crowd density estimation, crowd tracking, and crowd behavior determination.
3 Crowd Analysis People counting, people tracking, and crowd behavior analysis are the types of crowd analysis techniques based on computer vision.
3.1 Motion Segmentation and Object Detection The primary step in activity recognition is motion detection. It aims to segment regions that correspond to moving objects in a scene. There are many approaches to motion detection which involve differences between frames, background subtraction, and motion estimation using optical flow methods. Object detection faces several difficulties like an issue in environment, illumination, movement of small objects, motion blur in a video. Individual tracking in a crowd is not the same as crowd movement tracking. Crowd tracking is arduous due to the hard occlusion, multiple motion patterns, and portion poster changes of the crowd. Figure 2 illustrates a general architecture of video segmentation method where the entire video is divided into frames and segmented to identify the region of interest. The temporal differencing method extracts the moving regions by considering the difference in pixels between two or three consecutive frames. It is extremely adaptive to dynamic environments. However, this results in holes inside moving entities. Background subtraction performs segmentation in scenes with a relatively static background that is moving objects are detected by taking differences between the current image and the preferred background image in a pixel by pixel fashion. However, it is exceptionally sensitive to lighting changes in the environment. Optical flow methods are computationally complex, require specialized hardware for realtime applications, and are sensitive to noise. Yang et al. [5] use SimpleFlow, one of the optical flow techniques to realize crowd segmentation accurately. This method combines both optical flow and steak-flow for crowd segmentation. It is observed with SimpleFlow method that a better accuracy can be achieved than the conventional LK Pyramid flow method. Bhatti et al. [6] propose a framework based on a fully conventional network. This uses two models one pre-trained using ImageNet image database and the other trained by extracting features. On applying binary classification (logistic regression) on all
290
P. Shreedevi and H. S. Mohana
Fig. 2 General architecture of video segmentation method
the images in ImageNet, foreground and background are separated. The isolated foregrounds are used to sort out the contours for the object which is to be segmented, which in turn are used to develop a mask of that object. The mask is created for one frame in a video sequence, and the same mask is used to process all the frames to segment the object. Features extracted from this mask are fed as training data to the second model. Once the second model is trained, it is subjected to the testing phase, by inputting an online video stream. Accuracy and recall performance metrics are used to evaluate the model, and the proposed model shows ~ 77% accuracy, ~ 90% recall which is higher than the usual unsupervised and semi-supervised techniques for object segmentation.
3.2 People Counting and Density Estimation Crowd density estimation measures the comfort level of a crowd or group of people in public spaces and is used to identify potentially dangerous situations like occurrence of a stampede or an agitated violent mob. There are three categories of models for
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
291
crowd density estimation: pixel-based analysis, texture-based analysis, and objectbased analysis. • Pixel-based analysis: This relies mainly on local features for estimating the number of people. • Texture-based analysis: Algorithms that work on texture analysis use high-level features while analyzing image patches. • Object-based analysis: The methods which use object-level analysis produce better results in contrast to pixel-level analysis and texture analysis. This method can also be used to identify individuals in lower density crowds. However, clutters and severe occlusions in denser crowds make the individual counting problem highly strenuous. The authors [7] have proposed a crowd counting technique called multi-scale head detection which evaluates crowd density more accurately and efficiently. The foreground of the image is extracted by using gradient difference, and the input images are split by applying patches in different scales. A density map is used to evaluate the crowd count and integrate different scale density maps with the weight of the perspective map. Hong Mo et al. [8] perform head size estimation through competitive models by considering the whole crowd distribution either sparse or dense. This is accomplished by finding out the radius of inserted points and iteratively updating the head size. The crowd is then divided into different density parts by utilizing the estimated head size to generate high fidelity head masks. The results have shown that this technique can quantitatively achieve crowd counting. Crowd density estimation involves direct approach or detection-based approach and indirect approach or feature-based approach. In the direct approach, the individuals in the crowd scene are directly segmented followed by detecting the number of people. The indirect approach involves techniques in which features are learned by using learning algorithms or location-based smartphones for estimating the number of people. More recently due to the upsurge of deep learning, several convolution neural network-based algorithms are proposed for crowd counting [7–9].
3.3 Crowd Tracking The task of object tracking close to video frame rate is challenging, and it becomes highly difficult if elements in the background are similar to foreground objects. For example, a man is moving past a crowd. Crowd tracking involves recognizing the position of the same person in consecutive frames. Similarly, a crowd can be traced by exploring individual trajectories of that crowd. While there are so many approaches for crowd tracking, clutter and severe or partial occlusions tend to make tracking of an individual in denser crowds a very challenging one. However, it is interesting to know that crowd tracking and people counting are related to each other as both of them aim at identifying individuals present in the crowd. People counting normally
292
P. Shreedevi and H. S. Mohana
requires only an estimate of the people irrespective of the position of each person. In contrast, the tracking determines the position of every person in the scene as a time function. Some of the object-based approaches employed for people counting can also be used to initialize crowd tracking thus accomplishing both density estimation and crowd tracking. Tracking can be categorized into different methods: • Region-based Tracking: Tracking is done according to variation of image regions corresponding to the moving objects. For this type of tracking, the motion regions are normally detected by subtracting the background from the current images. • Contour-based Tracking: These algorithms track only the contour of the object rather than tracking the whole set of pixels comprising the object. • Feature-based Tracking: These methods exploit features of video for tracking parts of the object. Firstly, elements are extracted and then clustered into high-level features. Finally, the features are matched between the images. • Model-based tracking: These algorithms track objects by matching with the projected object model. This method is mainly used in human tracking. This model-based human tracking mainly involves three main tasks: – Construction of human body models. – Representation of motion models and its constraints – Prediction and search strategies. • Hybrid tracking: These algorithms are a hybrid between the region-based and the feature-based techniques and possess the advantages of both. They consider the object as an entity and then by tracking its parts. A novel crowd tracking system [10] is formulated that combines low-level key point tracking, mid-level patch tracking, and high-level group evolution in a single framework. In this work, the crowd is regarded as a set of distinct and stable midlevel patches. The low-level key point tracking provides local motions which aid to detect mid-level patches with stable internal motions and patches are organized into groups with collective motions. The experimental result shows that the tracker tracks the target object accurately and robustly. Ren et al. [11] propose a sparse kernelized correlation filter which is mainly designed for eliminating response variations that are caused by the presence of occlusions, distractor objects, and illumination changes. This SKCF response map is then fused with crowd density map that is estimated with the help of convolution neural network to yield a refined response map. This framework significantly enhances tracking of people in crowd scenes.
3.4 Feature Extraction and Motion Information Large videos have huge datasets which in turn have large number of variables that require more computing resources for processing. Thus, feature extraction is used as a part of dimensionality reduction process where the number of features in a dataset
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
293
is reduced and new features are created from the original features by discarding the original ones. However, the new features extracted can still describe the original dataset with accuracy. Thus, extraction of motion information from a series of images becomes one of the important steps. Generally, there are three important techniques usually used to extract motion information from a series of images, namely optical flow, trajectory-based features, and region-based features. (a) Optical flow features: It is one of the most commonly used methods. An optical flow field is a vector field that is an approximation of the two-dimensional flow fields from image intensities. Though several methods have been developed, it is difficult to achieve accurate measurements. (b) Trajectory-based features: Trajectories are obtained at the locations of specific points on an object with respect to time. They are relatively simple to extract. In this technique, the tokens are detected in every frame, and corresponding tokens are found in subsequent frames. These tokens include interest points, edges, corners, limbs, and regions. These tokens are expected to be destructive enough for the generation of motion trajectories. (c) Region or image-based features: These are the features that are generated from a relatively larger region in applications where extraction of precise motion is not necessary. Table 1 depicts some of the existing feature extraction and motion pattern analysis techniques that can be applied on structured and unstructured crowds to analyze the nature of density.
3.5 Crowd Behavior Recognition The principal aim of crowd behavior modeling is to identify abnormal events in the crowd like riots, stampede, panic, and violence act or to identify abnormal behavior of individuals within the crowd. There are two approaches of behavior recognition depending on whether the crowd is observed as a cluster or individuals in a single entity. In object-based approach, the crowd is regarded as a group of individuals where both the individuals and their behavior are deduced. The holistic approach regards crowd as a single entity and applies to structured crowd scene of medium to high densities. Object-based approach: In unstructured crowds, the individuals move randomly in various directions, and hence, it becomes strenuous to identify the abnormal behavior that is a result of inappropriate actions of individuals. So, in object-based approach, the main focus is on expressing both structure and behavior of information systems into small modules that integrate both data and process. Therefore, it helps in capturing the whole scene to distinguish abnormal activities. A person running in a crowd where all the individuals are walking will be identified as an abnormal event
294
P. Shreedevi and H. S. Mohana
Table 1 Feature extraction and motion pattern analysis techniques References
Datasets
Feature
Applicable scenes
Density level
Baig et al. [12]
UCF
Optical flow
Structured
High
Shao et al. [13]
CUHK
Optical flow
Structured/Unstructured
High
Zitouni et al. [14]
PETS
Optical flow
Unstructured
High
Yuncheng et al. [15]
TGIF
Motion histogram
Structured
Low
Bera et al. [16]
ARENA
Cluster flow
Structured/Unstructured
Low/High
Hao et al. [17]
UMN
Optical flow
Structured
High
Moustafa et al. [18]
GC
Optical flow
Structured
High
Lamba et al. [19]
PETS/UCF/CUHK
Crowd flow
Unstructured
High
Yang et al. [20]
PETS
Optical flow
Unstructured
High
Irfan et al. [21]
UCF
Motion histogram
Structured/Unstructured
High
Usman [22] GP
Optical flow
Structured/Unstructured
High/Low
Roy et al. [23]
ANN
Optical flow
Structured/Unstructured
High
Nag et al. [24]
Drone Crowd dataset
Convolution Block attention module
Structured/Unstructured
High/Low
Chipade et al. [25]
Agoraset, PETS2009
Optical Flow + Motion Histogram
Structured/Unstructured
High/Low
leading to fast detections if the crowd is considered as a single entity. Such problems are overcome by using object-based approach. The conventional objects-based methods suffer from the complexity of segregating the individuals while performing segmentation and detection on each individual. These methods couldn’t handle densely crowded scenes. However, recent researches have tried to overcome this by considering low-level features and interactions. The authors [26] have proposed a multi-target multi-camera tracking (MTMCT) for real-time application in which the object of interest is tracked in different views separately by using a single camera. This online hierarchical algorithm stores the appearance of multiple objects and their information to create a dynamic gallery. The MTMC dataset is used to find the efficiency and effectiveness of the framework
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
295
by calculating the threshold value. This technique shows promising results with improved performance of tracking. In [27], a new model is put forth for pedestrian behavior modeling by considering stationary crowd groups as the key component. The model investigates pedestrian behaviors via interference on the interactions among pedestrians and stationary crowd groups. The performance of the proposed model is exhibited by diverse applications including personality classification, destination prediction, walking path prediction, and abnormal event detection. Holistic approach: In holistic approach, the moving objects are individuals in densely crowded scenes. Since they appear very small, their features are often ignored. The main objective of this approach is to comprehend the overall behavior of the crowd to judge if any abnormality exists with regard to the actions of individuals in particular. The holistic approach means to observe the activity of a person as a whole and not limited to particular movements, i.e., when a crowd scene is considered, it estimates the number of people instead of identifying individuals. The motion or movements of the person are captured and formed as frames to extract the features. It also helps in motion estimation using different techniques. In the paper [28], crowd behavior is modeled using holistic approach. Sparse coding is used for identifying and locating abnormal behaviors. A two-part dictionary is used to save general training features, and the analysis of test movements is performed via rebuilding features that are extracted from the test video based on the available dictionary generated in an unsupervised way using sparse combinations. If the algorithm shows higher errors at this stage, it depicts the lack of suitable rebuilding of the test video based on available behaviors in the dictionary, thereby detecting and locating abnormal behaviors. This model is capable of detecting and locating multiple abnormal events simultaneously in the scene. Gaoa et al. [29] present a novel method that proposes the detection of violent crowd behavior by combining the techniques of compressive sensing and deep learning. A novel hybrid random matrix (HRM) is constructed and trained to extract more discriminatory features. It is to be noted that this novel hybrid matrix satisfies the restricted isometric property (RIP), and through this matrix, it is possible to project the high-dimensional features into a low-dimensional space. Additionally, the reduced dimension features in the crowd behavior are extracted by developing a deep neural network, and these learnt features are exploited for classification. Bappaditya Mandal et al. [30] perform the holistic analysis of crowd behavior using spatial partitioning trees. The crowd behavior attributes or classes containing feature maps are used to create subclasses. An Eigen modeling method is used for regularizing the features of these subclasses. The crowd behaviors attribute classes are extracted from normal videos using a 1-nearest neighbor (NN) classifier. Experimental results of the proposed work exhibit superior performance as compared to other standard techniques on large crowd behavior recognition.
296
P. Shreedevi and H. S. Mohana
3.6 Crowd Anomaly Detection Crowded scenes suffer from severe clutter, occlusions from individuals and extreme ambiguities. A crowd scene manifests both psychological characteristics and dynamics which are goal oriented. Thus, it can presume unanticipated and complicated behavior as it is always a collective behavior of individuals [3]. It becomes difficult to segment and trace out individual behavior due to occlusions. The performance of conventional object tracking methods often tends to decrease with increase in crowd density for which novel methods need to be developed. The definition of an abnormal event itself is subjective as even a rare and outstanding event can be considered abnormal sometimes or the events that have not happened may be considered abnormal [1, 3]. Table 2 depicts a brief comparison of some of the anomaly detection techniques which are evaluated on different datasets with different methods whose performance can be estimated by parameters like area under ROC curve (AUC), accuracy (ACC), average precision (AP), and error rate (ERR). The challenges of crowd modeling and monitoring have resulted in crowd analysis, and the main aim is not to analyze a normal crowd but to detect and derive deviations from it which is called anomaly detection. Crowd anomaly detection can be divided Table 2 Comparison of some anomaly detection techniques Reference
Dataset
Methodology
AUC
ACC
AP
ERR
Shuaibu et al. [31]
WWW
3D Scale 0.9530 – convolutional neural network
Rejitha et al. [32]
UMN
Optical flow method 0.9785 92.33% – (Farneback algorithm)
Hu et al. [33]
UCSD Ped1 Single-hidden-Layer 0.8090 – feedforward neural network model
–
26.3%
Nady et al. [34]
UMN
0.7008 –
–
Feature extraction (STACOG features) + Clustering algorithm (K medoids)
0.9633 –
–
4.6%
Mahadevan et al. [35] UCSD
Mixture of dynamic texture (MDT) model
0.735
75%
–
25%
Zhang et al. [36]
PETS 2009
K-means clustering algorithm + hungarian algorithm + spatial location features
89%
0.7773 –
Esan et al. [37]
UCSD Ped1 CNN-Long short-term memory
88.5%
–
0.891
–
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
297
into two kinds; local anomaly detection and global anomaly detection, based on the location and position of detection of abnormal behavior. • Global anomaly detection: Any abnormal behavior that has occurred in the entire frame is global abnormal behavior. When unexpected events like fires, explosions, stampede, transportable disasters occur, they create a state of fear and panic in the crowd and would transform the entire crowd dynamics into a different state, i.e., the regular motion pattern is disturbed. • Local anomaly detection: The occurrence of anomalous activity in a particular area of the frame is the local abnormal behavior. Local anomaly detection often requires realizing the position where the anomaly event happens. Some of the popular local anomaly detection techniques are flow field model, hidden Markov model [HMM], dynamic texture, and space representation. The traditional human detection and activity identification methods often fail to perform well in crowded scenes due to the appearance of many objects in an image. Thus, it is challenging to analyze human activity in a crowd scene. The texture information such as spatiotemporal frequency, spatiotemporal gradient, and a mixture of dynamic texture are some of the techniques used by the researchers to detect abnormal behavior in a crowd. The optical flow techniques are used to identify motion features like spatial saliency of motion feature, clustered motion pattern, and local motion histogram. And a few have used particle advection method and social force model on dense flow fields for analyzing crowd movements. When abnormal situations are detected, another challenging task that arises is how effectively this information will be made accessible for search and analysis. Many times various technologies are utilized for storage, indexing, and accessibility of videos in different video formats resulting in diverse and non-interoperable video search systems. However, the ambiguities, visual occlusions, scene semantics, and complex behaviors make the task of analysis of a crowd scene a more challenging one. The authors [38] present a crowd anomaly detection based on outlier rejection. The algorithm produces a feature for each super pixel which comprises the involvement from the adjacent super pixels whose motion direction conforms to the dominant direction of motion in the region. For classification purpose, a univariate Gaussian discriminant analysis along with K-means algorithm has been used. This technique draws out features over non-overlapping pixels in the scene. The speed, power, and accuracy of this technique are superior compared to many handcrafted feature-based and deep learning-based approaches. However, further improvement is possible if the proposed feature is blended with a deep neural network. Yang et al. [39] have suggested a two-stream framework approach that mainly incorporates raw data and optical flow as supplementary information since the optical flow method is not able to satisfactorily encode motion differences while addressing normal motions. The proposed global–local analysis scheme executes anomaly detection and localization together. The network is trained to minimize a weighted Euclidean loss. The effectiveness of the approach is superior when
298
P. Shreedevi and H. S. Mohana
compared to the existing approaches. However, the proposed approach lags in generalization ability. It also required additional training that applies for a well-trained detection model for other scenes. The accuracy of the detection can be still improved by employing adversarial training. The HUAD or Hierarchical Urban Anomaly Detection framework [40] detects regional anomaly. Based on taxi and subway data, the traffic flow matrix is derived by generating regions and period sequences. In the second phase, the alternative abnormal regions are obtained. The historical anomaly scores are procured by predicting the traffic that is exploiting the long short-term memory (LSTM) network. Then, using adjacent periods, historical anomalies, and adjacent regions, the sophisticated anomaly characteristics are produced. Finally, abnormal regions are identified by one-class support vector machine (OC-SVM). Zhang et al. [41] present a method that uses the evolution of spatial position relationship (ESPR) to identify three kinds of abnormal behavior of crowds such as behavior of the crowd in collective movement, behavior of crowd gathering, and behavior of crowd spread. The advanced optical flow approach is incorporated to adopt optical flow features. The tightness level of all moving objects is analyzed by the fuzzy algorithm. Lastly, abnormal event is identified through the ESPR feature. Hasan et al. [42] have proposed new methods, handcrafted spatiotemporal framework, and auto-encoder to learn the motion signatures in the video frame. This model uses video as an input where features of the frames are extracted, labeled and as a result abnormal and regular events are detected in the frames. Multiple datasets named Avenue, UCSD pedestrian, and subway are used for experiments. The convolutional auto-encoder gives a more precise regular score when compared to IT auto-encoder. The convolutional layers obtain low high-level filter responses which describe regularities and irregularities in the video frame. The proposed method influences more regularity recognition with past and future regular motions in a single frame. Table 3 illustrates some of the state-of-the-art techniques for crowd analysis dealing with crowd behavior recognition, detection of abnormal events, recognition of crowd motion pattern, crowd density estimation, and crowd tracking.
4 Technological Advancement in Video Surveillance The advancement of video surveillance systems is focused on several technology trends. Current developments in signal processing aid the growth of intelligent video surveillance systems, particularly the feature of flexible adaptation to the changes in the rate of video data collection when the security indicator is identified. Advanced big data infrastructures have unlocked new prospects for video data storage and accessing which are comprised of volume, variety, veracity, and velocity. In specific, gathering videos of huge data sizes from various cameras are now much simpler than it had been in the past. From the past few years, several data streaming systems have been developed offering functionalities for streaming analytics and stream management as a part of the big data systems. The rapid development in the field of artificial
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
299
Table 3 Survey on crowd analyses and techniques References
Key issue
Gaoa et al. [43] Detection of violent crowd behavior
Algorithm
Dataset
Hybrid random matrix deep neural network architecture
Crowd violence AUC = 91.61% benchmark dataset
Rejitha and George [44]
Abnormal crowd Farneback optical behavior detection flow
Shehzed et al. [45]
Crowd counting, abnormal event detection
UMN
Results
AUC (indoor) = 0.9516
Silhouette template PETS2009 matching, Gaussian UMN clusters
Tracking Accuracy = 88.7% Detection Accuracy = 95.5%
Matkovic et al. Crowd behavior [46] recognition Motion Pattern recognition
Optical flow-based tracker with CNN-based detector Fuzzy logic function
Precision = 0.9675 Recall = 0.884
Salim et al. [47]
Background PETS2009 subtraction, Shadow removal using Kalman filter
Average Efficiency = 83.14%
Zhao et al. [48] Crowd density estimation and Crowd stability analysis
Improved multicolumn CNN-4 Column CNN
Shanghai dataset, UCF_50
Accuracy = 83.26% Error = 16.74%
Wang et al. [49]
Self-weighted multiview clustering method
CUHK dataset
Accuracy = 83%
Crowd detection and tracking
Detection of Coherent crowd groups
UMN
intelligence has resulted in more sophisticated deep learning techniques like Google’s Alpha AI engine that enables video surveillance process more effectively. So, AI can permit predictive analytics for security operations for anticipating security incidents. With the combination of smart objects and IoT devices, security systems will provide next-generation surveillance and security functionalities. In recent years, drones are deployed to provide video surveillance functionalities based on conventional fixed cameras. The present digital transformation in the industry enables the convergence of physical and cyber-security measures. Therefore, they can be easily integrated with cyber-security systems. In addition to the above-discussed technologies, it is very important to have a proper architecture for the video surveillance infrastructure. Thus, modern surveillance systems use the edge/fog computing paradigm to process video surveillance data nearer to the field as it exploits minimum bandwidth and performs efficient real-time security monitoring. Hence, the edge/fog computing paradigm is an ideal choice for present video surveillance technologies.
300
P. Shreedevi and H. S. Mohana
5 Research Gap Analysis Two crucial concerns emerge while performing anomaly detection in real-life crowd scenes. Firstly, typical behaviors often demonstrate multimodal property such as people moving in various directions and are sequentially observed. Secondly, the environment cannot be expected to be always stationary as even the normal patterns may undergo unprecedented changes. Most of the research papers in the literature survey have considered a stationary background for abnormal behavior recognition of crowd using a static camera, but the crowd scenes with dynamic background and a moving camera need to be addressed. It is always a challenging task to analyze human interactions and activities with complicated temporal logic, for instance, two people meeting, walking together, and then separating. The appearance of too many subjects in a frame of a crowd scene results in extreme clutters, severe occlusions, and ambiguities are some of the reasons for the poor performance of conventional human detection and activity recognition methods in crowded scenes. Thus, summarizing the above: • It is difficult to detect abnormal interactions and atomic activities of individuals in a crowd scene due to severe occlusions and extreme clutters. This challenge may arise due to: – – – – –
A complex or dynamic environment. Change in features like shape and direction with time. Unpreventable similarities among objects. Changes in illumination, shadows, and merging and splitting of targets. Adverse weather conditions.
• Tracking of target objects with identical color may perhaps fail when these objects participate together in an occlusion event. This poses a potential drawback when the classification had to be achieved based on color features. • In the case of detection of loitering individuals, there is a lack of efficient existing technology to compress the multiple views of the same object into a single highdimensional point. • The emotional aspects of crowd behaviors require more research. Analyzing the emotion of crowd behaviors can help to uncover the social moods that are beneficial for video surveillance.
6 Crowd Video Datasets To work with crowd scenarios, certain real-life datasets are needed to authenticate the experiments. Table 4 illustrates some freely available crowd video datasets with their brief information, size of the video, accessibility, and the labeling level.
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
301
Table 4 Summary of existing crowd video datasets Reference
Description
Accessibility
No. of videos
Label
UMN [50]
3 Different indoor and outdoor scenarios with an escape event
Yes
11
All
Violent-flows [51]
Video sequences of crowd violence with real-world scenarios
Yes
246
Partial
UCSD [52]
36 Testing video clips and 34 training video clips in subset 1 12 Testing video clips and 16 training video clips in subset 2
Yes
98
All
CUHK [53]
1 Crowd video sequence and 1 traffic video sequence
Yes
2
Partial
Rodriguez’s [54]
Large crowd videos No collection with 100 labeled object trajectories
520
Partial
PETS2009 [55]
8 Video sequences of different crowd activities with calibration data
Yes
8
All
UCF [56]
Crowd videos, vehicle flows, and high-density moving objects
Yes
38
Partial
QMUL [57]
1 Pedestrians video sequence and 3 traffic video sequence
Yes
4
Partial
7 Conclusion In this review work, an overview of general strategies and recent developments in different stages of crowd video analysis systems is presented. The existing state-ofthe-art techniques in each crucial issue are described by mainly focusing on different steps involved in crowd analysis like density estimation, people counting, motion detection, crowd tracking, crowd behavior recognition, and anomaly detection in scenarios with massive crowd. Although there has been a substantial amount of work being done in crowd scene analysis, it is still a topic of open research in the perspective of artificial intelligence and machine learning. Crowd scene analysis using deep learning is gaining momentum and could be a prospective solution. Further, the crowd scenes and human behaviors can be effectively analyzed by combining data from remote and multisensory surveillance systems which use natural language description and improvised human behavior recognition. This survey gives a future direction for
302
P. Shreedevi and H. S. Mohana
the enhancement of crowd scene analysis using artificial intelligence and machine learning and development of state-of-the-art techniques.
References 1. Thida M, Yong YL, Climent-Pérez P, Eng H-L, Remagnino P (2013) A literature review on video analytics of crowded scenes. In: Intelligent multimedia surveillance. Springer, Berlin, Germany, pp 17–36 2. Jacques JCS, Musse SR, Jung CR (2010) Crowd analysis using computer vision techniques. IEEE Sign Process Mag 27(5):66–77 3. Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviours. IEEE Trans Syst Man Cybern Appl Rev 34(3) 4. Zhan B, Monekosso DN, Remagnino P, Velastin SA, Xu L-Q (2008) Crowd analysis: a survey. Mach Vis Appl 19(5–6):345–357 5. Yang H-Y, Zhao H-A, Zhou P (2017) Crowd segmentation using simpleflow. In: International conference on network and information systems for computers (ICNISC), pp 107–110 6. Bhatti MH, Azeem M, Younis H (2019) Object segmentation in video sequences by using single frame processing. In: 13th International conference on open source systems and technologies (ICOSST) 7. Ma T, Ji Q, Li N (2018) Scene invariant crowd counting using multiscales head detection in video surveillance. IET Image Process 12(12):2258–2263 8. Mo H, Ren W, Xiong Y, Pan X, Zhou Z, Cao X, Wu W (2020) Background noise filtering and distribution dividing for crowd counting. IEEE Trans Image Process 29:8199–8212 9. Ahuja KR, Charniya NN (2019) A survey of recent advances in crowd density estimation using image processing. In: Proceedings of the fourth international conference on communication and electronics systems (ICCES 2019) IEEE conference record # 45898; IEEE Xplore, pp.1207– 1213. ISBN: 978-1-7281-1261-9 10. Zhu F, Wang X, Yu N (2018) Crowd tracking by group structure evolution. IEEE Trans Circ Syst Video Technol 28(3) 11. Ren W, Kang D, Tang Y, Chan AB (2018) Fusing crowd density maps and visual object trackers for people tracking in crowd scenes. In: IEEE/CVF conference on computer vision and pattern recognition.https://doi.org/10.1109/CVPR.2018.0056 12. Baig MW, Baig MS, Bastani V, Barakova EI, Marcenaro L, Regazzoni CS, Rauterberg M, Perception of emotions from crowd dynamics. In: IEEE international conference on digital signal processing (DSP), pp 703–707 13. Shao J, Dong N, Zhao Q (2015) An adaptive clustering approach for group detection in the crowd. In: IEEE international conference on systems, signals and image processing (IWSSIP), pp 77–80 14. Zitouni MS, Dias J, Al-Mualla M, Bhaskar H (2015) Hierarchical crowd detection and representation for big data analytics in visual surveillance. IEEE international conference on systems, man, and cybernetics, pp 1827–1832 15. Li Y, Song Y, Cao L, Tetreault J, Goldberg L, Jaimes A, Jiebo (2016) In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4641–4650 16. Bera A, Kim S, Manocha D (2016) Realtime anomaly detection using trajectory-level crowd behaviour learning. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1289–1296 17. Hao Y, Xu Z, Wang J, Liu Y, Fan J (2017) An effective video processing pipeline for crowd pattern analysis. In: 23rd International conference on automation and computing (ICAC) 18. Moustafa AN, Hussein ME, Gomaa W (2017) Gate and common pathway detection in crowd scenes using motion units and meta-tracking. In: International conference on digital image computing: techniques and applications (DICTA)
Video Analysis to Recognize Unusual Crowd Behavior for Surveillance …
303
19. Lamba S, Nain N (2017) A Large scale crowd density classification using spatio-temporal local binary pattern. In: 13th International conference on signal-image technology and internet-based systems (SITIS), pp 296–302 20. Yang M, Rashidi L, Rao AS, Rajasegarar S, Ganji M, Palaniswami M, Leckie C (2018) Clusterbased crowd movement behaviour detection. In: Digital image computing: techniques and applications (DICTA) 21. Irfan M, Tokarchuk L, Marcenaro L, Regazzoni C (2018) Anomaly detection in crowds using multi-sensory information. In: 15th IEEE international conference on advanced video and signal based surveillance (AVSS) 22. Usman I (2019) Anomalous crowd behaviour detection in time varying motion sequences. In: 4th World conference on complex systems (WCCS) 23. Roy A, Biswas N, Saha SK, Chanda B (2019) Classification of moving crowd based on motion pattern. In: IEEE region 10 symposium (TENSYMP), pp 102–107 24. Nag S, Khandelwal Y, Mittal S, Mohan CK, Qin AK (2021) ARCN: a real-time attention-based network for crowd counting from drone images. In: 2021 IEEE 18th India council international conference (INDICON), pp 1–6. https://doi.org/10.1109/INDICON52576.2021.9691659 25. Chipade A, Bhagyawant P, Khade P, Mahajan RC, Vyas V (2021) Computer vision techniques for crowd density and motion direction analysis. In: 2021 6th international conference for convergence in technology (I2CT), pp 1–4. https://doi.org/10.1109/I2CT51068.2021.9417993 26. Chou Y-S, Wang C-Y, Chen M-C, Lin S-D, Mark Liao H-Y (2019) Dynamic gallery for realtime multi-target multi-camera tracking. In: 16th IEEE international conference on advanced video and signal based surveillance (AVSS). https://doi.org/10.1109/AVSS.2019.8909837 27. Yi S, Li H, Wang X (2015) Understanding pedestrian behaviours from stationary crowd groups. In: IEEE 2015, pp 3488–3496 28. Masoudirad SM, Hadadnia J (2017) Anomaly detection in video using two-part sparse dictionary in 170 FPS. In: 3rd international conference on pattern recognition and image analysis (IPRIA 2017) 19–20 Apr 2017, pp 133–139 29. Gaoa M, Jiang J, Maa L, Zhoud S, Zoua G, Pana J, Liub Z (2019) Violent crowd behaviour detection using deep learning and compressive sensing. In: The 31th Chinese control and decision conference (2019 CCDC), pp 5329–5333 30. Mandal B, Fajtl J, Argyriou V, Monekosso D, Remagnino P (2018) Deep residual network with subclass discriminant analysis for crowd behaviour recognition. In: ICIP 2018, pp 938–942 31. Shuaibu AN, Malik AS, Faye I (2017) Adaptive feature learning CNN for behaviour recognition in crowd scene. In: Proceedings of the IEEE international conference on signal and image processing applications (IEEE ICSIPA 2017), Malaysia, pp 357–361 32. Rejitha MR, George SN (2019) An unsupervised abnormal crowd behaviour detection technique using farneback algorithm. IEEE 33. Hu J, Zhu E, Wang S, Liu X, Guo X, Yin J (2019) An efficient and robust unsupervised anomaly detection method using ensemble random projection in surveillance videos. Sensors (Basel). https://doi.org/10.3390/s19194145 34. Nady A, Atia A, Abutabl AE (2018) Real-Time abnormal event detection in crowded scenes. J Theor Appl Inf Technol 6064–6074 35. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: Proceedings of IEEE conference computer vision pattern recognition, pp 1975–1981 36. Zhang L, Han J (2020) Recognition of abnormal behavior of crowd based on spatial location feature. In: 2020 IEEE 9th joint international information technology and artificial intelligence conference (ITAIC), pp 736–741. https://doi.org/10.1109/ITAIC49862.2020.9338944 37. Esan DO, Owolavi PA, Tu C (2020) Anomalous detection system in crowded environment using deep learning. In: 2020 International conference on computational science and computational intelligence (CSCI), pp 29–35.https://doi.org/10.1109/CSCI51800.2020.00012 38. Khan MUK, Park H-S, Kyung C-M (2019) Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur 14(2):541–556 39. Yang B, Cao J, Wang N, Liu X (2019) Anomalous behaviours detection in moving crowds based on a weighted convolutional auto encoder-long short-term memory network. IEEE Trans Cogn Dev Syst 11(4):473–482
304
P. Shreedevi and H. S. Mohana
40. Kong X, Gao H, Alfarraj O, Ni Q, Zheng C, Shen G (2020) HUAD: hierarchical urban anomaly detection based on spatio-temporal data. In: Special section on artificial intelligence (AI)empowered intelligent transportation systems, vol 8, pp 26573–26582 41. Zhang M, Li T, Yue Y, Li Y, Hui P, Zheng Y (2019) Urban anomaly analytics: description, detection and prediction. J Latex Class Files 14(8):1–18 42. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–40 43. Gaoa M, Jiang J, Maa L, Zhoud S, Zoua G, Pana J, Liub Z (2019) Violent crowd behavior detection using deep learning and compressive sensing. In: The 31th Chinese control and decision conference, pp 5329–5333. https://doi.org/10.1109/CCDC.2019.8832598 44. Rejitha MR, George SN (2019) An unsupervised abnormal crowd behavior detection technique using farneback algorithm. In: IEEE international conference on electronics, computing and communication technologies, pp 1–5. https://doi.org/10.1109/CONECCT47791.2019.901 2845 45. Shehzed A, Jalal A, Kim K (2019) Multi-Person tracking in smart surveillance system for crowd counting and normal/abnormal events detection. In: 2019 International conference on applied and engineering mathematics, pp 163–168. https://doi.org/10.1109/ICAEM.2019.885 3756 46. Matkovic F, Marˇcetic D, Ribaric S (2019) Abnormal crowd behaviour recognition in surveillance videos. In: 15th International conference on signal-image technology and internet-based systems, pp 428–435. https://doi.org/10.1109/SITIS.2019.00075 47. Salim S, Khalifa OO, Rahman FA, Lajis A (2019) Crowd detection and tracking in surveillance video sequences. In: IEEE International conference on smart instrumentation, measurement and application, pp 1–6. https://doi.org/10.1109/ICSIMA47653.2019.9057300 48. Zhao R, Dong D, Wang Y, Li C, Ma Y, Enríquez VF (2022) Image-Based crowd stability analysis using improved multi-column convolutional neural network. IEEE Trans Intell Transp Syst 23(6):5480–5489. https://doi.org/10.1109/TITS.2021.3054376 49. Wang Q, Chen M, Nie F, Li X (2020) Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell 42(1):46–58. https://doi.org/10.1109/TPAMI. 2018.2875002 50. UMN Crowd Dataset, Department of Computer Science and Engineering, University of Minnesota, Minneapolis 51. Hassner T, Itcher Y, Kliper-Gross O (2012) Violent flows: real-time detection of violent crowd behavior. In: 3rd IEEE international workshop on socially intelligent surveillance and monitoring (SISM) at the IEEE conferences on computer vision and pattern recognition (CVPR), Rhode Island, June 2012 52. UCSD Anomaly Detection Dataset, University of California, San Diego 53. Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555 54. Rodriguez M, Sivic J, Laptev I, Audibert J-Y (2011) Data-driven crowd analysis in videos. In: Proceedings of IEEE international conference on computer vision, Nov 2011, pp 1235–1242 55. Ferryman J, Shahrokni A (2009) PETS2009: dataset and challenge. In: 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, Snowbird, UT, USA, 2009, pp 1–6.https://doi.org/10.1109/PETS-WINTER.2009.5399556 56. Ali S, Shah M (2007) A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In: Proceedings of IEEE conference on computer vision and pattern recognition, June 2007, pp 1–6 57. QMUL Junction Dataset, School of Computer Science and Engineering, Nanyang Technological University, Singapore
Prediction of Drug-Drug Interactions Using Support Vector Machine W. Mohammed Abdul Razak , R. Rishabh , and Merin Meleet
Abstract Drug-drug interactions can prompt various health issues that take place when a person intakes multiple drugs simultaneously. Predicting these events in advance can save lives. A polynomial kernel SVM was proposed for predicting drugdrug interactions (DDIs) between a drug pair. A unique fingerprint of each drug was considered, and then, the fingerprints of every drug pair were combined into another unique fingerprint using a novel method which then acts as a feature vector for the machine learning model. The SVM implemented gave us an accuracy of 91.6%. Moreover, the proposed model could also predict novel interactions between drugs not present in the dataset. Keywords Drug-drug interactions · Deep learning · Fingerprinting · Support vector machine · Polynomial kernel
1 Introduction Drug-drug interactions take place when multiple drugs are consumed causing some interactions between drugs. Such interactions can give rise to side effects and prove to be fatal to the human body and such problems must be handled with utmost care by any doctor who prescribes multiple drugs. About 38.1% of adults in the USA whose age ranged from 18 to 44 years were prescribed three or more drugs in a time span of 30 days [1]. Self-medication also has been on a rise on a worldwide scale and especially in economically deprived communities. South Africa 14%, USA 13%, Australia 11%, Germany 11%, etc. Prevalence of self-medication [2] which is another W. Mohammed Abdul Razak (B) · R. Rishabh · M. Meleet R V College of Engineering, Mysore Rd, RV Vidyaniketan Post, Bengaluru, Karnataka 560059, India e-mail: [email protected] R. Rishabh e-mail: [email protected] M. Meleet e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_23
305
306
W. Mohammed Abdul Razak et al.
factor, wherein a person unknowingly consumes some harmful drug combinations that can prove to be fatal. Drug interactions mainly occur on pharmacodynamic (PD) and pharmacokinetic (PK) levels [3]. PD DDIs arise in case the pharmacological effect of a drug is altered by another drug though such interactions can be deliberated in few medications like cancer chemotherapy but adverse or unintended PD interactions must be monitored [4]. PK interactions on the other hand could develop if one drug alters the absorption, distribution or metabolism properties of another drug. A survey [5] done in a private hospital during 2015–2016 showed that in 211 prescriptions, a total of 369 DDIs were identified out of which around 4% interactions were serious, 66% interactions were significant and 29% interactions were minor, and majority of whose interactions were found to be pharmacodynamic (51.21%) in nature. An efficient method that can predict correct DDIs is the need of the hour since clinical trials are very much tardy, costly and impractical when largescale data and experimental conditions are limited. Hence, computational methods are introduced by researchers to accelerate the prediction process [6]. Recently, there has been a rise in research and development of various computational methods that can solve this problem to predict DDIs in an efficient manner. Zhang and Mei [7] showed that several computational methods can be somewhat categorized into three types, namely network-related methods [6, 8], similarity related methods [9, 10] and machine learning methods [9, 10]. Network-based methods [8] showed the construction of networks of drugs that graphically indicate if any two drug interactions exist. In the network, each node would represent a drug. An edge between two nodes would denote if there exists a DDI between the connected nodal drugs. More the drugs interact the more the risk involved; hence, the dimensionality of each node enumerates the proportion of risk of a particular drug in the network. Similarity-based methods can be employed by a logic that if a drug is similar to another interacting drug then most likely it also interacts with that drug. Drug structural and chemical properties can be used as similarity metrics. In [11], in addition to using interaction profile fingerprints (IPFs) representing drug-drug interactions in Boolean form, a similarity matrix is populated showing similarity measures among all pairs using Tanimoto coefficient measurement. The third method, i.e., machine learning methodology that has been the most popular and widely used method to predict DDIs. This approach combines various similarity metrics and network-based drug representation to create a unique feature vector representing input drug pairs and then feed into the model for training. Zhang and Mei [7] show frameworks of machine learning that are used in integration processes involving indiscriminate data including kernel methods [1], ensemble learning [12] and deep learning [9, 13].
2 Literature Review Ryu et al. in [13] developed a deep neural network model designed to classify multiple labels that then predict multiple DDI types for a given pair of drugs, even though in the gold standard dataset considered, 99.8% of drug pairs are reported to have
Prediction of Drug-Drug Interactions Using Support Vector Machine
307
only a single DDI type. Hyun et al. [13] show how a feature vector is created by first generating two feature vectors of each drug called structural similarity profile (SSP), which are then used to capture a structural feature that uniquely describes a particular drug. Two SSPs are then combined, producing a feature vector of a given pair of drugs. Similarly, feature vectors are created in this manner for all the considered drug pairs for development of a deep neural network to predict the DDI type. In [9], Zhang et al. have also created a deep neural network for prediction of DDI type and showed a new way to represent drugs. The deep predictor for drug-drug interactions (DPDDIs) in [9] consist of two prominent parts: (i) a feature extractor based out of a graph convolution network (GCN)—this is used characterize each drug in graph embedding space whose representation is in the form of feature vector represented in latent space used to capture topological relationships of neighboring drugs by transforming complex data into simpler representation to conveniently process and analyze the data. This can also imply reducing the dimensionality of data representation. The researchers also examined this type of GCN model, in which feature extraction was found to be advantageous (>20%) compared to other features derived via biological, anatomical or chemical features of drugs; (ii) a deep neural networkbased predictor model that concatenates the feature vectors extracted in the former process in a novel manner to create a single feature vector to representing the input drug pairs in training a DNN to predict the DDIs. Another paper by Zhang et al. in [12] has presented a novel algorithm that predicts DDIs. The feature vectors were created by considering four types of features and then fed as input for the prediction module. This feature extraction method was proposed as a novel approach. The authors here as well have employed neural network architecture based on CNN. Rohani et al. in [10] developed an NDD system that can extract high-level features by combining two predefined methods involving similarity measurement procedures, wherein this integration process is the novelty of this system that is then combined with a neural network model that then predicts novel interactions between two unknown drugs. Rawat et al. in [14] efficiently predicted DDIs via deep learning methods. A deep learning model is designed and constructed. Machine learning methods like recurrent neural networks, convolutional neural networks and mixture density networks are integrated to efficiently predict DDIs, and a comparative analysis was performed on the results. Extensive experiments were conducted on benchmark datasets. In [6], Zhang et al. presented a neural network framework DANN DDI to estimate 10 different types of DDIs to achieve an accuracy score of 88.74%, and more significantly, the model presented could evaluate novel interactions between drugs and DDIs associated with them. Yin et al. in [15] have used a unified embedding feature that represents each drug and then generates a drug-drug set feature using a concatenation algorithm which is then fed as the input to the deep neural network model to predict 65 kinds of DDIs. The results of this model were shown to have an improved predictive performance due to extraction of features.
308
W. Mohammed Abdul Razak et al.
In [16], Almustafa proposed classification algorithms, namely support vector machine, logistic regression, AdaBoost and a Naïve Bayes event classifier to predict heart diseases. The classifiers were compared with one another and accuracies as high as 99% was achieved. Ayvaz and Dere in [11] while predicting new interactions between drugs used an approach that would derive DDI predictions based on properties of drugs that make them similar to one another and the already known DDIs and unknown drugs. Wang et al. in [8] proposed a novel framework to predict DDIs by first generating a large-scale graphical representation of suitable data which is then processed to reduce the dimensionality of the data after which these learned modules through a link prediction process are then used to efficiently compute rich DDI information. In [17], Gonzalez et al. generated drug pairs feature vectors from similarity matrices separate for each drug and then used them as the input to an encoder-based model, generating two distinct feature vectors representing each drug in the input pair that are again processed. Concatenation of the encoded feature vectors is done, and finally, concatenated drug pair vectors are used to generate a drug interaction probability distribution for each drug pair.
3 Dataset Details The dataset used is a subset of the gold standard DrugBank [18] dataset. The dataset originally consisted of 191,878 drug-drug interactions with 86 different types of interactions. However, this dataset was heavily unbalanced with 99.8% of drug pairs reported to have only a single DDI type. A relatively balanced dataset with 33,124 drug-drug interactions comprising 10 different types of interactions was considered. Figure 1 shows the different DDI types and their sample sizes.
Fig. 1 Figure showing the types of drug-drug interactions and their sample size
Prediction of Drug-Drug Interactions Using Support Vector Machine
309
4 Methodology A methodology that considers a drug pair as an input and classifies the type of interaction between them as output is presented. The simplified molecular-input line-entry system (SMILES) structure of each drug is generated. SMILES is used to translate a chemical’s molecular structure into a string. Figure 2 represents the chemical structure of a commonly used drug, paracetamol. This string can later be used for computational purposes. A molecular fingerprint of each drug was generated using MACCS keys. A molecular fingerprint is a way of encoding the molecular structure of a chemical compound into a bit-string. Molecular Access System (MACCS) keys are 166-bit 2D structure fingerprints which are used for the measure of molecular similarity. Figure 3 shows the steps involved in predicting the DDI type. The fingerprints generated for the given drug pair are then converted into a combined feature vector. Each bit in the 166-bit MACCS fingerprint represents whether a particular chemical fragment is present or absent in the chemical structure.
4.1 Methodology Implemented for Combining the Two Drug Fingerprints The absence of a chemical fragment in the chemical structures of both the drugs in each pair was assigned the number 0 in the combined feature vector. The presence of a chemical fragment in the chemical structure of drug 1 and its absence in drug 2 was assigned the number 1 in the combined feature vector. The presence of a chemical fragment in the chemical structure of drug 2 and its absence in drug 1 was assigned the number 2 in the combined feature vector. The presence of a chemical fragment in the chemical structure of both the drugs was assigned the number 3 in the combined feature vector. The combined fingerprint was converted into a NumPy array and then used as a feature vector for training the machine learning model.
SMILE - “CC(=O)NC1=CC=C(O)C=C1”
Fig. 2 Above figure is the chemical structure of paracetamol also known as acetaminophen
Fig. 3 Two drug fingerprints are combined into a unique feature vector which act as the training data for the machine learning model
310 W. Mohammed Abdul Razak et al.
Prediction of Drug-Drug Interactions Using Support Vector Machine
311
4.2 Support Vector Machine Support vector machine (SVM) is a supervised machine learning model that analyzes data and performs classification and regression analysis. In this paper, a SVM with a polynomial kernel was implemented to classify the various drug-drug interactions. The polynomial kernel function is used with SVMs and other kernelized models to represent the similarity of the feature vectors in a feature space over polynomials of the original variables. This allows the learning of nonlinear models. For a d-degree polynomial, the polynomial kernel is defined as: K (x, y) = (x T y + c)d
(1)
where x and y are the input feature vectors and c ≥ 0 is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial. The SVM with the following hyperparameters was used for classifying the data: Kernel = ‘poly’, degree = 4, gamma = ‘auto’, coef0 = 1 and C = 10. The kernel parameter selects the type of hyperplane that is used for classifying the data. A polynomial kernel was required to classify our data as multiple classes were present. The gamma hyperparameter is used for nonlinear hyperplanes. The higher the gamma value, the more accurately it tries to fit the training data set. The hyperparameter coef0 determines how much the model is influenced by high degree polynomials. C is the penalty parameter of the error term that determines how smooth or sharp the decision boundary will be, for classifying the training samples correctly. When the kernel is set to ‘poly’, the degree parameter is used to find the degree of the polynomial used to find the hyperplane to split the data.
5 Results An overall accuracy of 91.6% was obtained with individual accuracies for each type of interaction listed below. Our model could also identify novel interactions between drugs that were not present in our dataset. The confusion matrix, Receiver operating characteristic (ROC) and Precision-Recall curves were also plotted. The precision, recall and F1 score of each DDI type are given in Table 1. Precision is the indicator that represents the quality of the positive predictions made by the model. Recall represents the model’s ability to detect positive samples. F1 score combines the two metrics, precision and recall to represent the predictive performance of the model. Table 2 describes the real world effectiveness of the machine learning model by considering interactions that were not present in the training dataset.
312
W. Mohammed Abdul Razak et al.
Table 1 Precision, recall and F1-score obtained for each type of interaction DDI type ID and name
Precision
Recall
F1-score
0: When drug 1 is combined with drug 2, the risk or severity of adverse effects can be increased
0.91
0.92
0.92
1: When drug 2 is combined with drug 1, the risk or severity of adverse effects can be increased
0.90
0.91
0.91
2: When drug 1 is combined with drug 2, the serum concentration of drug 1 can be increased
0.92
0.91
0.91
3: When drug 2 is combined with drug 1, the serum concentration of drug 2 can be increased
0.92
0.90
0.91
4: When drug 1 and drug 2 are combined, the metabolism of drug 1 can be decreased
0.92
0.91
0.91
5: When drug 1 and drug 2 are combined, the metabolism of drug 2 can be decreased
0.90
0.90
0.90
6: Drug 1 can increase the qtc-prolonging activities of drug 2
0.94
0.96
0.95
7: Drug 2 can increase the qtc-prolonging activities of drug 1
0.93
0.93
0.93
8: Drug 2 can increase the serotonergic activities of drug 1
0.98
0.96
0.97
9: Drug 2 can increase the central nervous system depressant (CNS depressant) activities of drug 1
0.94
0.94
0.94
Table 2 Interaction types for novel drug pairs Drug 1
Drug 2
DDI type
Benazepril
Potassium acetate
1
Benazepril
Amiodarone
2
Digoxin
Verapamil
2
Norfloxacin
Theophylline
4
Cimetidine
Metformin
1
Bismuth
Furosemide
1
Hydrochlorothiazide
Diclofenac
2
Warfarin
Diclofenac
5
Warfarin
Ciprofloxacin
2
Warfarin
Sulfamethizole
0
Many commonly prescribed drugs such as Benazepril, that is used to treat high blood pressure, and Digoxin, that is used to treat heart conditions, were considered. Our model accurately predicted the interactions between these drugs. Many of the novel drugs considered for testing are prescribed in combinations for patients. Thus, it is of utmost importance to predict their DDIs in advance to prevent further complications. A false positive is classified as a type I error. A false negative is classified type II error Type 1 and Type 2 classifier errors can be inferred from Fig. 4.
Prediction of Drug-Drug Interactions Using Support Vector Machine
313
Fig. 4 Above figure is a confusion matrix showing the performance of the machine learning model
Figure 5 shows the ROC plot and the precision-recall curves. The ROC plot is used as a measure of performance of a classifier and the precision-recall curves help us understand how relevant the predicted results are.
Fig. 5 Six classes were chosen at random and the above curves were generated, a ROC curve with the area under the curve also being computed, b precision-recall curve
314
W. Mohammed Abdul Razak et al.
6 Conclusion The research work done shows that machine learning models other than neural networks can be used to accurately predict drug-drug interactions. Ample research has been done to accurately predict DDIs using neural networks, but none using support vector classifiers. The proposed model created a distinct feature vector in a novel manner by combining two unique drug fingerprints. To demonstrate the performance of the model, novel interactions were considered and the correct DDI types were predicted for which the model had no learning experience whatsoever. The machine learning model built can be used to classify more DDI types provided the drug bank dataset would be more balanced across all the classes.
References 1. Wang J, Duan G, Pan Y, Wu FX, Yan C (2019) DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. Bioinformatics 20(Suppl 15):538 2. Bennadi D (2013) Self-medication: a current challenge. J Basic Clin Pharmacol 5(1):19–23 3. Cascorbi I (2012) Drug interactions–principles, examples and clinical consequences. Deut Ärzte Int 109(33–34):546–556 4. Straubinger RM, Niu J, Mager DE (2019) Pharmacodynamic drug-drug interactions. Clin Pharmacol Ther 105(6):1395–1406 5. Karim N, Hoor T, Farooqui R, Muneer M (2018) Potential drug-drug interactions among patients prescriptions collected from medicine out-patient setting. Pak J Med Sci 34(1):144–148 6. Zhang Y et al. (2022) Enhancing drug-drug interaction prediction using deep attention neural networks. IEEE/ACM Trans Comput Biol Bioinf 7. Zhang K, Mei S (2021) A machine learning framework for predicting drug–drug interactions. Nat Sci Rep 11:17619 8. Wang M et al. (2021) Drug-drug interaction predictions via knowledge graph and text embedding: instrument validation study. JMIR Med Inf 9. Zhang SW, Feng YH, Shi JY (2020) DPDDI: a deep predictor for drug-drug interactions. Bioinformatics 21:419 10. Eslahchi C, Rohani N (2019) Drug-drug interaction predicting by neural network using integrated similarity. Sci Rep 9:13645 11. Ayvaz S, Dere S (2020) Prediction of drug-drug interactions by using profile fingerprint vectors and protein similarities. Healthc Inf Res 42–49 12. Zang T et al (2022) CNN-DDI: a learning-based method for predicting drug–drug interactions using convolution neural networks. Bioinformatics 23:88 13. Kim HU, Ryu JY, Lee SY (2018) Deep learning improves prediction of drug–drug and drug– food interactions. PNAS 14. Rawat P et al (2020) Efficient prediction of drug–drug interaction using deep learning models. IET Sys Bio 14(4):211–216 15. Yin P-W et al. (2021) Prediction of the drug–drug interaction types with the unified embedding features from drug similarity networks. Front Pharmacol 16. Almustafa Khaled M (2020) Prediction of heart disease and classifiers’ sensitivity analysis. Bioinformatics 21:278 17. Perez Gonzalez NA, Schwarz K, Allam A et al. (2021) AttentionDDI: Siamese attention-based deep learning method for drug–drug interaction predictions. BMC Bioinf 22:412 18. Sayeeda Z, Feunang YD, Assempour N et al. (2018) DrugBank 5.0: a major update to the drug bank database for 2018. Nucleic Acids Res 46(D1)
Dynamic Load Scheduling Using Clustering for Increasing Efficiency of Warehouse Order Fulfillment Done Through Pick and Place Bots Cysil Tom Baby
and Cyril Joe Baby
Abstract The domain of warehouse automation has been picking up due to the vast developments in e-commerce owing to growing demand and the need to improve customer satisfaction. The one crucial component that needs to be integrated into large warehouses is automated pick and place of orders from the storage facility using automated vehicles integrated with a forklift (Pick and Place bots). Even with automation being employed, there is a lot of room for improvement with the current technology being used as the loading of the bots is inefficient and not dynamic. This paper discusses a method to dynamically allocate load between the Pick and Place BOTs in a warehouse during order fulfillment. This dynamic allocation is done using clustering, an unsupervised Machine Learning algorithm. This paper discusses using fuzzy C-means clustering to improve the efficiency of warehouse automation. The discussed algorithm improves the efficiency of order fulfillment significantly and is demonstrated in this paper using multiple simulations to see around 35% reduction in order fulfillment time and around 55% increase in efficiency. Keywords Order picking · Dynamic order allocation · Machine Learning · Swarm robotics · Clustering · Fuzzy C-means clustering · Master Bot · Slave Bot
1 Introduction Warehouses play a very vital role in the twenty-first century. Today, most e-commerce companies want to ship orders as quickly as possible to increase customer satisfaction. Companies that have a guaranteed 1-day delivery rely on the fast retrieval of goods from the warehouses and fast sorting for dispatch. For large warehouses, hiring people to perform such activities is not a good decision, instead, companies are going for automation. There are various automation solutions for warehouses, C. T. Baby (B) CHRIST (Deemed to be University), Bangalore 560029, India e-mail: [email protected] C. J. Baby Fupro Innovation Private Limited, Mohali 160055, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_24
315
316
C. T. Baby and C. J. Baby
but there is no solution available that can individually pick and place orders from warehouse racks to receiving or dispatching counters. The order picking or order preparation operation is one of a logistic warehouse’s processes. It consists of taking and collecting articles in a specified quantity before shipment to satisfy customers’ orders. It is a basic warehousing process and has an important influence on the supply chain’s productivity. This makes order picking one of the most controlled logistic processes. It is one of the warehouse management system functionalities. To achieve lower warehousing costs and accurate and faster order fulfillment, higher inventory traceability and intelligent automation systems must be used [1, 2]. One such automated system has been utilized by gray orange (See Fig. 1, Left), a smart butler as shown in Fig. 1, which picks up entire stations with multiple storage units and can move at a relatively high speed. One major disadvantage of this smart butler is that it cannot carry individual units from one place to another without carrying the entire station. Another example of intelligent automated robots is the pick-up robot by Magazino (See Fig. 1, Right). It is a smart bot that picks up the inventory required and places it on the shelves built within it. When its shelves are full, it goes to the unloading station. One major drawback of this design is the limited number of shelves, which limit the efficiency of the bot. This drawback can be solved by using more bots. Another way to solve this issue is through a Master–Slave Bot mechanism which is discussed in this paper. Another drawback of this system is that it is insufficient in the case of non-equal workloads between the different Master Bots. Let us assume that there are 4 Master Bots working in a warehouse for order collection. In an ideal situation for a 100-item order, each Master Bot picks up 25 items to fulfil the order. In current warehousing solutions, the Master Bots are all assigned to respective regions of the warehouse and the Master Bots operate and collect order items in their respective regions. In practical situations, the order items are not equally divided between all the different regions, and hence one of the Master Bots will be loaded more and will become a bottleneck in the process. This makes this solution inefficient. This paper discusses a more efficient system to solve this drawback.
Fig. 1 Left: butler bots in gray orange and right: pick-up robot by Magazino
Dynamic Load Scheduling Using Clustering for Increasing Efficiency …
317
2 Proposed Methodology Using the concepts of Swarm Robotics and Machine Learning, we are dynamically distributing workload among Master bots. In a traditional warehouse with Swarm Robotics, the bots are assigned predefined regions in the warehouse. Once a set of orders arrives, the job is distributed among the Master bots based on the region from which it has to be collected [3, 4]. The disadvantage with this approach is that the bot with fewer items to pick will finish the task earlier and remain idle. This problem can be solved by dynamically distributing the task among the bots. The flow diagram of the proposed methodology is shown in Fig. 2. 1. Orders are obtained. 2. Once the buffer orders are reached, the set of orders are given to the Bots. 3. The job of collecting the orders is dynamically divided among the Master Bots using Machine Learning (clustering). 4. Each Master Bot gets its own set of order items to fetch. 5. A Slave Bot is allocated to each Master Bot, and the Slave Bot follows the respective Master Bot to store the collected product. 6. The Master Bot goes to the respective shelves and collects and places products in the Slave Bot. 7. Once the Slave Bot’s capacity is full, it goes to the unloading area, while another Slave Bot is deployed to follow its Master Bot. 8. Once the task is completed, the next buffer of orders is taken for execution. The task is considered completed when all the order items have reached the unloading area.
2.1 Master–Slave Bot Mechanism In a Master–Slave Bot mechanism, there are two types of Bots used in order collection in the warehouse—a Master Bot and a Slave Bot. The Master Bot, as the name suggests, would be the one collecting the inventory and placing it in the Slave Bot
Fig. 2 Flow diagram of the proposed methodology
318
C. T. Baby and C. J. Baby
Fig. 3 Master bot (left) and slave bot (right)
one by one. The Master Bot is equipped with a forklift mechanism to pick and place items in the order. The Slave Bot is equipped with shelves, where the Master Bot shelves the collected items of the order. This mechanism solves the problem of the Bots having to go to the order unloading area, whenever shelving capacity is reached [5]. The Master Bot keeps picking and placing the items of the order. Whenever the Slave Bot’s capacity is almost reached, the Slave Bot signals another Slave Bot to take its place. Once a new Slave Bot reaches the Master Bot, the old Slave Bot goes to the order unloading area and unloads the order. This relaying between Slave bots when the current Slave Bot is full keeps on happening till the entire order assigned to the particular Master Bot is fulfilled. The Master and Slave Bots communicate and coordinate with each other, and this is done through Swarm Robotics. The Master Bots control the functioning of the Slave Bots, and each Master Bots along with multiple Slave Bots works together to fulfil the quantum of orders assigned to that particular Master Bot [6]. The schematic diagrams of the Master and Slave bots are given below in Fig. 3.
2.2 Dynamic Order Allocation Using the concepts of Swarm Robotics and Machine Learning, we are dynamically distributing workload among the bots. In a traditional warehouse with Swarm Robotics, the bots are assigned predefined regions in the warehouse. Once a set of
Dynamic Load Scheduling Using Clustering for Increasing Efficiency …
319
orders arrives, the job is distributed among the bots based on the region from which it has to be collected. The disadvantage with this approach is that bots that receive fewer orders will finish the task earlier and will remain idle. This problem can be solved by dynamically distributing the task among the Master bots. This is done using clustering, an unsupervised Machine Learning algorithm. We have used fuzzy C-means clustering to dynamically allocate the order items among the Master Bots. This algorithm/clustering method divides the order items equally between the Master Bots and thereby increases the overall efficiency of order fulfillment. The order item locations are considered as an (x, y) point in a 2D plot. Clustering has very good results in classifying 2D data into a predefined number of clusters. Here, the number of clusters is predefined and is the same as the number of Master Bots. Since the order item locations are considered as points on a 2D plot, fuzzy C-means clustering can divide the order items equally between the Master Bots. Hierarchal methods of clustering are avoided as they may generate a few points that form a bridge between two clusters, which can cause the clustering to merge these two clusters and may also cause elongated clusters to split and for portions of adjacent elongated clusters to merge [7, 8]. Fuzzy C-means clustering algorithms are used to cluster multi-dimensional data, assigning each data point a membership value in each cluster center, with values ranging from 0 to 1. This method is comparatively powerful to traditional hardthreshold clustering (K-means), where each point is assigned a crisp, exact label (hard clustering). This algorithm works by assigning a membership value to each data point corresponding to each cluster center based on the distance between the cluster center and the data point. If the data density is higher toward the cluster center, more the membership value toward the particular cluster center. See Fig. 4 for an example of fuzzy C-means clustering.
Fig. 4 Example of fuzzy C-means clustering
320
C. T. Baby and C. J. Baby
In Fig. 4, we can view how data clustering occurs; separating the 3 clusters and membership values, each data point is assigned a membership value with respect to the centroid of the three clusters.
2.3 Fuzzy C-means Clustering Formula 2 1 −1 μi j = di j c m k=1
(1)
di j
n m i=1 (μi j ) x i , ∀ j = 1, 2, . . . , c v j = n m i=1 (μi j )
(2)
‘n’ is the number of data points. ‘vj ’ represents the jth cluster center. ‘m’ is the fuzziness index m e [1, ∞]. ‘c’ represents the number of the cluster center. ‘μij ’ represents the membership of the ith data to the jth cluster center. ‘d ij ’ represents the Euclidean distance between the ith data and the jth cluster center. J (u, v) =
c n
(u i j )m | xi − v j |2
(3)
i=1 j=1
‘||x i − vj ||’ is the Euclidean distance between the ith data and the jth cluster center. Algorithm. The main objective of this algorithm is to reduce data density in a single cluster by dividing the cluster elements into equal proportions across the total number of clusters being formed [9, 10]. • Consider X = {x 1 , x 2 ,…, x n } set of data points and V = {v1 , v2 ,…, vc } set of centers. • Randomly select the cluster centers ‘c’, as this algorithm is unsupervised. • Calculate the fuzzy membership ‘U ij ’ using the formula displayed above. • Compute the fuzzy centers ‘V j ’ with the formula displayed above. • Steps 4 and 5
are repeated until the minimum ‘J’ value is achieved or
U(k+1) − U(k)
< β where ‘K’ is the iteration step ‘β’ is the iteration criterion between [0, 1] ‘U = (μij )n*c is the fuzzy membership matrix
Dynamic Load Scheduling Using Clustering for Increasing Efficiency …
321
‘J’ is the objective function [11–13]. Advantages • Compared to the K-means algorithm for overlapped datasets, C-means work better [14, 15]. • Unlike K-means where the data point must be exclusive to one cluster center to create a crisp set, the data point, in this case, is assigned to membership where it may belong to a different cluster [16–18]. Disadvantages • Can achieve better results at the expense of more iterations and time lowering the value of β [19–21]. • The underlying factors could be unequally weighted due to Euclidean distance [22, 23]. • The number of clusters has to be predefined; automatic detection of number clusters is complex [24, 25]. In warehousing, clustering can be used in dynamic allocation of workload for order fulfillment. In this paper, we are discussing order fulfillment as a case study to show the results of using fuzzy C-means clustering algorithm to make the process optimized and more efficient.
3 Computational Experimental Results 3.1 Case 1. A Warehouse with 108 Products Shelved in 12 Two-Sided Shelves (See Fig. 5) Environment Details Number of Bots: 4 Simulation: An order of 1000 products. Traditional Warehousing. This Job Scheduling Scenario is simulated using Arena Simulation Software. The distances between each product are found using a shortest path algorithm and fed to Arena as an excel sheet. The simulation on the Arena software gives the total time taken by each BOT. The division of tasks to each bot is shown in Fig. 6 and Table 1 gives the time for delivery of order for each bot. Hence, as seen here, the Job Scheduling is not optimized as bot 2 takes 51 min, and all other bots are done collecting orders in 27.5 min and remain idles for the rest of the time. Optimized Warehousing. This Job Scheduling Scenario is simulated using Arena Simulation Software. The distances between each product are found using a shortest path algorithm and fed to Arena as an excel sheet. The simulation on the Arena
322
C. T. Baby and C. J. Baby
Fig. 5 Warehouse layout with different density distribution of products for Case 1
Fig. 6 Division of tasks between the BOTs (Case 1: Traditional Warehousing)
software gives the total time taken by each BOT. The division of tasks to each bot is shown in Fig. 7 and Table 2 gives the time for delivery of order for each bot.
Dynamic Load Scheduling Using Clustering for Increasing Efficiency … Table 1 Time for delivery of order for each BOT (Case 1: Traditional Warehousing)
323
Bot
Number of items picked
Delivery time (approx.) (minutes)
1
22
27.5
2
49
51
3
13
23.5
4
16
15
Fig. 7 Division of tasks between the BOTs (Case 1: Optimized Warehousing)
Table 2 Time for delivery of order for each BOT (Case 1: Optimized Warehousing)
Bot
Number of items picked
Delivery time (approx.) (minutes)
1
23
27
2
26
28
3
25
33
4
26
28
As seen in this case, the BOTs take similar time to finish their jobs and the idle time is close to zero. The algorithm optimizes the delivery of an order of 100 products by approx. 18 min.
324
C. T. Baby and C. J. Baby
Fig. 8 Warehouse layout with different density distribution of products for Case 2
3.2 Case 2. A Warehouse with 30 Products Shelved in 6 One-Sided Shelves (See Fig. 8) Environment Details Number of Bots: 3 Simulation: An order of 50 products. Traditional Warehousing. This Job Scheduling Scenario is simulated using Arena Simulation Software. The distances between each product are found using a shortest path algorithm and fed to Arena as an excel sheet. The simulation on the Arena software gives the total time taken by each BOT. The division of tasks to each bot is shown in Fig. 9 and Table 3 gives the time for delivery of order for each bot. Hence, as seen here, the Job Scheduling is not optimized as bot 1 takes 22 min and all other bots are done collecting orders in 12.5 min and remain thereafter. Optimized Warehousing. This Job Scheduling Scenario is simulated using Arena Simulation Software. The distances between each product are found using a shortest path algorithm and fed to Arena as an excel sheet. The simulation on the Arena software gives the total time taken by each bot. The division of tasks to each bot is shown in Fig. 10 and Table 4 gives the time for delivery of order for each bot. As seen in this case, the BOTs take similar time to finish their jobs and the idle time is reduced. The algorithm optimizes the delivery of an order of 50 products by approx. 8 min. The results of both the cases show that the optimized solution for warehousing saves time of delivery of the order and reduces the idle time of the bots.
Dynamic Load Scheduling Using Clustering for Increasing Efficiency …
325
Fig. 9 Division of tasks between the BOTs (Case 2: Traditional Warehousing) Table 3 Time for delivery of order for each BOT (Case 2: Traditional Warehousing)
Bot
Number of items picked
Delivery time (approx.) (minutes)
1
24
22
2
10
9
3
16
12.5
Fig. 10 Division of tasks between the BOTs (Case 2: Optimized Warehousing)
326
C. T. Baby and C. J. Baby
Table 4 Time for delivery of order for each BOT (Case 2: Optimized Warehousing) Bot
Number of items picked
Delivery time (approx.) (minutes)
1
17
11
2
17
14
3
16
12.5
Table 5 Time saved by optimization for each case Case
Number of products in order
Time saved by optimization (minutes)
Percentage reduction in order fulfillment time (%)
1
100
18
35.29
2
50
8
36.36
Table 6 Increase in efficiency for each case Case
No. of orders fulfilled in 1 h (traditional warehousing)
No. of orders fulfilled in 1 h (optimized warehousing)
Percentage increase in efficiency using proposed algorithm (%)
1
117
181
54.70
2
136
214
57.35
4 Conclusions The results of both the cases show that the optimized solution for warehousing saves time of delivery of the order and reduces the idle time of the bots. Hence, our proposed algorithm optimizes the current warehousing solution in a very effective manner. Using this dynamic workload allocation algorithm, we can assign an equal number of order items to each Master Bot. Hence, the entire order fulfillment process is optimized and efficient (Tables 5 and 6). Using the clustering algorithm used in the bots, these bots can be used in very large warehouses to reduce the total time taken to collect the orders. It can also be implemented in large libraries and supermarkets for quick response and better efficiency. Complete automation of warehouses using Swarm Robotics would result in time management of demands, as the bots are scalable, reliable and flexible.
5 Limitations and Future Works 1. One obvious drawback of the Master Bot–Slave Bot system is its limited shelving capacity even though it is better compared to existing models. We can modify the Master Bot to increase its capacity of carrying load and the limit of the Slave Bots can be regulated based on the industry it is being used.
Dynamic Load Scheduling Using Clustering for Increasing Efficiency …
327
2. The number of clusters cannot be optimized for computing with the means of fuzzy C-means clustering and can prove to be a waste of resources for orders with less number of items for pick up. This can be solved by considering a set of n − 1 scenarios by employing 2 to n clusters where n is the number of Master Bots assigned to the warehouse and running an algorithm to determine the best case prior to being passed on to the proposed algorithm.
References 1. Reaidy PJ, Gunasekaran A, Spalanzani A (2015) Bottom-up approach based on Internet of Things for order fulfillment in a collaborative warehousing environment. Int J Prod Econ 159:29–40 2. Poudel DB (2013) Coordinating hundreds of cooperative, autonomous robots in a warehouse. Jan 27(1–13):26 3. Yang P, Zhao Z, Shen ZJM (2021) A flow picking system for order fulfillment in e-commerce warehouses. IISE Trans 53(5):541–551 4. Liang C, Chee KJ, Zou Y, Zhu H, Causo A, Vidas S, Cheah CC et al. (2015) Automated robot picking system for e-commerce fulfillment warehouse application. In: The 14th IFToMM World Congress 5. Li G, Lin R, Li M, Sun R, Piao S (2019) A master-slave separate parallel intelligent mobile robot used for autonomous pallet transportation. Appl Sci 9:368. https://doi.org/10.3390/app 9030368 6. Anand A, Nithya M, Sudarshan TSB (2015) Coordination of mobile robots with master-slave architecture for a service application. In: Proceedings of 2014 international conference on contemporary computing and informatics, IC3I 2014, pp 539–543. https://doi.org/10.1109/ IC3I.2014.7019647 7. Rokach L, Maimon O (2005) Clustering methods. In: The data mining and knowledge discovery handbook, pp 321–352 8. Prakash J, Kumar BV (2021) An empirical analysis of hierarchical and partition-based clustering techniques in optic disc segmentation. In: Sharma H, Saraswat M, Kumar S, Bansal JC (eds) Intelligent learning for computer vision. CIS 2020. Lecture notes on data engineering and communications technologies, vol 61. Springer, Singapore. https://doi.org/10.1007/978-98133-4582-9_7 9. Ruspini EH (1969) A new approach to clustering. Inf Control 15(1):22–32 10. Madhulatha TS (2012) An overview on clustering methods. arXiv preprint arXiv:1205.1117 11. Rokach L, Maimon O (2015) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, Boston, pp 321–352 12. Omran MG, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605 13. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy C-means clustering algorithm. Comput Geosci 10(2–3):191–203 14. Ghosh S, Dubey SK (2021) Comparative analysis of K-means and fuzzy C-means algorithms. Int J Adv Comput Sci Appl 4(4) 15. Cannon RL, Dave JV, Bezdek JC (1986) Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 2:248–255 16. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Lin CT et al. (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681 17. Narayanan SJ, Soundrapandiyan R, Perumal B, Baby CJ (2019) Emphysema medical image classification using fuzzy decision tree with fuzzy particle swarm optimization clustering. In: Smart intelligent computing and applications. Springer, Singapore, pp 305–313
328
C. T. Baby and C. J. Baby
18. Narayanan SJ, Perumal B, Baby CJ, Bhatt RB (2019) Fuzzy decision tree with fuzzy particle swarm optimization clustering for locating users in an indoor environment using wireless signal strength. In: Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 217–225 19. Narayanan SJ, Baby CJ, Perumal B, Bhatt RB, Cheng X, Ghalib MR, Shankar A (2021) Fuzzy decision trees embedded with evolutionary fuzzy clustering for locating users using wireless signal strength in an indoor environment. Int J Intell Syst 36(8):4280–4297 20. Itagi A, Baby CJ, Rout S, Bharath KP, Karthik R, Rajesh Kumar M (2021) Lisp detection and correction based on feature extraction and random forest classifier. In: Microelectronics, electromagnetics and telecommunications. Springer, Singapore, pp 55-64 21. Sundar S, Baby CJ, Itagi A, Soni S (2020) Adaptive sensor ranking based on utility using logistic regression. In: Soft computing for problem solving. Springer, Singapore, pp 365–376 22. BitBio Homepage. https://2-bitbio.com/. Accessed on 11 Feb 2022 23. Baby CJ, Singh H, Srivastava A, Dhawan R, Mahalakshmi P (2017) Smart bin: an intelligent waste alert and prediction system using machine learning approach. In: International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 771–774 24. Baby CJ, Mazumdar A, Sood H, Gupta Y, Panda A, Poonkuzhali R (2018) Parkinson’s disease assist device using machine learning and Internet of Things. In: International conference on communication and signal processing (ICCSP). IEEE, pp 0922–0927 25. Baby CJ, Das KJ, Venugopal P (2020) Design of an above knee low-cost powered prosthetic leg using electromyography and machine learning. In: Soft computing for problem solving. Springer, Singapore, pp 339–348
Deploying Fact-Checking Tools to Alleviate Misinformation Promulgation in Twitter Using Machine Learning Techniques Monikka Reshmi Sethurajan and K. Natarajan
Abstract In the present era, the rising portion of our lives is spending interactions online with social media platforms. Thanks to the latest technology adoption as well as smartphones’ proliferation. Gaining news from the platforms of social media is quicker, easier as well as cheaper in comparison with other traditional media platforms such as T.V and newspapers. Hence, social media is being exploited in order to spread misinformation. The study tends to construct fake corpus that comprises tweets for a product advertisement. The FakeAds corpus objective is to explore the misinformation’ impact on the advertising and marketing materials for a particular product as well as what kinds of products are targeted mostly on Twitter to draw the consumers’ attention. Products include cosmetics, fashions, health, electronics, etc. The corpus is varied and novel to the topic (i.e., Twitter role in spreading misinformation in relation to production promotion and advertising) as well as in terms of fine-grained annotations. The guidelines of the annotations were framed through the guidance of domain experts as well as the annotation is done with two domain experts, which results in higher quality annotation, through the agreement rate F-scores as higher as 0.976 using text classification. Keywords Fake news · Fact-checking · Misinformation · Social media · Twitter
1 Introduction Social media is a very fast and easy to access channel that disseminates news and, every second of the day, huge numbers of people are accessing and interacting with online news [1]. Over the years, social media platforms, including Twitter, Facebook, YouTube, and Instagram, have been an integral segment of our day-to-day lives [2]. Numerous persons aim to find and gather news from social media rather than from M. R. Sethurajan (B) · K. Natarajan CHRIST (Deemed to Be University), Bangalore, India e-mail: [email protected] K. Natarajan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_25
329
330
M. R. Sethurajan and K. Natarajan
traditional news organizations as a surging number of our time is spent online with the social media channels. One of the familiar social media platforms is Twitter and its subscribers’ numbers have been upsurging rapidly since its establishment from 2006 [3]. In the recent years, Twitter has given a panel where individuals could interact with others and sustain social ties. Individuals used Twitter to spread their day-today activities, happenings, thoughts, and feelings with their contacts, which makes Twitter both a valuable source of data as well as greater target for vast research areas and practice. As per the report that is published in the year 2021 [4], Twitter has a subscriber of 340 million as well as spreads over 500 million tweets in a day and 200 billion tweets in a year [4, 5]. It is fast and easy to use social media to obtain the latest news or to see advertisements for different products [6]. Any news spreads much faster via social media, no matter where in the world an event takes place [1]. However, regardless with the benefits offered with the social media platforms, the news credibility and quality on social media platform is low in comparison with traditional news channels, including TV, newspapers, and other trusted news sources, due to the freedom afforded to social media channels in expressing (false) ideas and circulating (fake) news and (misleading) adverts [1]. Therefore, social media encourages vast and rapid dissemination of “fake news” that consists of misinformation intentionally. Though, the survey report [7] figured that around 60% of the subscribers expected news on the social media platform to be false. Yet, millions of people who share the misinformation in the name of retweets believe it to be real. Twitter is widely used to spread false information promoting products and brands. For example, in the US alone, 60% of adults who are depending on social media platforms for consumption of news also share misinformation [5, 8]. Individuals receive advertisements on social media based on their interests as well as consciousness regarding the facts and the content mentioned in the circulated advertisements. Around 54% of people around the globe have expressed their concerns about misinformation [1]. Additionally, the younger generation is more heavily influenced by online-based news than older generations. This additionally results in quicker spread of news to millions and billions of users [9]. Additionally, online advertisements for products tend to target the younger generations and try to promote products relevant to their lifestyles, such as skincare products and technological gadgets, in eye-catching ways, to reach as many people as possible around the globe [1]. The widespread misinformation has the potential to have a large negative impact on society and individuals [10]. For example, in 2008, a false report regarding the bankruptcy of United Airlines parent company caused the price of the stock to drop by a percentage of 76. Twitter [11, 12] has been widely used to spread fake and biased news during the last two US presidential election periods [13]. Following the last presidential election, it was presumed that about 1 million tweets were in relation to false news “Pizzagate”. Thus, in the year 2016, Macquarie dictionary named the word “Fake news” as word of the year [3]. Misinformation is created for a variety of reasons, but mainly for financial and political gain [3]. For example, as most of the misinformation is spread
Deploying Fact-Checking Tools to Alleviate Misinformation …
331
by propagandists, it usually conveys influential messages and persuades individuals, in different ways, to accept biased or false information [3, 14, 15]. From marketing perspectives, misinformation also presents false information to promote a specific idea or product. If spread with malicious intent, misinformation can be used by a competitor to damage the reputation of a specific brand or company. According to Twitter’s policy, a warning tag will be applied to any tweet containing doubtful/false leading information in association to the pandemic disease COVID-19 which goes against the guidance on COVID-19 from authoritative sources. However, Twitter is still working on the public conversation to make sure that credible and authentic information is available to the user [16]. Therefore, misinformation detection on social media generally, and Twitter, has been a developing research topic recently which attracts huge attention [3]. Although considerable effort has been made toward misinformation detection on websites and news articles, very little effort has been put in to explore Twitter and, to our knowledge best, no prior work has focused on the influence of misinformation on marketing and promoting products, solely focused on Twitter, to tackle the rise and outspread of false information as well as to enhance the automatic detection of misinformation on this specific social media platform. To enable research into fake news detection over Twitter about misleading advertisements for differing products that target the consumer and to support alleviate the negative impacts that are caused with misinformation—both to benefit consumers and news ecological system—it is crucial that we emerge tactics to mechanically sense false news over the platform of social media. Machine learning (ML)-based text mining (TM) tools have the potential to automatically detect misinformation related to product marketing. However, developing TM tools is dependent on textual corpora, in which relevant news is marked up by expert annotators. Those corpora which are annotated serve both as training datasets for ML-based named entity recognition methods and as gold standard for the systematic assessment of fresh methodologies. This research aims to explore how Twitter is used to disseminate false marketing information through deliberately misleading/fake adverts; the contribution of this research is threefold: • To utilize Twitter as a resource of social media to explore the usage of fake and false news to promote products. • To build annotated datasets for fake and real advertisements related to cosmetics, fashion, health, and technology products. • The corpus is available freely in order to encourage emergence of ML-based Text Mining (TM) systems for the automatic removal as well as classifications of details, in relation to misinformation intended to mislead the consumer by promoting false products. The developed TM systems can ultimately be supportive data resources for the research community in addition to the study of social media credibility, in promoting products and circulating fake advertisements.
332
M. R. Sethurajan and K. Natarajan
2 Literature Survey Even though numerous manual fact-checking websites emerged in order to investigate whether the news is real, it does not scale with the volume of quick dissemination of online information, particularly on the social media platform. Automated fact-checking apps have emerged in order to overcome this issue as well as to tackle the requirement for automation as well as scalability. Yet, the present apps approach lacks an inclusive dataset through a formulated multi-dimension information for sensing the false information characteristics to attain higher precision of the machine learning characteristics model performance. To resolve this restriction, we formed and changed the social media data of Twitter to figure furthermore important attributes which impacts the machine learning methods’ accuracy to signify if the news is fake/actual with the usage of data mining tactic. In this study, we represent the mechanisms of figuring out the important attributes of the tweets as well as application architecture toward systematically automating the online news’ characteristics [17]. With the usage of dataset that are gathered from signal media and sources’ list from OpenSource.co, the authors have implemented TF-IDF, i.e., term frequency-inverse document frequency of bi-grams as well as PCFG, i.e., probabilistic context free grammar detection toward corpus around an article of 11,000. Our dataset was tested over multiple classification algorithms—bounded decision trees, gradient boosting, support vector machines, random forests, and stochastic gradient descent. They have identified that TF-IDF of bi-grams fed to a stochastic gradient descent model figures non-credible sources with 77.2% of accuracy through PCFGs having some impacts over the recall [18]. Likewise, one of the authors has given an elaborated discussion of present tools as well as extensions that are accessible already for the detection of false information. They have provided numerous systems formulated by the research scholars in order to oppose false information. Numerous fact-checking websites are considered to support social media subscribers verifying the information that exists in the social media platform. This study would support the public to know the general tactics for false information identification. On the present Kaggle dataset, they ran the LSTM and BI-LSTM classifier and attained a level of 91.51% of accuracy with the usage of Bi-LSTM classifier [18]. The concept-based mining model using threshold (CMMT) and the fuzzy similarity-based concept mining model using feature clustering are two text categorization methods presented in this research since it is crucial for fact-checking (FSCMM-FC). These methods use feature extraction and reduction, train the classifier, and then classify the documents using support vector machines after preprocessing them at the sentence, document, and integrated corpus levels. While FSCMM-FC minimizes the features by locating the feature points using fuzzy C-means, CMMT eliminates the less frequent features by applying a threshold to the retrieved features. The experimental findings showed that CMMT and FSCMM-FC, respectively, had feature reductions of 95.8 and 94.695% and classification accuracy of 85.41 and 93.43% [19].
Deploying Fact-Checking Tools to Alleviate Misinformation …
333
Machine learning techniques have been proposed, particularly supervised learning for detection of false information. Accurately, dataset of real and fake news was employed in order to train a machine learning model with the usage of Scikit-learn library in Python. From the dataset with the usage of textual representation models such as TF-IDF, Bi-gram frequency, and Bag-of-Words, features have been extracted. On the content and title, two classification approaches have been tested in order to check the clickbait and non-clickbait, respectively, real or fake. The approaches were linear and probabilistic classification. The experiments’ outcome was that linear classification performs at top through the model of TF-IDF in the content classification process. Low accuracy was given by bi-gram frequency for the title classification compared to TF-IDF and Bag-of-Words [20]. The problem statement and progressions’ characteristics in false information detection are deliberated. Common taxonomy is as provided in [21]: • Propaganda: In general, these kinds of news stories have the motives of politics as well as they attempt to stimulate subscriber’ agenda. • Clickbait: These consist of news articles that are in general false as well as are established to upsurge the traffic of the Internet over these pages to add toward the revenue. • Opinion or Commentary: With these articles, the author generally attempts to draw the opinions of the readers for certain later events. The features of semantic comprise semantic scores and opinion words. For the semantic evaluation, text semantic mining performance is needed that is challenging. Statistical features consist of little information about statistics of the news article like word count, punctuation, emoticons employed, and hashtag topics [22]. Explicit and implicit profile features’ analysis exposed correlation among the actual or fake news as well as profiles of subscribers has been stressed [23]. Classification model employs features in order to figure false Twitter threads that non-expertise crowdsourced workers other than journalists become leveraged in the PHEME and CREDBANK [24]. Over the machine learning algorithms, dataset was tested and the results are compared. Stochastic gradient descent model is the better functioning model with the usage of TF-IDF feature set. Check-It imposes text features that are extracted from the article’ headline and body that have been used broadly in order to detect false information [25, 26]. In detection of fake news, Naïve Bayes classifier has been employed. About 74% accuracy has been attained as well as the ways to enhance the classifier have been debated in Facebook news posts [3, 27]. The results of several studies’ comparative investigations indicate that feature extraction and support vector machines have taken over as the systems of choice for existing ones. Between all other methods currently in use, their respective usage rates are found to be 43.3 and 26.08% [28]. The recent studies have also concentrated on semi-supervised INLDA, which is utilized to extract multi-word characteristics from text reviews and categories the text to identify bogus news [29].
334
M. R. Sethurajan and K. Natarajan
A remarkable understanding of how features are utilized in decisions that are carried through models have been provided in various studies. An unbiased exploration for the models of XGB has been performed. The detection of misinformation is the worldwide issue in various nations and is written in various languages. Numerous research scholars attempt to establish remedy for multiple language misinformation detection through the construction of training models/ fresh datasets over various language datasets.
3 Existing Methodology 3.1 Corpus Construction The FakeAds corpus consists of tweets that were collected from Twitter with the usage of Tweet Scraper tool for the period between January 1, 2017 and December 30, 2021. We targeted this five-year span as product marketing through social media was very common during this period. The keywords list as follows was employed in order to get relevant tweets: marketing, advertisement, digital marketing, social media marketing, and online promotion. We used the hashtagify tool to find highly ranked, trending, and popular hashtags, and to find hashtags highly related to marketing and advertising. We found that the used search keywords represent hashtags ranked by hashtagify to be highly related to marketing hashtags. The tweets were additionally shortlisted through annotators of this task who are English instructors, and only tweets that include information straightforwardly in relation with the task in question were retained, resulting in 5000 tweets. Manual inspection of the collected tweets revealed that the products that are discussed in the tweets generally belong to one of the following broad categories: cosmetics, health, fashion, and electronics. Thus, these categories were used as the classes for the products in the FakeAds corpus. The tweets were annotated at two levels: • At tweet level so that tweets were annotated as fake or real. • At word level so that for each tweet, the product was classified into one of the following classes: cosmetics, health, fashion, and electronics. In the tweet-level annotation task, the tweets were annotated as either fake or real. This annotation task is considered binary classification and we used the Amazon Mechanical Turk (AMT) tool to annotate the tweets. AMT is a crowdsourcing marketplace introduced by Amazon and which is becoming increasingly familiar as an annotation tool for NLP research including word similarity, word sense disambiguation, temporal ordering, and text entailment. To ensure the quality of annotations produced by AMT, we applied the country and high acceptance rate crowd filters so that only annotators with a 95% success rate on the previous AMT Human Intelligence Tasks (HITs) and restricted to those who were in the United States were accepted for the task. The reason to choose these two filters was because it lowered
Deploying Fact-Checking Tools to Alleviate Misinformation …
335
the pool of workers and it has been shown to be effective in reducing incidents of spamming and cheating found in previous studies. The same set of annotation guidelines was shared/used by the annotators to ensure high quality and reliable annotations. As per the guidelines, the annotators need to consider two factors before deciding if a tweet is fake or real: the account-related features (e.g., the profile information like followers’ numbers and following operators) as well as the tweet’s related features (e.g., lexical, and syntactical features of the tweet). Each of the 5000 tweets in our entity was annotated with three workers, resulting in 5000 × 3 = 15,000 annotations in total. For each tweet, the majority given class was chosen and hence the tweet was given that label: fake or real. In total, we collected 5000 tweets, out of which 2914 (0.5828) were labeled as real news, while 2086 (0.4172%) were labeled as misinformation. Figure 1 [14] shows the distribution of tweets that contained either fake or accurate content in the FakeAds corpus. It has been noted that while 41% of the tweets in FakeAds were annotated to be fake, distributing real information related to product promotion still represents a higher percentage of product advertisements. This was something we expected as the Twitter platform is used by many trustworthy organizations to disseminate real adverts and information. However, for the multi-class annotation task, the tweets were annotated at word level, which denotes mentions for products including the following classes: cosmetic, fashion, health, and electronics. The annotation was made with COGITO service where every tweet was annotated through two annotators for the mentions of the product type. Each tweet was annotated with two annotators for the corpus kinds in relation to the product kinds with the usage of the same set of annotation guidelines provided in supplementary materials file S1. The annotation consisted of marking up all corpus statements in entity in relation to four semantic kinds cited in Table 1 [14]. Fig. 1 Spread of fake and real tweets in the FakeAds corpus as observed
336
M. R. Sethurajan and K. Natarajan
Table 1 Annotated entity classes in the FakeAds corpus Entity type
Description
Cosmetic
Is product mention related to skincare, body care, or make-up, for example, lipsticks, creams, etc.
Electronic
Is products that require electric currents or electromagnetic fields to work. Examples are electronic devices, phones, cameras, computers, etc.
Health
Is product mention related to supplement(s) that promotes the wellbeing of individuals, e.g., vitamins, herbs, etc.
Fashion
Is a product related to accessories such as clothing, shoes, bags, jewelry, fragrances, etc.
Figure 2 [14] describes the most common product types in the corpus and their distribution in the FakeAds corpus. As shown in Fig. 2 [14], there was considerable emphasis on Twitter in promoting cosmetic products, e.g., skincare, make-up, etc. The cosmetic class represents 83% of the annotations in the FakeAds corpus, and health-related products come next after the cosmetic products with 10% of the total annotation in the FakeAds corpus. The less-dominant and lesser-targeted products in advertisements on Twitter and in the FakeAds corpus were electronics and fashion. It was also noted that people on Twitter tended to discuss fashion and electronics products less frequently in the context of advertising when compared with cosmetics. Table 2 [14] summarizes the statistics of the corpus, as well as it shows the total number of fake and real annotations and the distribution for the different products among fame and real tweets. Figure 1 [14] visualizes the distribution of products that are targeted most by misinformation, which are cosmetic and health products compared to real information for these two types of products. On the other hand, it is worth mentioning that the amount of real news information related to electronic and fashion products is significantly higher than the amount of misinformation that targets these two types of products. This is because online advertisements for products in social media platforms tend to target the younger generations and try to promote products relevant to their lifestyles, such as skincare products and different supplements that match their lifestyles.
4 Proposed Methodology We provide bogus news detecting methods in this part. They utilize machine learning and parsing methods. The primary algorithm displays information about the content, title, date, and author for each input link to an article by contacting various subalgorithms. These sub-algorithms all make sense on their own. We will go through the essential concepts in a nutshell. To validate the title and substance of the article, we employed machine learning models. We were able to use machine learning to: (1) determine if the article title is clickbait or not; and (2) determine whether the article content is phony or real.
Deploying Fact-Checking Tools to Alleviate Misinformation …
337
Fig. 2 Distribution of product types in FakeAds corpus
Table 2 Statistics of the FakeAds corpus annotations Total number of annotations
Health
Cosmetics
Fashion
Electronics
Real
6159
300
Fake
4807
200
5691
135
33
4527
66
14
Machine learning was employed to address categorization issues as opposed to parsing. On the other hand, data extraction employed parsing. There are several kinds of machine learning algorithms, as indicated in the literature survey. Our method requires a dataset as input, and the dataset will affect the algorithm’s output. The output indicates whether an article is phony or real if the input is a fake/real news dataset. On the other hand, if the input contains a dataset of clickbait and non-clickbait titles, the output indicates whether the title is clickbait or not.
4.1 Experimental Setup The following steps make up our application’s machine learning process: 1. We utilized the fake and genuine news dataset from Kaggle for fake real news dataset for content recognition for FakeAds corpus. The link to the dataset is as such, https://www.kaggle.com/datasets/nohaalnazzawi/the-fakeads-corpus. 2. We utilized the dataset described in the study [7] for title detection for falsely promoted products. These datasets are well maintained, well labeled, and prepared for use in feature extraction.
338
M. R. Sethurajan and K. Natarajan
3. To determine whether clickbait headlines and words as tokens in the articles had a substantial influence on whether the content was phony or authentic, we looked at both. 4. We selected the following models for text representation: term frequency bi-gram model, term frequency-inverse document frequency model, and Bag-of-Words model are three examples. 5. We used the linear and probabilistic major classification methodologies to create these three text representation models. 6. Import the dataset and implement it using the classifiers. The training of the classifiers did not involve using the test data. Dataset: Since Python (https://www.python.org) includes numerous effective packages that enable it to handle with any form of data (pictures, text, audio, etc.), we used Python to create our application (machine learning, deep learning, web development, etc.). Scikit-learn, Pandas, and two external APIs were utilized to develop our application. Scikit-learn is available at www.scikit-learn.org. Pandas is available at www.pandas.pydata.org. (1) News API (https://newsapi.org); (2) Google NLP API (www.cloud.google.com/natural-language/). Metrics: In this part, we go through the outcomes that were produced when each classifier was combined with each text representation model. We utilized the metrics class from the Scikit-learn package to calculate a classifier’s prediction accuracy. Then, we displayed these findings as a ROC Curve. When we segregate the dataset into train and test data, the accuracy was obtained using the test data. The item labels from the test data were removed and added to a different independent variable. To check what predictions the classifier makes, we introduce it with test data. Then, we compare the classifier’s predictions with the labels we removed from the test data. The accuracy score is calculated as a proportion of correct predictions. Baselines: We compare the proposed strategy to the seven baseline approaches currently in use and make an effort to illustrate the results of the classifiers using accuracy tables and ROC curves: Words in a Bag (BoW) 2) Bigram vs bi-grams of Naive Bayes, Linear Support Vector Classifier, and Term Frequency-Inverse Document Frequency (TF-IDF).
4.2 Review Article Title and Content Input The headline and article content are either false or true, or clickbait or not. Using these steps, a proposed tool can be created to fact check information by text classification. Step 1: After reading the dataset with fake and real news, divide it into train and test sets to comprehend the desired information. Step 2: Create the text model representation using the train and test data (i.e., BoW, TF-IDF).
Deploying Fact-Checking Tools to Alleviate Misinformation …
339
Step 3: The train data should be fitted to the following machine learning classifiers: (1) Naive Bayes, which represents a probabilistic approach to classification, and (2) linear support vector machine, which supports liner technique. Step 4: Determine if the article’s headline is clickbait or not, whether the content is bogus or real, and more using machine learning classifiers. Step 5: Using machine learning classifiers, determine if the article’s headline is clickbait or not, whether the content is fake or authentic, and more.
4.3 Regarding Applying the Algorithm 1. Evaluating Content and Title: Verifying content and titles is comparable, with the dataset utilized being the sole variation. The dataset was initially read using the Pandas package. We maintain our datasets (for the title and content) in two distinct CSV files. We present two bits of information: • News—real or false. csv contains 12,789 news stories. Each news item in the collection is classified as either true or fraudulent. • Each title in the dataset has a label indicating whether it is clickbait or not. Each title in the dataset has a label indicating whether it is clickbait or not. • After reading the datasets, we carried out each stage of the supervised learning method for detecting false news using the Scikit-learn module. 2. Examining the publication date is: Scikit-cosine learn’s similarity algorithm and the https://newsapi.org API were both used for this investigation.
5 Results Analysis We utilized a dataset that included comparisons of several studies of news stories and randomly split it into training and testing data for research on title filtering. About 60% of the data are made up of training data, while 40% are made up of testing data following a methodological approach as shown in Fig. 4. This specific understanding is gathered from how the distribution of fake and real products is classified as in Fig. 3 using the corpus provided through product classification using fine-grained annotation as listed in Table 2. The classifiers employed the TF-IDF model and had an F1 score of 0.9768, as shown in Fig. 5. The accuracy results demonstrate that both classifiers used the BoW model to get the same accuracy score. The MN classifier, on the other hand, provided a greater score accuracy for the bi-gram model. This is determined by the bi-gram term-document medium, which has components like token pair frequencies. Since a pair of tokens with an excessive frequency may suggest a higher possibility for one of the labels than the other, utilizing the frequencies of pairs to compute probabilities yields better results than converting these frequencies into a linear space.
340
M. R. Sethurajan and K. Natarajan
Fig. 3 Distribution of fake and real products in FakeAds corpus
Fig. 4 Visualizing the algorithm applied on training and test data
Fig. 5 Results of our model’s evaluation using several measures. In terms of ROC score and F1 score, the performance involves deploying machine learning techniques
Deploying Fact-Checking Tools to Alleviate Misinformation …
341
To certify that the established entity was of higher quality, the annotations that are established in the entity followed the set of guidelines which are made by the annotators, who were English native speakers and experts in the field of annotation. We calculated the Inter Annotator Agreement (IAA) between the two annotators and a higher IAA score provided reassurance that the entity annotations were trustworthy as well as of higher quality as shown in Fig. 4. Each classification of text labels quantifies the attributes. With the calculation of IAA in F-score’ terms, we followed numerous other associated studies. The F-score is similar to whichever annotations’ set is employed as the gold standard. To deliver that evaluations, the annotations’ set established by one was to be recognized as gold standard, i.e., the apt annotations’ set and total amount of entities were the total amount of corpus annotated through this annotator. In this study, the annotations established by the initial annotator were recognized as the ‘gold standard’, that is, the correct annotations’ set as well as the total amount of apt corpus were the total amount of corpus interpreted with this interpreter. Based on the gold standard, the Inter Annotator Agreements (IAA), by means of F-score, recall and precision, were calculated. Precision (P) refers to apt positive annotated corpus’ percent annotated with the second annotator compared to the explanation established by the initial annotator, that was presumed to be gold standard. The exactness was assessed as the ratio among TP entities, i.e., true positives as well as total amount of entities marked up by the second annotator, according to the following formula: P = T P/T P + F P
(1)
where FP is false positives. Recall (R) is positive annotated entities percentage, considered by the second annotator. It is assessed as the ratio among TP and annotations’ total number in the gold standard, in accordance with the formula as follows: R = T P/T P + F N .
(2)
The F-score is the precision and recall’ harmonic mean and is calculated in accordance with formula as follows: F - score = 2 ∗ (Precision × R)/Precision + R.
(3)
Table 2 [14] exposes the statistics and IAA for the annotation of product types in the FakeAds corpus. Overall, the annotators agreed most of the time on annotating cosmetics and health products, and the F-scores for these two classes were the highest, at 0.94 and 0.86, respectively. The reason for this high score is because the mentions and examples of cosmetics and health were very straightforward, and the annotators could easily recognize and classify the mentions. On the other hand, the F-scores for fashion and electronic products were generally lower than those for cosmetics and health because the number of tweets for electronics and fashion products was the fewest in the corpus compared with the number of examples of cosmetics and
342
M. R. Sethurajan and K. Natarajan
health products. In addition, there were a greater number of conflicts among the annotators, regarding which type of products belonged to these two classes. For example, the second annotator annotated general words, e.g., clothes, bags, jumpers, etc., and this contributed to the low precision, especially for the fashion class, where the second annotator annotated irrelevant products as fashion (i.e., annotating very general descriptions of a fragrance instead of mentioning specific products, e.g., luxurious scents). It was noticed that the low recall for electronics products was because the second annotator did not annotate every mention of electronic products and did not annotate broad coverage of electronics devices. For example, he did not correctly annotate electronic devices related to skincare and health care, such as skin care devices and airbrushes. To expose the significance of our established dataset, we evaluated the FakeAds entity with other generally accessible datasets in the misinformation detection domain, that are reported in Table 3 [14]. We compared our dataset with CREDBANK [24] and CheckThat sub-task 1: check worthiness [26] on Twitter datasets, because they utilized Twitter as a textual source as well as they spread few characteristics with the FakeAds entity. Apart from the datasets as listed in Table 3 [14], the corpus can be implemented using the algorithm for FacebookHoax, BuzzFeed, and LIAR datasets as well. Since the annotators are mentioned, it can be easily implemented with the text mining approach. As demonstrated in Table 2 [14], the FakeAds entity differs from the present datasets in particular domain set, which is false advertisement to promote products, and the rich annotations at two levels of annotation at tweet level, where the tweet is categorized as fake/ real, and at mention level, where the product mentioned is given one of the following classes: health, cosmetics, fashion, or electronics which is illustrated in Table 1 using entity class classification which clearly states its product class. This makes it a valuable resource for training and evaluating ML-based techniques. The annotation’ results are satisfactory and are evaluated in terms of F-score at 0.976 suggesting quantitative analysis. Table 3, thus, gives a brief understanding of the scope of the various social media platforms wherein the algorithm can be successfully applied for title-based classification of fake news and the possible accuracy achieved.
6 Conclusion Our central goal in this paper was to provide the research community with a dataset that could serve the study of misinformation detection on Twitter that targets information that misleads the consumer by falsely promoting products. The corpus consists of 5000 tweets, annotated at two levels: (1) each tweet is annotated as fake or real, (2) each tweet is annotated at word level. This is to classify the product into one of the following classes: cosmetics, health, fashion, or electronics. We envision that this will be a useful data resource for the community to further the study of social media credibility in promoting products and circulating fake advertisements. The proposed research could also provide a broader view about misinformation related to marketing
Tweets
Tweets
2283
12.8 K
5000
60 M
BuzzFeed
LIAR
FakeAds
CREDBANK
CheckThat sub-task 1312 1: checkworthiness estimation on Twitter
Tweets
Sentences from PolitiFact
News articles
Facebook posts
15,500
FaceBookHoax
Text genre
Size
Dataset
COVID-19 and politics
Sentence level
Binary classes: fake real multi-classes: health cosmetics fashion electronic
False, hardly true, half true, mostly true, and true
Not worth fact-checking—worth fact-checking
Sentence level
Sentence level
Tweet level—Mention level
Sentence level
Approximately 0.02% points
0.44 and 0.48 for recall and F1-score
Agreement measurement
3 Annotators
1736 Unique annotators from AMT
3 Annotators for the binary annotation 2 annotators for the multi-class annotation
Majority voting averaged (0.597)
Intraclass correlations (ICC) (0.77)
F-score (0.976)
Machine rather F-score {0.270) than human annotators
3 Annotators
2 Annotators
Annotation level # Of Annotators
A blend of truth and false, largely Sentence level true, primarily false, and no factual material
Hoax, no-hoax
Categories/labels
Real world vents Assuredly incorrect—probably incorrect—uncertain—probably correct—certainly correct
Marketing and fake news
Politics
Presidential election, political biases: mainstream, left-leaning, and right-leaning
Scientific news sources versus conspiracy news sources
Topic
Table 3 Comparison of FakeAds corpus with other corpora that are comparable as specified in the baselines
Deploying Fact-Checking Tools to Alleviate Misinformation … 343
344
M. R. Sethurajan and K. Natarajan
and help to enhance Twitter’s policy to provide more credible and authentic information related to promoting products. It will also help to give an idea about which types of products are targeted more by propagandists to distribute misinformation to attract more consumers. The generated corpus can serve as a gold standard for the development and evaluation of TM tools that can classify each tweet as real or fake and extract product mentions, related to cosmetics, health, fashion, and electronics. For example, in the future, we are planning to use classical ML-based NER and compare them with state-of-the-art contextual word embedding (e.g., BERT) on the FakeAds corpus to automatically classify tweets as fake or real, and extract the product type discussed in the tweets.
7 Future Work Despite the contributions presented earlier, we acknowledge certain limitations. This work’s main limitation lies in the product classes. While we tried to make sure that the product categories (i.e., classes) were broad enough to include all the products mentioned in the FakeAds corpus, the categories used may be not broad enough to include products that were not mentioned in the FakeAds corpus; for example, products related to sports equipment, furniture, cars, etc. Another potential limitation is the size of the corpus due to the cost of the manual annotation, in terms of time and money, so we were only able to annotate 5000 tweets. However, the size of the corpus is comparable to popular misinformation datasets mentioned in Table 1 and exceeds the size of some datasets, e.g., the BuzzFeed (28) and CheckThat datasets. ML-based text mining methods require a large dataset for accurate models to be trained and tested and, hence, increasing the size of the corpus would give more accurate results for training and evaluating ML models. The title of article classification helps in segregating the bot news that is generating fake news on Twitter. In the future, we are planning to use the generated corpus as a gold standard for the expansion and evaluation of TM tools and we are also planning to broaden and increase the range of the corpus by including more product classes, and by including text from other social media platforms by creating a tool using the text classification.
References 1. Meel P, Vishwakarma DK (2020) Fake news, rumor, information pollution in social media and web: a contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst Appl 153:112986 2. Wang W, Chen L, Thirunarayan K, Sheth AP (2014) Cursing in english on twitter. In: Proceedings of the 17th ACM conference on computer supported cooperative work and social computing, Baltimore. Association for Computing Machinery, New York, pp 415–425, 15–19 Feb 2014
Deploying Fact-Checking Tools to Alleviate Misinformation …
345
3. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 19:22–36 4. Aslam S (2018) Twitter by the numbers: stats, demographics and fun facts. Omnicore, San Francisco 5. Khan T, Michalas A, Akhunzada A (2021) Fake news outbreak 2021: can we stop the viral spread? J Netw Comput Appl 190:103112 6. Aldwairi M, Alwahedi A (2018) Detecting fake news in social media networks. Procedia Comput Sci 141:215–222 7. Martin N (2022) How social media has changed how we consume news. Forbes. Retrieved from https://www.forbes.com/sites/nicolemartin1/2018/11/30/how-social-mediahas-changed-how-we-consume-news/?sh=40c30d723c3c. Accessed on 22 June 2022 8. Wong Q (2022) Fake news is thriving thanks to social media users, study finds. CNET. Retrieved from https://www.cnet.com/tech/social-media/fake-news-more-likely-to-spread-onsocial-media-study-finds/. Accessed on 22 June 2022 9. Nasir JA, Khan OS, Varlamis I (2021) Fake news detection: a hybrid CNN-RNN based deep learning approach. Int J Inf Manag Data Insights 1:100007 10. Aslam N, Ullah Khan I, Alotaibi FS, Aldaej LA, Aldubaikil AK (2021) Fake detect: a deep learning ensemble model for fake news detection. Complexity 2021:5557784 11. Murayama T, Wakamiya S, Aramaki E, Kobayashi R (2021) Modeling the spread of fake news on Twitter. PLoS ONE 16:e0250419 12. Carvalho C, Klagge N, Moench E (2011) The persistent effects of a false news shock. J Empir Financ 18:597–615 13. Bovet A, Makse HA (2019) Influence of fake news in Twitter during the 2016 US presidential election. Nat Commun 10:7 14. Alnazzawi N, Alsaedi N, Alharbi F, Alaswad N (2022) Using social media to detect fake news information related to product marketing: the FakeAds corpus. Data 7(4):44. https://doi.org/ 10.3390/data704004 15. Klein DO, Wueller JR (2018) Fake news: a legal perspective. Australas Policy 10:11 16. Roth Y, Pickles N (2022) Updating our approach to misleading information. Twitter Blog. Retrieved from https://blog.twitter.com/en_us/topics/product/2020/updating-our-approach-tomisleading-information. Accessed on 22 June 2022 17. Nyow NX, Chua HN (2019) Detecting fake news with tweets’ properties. In: 2019 IEEE conference on application, information and network security (AINS), IEEE 18. Mugdha SBS, Ferdous SM, Fahmin A (2020) Evaluating machine learning algorithms for bengali fake news detection. In: 2020 23rd International conference on computer and information technology (ICCIT), IEEE 19. Puri S (2021) Efficient fuzzy similarity-based text classification with SVM and feature reduction. In: Advances in intelligent systems and computing. Springer Singapore, Singapore, pp 341–356 20. Al Asaad B, Erascu M (2018) A tool for fake news detection. In: 2018 20th International symposium on symbolic and numeric algorithms for scientific computing (SYNASC), IEEE 21. Campan A, Cuzzocrea A, Truta TM (2017) Fighting fake news spread in online social networks: actual trends and future research directions. In: IEEE international conference on big data (Big Data), Boston 22. Egele M, Stringhini G, Kruegel C, Vigna G (2017) Towards detecting compromised accounts on social networks. IEEE Trans Dependable Secure Comput 14(4):447–460 23. Kai S, Wang S, Liu H (2018) Understanding user profiles on social media for fake news detection. In: IEEE conference on multimedia information processing and retrieval (MIPR), Miami 24. Buntain C, Golbeck J (2017) Automatically identifying fake news in popular twitter threads. In: IEEE international conference on smart cloud (SmartCloud), New York 25. Gilda S (2017) Evaluating machine learning algorithms for fake news detection. In: IEEE 15th student conference on research and development (SCOReD), Putrajaya
346
M. R. Sethurajan and K. Natarajan
26. Granik M, Mesyura V (2017) Fake news detection using naive Bayes classifier. In: IEEE first Ukraine conference on electrical and computer engineering (UKRCON), Kiev 27. Sharma H, Saraswat M, Yadav A, Kim JH, Bansal JC (eds) (2021) Congress on intelligent systems. Advances in intelligent systems and computing. https://doi.org/10.1007/978-981-336981-8 28. Puri S (2021) A review on dimensionality reduction in fuzzy- and SVM-based text classification strategies. In: Advances in intelligent systems and computing. Springer Singapore, Singapore, pp 613–631 29. Pathik N, Shukla P (2021) IN-LDA: an extended topic model for efficient aspect mining. In: Advances in intelligent systems and computing. Springer Singapore, Singapore, pp 359–370
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance Systems with Object Detection and Traffic Sign Recognition P. C. Gagan Machaiah and G. Pavithra
Abstract Advanced driver assistance systems (ADASs) and autonomous vehicles are expected to increase safety, lower energy and fuel consumption, and lower pollutants from road traffic. The advanced driver assistance system’s major features include lane detection and tracking. Finding color line marks on the road is the technique of lane detection. The process of lane tracking aims to help the vehicle continue traveling along a predetermined course. Hence, automatic detection of lanes using convolutional neural networks (CNNs) models has gained popularity in the current economic development. This paper also aims at detecting various objects using convolutional neural networks. In this work, we offer object classification and detection, a demanding topic in computer vision and image processing. As a result, we deployed convolutional neural networks on the Keras platform with TensorFlow support. The experimental results illustrate the amount of time needed to train, test, and generate the model using a constrained computing environment. Here, we trained the system for about 80 images which have taken a couple of seconds to detect with better accuracy. Traffic sign recognition is carried out, which is a significant area of research in ADAS. It is crucial for driverless vehicles and is frequently used to read stationary or moving road signs along the side of the road. A comprehensive recognition system is made up of traffic sign detection (TSD) and categorization (TSC). The paper also aimed at traffic sign recognition, which is crucial to consider, because traffic sign recognition is typically applied to portable devices. The model’s detection accuracy is ensured as long as the speed is maintained. The model developed in this study is 99.89% accurate. Keywords Lane sensing · Lane monitoring system · Advanced driver assistance systems (ADASs) · Lane departure warning system · Lane sensing system · Sensors · Convolutional neural networks · Deep learning (DL) · Traffic sign · Keras · TensorFlow
P. C. Gagan Machaiah · G. Pavithra (B) ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India P. C. Gagan Machaiah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_26
347
348
P. C. Gagan Machaiah and G. Pavithra
1 Introduction Around 73% of accidents are caused by driver error, such as failing to see signs or ignoring a blind area, according to statistics [1, 2], which motivates manufacturers to develop safer vehicles, with regard to the creation of ADAS. Implementation of ADAS has brought to avoid and to reduce the damage that accidents inflict by early and intimately identifying lane departure and collision. For lane detection and tracking in various weather circumstances, such as sunny and cloudy, an algorithm will be created. The need for autonomous vehicles has grown significantly in recent years as a result of rising traffic volumes and increasingly congested roads around the world. As a result, it is important to create an intelligent driver aid system that can either notify the driver of harmful situations or intervene, while the vehicle is on the road. Such technologies will become more complicated in the ensuing decades, enabling complete vehicle autonomy. In particular, lane, object, and traffic detection are three key components in the development of such autonomous systems. The most recent development in deep learning, the deep neural network (DNN), made easier object recognition by engaging in as much learning as possible. Machine learning algorithms, which are great at finding patterns but typically need more data, are divided into deep learning algorithms. Convolutional neural networks (CNNs) are the most widely used method for increasing the accuracy of picture classification. CNN is a unique kind of neural network that functions similarly to a regular neural network, which begins with a convolution layer. Berkaya et al. [3] presented recognition of traffic signs as crucial in intelligent driving systems like autonomous and assisted driving. The two categories of road sign identification techniques are manual feature methods and deep learning techniques. Yang et al. [4] Traditional recognition techniques, such as particular color recognition and other feature recognition methods, needed manual labeling and feature extraction, which significantly slowed down system operation. In addition to adding to the workload, manual labeling made it challenging to ensure correctness. Chaiyakhan et al. [5] SVM and random forests are typically used in artificial feature learning techniques, although this approach might be challenging because of hazy feature boundaries in the spot recognition of images. The development of automatic TSR systems aids the driver in a variety of ways to ensure his or her safety, which also protects the safety of pedestrians. The primary objective of these systems is to identify and recognize traffic signs, while a driver is on the road. The system can direct and warn drivers to avoid danger with the use of these features: lighting, weather changes, and signs that are damaged. Aghdam et al. [6] proposed rapid advancement of deep learning in recent years which has altered the detection process. Over time, the study of neural networks has gained popularity among academics. With the invention and quick adoption of neural networks, tedious human annotation is no longer necessary because these networks can easily extract the features from an image. The network may obtain many features, particularly for complex images, and these features are ultimately employed for target
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
349
classification. Bouti et al. [7] developed deep learning’s advancement which has accelerated the development of traffic sign recognition. To get a good classification impact on the GSTRB dataset, LeNet model has been developed. The remainder of the essay is structured as follows. The previous works are briefly summarized in Sect. 2. The methodology is given in Sect. 3, whereas evaluation results are presented in Sect. 4. In Sect. 5, conclusions and recommendations for future work are made.
2 Previous Works The model-based technique for lane detection uses geometric characteristics [8– 10]. Training and categorization are the two phases of the learning-based strategy. The building of a model, such as program variables, is done during training using previously recognized errors and system characteristics. Kang et al. [11] developed a kinematic-based fault-tolerant system that is proposed to recognize the lane even if the environment makes it impossible for it to capture a road image. The kinematic model projects the lane by considering variables such the vehicle’s length and speed. Using a clothoid cubic polynomial curve road model, the camera input is provided. The lane coefficients of the clothoid model will be available in the absence of camera input. The lane restoration strategy is employed to overcome this loss. The anticipated lane is determined by the road’s curvature and past curvature rate. A lane detecting system was proposed by Priyadarshini et al. [12]. A grayscale image is created from the recorded video. The noise is eliminated by applying a Gaussian filter. The edges are found using the Canny edge detection technique. Using a Hough transform, the length of the lane is determined. A Raspberry Pi-based robot equipped with ultrasonic sensors is used to replicate the proposed method in order to calculate the separation between nearby cars. Video processing methods are covered for identifying how the illumination of the lanes changes in an area of straight-line roadways. The study emphasizes the approaches used, including selecting the right color space and identifying the ROI. Following the capture of the desired image, a color segmentation procedure utilizing region splitting and clustering techniques is carried out. The merging process is then used to avoid noise. Hong et al. [13] covered video processing methods for identifying how the illumination of the lanes changes in an area of interest for straightline roadways. The study emphasizes the approaches used, including selecting the right color space and identifying the region of interest. Following the capture of the desired image, a color segmentation procedure utilizing region splitting and clustering techniques is carried out. The merging process is then used to suppress the noise in the image.
350
P. C. Gagan Machaiah and G. Pavithra
As it can be difficult to detect the lane and maintain the lane on course under various circumstances, Son et al. [14] presented a method that exploits the lighting feature of lanes under diverse conditions. The process entails figuring out the deleting point, and clever edge detector which is used to assess the image’s bottom half, and in the second stage Hough transformation is used to choose the yellow or white lanes. Based on the illumination characteristic, the usage of the white and yellow lanes are created to obtain the lane’s binary representation. The lanes are traced, and the intercepting angles are created on the y-axis, they are grouped to create long lanes if there is a match. An autonomous lane-changing system with three modules—perception, motion planning, and control—was suggested by Chae et al. [15], and they proposed LIDAR sensor input which is used to identify nearby cars. During motion planning, the vehicle selects a mode, lane-changing, and then plans the necessary motion while taking into account the safety of adjacent vehicles. For longitudinal acceleration and choosing the steering angle, a model predictive control (MPC) based on a linear quadratic regulator (LQR) is utilized. For lateral acceleration, stochastic model predictive control is employed. A reinforcement learning was proposed by Wang et al. [16]. There are two different kinds of lane change controllers used: longitudinal control and lateral control. The intelligent driver model, a car-following model, is selected as the longitudinal controller. Reward learning is used to implement the lateral controller. Suh et al. [17] considered the yaw rate, acceleration, and lane change time which are the basis for the reward function. A Q-function approximator is suggested to achieve continuous action space in order to get around the static rules. A specially created simulation environment is used to test the proposed technology. It is anticipated that extensive simulation will be used to evaluate the approximator function’s effectiveness in various real-time settings. Szegedy et al. [18] object detection with deep neural networks: Recently, deep neural networks (DNNs) have demonstrated excellent performance in image characterization tasks. Take this paper’s author one step further and use DNNs to address the location of object recognition, focusing less on the grouping and more on the accuracy of different classes. They present the selected site as a problem that resists covers as a situation that restrains the questionable but practical strategy. Showcase a multi-scale basic leadership approach that generates low-effort, high-decile protest finders from some-arrange applications. Pascal VOC has the strategy’s most advanced implementation. Benjamin and Goyal [19] Using deep neural networks for item identification are a vibrant area of exploration that has made significant strides in recent years, according to a survey. The paper illuminates the most current developments in this area while condensing the historical background of neural network research recorded using benchmark datasets for the recently developed sensory system computation. Finally, a few examples of applications in this area are provided. In 2015, Hijazi and colleagues [20] used convolution neural networks (CNNs) to solve pattern and image recognition problems because they have several advantages over competing technologies. Samer Hijazi describes the difficulties in utilizing
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
351
CNNs within installed frameworks and should illustrate the Cadence® Tensilica® Vision P5 Digital Signal Processor (DSP) Key Features in Imaging and Computer vision as well as the software that make it appropriate for CNN applications in many image-editing applications and in relation to identification tasks. Srinonchat and Pohtongkam [21] by recording the texture of items from daily life that were separated into sections based on the various portions of the human palm, this paper shows how to recognize photographs of out-of-touch objects. Data stream using a segmentation of 15, 20, and 26 regions produced 15, 20, and 26 vectors, respectively. In each sequence of tests, the total of each segment is calculated and converted to a binary picture. The vector data is then sorted to carry out the 300-series train process and a second 300-series testing procedure. Malykhina and Militsyn [22] present a hypothetical scenario for processing aerial photos, which includes neural network-based image categorization, binarization, filtering, looking for specific items, identifying highways, and tying target to the terrain using cross-correlation function. Both object classification and picture classification have used neural networks. Due to the variety of objects’ shapes, sizes, and rotations, classification errors that did not surpass 10% may be regarded as satisfactory. The problem of road traffic sign recognition (TSR) has been the subject of numerous studies in the literature. Paclik et al. [23] claim that the pioneering research on automatic traffic sign detection was originally conferred in Japan in 1985. Following that, different techniques were developed by various researchers in order to create a successful traffic sign recognition and detection system (TSDR) and to reduce all of the aforementioned problems. Preprocessing, detection, tracking, and recognition are the first four phases of an effective TSDR system. Tagunde and Uke [24] enhanced that the aesthetic appeal of photographs is the primary objective of preprocessing. Based on two crucial properties, such as color and form, various methods are utilized to reduce the impact of surroundings on the trial photos. Gündüz et al. [25] aimed to find a traffic sign that has been verified following a thorough search for candidates (TS) inside the input image, traffic sign detection seeks to recognize zones of interest (ROIs) in which it is meant to find those signs. Various methods were suggested to find these ROIs. The most often used techniques for color-based thresholding include HSV/HIS transformation [26, 27]. Region growing [28], YCbCr color indexing [29], and color space conversion [30] are three examples. Shape-based algorithmic program was developed to strengthen the observation stage because color data can be easily influenced by poor lighting or changing weather conditions. There are numerous methods for detecting shapes, and they are widely known for their effectiveness and quick processing times. The most well-liked ones include edges with Hough transformation [31, 32]. Similarity detection [33], distance transform matching [34], and Haar-like features [35] are also well known for their shape detection. Chaudhary et al. [36] proposed the path planning problem with fixed impediments together with other robot navigational issues. Finding an optimal and collision-free route to the target is the goal of the path planning issue. A variety of network topologies and training technologies are employed to create model network which predicts
352
P. C. Gagan Machaiah and G. Pavithra
the turnout inclination that the point-mass robot will use to elude barrier on the way to the goal. In this essay, the performance of various feedforward neural network models will be compared and contrasted. The outcomes indicate that the 10 neuron feedforward neural network model with Bayesian regularization outperformed the others. The models has been utilized in two distinct situations to avoid obstacles. The robot’s paths demonstrate that it has safely navigated around potential hazards and arrived at its target. Xiao et al. [37] identified advantage of the prior structural knowledge of lane markings, and we suggest a recurrent slice convolution module (referred to as RSCM) in this study. A unique recurrent network structure with several slice convolution units makes up the proposed RSCM (called SCU). The dissemination of earlier structural information in SCU could give the RSCM a stronger semantic representation. Additionally, we construct a distance loss taking the previous lane marking system into account. The overall loss function created by combining segmentation loss and distance loss can be used to train the lane detection network more steadily. The outcomes of the experiments demonstrate the potency of our approach. On lane detection benchmarks, we achieve good computing efficiency while maintaining reasonable detection quality. Yao and Chen [38] suggested an enhanced attention deep neural network (DNN), which consists of two branches working at various resolutions and is a lightweight semantic segmentation architecture designed for fast computation in little memory. The suggested network creates dense feature maps for prediction tasks by integrating tiny features obtained from local pixel interactions in global contexts at low resolution. On two well-known lane detection benchmarks (TuSimple and CULane), the introduced network achieves results that are comparable to those of state-of-the-art techniques. It also has a faster calculation efficiency, averaging 258 frames per second (FPS) for the CULane dataset, and only needs 1.56 M model parameters in total. The application of lane detection in memory-constrained devices is made realistic and meaningful by this study.
3 Methodology Our proposed system main objectives are stated below: i. lane sensing and lane tracing ii. object detection iii. traffic sign recognition.
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
353
3.1 Lane Sensing and Lane Tracing Data Acquisition The data for the lanes was gathered from Website. Training has been made for the collected dataset. Here, it entails the gathering of data. Figure 1 shows the input data collected from the Website. Data Preprocessing The video input is converted to mp4 format and shrunk before training and fitting the model. Once after preprocessing, the model is given to the processed data. The process of putting raw data into a format that is comprehensible is called data preprocessing. Given that we cannot work with raw data, it is also a crucial stage in data mining. Before using machine learning, the data’s quality should be examined. Data Augmentation Data augmentation is done in order to prevent overfitting of the model. By creating additional data points from existing data, a group of techniques known as “data augmentation” can be used to artificially enhance the amount of data. This includes making minor adjustments to the data or creating new data points using deep learning models. Model Training The model is trained using deep learning algorithm and convolutional neural network which has two main layers such as convolutional layer and the pooling layer. Finally, the trained CNN model gives the better result by sensing and tracing the lane from given input data.
Fig. 1 Input data for lane sensing and lane tracing
354
P. C. Gagan Machaiah and G. Pavithra
Fig. 2 Input image given for object detection
3.2 Object Detection We use Python to compose the program. Figure 2 is the data input given to the model for object recognition. For feature extraction, we take into account the following 5 features: RGB-RGB values can be used to represent colors (going from 0 to 255, with red, green, and blue). In order to interpret the results, the computer would be able to perform this task and extract the RGB estimation of each pixel. When the framework translates a new image, it similarly transforms a scope into the image before checking samples of numbers against data it is confident about. At that point, the framework has assigned a certainty score to each class. Grayscale: A grayscale version of the image is created. Typically, the predicted class has the most astounding certainty score. Then, the dataset is trained using CNN which is capable of extracting the features and classifying them accurately. The object recognition model has been created to recognize the object in terms of category with the ability of giving better results.
3.3 Traffic Sign Recognition To test the recognition accuracy of the network on the validation set, the target neural network built in this study is trained on the training set. The training is continued on the training set in accordance with the validation set’s findings. Finally, the network’s accuracy on the test set is evaluated. The distribution of the 43 German traffic sign recognition benchmark (GTSRB) categories is shown in Fig. 3. The vertical coordinate is the number of each category, while the horizontal coordinate is 43 categories. To balance the dataset, this paper so employs approaches for data improvement. Imgaug, a machine learning library for processing pictures, is used in algorithm improvement. There are numerous ways to improve an image, including rotation, blur, grayscale, etc. As a result, this article employs Imgaug to enlarge the GTSRB data and split it into manageable clusters for network training, which strengthens the network’s capacity for generalization and also lightens the computer’s computational
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
355
Fig. 3 Distribution classes of traffic signs
burden. A popular way of image enlargement to increase network generalization ability is data augmentation. In order to increase dataset size and increase effectiveness, this study implements data improvement, to conduct 50% picture shading on the training set, 50% image color conversion, and random cropping and filling of specified pixels. The convolutional neural network in this study, which we refer to as TS-CNN and has a total of 10 layers, is developed. Pooling and convolutional layers make up the majority of the network. By gradually adding additional convolutional layers, removing the feature map from the feed in image, using max-pooling to minimize the dimensionality of the feature map, and getting features at various feature scales by combining more layers. In order to classify the traffic signs, the fully connected layer uses soft extremum functions after performing dimensional transformation on the input features. Figure 4 represents the flowchart for our proposed system.
4 Evaluation Results Figure 5 shows the output result for lane sensing and tracing to the given input using convolutional neural network. Figure 6 gives the accurate picture of classifying the object with high recognition rate for the given data sample. In Fig. 7, traffic sign recognition has been made with CNN algorithm, accuracy, and loss metrics which has shown in the figure. Finally, the model has obtained 99.29% accuracy.
356
Fig. 4 Flowchart for our proposed system
Fig. 5 Detecting and tracing the lane
P. C. Gagan Machaiah and G. Pavithra
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
357
Fig. 6 Object recognition with two different classes
Fig. 7 Loss and accuracy after 4 epochs
5 Conclusion For the goal of lane detection and tracking in this work, CNN-based approaches were applied to automated driving assistance systems (ADAS). For the detection and tracking of the lane, we suggested an efficient and reliable approach. The suggested algorithm is simple, and the intended algorithmic program is successful and validated. In this study, we used an online dataset to train and test object identification in images. We tested for single classes like people and cars. We determined that a significant factor in the development of neural network systems is the issue of computational resources. Considering that we tried out Python. To build a model, speed up processing, and analyze the object recognition system across more categories, it requires the bare minimum of time. This paper suggests a lightweight convolutional neural network suited for classifying and recognizing traffic signs. The network successfully recognizes traffic signs using straightforward convolution and pooling operations, theoretically ensures the algorithm’s calculation efficiency, and is tested using GTSRB data. This network also features a straightforward architecture, strong scalability, and processing time that is faster than the detection speed of existing techniques. In future, we plan to experiment with new benchmark datasets and recognize traffic signs in bad conditions.
358
P. C. Gagan Machaiah and G. Pavithra
References 1. Statastics of road accidents in India from 2013–2016. Retrieved from https://data.gov.in/resour ces/staticstics-road-accidents-india/. Accessed on 13 Mar 2018 2. WHO Global status report on road safety 2015. Retrieved from http://www.who.int/violence_ injury_prevention/road_safety_status/2015/en/. Accessed 15 Mar 2018 3. Berkaya SK, Gunduz H, Ozsen O, Akinlar C, Gunal S (2016) On circular traffic sign detection and recognition. Expert Syst Appl 48:67–75 4. Yang Y, Luo H, Xu H, Wu F (2015) Towards real-time traffic sign detection and classification. IEEE Trans Intell Transp Syst 17(7):2022–2031 5. Chaiyakhan K, Hirunyawanakul A, Chanklan R, Kerdprasop K, Kerdprasop N (2015) Traffic sign classification using support vector machine and image segmentation. IEEE Access 2017 6. Aghdam HH, Heravi EJ, Puig D (2016) A practical approach for detection and classification of traffic signs using convolutional neural networks. Robot Auton Syst 84:97–112 7. Bouti A, Mahraz MA, Riffi J, Tairi H (2019) A robust system for road sign detection and classification using LeNet architecture based on convolutional neural network. Soft Comput 1–13 8. Zhou S, Jiang Y, Xi J, Gong J, Xiong G, Chen H (2020) A novel lane detection based on geometrical model and Gabor filter. In: Proceedings of the 2010 IEEE intelligent vehicles symposium, pp 59–64 9. Zhao H, Teng Z, Kim H, Kang D (2013) Annealed particle filter algorithm used for lane detection and tracking. J Autom Control Eng 1:31–35 10. Paula MB, Jung CR (2013) Real-time detection and classification of road lane markings. In: Proceedings of the 2013 XXVI conference on graphics, patterns and images, Arequipa, Peru, pp 5–8 11. Kang CM, Lee SH, Kee SC, Chung CC (2018) Kinematics-based fault-tolerant techniques: lane prediction for an autonomous lane keeping system. Int J Control Autom Syst 16:1293–1302 12. Priyadharshini P, Niketha P, Saantha Lakshmi K, Sharmila S, Divya R (2019) Advances in vision based lane detection algorithm based on reliable lane markings. In: Proceedings of the 2019 5th international conference on advanced computing and communication systems (ICACCS), Coimbatore, pp 880–885 13. Hong G-S, Kim B-G, Dorra DP, Roy PP (2019) A survey of real-time road detection techniques using visual color sensor. Multimed Inf Syst 5:9–14 14. Son J, Yoo H, Kim S, Sohn K (2019) Real-time illumination invariant lane detection for lane departure warning system. Expert Syst Appl 42 15. Chae H, Jeong Y, Kim S, Lee H, Park J, Yi K (2018) Design and vehicle implementation of autonomous lane change algorithm based on probabilistic prediction. In: Proceedings of the 2018 21st international conference on intelligent transportation systems (ITSC), pp 2845–2852 16. Wang P, Chan CY, de La Fortelle A (2018) A reinforcement learning based approach for automated lane change maneuvers. In: Proceedings of the 2018 IEEE intelligent vehicles symposium (IV), Changshu, pp 1379–1384 17. Suh J, Chae H, Yi K (2018) Stochastic model-predictive control for lane change decision of automated driving vehicles. IEEE Trans Veh Technol 67:4771–4782 18. Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems, pp 2553–2561 19. Goyal S, Benjamin P (2014) Object recognition using deep neural networks. IEEE Access 20. Hijazi S, Kumar R, Rowen C (2015) Using convolutional neural networks for image recognition. IEEE Access 21. Pohtongkam S, Srinonchat J (2016) Object recognition from human tactile image using artificial neural network. In: 2016 13th International conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 1–6 22. Militsyn A, Malykhina G (2016) Application of dynamic neural network to search for objects in images. In: 2016 International conference on industrial engineering, applications and manufacturing (ICIEAM). IEEE, pp 1–3
Lane Sensing and Tracing Algorithms for Advanced Driver Assistance …
359
23. Paclik P, Novovicová J, Duin RPW (2006) Building roadsign classifiers using a trainable similarity measure. IEEE Trans Intell Transp Syst 7(3):309–321 24. Tagunde GA, Uke NJ (2012) Detection, classification and recognition of road traffic signs using color and shape features. Int J Adv Technol Eng Res 2(4):202–206 25. Gündüz H, Kaplan S, Günal S, Akınlar C (2013) Circular traffic sign recognition empowered by circle detection algorithm. In: Proceedings of the 21st signal processing and communications applications conference (SIU ‘13). IEEE, New York, pp 1–4 26. Maldonado-Bascón S, Lafuente-Arroyo S, Gil-Jimenez P, Gómez-Moreno H, López-Ferreras F (2007) Road-sign detection and recognition based on support vector machines. IEEE Trans Intell Transp Syst 8(2):264–278 27. Tagunde GA, Uke NJ (2012) Detection, recognition and recognition of road traffic signs using colour and shape features. Int J Adv Technol Eng Res 2(4):202–206 28. Priese L, Rehrmann V (1993) On hierarchical color segmentation and applications. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR ‘93). IEEE, New York, pp 633–634 29. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis IEEE Access 7(1):11–32 30. Hechri A, Mtibaa A (2012) Automatic detection and recognition of road sign for driver assistance system. In: Proceedings of the 16th IEEE Mediterranean electrotechnical conference (MELECON’12), pp 888– 891 31. Overett G, Petersson L (2011) Large scale sign detection using HOG feature variants. In: Proceedings of the IEEE intelligent vehicles symposium (IV ‘11), pp 326–331 32. Møgelmose A, Trivedi MM, Moeslund TB (2012) Visionbased traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13(4):1484–1497 33. Vitabile S, Pollaccia G, Pilato G, Sorbello E (2001) Road signs recognition using a dynamic pixel aggregation technique in the HSV color space. In: Proceedings of the 11th international conference on image analysis and processing (ICIAP ‘01), pp 572–577 34. Gavrila DM (1999) Traffic sign recognition revisited. In: Mustererkennung, pp 86–93 35. Ferlin BH, Zimmermann K (2009) Towards reliable traffic sign recognition. In: Proceedings of the IEEE intelligent vehicles symposium, pp 324–329 36. Chaudhary AK, Lal G, Prasad A, Chand V, Sharma S, Lal A (2021) Obstacle avoidance of a point-mass robot using feedforward neural network. In: 2021 3rd Novel intelligent and leading emerging sciences conference (NILES), pp 210–215 37. Xiao D, Zhuo L, Li J, Li J (2021) Structure-prior deep neural network for lane detection. J Vis Commun Image Represent JAT 81 38. Yao Z, Chen X (2022) Efficient lane detection technique based on lightweight attention deep neural network. J Adv Transp 2022:13
Exploring Open Innovation in the Workplace Through a Serious Game: The Case of Datak Eleni G. Makri
Abstract Gameplay for immersive teaching, learning and training bears a significant design and learning challenge for software developers, academics, researchers and learners. There seems to be inconsistent evidence regarding serious game learning outcomes per se, when compared with traditional modes of instruction and scarce findings addressing open innovation-related serious game learning agency across diverse learner cohorts including workplace. Therefore, this study builds on exploring open innovation-related awareness and attributes surveyed within a serious gameplay multinational organizational context. We report on 45 Greek employees’ preliminary open innovation-associated attributes after gaming when compared with seminar instruction as part of workshop training in open innovation and sustainable development during the year 2021–2022. The trainees perceived the serious game as a supportive instructional tool for open innovation-linked understanding and attitudes in their company. The obtained evidence is discussed along with conceptual and practical implications and streams for further research for serious game open innovation and sustainable development instruction and practice. Keywords Serious games · Open science · Sustainable development · Open innovation · Workplace · Greece
1 Introduction In recent years, advanced technology has evolved for different teaching, learning and training contexts, learner cohorts and disciplines worldwide [1, 2]. Among others, serious games offer immersive space for instruction and co-development of awareness and attributes across a diverse learner audience about real-world challenges. Despite serious game learning contributions [3–5], yet, there tends to be no consistent evidence about the impact of gaming on open innovation and sustainable In loving memory of my parents, doctor Georgios Makris and teacher Georgia Tsiotou-Makri. E. G. Makri (B) Unicaf, Larnaca, Cyprus e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_27
361
362
E. G. Makri
development within different learning environments [6]. In this context, the current paper aims to elaborate on the research performed as part of workshop training in open innovation and sustainable development within a multinational organization in Greece. Exploring employees shared open innovation-related attributes during traditional and gameplay instruction in 2021/2022. The study was implemented in 45 full time employees to assess the extent of their learning outcomes post-seminar and post-gaming instruction for open innovation-associated continuum, as ways to foster efficient open science and open innovation management through serious gameplay and vice versa. Resting on the above rationale, therefore, the research question that adheres to the capacity of the present study is illustrated as next: • Are employees indicating the same levels of favorable company orientation toward open innovation-related knowledge and attitudes after seminar and after gameplay?
2 Related Work Kloeckner et al. [7] elaborate on their design and assessment of a serious game aimed to instruct systems design thinking for open innovation in Brazil. It was developed based on design thinking iterative processes starting from inspiration, idea generation and implementation. Beginning from a deeper understanding of systems design thinking as problem-solving scenarios to the designated solution depicting the game’s final goal. The game design process included the following steps: (a) The authors used their experience in design thinking instruction to develop the learning objectives of the game along with the corresponding teaching challenges met in the design thinking literature; (b) a workshop with experts in design thinking was organized for further feedback on the game and learning mechanics that should be integrated (including design thinking resources, tools and gamification elements); (c) three pilot game application assessments accompanied by a structured questionnaire survey. The game application interventions included overall 312 graduate and postgraduate students, researchers, professors and design thinking professionals who practiced the game in groups. However, the authors rest on their introductory learning outcomes obtained from 18 undergraduate students, 22 professors and 26 design thinking professionals, respectively. Serious gaming learning activities were indicated to be well structured throughout the game in an attractive way, facilitating learner satisfaction with their enjoyable gaming learning practice. Time passing while playing with the game was reported as not evident, thereby fostering immersion during gameplay. In addition, the learning activities appeared to be challenging ones, not demanding prior knowledge and within a group work collaborative learning space. All gamer cohorts agreed that the game seemed to bear promising for: improved design thinking comprehension, developed knowledge, attitudes and skills related to design thinking (as problem-solving procedure/micro-level); realizing the differences between insights and ideas throughout design thinking process, enhanced
Exploring Open Innovation in the Workplace Through a Serious Game: …
363
use of design thinking strategies, generating more ideas, higher value added contribution to design thinking customers and (or) professionals and finally, improved creative self-confidence (macro-level). Overall, professionals compared to students did exhibit higher scores on gaming learning assessment with respect to improved use of design thinking framework, guidance on reaching a suitable solution and (or) improvement of self-confidence, thus offering promising evidence for the use of the game as an enjoyable and motivating learning instructional tool for collaborative multidisciplinary work on design thinking subject area. Mettler and Pinto [8] illustrate their stakeholder co-development serious game iterative design approach (i.e., design partners involved 10 engineering management experts from Spain, Italy, Germany and Switzerland taking part in regular testing and fine tuning sessions) used to co-design (i.e., open innovation-associated), and preliminary assess a serious game (Intime) aimed to (a) foster knowledge and skills about production networks (mapped into open innovation-related awareness and attributes) (micro-level) and (b) transfer and reproducibility of scientific/research knowledge (i.e., open science-related) to broader supply chain/engineering management professionals (macro-level). The codesign partners collaborated to define the learning objectives of the game, the learning profile, and possible learning expectations of the engineering/supply chain managers as the targeted professional audience. The aforementioned learning outcomes were further validated by expert industry stakeholders. The Intime was first developed as a board game. An all-day workshop event was organized to practice and assess the board game in relation to the expected context-specific serious game learning goals. The feedback from the play event was used to develop the prototype of the Intime serious game. According to that, in-game performance assessment measures and extended debriefing sessions with in-person open-ended interviews were further integrated into the gaming learning experience assessment. An additional all-day Intime serious game event was further organized to garner the gamers/supply chain management professionals feedback with the gaming version of the Intime based on in-person open-ended interviews and debriefing sessions. The professional crosscountry gamers did offer positive feedback as regards the Intime serious game and learning mechanics post-gameplay. Their insights, reflections, expectations and active gameplay experience perceptions stressing the need of moderate complexity embedded into the game to foster the defined learning gaming goals and tend to correspond to Intime learning challenges that the authors (and co-developers) claim to integrate in future version(s) of their game. In the aforementioned studies, both codeveloped serious games relate to open innovation topic but within different aspects, involve diverse learner and professional cohorts as co-design and assessment audience, share different iterative co-design process, include micro-and macro-level of gaming learning objectives and report improved learning outcomes obtained from expert gamers.
364
E. G. Makri
3 Method 3.1 Study Design 45 full time employees of a multinational organization in Greece participated in the current study as part of their onsite 3-h workshop in open innovation and sustainable development. The aim of the survey was to explore trainees’ perceptions of open innovation-related workplace aspects post-seminar (i.e., traditional mode of instruction) and after gameplay (i.e., immersive mode of instruction), respectively. In other words, following inconsistent evidence [9] and limited research that tends to compare conventional with immersive instruction, [10] to investigate whether gaming and the particular co-designed (i.e., open innovation solution) serious game (i.e., Datak) learning seems to foster more favorable open innovation-associated organizational orientation when compared to traditional learning, accordingly. Post-informed consent offered that the employees were first required to fill in demographic information regarding gender, department/division, job role and residency items. Next, they attended an hour of seminar (including completion of relevant assigned tasks on open innovation topics). Afterward, they answered 47 self-designed open innovationrelated self-assessment questions following Brunswicker and Chesbrough [11] and Cosh and Zhang [12]. The items were modified for the needs of the present study to address issues of open innovation-associated concepts and attributes exercised within trainees’ company. The participants/attendees were required to complete 47 open innovation-related self-assessment questions post-seminar and post-gameplay, respectively. The first two of the 47 questions aimed to assess the frequency with which open innovation is exercised in their company and the range of concepts that seem to be known to them in their organization in terms of the use of open innovation. The rest 45 of them aimed to evaluate the seminar and (or) gameplay as means/mode of instruction for open innovation concept and open innovation practice, as part of open innovation process specific aspects explored in their organization. The attendees completed their open innovation-associated self-assessment items based on their overall experience with the open innovation seminar introduced (post-seminar). In the second and third hour of their workshop, the instructor/facilitator first introduced them to Datak open science/open innovation serious game through a demo (https:// datak.ch/). Participants were motivated and instructed to play the game as long as they pleased. After Cheung and Ng [13], they experienced gaming in pairs and (or) small groups across all game levels and activities for up to one hour. Following Hauge et al. [14], the workshop instructor acted further as game facilitator/moderator throughout gameplay and encouraged learners to take part in shared discussions leading to reflection during and after gaming. The short debriefing session post-gameplay revealed favorable responses regarding trainee gaming learning experience. Postgaming, the attendees were required to complete the same open innovation-related self-assessment items as post-seminar (post-gameplay).
Exploring Open Innovation in the Workplace Through a Serious Game: …
365
Fig. 1 Screenshot of Datak game
3.2 Description of the Game Datak is a serious game created by Radio Television Switzerland with the active participation of software developers and supports from the youth communities (codevelopment application) to raise citizen awareness about privacy and big data protection mapped into open science and open innovation [15–17] (https://www.gam esforchange.org/games/datak-a-serious-game-about-personal-data/ https://www.dat ak.ch/). Its aim is to instruct adult players on how their personal data is used across different research and workplace situations and make them aware of the associated benefits and risks. The game is available in English, French, German, and Italian. Players assume the role of a recent recruit hired to work for the mayor of a town and to manage their social media services and are confronted with various dilemmas in their working day-to-day lives. There are decisions to be made in given big data privacy scenarios (in their private lives as well as for the town community) against the clock and interspersed with videos from YouTubers and relevant factual data based on scenarios and decision-making. Some of the particular tasks undertaken by the new recruit are, for example, whether or not to approve a project to install CCTV cameras in town, or to pass on citizens’ details or people-oriented data to organizations, public service infrastructure or political parties. Once a project is completed, the actual investigation result can be accessed as well as a trove of useful tips on how to manage personal and (or) people-oriented data. Figure 1 illustrates a screenshot of the Datak serious game.
3.3 Data Analysis and Results 23 male and 22 female employees of a multinational organization (N = 45) from Attica (N = 30) and other regions (N = 15) offered full open innovationrelated self-assessments after seminar and post-gaming instruction, respectively. The preliminary exploration of the frequency and the distribution of the level of
366
E. G. Makri
Table 1 Frequency of open innovation exercise and concepts known in own organization in terms of the use of open innovation post-seminar and post-gaming (N = 45) Open Post-seminar innovation-related % self-assessment
Post-gameplay N
N
Sometimes
20
9
Almost
42.2
19
Always
1. Exercise of open Sometimes innovation by your organization
100
37.8
17
2. Concepts known to you in your organization in terms of the use of open innovation
Networking
28.9
13
Co-creation/co-development 31.1
14
R&D
28.9
13
Open source
31.1
14
Social media
28.9
13
Inclusivity
6.7
3
6
IP
13.3
6
Open data and sharing
17.8
8
Knowledge 13.3 management
45
%
agreement/disagreement that participants reported in each of the corresponding open innovation-associated workplace self-assessment items post-seminar and postgameplay are indicated in Table 1 and Tables 2, 3, 4, 5 and 6 that follow next, respectively. Tables 1, 2, 3, 4, 5 and 6 above illustrate that overall, there seems to be more favorable employee perceptions toward open innovation-associated concepts and organizational practice orientation after gaming than post-seminar instruction, respectively. In other words, taken together, the gameplay experience did appear to relate to more open innovation-associated workplace orientation aspects compared to post-seminar instruction, accordingly, as reported next. In particular, as Table 1 denotes, while all attendees (100%) indicated that their organization practices open innovation sometimes after seminar, post-gaming, the majority of employees (80%) perceived that their company tends to exercise open innovation either almost or always (Datak’s positive effect on open innovation exercised by corresponding company). In terms of the open innovation concepts known to participants after seminar and post-gaming: 28.9% of them reported networking, R&D, social media and knowledge management (13.3%) post-seminar, with their post-gameplay peers indicating co-creation/codevelopment as the mostly known open innovation-related concepts in their company (62.2%), followed by open data and sharing (17.8%) and IP (13.3%), respectively. It seems, therefore, that Datak appeared to trigger more concepts directly related to open innovation-associated subject than the seminar. In addition, as reported in Tables 2, 3, 4, 5, and 6, most trainees completely agreed that the game can be used to instruct the concept of open innovation (64.4%), rather and (or) completely agreed that Datak can be further employed as part of open innovation process in their organization (86.6%) for business intelligence (64.4%), for systems design thinking (80%), for technology innovation (100%), as new product or service (100%), for product
Exploring Open Innovation in the Workplace Through a Serious Game: …
367
Table 2 Level of agreement/disagreement with usage as part of open innovation process in own organization, post-seminar and post-gaming (N = 45)
1.
2.
3.
4.
5.
6.
Open innovation-related self-assessment
Post-seminar
The seminar/game can be used as part of open innovation process in your organization for business intelligence
Post-gameplay %
N
Neither disagree nor agree
100
45
The seminar/game can Neither be used as part of open disagree nor innovation process in agree your organization for systems design thinking
100
The seminar/game can be used as part of open innovation process in your organization for technology innovation
Neither disagree nor agree
100
The seminar/game can be used as part of open innovation process in your organization for new product or service
Neither disagree nor agree
100
The seminar/game can be used as part of open innovation process in your organization for product development
Neither disagree nor agree
100
The seminar/game can be used as part of open innovation process in your organization for manufacturing and distribution
Neither disagree nor agree
100
45
45
45
45
45
%
N
Neither disagree nor agree
35.6
16
Rather agree
64.4
29
Neither disagree nor agree
20
9
Rather agree
44.4
20
Completely agree
35.6
16
Rather agree
44.4
20
Completely agree
55.6
25
Rather agree
51.1
23
Completely agree
48.9
22
Rather agree
57.8
26
Completely agree
42.2
19
Neither disagree nor agree
31.1
14
Rather agree
68.9
31
development (100%), and (or) for manufacturing and (or) distribution (68.9%). Likewise, the majority of attendees either rather (51.1%) and (or) completely agreed (48.9%) that the game can be used to instruct open innovation in their company for new development ideas and better comprehension of customers’ motives and needs; as avenue to new technologies and to speed up time for market (completely agreed; 100%); 46.7% rather and 53.3% completely agreed Datak to be used to improve innovation success; 77.8% rather and 22.2% completely agreed to use the game for risk sharing in a complex context, for strengthening supplier relationship (rather agreed; 60%), open data and sharing (100%) and (or) IP management
368
E. G. Makri
Table 3 Level of agreement/disagreement with usage as instruction for open innovation in own organization, post-seminar and post-gaming (N = 45) Open innovation-related self-assessment 1.
2.
Post-seminar
The seminar/game can Neither be used to instruct open disagree nor innovation concept agree
Post-gameplay %
N
100
45
N
Neither disagree nor agree
11.1
5
Rather agree
24.4
11
Completely agree
64.4
29
Neither disagree nor agree
13.3
6
Rather agree
44.4
20
Completely agree
42.2
19
Rather agree
51.1
23
Completely agree
48.9
22
Rather agree
51.1
23
Completely agree
48.9
22
The seminar/game can Neither be employed to instruct disagree nor open innovation agree practice
100
The seminar/game can be used as part of open innovation in your organization for new development ideas
Neither disagree nor agree
100
The seminar/game can Neither be used as part of open disagree nor innovation in your agree organization for better understanding of customers’ motives and needs
100
5.
The seminar/game can Neither be used as part of open disagree nor innovation in your agree organization as route to new technologies
100
45
Completely agree
100
45
6.
The seminar/game can be used as part of open innovation in your organization to speed up time for market
Neither disagree nor agree
100
45
Rather agree
100
45
7.
The seminar/game can Neither be used as part of open disagree nor innovation in your agree organization to improve innovation process
100
45
Rather agree
46.7
21
Completely agree
53.3
24
3.
4.
45
%
45
45
(continued)
Exploring Open Innovation in the Workplace Through a Serious Game: …
369
Table 3 (continued) Open innovation-related self-assessment
Post-seminar
The seminar/game can be used as part of open innovation in your organization for risk sharing in a complex context
Post-gameplay %
N
Neither disagree nor agree
100
45
The seminar/game can be used as part of open innovation in your organization to reinforce supplier relationship
Neither disagree nor agree
100
10.
The seminar/game can be used as part of open innovation in your organization for open data and sharing
Neither disagree nor agree
100
11.
The seminar/game can Neither be used as part of open disagree nor innovation in your agree organization for IP management guidelines
100
8.
9.
%
N
Rather agree
77.8
35
Completely agree
22.2
10
Neither disagree nor agree
40
18
Rather agree
60
27
45
Completely agree
100
45
45
Neither disagree nor agree
28.9
13
Rather agree
20
9
Completely agree
51.1
23
45
guidelines (completely agreed; 51.1%; 28.9%; neutral and rather agreed; 20%). Moreover, most participants did rather and (or) completely agreed that the game can be used in success of open innovation in their organization for idea sharing (100%), to support middle (rather agreed; 62.2%; completely agreed; 37.8%) and top management (rather agreed; 55.6%; completely agreed; 44.4%) to capture external ideas systematically (100%), for project management (completely agreed; 53.3%, rather agreed; 35.5%), to support open innovation orientation (completely agreed; 68.8%; rather agreed; 31.1%), training in corresponding concepts and methods (rather agreed; 51.1%, completely agreed; 48.9%), IP management (rather agreed; 51.1%, completely agreed; 48.9%), resource allocation (completely agreed; 37.8%, rather agreed; 62.2%) and (or) open data and sharing (100%). Similarly, all trainees completely agreed (100%) that Datak can be implemented as source of information for inflow innovation for internal resource and within their organization, within the group of parent and (or) subsidiary companies per se (rather agreed; 80%; completely agreed; 20%) and other organizations and markets (rather agreed; 51.1%; 48.9%; completely agreed), end users and customers (100%), software developers (100%), business service providers and competitor organizations (rather agreed; 100%), within university and research infrastructure (completely agreed; 100%); consulting
370
E. G. Makri
Table 4 Level of agreement/disagreement with application in success of open innovation in own organization, post-seminar and post-gaming (N = 45) Open innovation-related self-assessment
Post-seminar
1.
The seminar/game can be applied in success of open innovation in your organization for idea sharing
2.
Post-gameplay %
N
Neither disagree nor agree
100
45
The seminar/game can be applied in success of open innovation in your organization for support of middle management
Neither disagree nor agree
100
45
The seminar/game can be applied in success of open innovation in your organization for assistance of top management
Neither disagree nor agree
100
4.
The seminar/game can be applied in success of open innovation in your organization to capture external ideas in a systematic way
Neither disagree nor agree
100
5.
The seminar/game can be applied in success of open innovation in your organization for project management performance
Neither disagree nor agree
100
The seminar/game can be applied in success of open innovation in your organization for open innovation orientation
Neither disagree nor agree
100
The seminar/game can be applied in success of open innovation in your organization for open innovation training concept and methods
Neither disagree nor agree
100
3.
6.
7.
%
N
Completely agree
100
45
Rather agree
62.2
28
Completely agree
37.8
17
Rather agree
55.6
25
Completely agree
44.4
20
45
Completely agree
100
45
45
Neither disagree nor agree
11.1
5
Rather agree
35.6
16
Completely agree
53.3
24
Rather agree
31.1
14
Completely agree
68.8
31
Rather agree
51.1
23
Completely agree
48.9
22
45
45
45
(continued)
Exploring Open Innovation in the Workplace Through a Serious Game: …
371
Table 4 (continued)
8.
9.
10.
Open innovation-related self-assessment
Post-seminar
The seminar/game can be applied in success of open innovation in your organization for IP management guidelines
Post-gameplay %
N
Neither disagree nor agree
100
45
The seminar/game can be applied in success of open innovation in your organization for resource allocation
Neither disagree nor agree
100
The seminar/game can be applied in success of open innovation in your organization for open data and sharing
Neither disagree nor agree
100
45
45
%
N
Rather agree
51.1
23
Completely agree
48.9
22
Rather agree
62.2
28
Completely agree
37.8
17
Completely agree
100
45
agencies (neutral; 60%, rather agreed; 22.2% and 17.8%; completely agreed), R&D enterprises (55.6%; rather agreed and 44.4%; completely agreed), higher education and communities of practice (100%), public sector service companies (86.7%; rather agreed and 13.3%; completely agreed), industry stakeholders (100%) and (or) public authorities and government (51.1%; rather agreed and 48.9%; completely agreed). Finally, most attendees did either rather or completely agreed (51.1%; rather agreed and 48.9%; completely agreed) that the game can be employed as source of information for outflow innovation and for knowledge transfer or technology to external partner (s), in particular. In sum, therefore, the Datak game appears to relate trainees with more open innovation-associated organizational orientation and open innovation-linked features in comparison with a traditional (seminar) mode of instruction. Figure 2 outlines the conceptual model that underpins the indicated post-Datak gameplay findings, as next.
4 Discussion The present study aims to be innovative by grappling with, embedding and exploring two learning challenges: The serious game instruction with the open innovationrelated awareness and attitudes investigated post-seminar and after gaming employee training, accordingly. The indicated evidence seems to associate employees of a multinational company with more direct open innovation-linked organizational culture after gaming. In addition, to further relate serious gaming instruction with
372
E. G. Makri
Table 5 Level of agreement/disagreement with application as source of information for inflow innovation, post-seminar and post-gaming (N = 45) Open innovation-related self-assessment
Post-seminar
1.
The seminar/game can be applied as source of information for inflow innovation for internal resource
2.
The seminar/game can be applied as source of information for inflow innovation within the organization
3.
Post-gameplay %
N
Neither disagree nor agree
100
45
Neither disagree nor agree
100
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies
100
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies for other organizations and markets
100
5.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies for end users and customers
100
6.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies for software developers
100
4.
%
N
Completely agree
100
45
45
Completely agree
100
45
45
Rather agree
80
36
Completely agree
20
9
Rather agree
51.1
23
Completely agree
48.9
22
45
Completely agree
100
45
45
Completely agree
100
45
45
(continued)
Exploring Open Innovation in the Workplace Through a Serious Game: …
373
Table 5 (continued) Open innovation-related self-assessment
Post-seminar
Post-gameplay %
N
%
N
7.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies for business service providers
100
45
Rather agree
100
45
8.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies for competitor organizations
100
45
Rather agree
100
45
9.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within university and research infrastructure
100
45
Completely agree
100
45
10.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within consulting agencies
100
45
Neither disagree nor agree
60
27
Rather agree
22.2
10
Completely agree
17.8
8
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within R&D enterprises
100
Rather agree
55.6
25
Completely agree
44.4
20
11.
45
(continued)
374
E. G. Makri
Table 5 (continued) Open innovation-related self-assessment
Post-seminar
Post-gameplay %
N
%
N
12.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within higher education institutions
100
45
Completely agree
100
45
13.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within communities of practice
100
45
Completely agree
100
45
14.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within public sector service companies
100
45
Rather agree
86.7
39
Completely agree
13.3
6
15.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within industry stakeholders
100
45
Completely agree
100
45
16.
The seminar/game can Neither be applied as source of disagree nor information for inflow agree innovation within the group of parent and (or) subsidiary companies within public authorities and government
100
45
Rather agree
51.1
23
Completely agree
48.9
22
Exploring Open Innovation in the Workplace Through a Serious Game: …
375
Table 6 Level of agreement/disagreement with application as source of information for outflow innovation, post-seminar and post-gaming (N = 45)
1.
2.
Open innovation-related self-assessment
Post-seminar
The seminar/game can be employed as source of information for outflow innovation The seminar/game can be employed as source of information for outflow innovation for knowledge transfer or technology to external partner(s)
Post-gameplay %
N
Neither disagree nor agree
100
45
Neither disagree nor agree
100
Exercise of Open InnovaƟon by OrganizaƟon
Co-creaƟon/Development Open Data and Sharing Open Source IP
New Development Ideas BeƩer Understanding Customer MoƟves and Needs Route to New Technologies Speed up Time for Market Improve InnovaƟon Success Risk Sharing in a Complex Contex Reinforce Supplier RelaƟonship Open Data and Sharing IP Management Guidelines
As Source of InformaƟon for Inflow InnovaƟon for Internal Resource Within the OrganizaƟon Within the Group of Parent /Subsidiary Companies, Other OrganizaƟons and Markets, End Users and Customers, SoŌware Developers, Business Service Providers, CompeƟtor OrganizaƟons, University and Research Infrastructure,ConsulƟng Agencies, R&D Enterprises, CommuniƟes of PracƟce, Public Sector Service, Industry Stakeholders, and Public AuthoriƟes and Government
45
%
N
Rather agree 51.1
23
Completely agree
48.9
22
Rather agree 51.1
23
Completely agree
22
48.9
Open InnovaƟon Process in OrganizaƟon
Business Intelligence Systems Design Thinking Technology InnovaƟon New Product or Service Manufacturing/DistribuƟon
Open InnovaƟon InstrucƟon in OrganizaƟon for
In Success of Open InnovaƟon in OrganizaƟon for Idea Sharing Support of Middle and Top Management Capture External Ideas SystemaƟcally Project Management Open InnovaƟon OrientaƟon and Training IP Management Resource AllocaƟon Open Data and Sharing
As Source of InformaƟon for Ouƞlow InnovaƟon for Knowledge Transfer or Technology to External Partner(s)
Fig. 2 Outline of conceptual model reflecting Datak post-gameplay open innovation-related continuum
376
E. G. Makri
facilitating inflow and outflow open innovation-related specific knowledge, attitudes and skills (KAS) as perceived through the following spaces: systems design thinking, collective entrepreneurship (co-creation/co-development), business intelligence, open-source software, IP management, technology innovation, risk sharing, R&D enterprises, open data and sharing; new product development, supplier relationships, manufacturing and distribution, marketing; parent and subsidiary companies within university, research infrastructure, industry, communities of practice, public service, government, knowledge and technology transfer, accordingly. Expanding the previous findings within different organizational contexts that (a) explored and linked the aforementioned particulars of open science and open innovation-related serious gaming knowledge and attributes instruction [7] (multidisciplinary collaborative work; systems design thinking for innovation; academics and professionals; Brazil) with analogous ones perceived in the current gameplay instruction by Greek multinational workers and (b) instructed and assessed supply chain management awareness and attributes of management engineering professionals through a serious game designed for production networks as a co-creation/co-development industry stakeholder open innovation initiative and assessed in Spain, Italy, Germany and Switzerland [8]. Further, extending prior exploration and evidence of how to facilitate smarter workplace inflow and outflow (open) innovation-associated awareness and competencies from other work operating systems [12] (small, medium and large UK enterprises); [11] (small and medium companies in Europe and USA; executives) within the present multinational organizational environment in Greece and that of current serious Datak game challenged-based learning employee assessment, in particular. Moreover, linking open science with open innovation or else, open innovation in science [18] concepts and competencies and exploring them within serious Datak gameplay instruction. Alongside the aforementioned preliminary evidence, therefore, it might be useful to future assess (a) whether current game can be further developed to integrate additional open innovation in science-related concepts and attributes [19] and (b) explored within doctoral innovation networks diverse curricula [20] to seek whether it additionally fosters advanced early career researchers open science/open innovation-related and sustainable development long-term awareness and competencies assessed in an engaged and immersive learning space.
5 Conclusion After gaming compared with post-traditional instruction, multinational employees perceived the explored serious game as encouraging immersive instructional tools for open science and open innovation-associated learning. The current findings seem to indicate the promising capacity of serious games as open science and open innovation-related instructional solutions in the workplace. Enabling the potential to future integrate serious games into doctoral innovation networks curricula to foster open innovation in science challenge-based rich learning.
Exploring Open Innovation in the Workplace Through a Serious Game: …
377
Acknowledgements The author thankfully acknowledges the support provided by the organization and the time and effort that attendees allocated in making this study possible.
References 1. Kariapper RKAR, Pirapuraj P, Suhail Razeeth MS, Nafrees ACM, Fathima Roshan M (2021) Adaption of smart devices and virtual reality (VR) in secondary education. In: Sharma H, Saraswat M, Kumar S, Bansal JC (eds) Intelligent learning for computer vision. CIS 2020. Lecture notes on data engineering and communications technologies, vol 61. Springer, Singapore. https://doi.org/10.1007/978-981-33-4582-9_43 2. Datt G, Tewari N (2021) Educator’s perspective towards the implementation of technologyenabled education in schools. In: Sharma H, Saraswat M, Yadav A, Kim JH, Bansal JC (eds) Congress on intelligent systems. CIS 2020. Advances in intelligent systems and computing, vol 1334. Springer, Singapore. https://doi.org/10.1007/978-981-33-6981-8_43 3. Oceja J, Gonzalez Fernandez N (2020) Development of civic competence through digital game experiences: perspectives of international video-game designers. Icono14 18(2):296–317. https://doi.org/10.7195/ri14.v18i2.1416 4. Riopel M, Nenciovici L, Potvin P, Chastenay P, Charland P, Blanchette Sarrasin J, Masson S (2020) Impact of serious games aon science learning achievement compared with more conventional instruction: an overview and a meta-analysis. Stud Sci Educ 56. https://doi.org/ 10.1080/03057267.2019.1722420 5. Rodríguez López F, Arias-Oliva M, Pelegrín-Borondo J, Marín-Vinuesa LM (2021) Serious games in management education: an acceptance analysis. Int J Manag Educ 19(3):100517. https://doi.org/10.1016/j.ijme.2021.100517 6. De la Torre R, Onggo BS, Corlu CG, Nogal M, Juan AA (2021) The role of simulation and serious games in teaching concepts on circular economy and sustainable energy. Energies 14(4):1–21, 1138. https://doi.org/10.3390/en14041138 7. Kloeckner AP, Scherer JO, Ribeiro JLD (2021) A game to teach and apply design thinking for innovation. Int J Innov (IJI) 9(3):557–587. https://doi.org/10.5585/iji.v9i3.20286 8. Mettler T, Pinto R (2015) Serious games as a means for scientific knowledge transfer—a case from engineering management education. IEEE Trans Eng Manag 62(2):256–265. https://doi. org/10.1109/TEM.2015.2413494 9. Rodela R, Ligtenberg A, Bosma R (2019) Conceptualizing serious games as a learning-based intervention in the context of natural resources and environmental governance. Water 11(2):245. https://doi.org/10.3390/w11020245 10. Li K, Hall M, Bermell-Garcia P, Alcock J, Tiwari A, González-Franco M (2017) Measuring the learning effectiveness of serious gaming for training of complex manufacturing tasks. Simul Gaming 48(6):770–790. https://doi.org/10.1177/1046878117739929 11. Brunswicker SH, Chesbrough H (2018) The adoption of open Innovation in large firms. Res Technol Manag 61(1):35–45. https://doi.org/10.1080/08956308.2018.1399022 12. Cosh A, Zhang JJ (2011) Open innovation choices—what is British enterprise doing? UK Innovation Research Centre 13. Cheung SY, Ng KY (2021) Application of the educational game to enhance student learning. Front Educ 6:623793. https://doi.org/10.3389/feduc.2021.623793 14. Hauge JB, Söbke H, Bröker T, Lim T, Luccini AM, Ing D, Kornevs M, Meijer S (2021) Current competencies of game facilitators and their potential optimization in higher education: multimethod study. JMIR Serious Games 9(2):e25481. https://doi.org/10.2196/25481 15. Datak Homepage, https://gamesforchange.org/games/datak-a-serious-game-about-personaldata/. Accessed on 07 Jan 2021
378
E. G. Makri
16. Beck S, Bergenholtz C, Bogers M, Brasseur T-M, Conradsen ML, Di Marco D, Distel AP, Dobusch L, Dörler D, Effert A, Fecher B, Filiou D, Frederiksen L, Gillier T, Grimpe C, Gruber M, Haeussler C, Heigl F, Hoisl K, Hyslop K, Kokshagina O, LaFlamme M, Lawson C, Lifshitz-Assaf H, Lukas W, Nordberg M, Norn MT, Poetz M, Ponti M, Pruschak G, PujolPriego L, Radziwon A, Rafner J, Romanova G, Ruser A, Sauermann H, Shah SK, Sherson JF, SuessReyes J, Tucci CL, Tuertscher P, Vedel JB, Velden T, Verganti R, Wareham J, Wiggins A, Xu SM (2022) The open innovation in science research field: a collaborative conceptualisation approach. Ind Innov 29(2):136–185. https://doi.org/10.1080/13662716.2020.1792274 17. Burgos D (ed) (2020) Radical solutions and open science: an open approach to boost higher education. In: Lecture notes in educational technology (LNET). Springer Singapore. https:// doi.org/10.1007/978-981-15-4276-3 18. Bogers M, Chesbrough H, Moedas C (2018) Open innovation: research, practices, and policies. Calif Manag Rev 60(2):5–16. https://doi.org/10.1177/0008125617745086 19. UNESCO recommendation on open science. Paris, France (2021) 20. Teo EA (2020) State-of-the-art-analysis of the pedagogical underpinnings in open science, citizen science and open innovation activities. In: Triantafyllou E (ed) INOS consortium. Retrieved from https://inos-project.eu/
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using Novel QIRWS-BWO and SAES Techniques Y. Jani
and P. Raajan
Abstract To monitor the patients’ medical status, various algorithms have been developed. Nevertheless, major complications are higher energy consumption along with security. Thus, by utilizing novel Quadratic Interpolation and Roulette Wheel Selection Black Widow Optimization (QIRWS-BWO) and Swapping-centric Advanced Encryption Standard (SAES) methodologies, a Blockchain (BC)-centered security and energy-efficient measure have been proposed in healthcare IoT. Firstly, by utilizing Binary Streebog Hashing Algorithm (BSHA), patients undergo registration together; the hash code will be generated during registration. Then, by employing the Spearman Rho Correlation Coefficient Gaussian Mixture Model (SRCGMM), the patients’ nodes are clustered. Then, by utilizing the QIRWS-BWO, the cluster heads (CHs) are chosen. Afterward, by SAES, the data are encrypted to access the data. The outcomes displayed that when analogized with the prevailing methodologies, the proposed model shows better performance. Keywords Streebog Hashing Algorithm (SHA) · Gaussian Mixture Model (GMM) · Black Widow Optimization (BWO) · Advanced Encryption Standard (SAES) and Blockchain
1 Introduction In various application domains, greater attention has been gained by the IoT, which is a transformative along with emerging paradigm [1]. Automatically together with intelligently linked sensors and actuators are incorporated with machines and physical objects [2, 3]. Regarding the sensed data, decisions are taken autonomously by an IoT device or can communicate [4]. To provide multiple features, most IoT technologies are reflected by healthcare applications [5, 6]. The IoT-centric healthcare services have wireless interfaces along with Internet connections; thus, they support mobility [7]. Nevertheless, security challenges like the authentication, the Y. Jani (B) · P. Raajan Department of Computer Science, Muslim Arts College (Affiliated to Manonmaniam Sundaranar University, Abishekapatti, Tirunelveli-627012), Thiruvithancode, Tamil Nadu 629174, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_28
379
380
Y. Jani and P. Raajan
exchange of data, and the need for energy efficiency are augmented owing to the increasing usage of IoT services [8, 9]. Data privacy, single point of failure, system vulnerability, and centralized data stewardship are the issues healthcare data management systems suffer [10]. The BC-centric model aids in assuring the best healthcare data management system [11, 12]. In the BC, the blocks contain records, which subsume transaction details between the system and users [13, 14]. Regarding security, audit, transparency, along with trust, the telemedicine outcomes are enhanced by utilizing the BC technology [15]. For security as well as energy efficient in healthcare IoT, several algorithms have been utilized. Nevertheless, in the prevailing methodologies, higher energy consumption, reduced latency, and non-assured security of data transmission are the drawbacks. Therefore, by utilizing QIRWS-BWO together with SAES methodologies, the work has proposed BC-centric security and energyefficient healthcare IoT. The paper’s remaining parts are structured as follows: The related works are reviewed in Sect. 2; the proposed methodology is explicated in Sect. 3; the proposed model is analyzed in Sect. 4; lastly, the paper is concluded in Sect. 5.
2 Literature Survey Saba et al. [16] presented an energy-efficient model utilizing the Internet of Medical Things (IoMT) meant for e-health care. In data transmission, the private–public keycentric digital authentications were integrated to make certain its validation together with integrity. The outcomes displayed that the presented model was energy efficient as well as highly secure. Nevertheless, there might occur cryptographic attacks since the values were chosen randomly. Jan et al. [17] introduced a lightweight together with secure communication for data exchanged among the devices of healthcare infrastructure. The outcomes demonstrated that better performance was achieved by the presented one than the prevailing models. Nonetheless, the one-step registration in the offline phase was highly complicated. Bharathi et al. [18] elucidated energy-efficient clustering aimed at IoT-centric Sustainable Healthcare Systems. Firstly, the data acquisition was performed then selecting the CH, the sensed data were transferred to the cloud subsystem. The outcomes displayed the model’s better performance. Nevertheless, under a varying number of IoT sensors, the EEPSOC algorithm demonstrated its maximum energy-efficient characteristics. Rahman et al. [19] presented a BC-centered Mobile Edge Computing (MEC) system. For a larger part of humanity, therapy diagnostic and analytical data were provided by the model with MEC. The outcomes proved the model’s superior performance. Nevertheless, increased data storage time was the limitation here.
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using …
381
Wang et al. [20] introduced distributed security architecture. To secure data transmission for connected health, node authentication, a hybrid signature, and a consensus technique were included in this solution. The outcomes displayed that the presented one obtained better performance. Nevertheless, larger key sizes and lower computation efficiency were the drawbacks of this model. Ashutosh Dhar Dwivedi et al. presented several opportunities along with industrial applications of 5G-enabled IoT devices. Here, by utilizing the BDN, network scalability problems were resolved. In BC, privacy was a major issue; this problem was addressed by utilizing a ledger grounded on ZKP. Kebira Azbeg et al. developed BlockMedCare, a secure healthcare system. To speed up the data storage process, an Ethereum BC-centric proof of authority was employed. The experiential outcome displayed that in terms of security face, better performance was achieved by the model than the prevailing methodologies. Koosha Mohammad Hossein et al. suggested an architecture termed BC Health that enabled data owners to proffer their required access policies over their privacysensitive healthcare data. The experiential evaluation confirmed that regarding computation together with processing time, the BC Health’s efficacy was enhanced; it also proved the model’s resilience against various security attacks. Jafar A. Alzubi et al. recommended a BC-guided highly secure system for medical IoT devices by utilizing Lamport Merkle Digital Signature (LMDS). The LMDSG’s root was determined by a Centralized Healthcare Controller (CHC) by utilizing LMDS Verification (LMDSV). The experiential evaluation demonstrated that in medical IoT systems, higher security, and minimum CT as well as CO were ensured by the presented model more than the other prevailing methodologies.
3 Proposed Secure and Energy-Efficient System By utilizing a novel QIRWS-BWO along with the SAES technique, the security and energy-efficient centered healthcare IoT has been proposed in this work. In order to attain energy-efficient data transmission, the patients (nodes) are clustered together and are transferred to the cloud server through the cluster head. Finally, the data are encrypted and authenticated by using SAES and BSHA. Figure 1 exhibits the proposed model’s block diagram.
3.1 Registration Primarily, for the registration process, patients enter their details, and the separate blocks are generated for every single patient in the BC during registration. User name (un), password (pw), location (loc), time (t), and mobile number (mbn) are
382
Y. Jani and P. Raajan
Fig. 1 Block diagram of the proposed model
the patient details stored in the BC. Rus specifies the data registered by the user. It is expressed as Rus
Ud ∈{un||loc||t||pw||mbn}
−→
Bc
(1)
where the patient details are specified as Ud and the BC is signified as Bc . For security purposes, the hash code was created after registration.
3.2 Hash Code Generation Here, the BSHA model is utilized to generate the hash code. For generating a fixedsize hash value, a compression function in SHA is utilized. Nevertheless, the hash conversion is highly time consuming along with complex owing to the random compression. Thus, for compression, the input data are binarized and negated, and the AND operation is utilized. Therefore, the proposed model is renowned as BSHA. Initially, the registered data Rus are binarized as well as negated as Ud
binarization
−→
Bd
Bneg = (∼ (Bd ))
(2) (3)
where outputs from the negation (∼) of binarized data Bd are notated as Bneg . After that, by utilizing the block cipher, the data are compressed X d which is expressed as X d = S M R ◦ T p ◦ Sb (h • Nb ), m b • h • m b
(4)
where the message block be m b , the variables be h, the block counter be Nb and the block cipher be S. Firstly, let the key, which is identical to the data size, be K (Bn ) K (Bn ) = K (Bn )1 , K (Bn )2 , . . . , K (Bn )e
(5)
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using …
383
where the number of keys be (e). In add round key (Ar ), the AND operation is performed betwixt the key and data as Ar = K (Bn ) • Bn
(6)
After that, every single byte of data is replaced by the sub-bytes (Sb ) as row and column of the S-box. Subsequently, to rearrange the row and column in the matrix, the transposition (T p ) is performed; similarly, to multiply every single row of matrix by matrix over the field, the mix rows (M R ) take place. Next, to present the final hash value H (X d ), the hash value is created by iterating the compressed function X d . Xd H (X d ) −→ H X d(E−1) , m E
(7)
where the hash of previous blocks is denoted as H X d(E−1) and the number of blocks is indicated as E. To upload the data, the patient should log into the system by entering the user such as password, user name, and hash code while logging into the system to upload data
3.3 Initialization of Node and Clustering The nodes are initialized after the login phase. After that, to recognize a varied group within the nodes, the clustering strategy is utilized by using SRCGMM. In normal GMM, the directional relationship between two nodes is measured by the covariance calculation; however, the strength of the relationship between two nodes is not measured. Thus, the Spearman Rho Correlation Coefficient is utilized. Therefore, the proposed methodology is termed SRCGMM. The initialized nodes are modeled as In , In = [I1 , I2 , I3 , ..., Ia ]
(8)
where the number of nodes be a. The probability distribution function of GMM for In is formulated as K p In(i) = πk G(In |μk , Ck )
(9)
k=1
where the influence factor be πk , the mean as μk , the Spearman Rho Correlation Coefficient be Ck , the Gaussian distribution as G, the number of clusters is exhibited as k = (1, 2, . . . K ), and the K th cluster be K . The parameters like μk , πk , Ck should be estimated by utilizing the Expectation–Maximization step, the parameters are estimated. By utilizing the hidden variables’, the E-step computes the maximum likelihood estimation and to compute the parameters’ values, the M-step maximizes
384
Y. Jani and P. Raajan
the maximum likelihood value attained in the E-step which are derived as πk G(In |μk , Ck ) γ ( j, k) = K ; μk = π j G I n μ j , C j j=1
γ ( j, k)In(i) 1 ; πk = γ ( j, k) γ ( j, k) M (10)
where to gauge the strength of the relationship betwixt ‘2’ nodes and the direction of the relationship, the Spearman Rho Correlation Coefficient (Ck ), which is a bivariate analysis, is utilized. 6 d2 Ck = 1 − 2 a a −1
(11)
where the difference betwixt the strength of two nodes be d. The relation betwixt the ‘2’ nodes will be weaker if Ck is 0. Lastly, the nodes are clustered.
3.4 Cluster Head Selection Here, the QIRWS-BWO model is utilized to select the CH. Regarding the black widow spider’s unique mating behavior, the BWO, a meta-heuristic algorithm, is developed. Nevertheless, lower solution accuracy, slow convergence rate, falling into local optimal, and lack of diversity are the drawbacks that exist in BWO. Thus, the QIRWS-BWO is proposed here to trounce the aforementioned issues. Generally, the population having a number of widows (i.e., number of clusters k) with size Z is initialized in which every single widow is specified as an array of 1 × Z . where the widow is represented as w = [w1 , w2 , w3 , . . . , w Z ]. Regarding the nodes’ residual energy, average distance to cluster leader, node distance, node density, and node’s position, the widow’s fitness is computed. Next, the widow’s fitness is estimated as fitness = f [w1 , w2 , w3 , . . . , w Z ]
(12)
By utilizing the QI technique, a new population is generated by selecting an optimal individual along with the other two individuals as of the initial population. QI, a curve fitting methodology, is utilized for constructing a quadratic function. It is expressed as OpZ
2 2 2 2 wuZ − wvZ f + wvZ fv − w 2Z f u + w 2Z − wuZ 1 = × 2 (wuZ − wvZ ) f + (wvZ − w Z ) f u + (w Z − wuZ ) f v
(13)
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using …
385
where the two individual contemporary populations be wu and wv , and the fitness values of the respective individual be f u and f v . Afterward, the quadratic function’s population wnew(u) = [wu1 , wu2 , wu3 , . . . , wuZ ]. After that, the parent selection process is performed. In this, for mating, a pair of parents is chosen by utilizing RWS; then, they are recombined for creating offspring for the subsequent generation. A fixed point is selected on the wheel circumference in RWS together with the wheel is rotated. The wheel’s region that comes in front of the fixed point is selected as the parent. The pair of parents P is computed as P = Z
fitness
fitness=1
fitness
(14)
Afterward, to create the next generation, the selected pairs begin to mate. An array termed α should be generated in the procreation phase. Then, offspring generated by utilizing alpha is modeled as offspring 1 = α × P1 + (1 − α) × P2 ; offspring 2 = α × P2 + (1 − α) × P1 (15) where the parents be P1 and P2 . Next, after or before mating, the female black widow eats her husband in the cannibalism phase. Next, by selecting the number of individuals randomly as of the population, the mutation is performed. Lastly, the population is updated; subsequently, the stopping condition is verified. In such a manner, the CH, which is notated as Chead , is selected by employing the QIRWSBWO methodology. Thus, by utilizing these CHs being selected and the data are transferred to the base station.
386
Y. Jani and P. Raajan
3.5 Encryption By employing the SAES, the data are encrypted before transferring them to the BC server. AES is a symmetric block cipher. In normal AES, better security was provided by the mix columns; however, it consumes larger computations, thus, making the algorithm’s performance slow. Thus, a swapping operation is utilized instead of a mix column. Thereby, the proposed methodology is renowned as SAES. Figure 2 depicts the architecture of the proposed SAES. Step 1: Let, the data as of the base station be considered as the input data, which is signified as D; in addition, the key’s size identical to the data size is
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using …
387
Fig. 2 Architecture of the proposed SAES
specified as K (D). D = [D0 , D1 , D2 , . . . , D15 ]
(16)
K (D) = K (D)0 , K (D)1, . . . , K (D)15
(17)
Step 2: Firstly, in addition to the round key, by performing the XOR operation, the key is added to the data. It is expressed as A = D ⊕ K (D)
(18)
Step 3: In round 1, the sub-bytes transformation replaces every single byte of input data by row and column of S-box which make certain that alterations in individual state bits propagate quickly across the cipher text. Step 4: After that, the shift row transformation is performed. When performing this operation, the first row remains unchanged while the other rows are executing the shift operation. Step 5: Next, the Swapping operation was conducted where the rows and columns are exchanged in the matrix. Lastly, by utilizing the XOR operation, the ‘add round key’ was executed to amalgamate the key with the data. For every single round, the process is repeated. The data are securely amassed in the BC after encryption.
3.6 Blockchain BC is a system in which the data are stored in such a manner that altering, hacking, or cheating the system is difficult. A number of transactions are included in every single block in a chain. After securely storing data in BC, the doctor who needs to access the data should first register and login into the BC. After login, the particular patient’s details should be entered by the doctor. Then, the hash code will be created. Figure 3 demonstrates the BC’s architecture.
388
Y. Jani and P. Raajan
Fig. 3 Architecture of the blockchain
The doctor is regarded as an authorized user if both the hash codes generated by the doctor and patient are the same or else, an alert message will be forwarded to the hospital and patients.
4 Results and Discussion The QIRWS-BWO is evaluated in Python. In this, from the MHEALTH dataset, the data are gathered.
4.1 Dataset Description While performing various physical activities, the body motion and vital signs recordings obtained for ‘10’ volunteers of the diverse profile are included in the Mobile HEALTH (MHEALTH) dataset.
4.2 Performance Analysis of Clustering Regarding response time, clustering time, energy consumption, latency, and throughput, the performance evaluation is done for the proposed SRCGMM. Gaussian Mixture Model (GMM), K-means, and Hierarchical clustering (HC) are the prevailing methodologies with which the proposed one is analogized. The comparative analysis proposed along with prevailing methodologies are demonstrated in Table 1. When analogized with the prevailing models, the lower energy consumption of 1120 J was achieved by the proposed model. In the same
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using …
389
Table 1 Comparative analysis in terms of energy consumption, throughput, clustering time, response time, and latency Techniques/metrics
Energy consumption (J)
Throughput (bps)
Clustering time (ms)
Response time (ms)
Latency (ms)
Proposed SRCGMM
1120
5678
1054
1023
1043
GMM
1387
3422
1134
1276
1365
K-means
2856
3126
2987
2655
2398
HC
5438
2971
3452
4532
3755
way, the proposed system attained a higher throughput of 5678bps than the prevailing models. Similarly, the proposed model’s clustering time, response time, and latency are 1054, 1023, and 1043 ms, which are lower than the prevailing methodologies. Therefore, it is confirmed that in clustering, superior performance was attained by the proposed system.
4.3 Performance Analysis of Cluster Head Selection Black Widow Optimization (BWO), Mayfly Optimization Algorithm (MOA), Bacterial foraging optimization (BFO), and Small String Optimization (SSO) are the prevailing methodologies with which the proposed QIRWS-BWO is analogized regarding the fitness outcome. Figure 4 exhibits the proposed together with the prevailing models’ performance evaluation. The proposed model’s fitness value is 5422 for iteration 5. Likewise, the fitness values range as of 6245–8634 for iterations ranging from 10 to 25. When analogized with the prevailing methodologies, the proposed one attained a higher fitness value. Thus, it is proved that for cluster head selection, a better performance was attained by the proposed model.
4.4 Performance Analysis of Encryption Regarding encryption time, decryption time, and security level, the proposed SAES’s performance is assessed, which is then analogized with the prevailing Advanced Encryption Standard (AES), Diffie Hellman Secret Key–Double Elliptic Curve Cryptography (DHSK–DECC), Elliptic Curve Cryptography (ECC), Rivest–Shamir– Adleman (RSA), and Diffie Hellman methodologies. Comparative analysis of the proposed together with existing models is given in Table 2. The encryption time, decryption time, algorithmic complexity, and security level of the proposed system are 1096 ms, 1123 ms, 67382 ms, and 985 which are
390
Y. Jani and P. Raajan
Fig. 4 Performance analysis of the proposed QIRWS-BWO and the existing model
lower than that of the prevailing models. Likewise, the proposed model consumes 127,625,148 kb of memory for encryption and 132,964,507 kb of memory for decryption. Likewise the proposed method requires 846 ms to generate the hash code and 875 ms to verify the hash code. Thus, the proposed SAES safeguards the patients’ health data against intruders.
5 Conclusion In this work, by employing novel QIRWS-BWO as well as SAES models, a BCcentric secure and energy-efficient healthcare IOT has been proposed. Next, regarding numerous metrics, the proposed methodologies’ performance is analyzed experimentally. The outcomes displayed that better performance was attained by the proposed methodology by achieving better energy consumption and security level of 1120 J and 98%, respectively. It is concluded that the proposed one is highly effective along with secure data transmission in healthcare IoT. In the future, with some enhanced methodologies, the work will be extended to perform attack detection during data transmission.
Proposed SAES
1096
1123
98
127,625,148
132,964,507
846
875
67,382
Techniques/metrics
Encryption time (ms)
Decryption time (ms)
Security level (%)
Memory usage on encryption (kb)
Memory usage on decryption (kb)
Hash code generation time (ms)
Hash code verification time (ms)
Algorithm complexity (ms)
Table 2 Comparative analysis of the proposed model
62,546
956
885
163,865,786
157,678,839
97
1303
1125
AES
61,294
1068
997
179,405,623
162,678,476
96
1345
1226
DHSK-DECC
54,893
1764
1274
207,653,481
192,890,452
90
3121
3214
ECC
50,734
2356
1985
246,945,623
219,074,571
89
4326
4532
RSA
44,689
2789
2794
387,564,936
258,450,723
82
5437
4672
Diffie Hellman
Blockchain-Based Secure and Energy-Efficient Healthcare IoT Using … 391
392
Y. Jani and P. Raajan
References 1. Pirbhulal S, Samuel OW, Wu W, Sangaiah AK, Li G (2019) A joint resource-aware and medical data security framework for wearable healthcare systems. Futur Gener Comput Syst 95:382– 391 2. Tao H, Bhuiyan ZA, Abdalla AN, Hassan MM, Zain JM, Hayajneh T (2018) Secured data collection with hardware based ciphers for IoT-based healthcare. IEEE Internet Things 6(1):410–420 3. Cho Y, Kim M, Woo S (20158) Energy efficient IoT based on wireless sensor networks for healthcare. In: 20th International conference on advanced communications technology. IEEE, Chuncheon, 11–14 Feb 2018 4. Hossain M, Riazul Islam SM, Ali F, Kwak KS, Hasan R (2017) An internet of things based health prescription assistant and its security system design. Future Gener Comput Syst 82(4):422–439 5. Almulhim M, Zaman N (2018) Proposing secure and lightweight authentication scheme for IoT based e-health applications. In: 20th International conference on advanced communications technology. IEEE, Chuncheon, 11–14 Feb 2018 6. Dewangan K, Mishra M (2018) Internet of things for healthcare a review. Int J Adv Manag Technol Eng Sci 8(3):526–534 7. Aktas F, Ceken C, Erdemli YE (2017) IoT based healthcare framework for biomedical applications. J Med Biol Eng 38(6):966–979 8. Almulhim M, Islam N, Zaman N (2019) A lightweight and secure authentication scheme for iot based e-health applications. IJCSNS Int J Comput Sci Netw Secur 19(1):107–120 9. Sun Y, Lo FPW, Lo B (2019) Security and privacy for the internet of medical things enabled healthcare systems a survey. IEEE Access 7:183339–183355 10. Ismail L, Materwala H, Zeadally S (2019) Lightweight blockchain for healthcare. IEEE Access 7:149935–149951 11. Farouk A, Alahmadi A, Ghose S, Mashatan A (2020) Blockchain platform for industrial healthcare vision and future opportunities. Comput Commun 154:223–235 12. Aujla GS, Jindal A (2020) A decoupled blockchain approach for edge-envisioned IoT-based healthcare monitoring. IEEE J Sel Areas Commun 39(2):491–499 13. Khatoon A (2020) A blockchain based smart contract system for healthcare management. Electronics 9(1):1–23 14. Jamil F, Ahmad S, Iqbal N, Kim DH (2020) Towards a remote monitoring of patient vital signs based on IoT-based blockchain integrity management platforms in smart hospitals. Sensors 20(8):1–26 15. Shahbazi Z, Byun YC (2020) Towards a secure thermal energy aware routing protocol in wireless body area network based on blockchain technology. Sensors 20(12):1–26 16. Saba T, Haseeb K, Ahmed I, Rehman A (2020) Secure and energy efficient framework using internet of medical things for e-healthcare. J Infect Public Health 13(10):1567–1575 17. Jan MA, Khan F, Mastorakis S, Adil M, Akbar A, Stergiou N (2021) LightIoT lightweight and secure communication for energy efficient IoT in health informatics. IEEE Trans Green Commun Netw 5(3):1202–1211 18. Bharathi R, Abirami T, Dhanasekaran S, Gupta D, Khanna A, Elhoseny M, Shankar K (2020) Energy efficient clustering with disease diagnosis model for IoT based sustainable healthcare systems. Sustain Comput Inf Syst 28(5):1–28 19. Rahman A, Shamim Hossain M, Loukas G, Hassanain E, Rahman SS, Alhamid MF, Guizani M (2018) Blockchain based mobile edge computing framework for secure therapy applications. IEEE Access 6:72469–72478 20. Wang R, Liu H, Wang H, Yang Q, Wu D (2019) Distributed security architecture based on blockchain for connected health architecture, challenges, and approaches. IEEE Wirel Commun 26(6):30–36
Plant Pathology Using Deep Convolutional Neural Networks Banushruti Haveri and K. Shashi Raj
Abstract Plants infection has affected people worldwide, resulting in a 13% reduction in crop production. Plant Pathology tries to enhance the wellness of plants when they are exposed to harmful environmental elements such as pH, humidity, temperature, wetness, and others. Undetected can worsen the state of the environment and raise pollution through the usage of harmful substances. The current standard approaches are expensive and time consuming. Therefore, automatic disease detection aids in quickly recognizing diseases in plants. EfficientNet, a deep convolutional neural network model, is utilized in this study to identify illnesses in apple leaves. The data sets was gathered from the Kaggle website and consists of four classes: apple scab, healthy, black rot, and apple cedar rust. In order to process and improve the data set for finer categorization, data processing and data augmentation techniques are applied. Considered EfficientNet-B3 models, precision is almost 100%, and MATLAB is used to identify leaf disease in real time. Keywords Deep learning · Convolutional neural networks · Deep learning models · Data augmentation · EfficientNet-B3 · Disease detection and classification · MATLAB
1 Introduction Most commonly cultivated fruits, apples, are produced all over the world and significantly increase worldwide productivity. Increase in the number of illnesses will reduce apple yield. To identify apple leaf diseases quickly and accurately is crucial. The data set for this study was gathered from the Kaggle website [1] and the actual world, and it consists of 2403 training photos that have been divided into four classes: healthy, apple scab, apple cedar rust, and black rot. B. Haveri (B) · K. Shashi Raj ECE Department, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India e-mail: [email protected] K. Shashi Raj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_29
393
394
B. Haveri and K. Shashi Raj
The traditional methods are based on farmers, followed by the use of pesticides. It consumes more time and is a difficult process which leads to the misdiagnosis. Machine learning (ML) [2] techniques have a way to increase the use of automatic plant disease detection results. Computer capabilities for object identification, data sensing and interpretation, categorization, and feature extraction are improved through machine learning and computer vision. Data collection, pre-processing of data, augmentation of data, training the model, and testing it are the steps that are included in classification and identification of diseases in plants. Unsupervised learning faces the issue of emerging patterns brought on by clustering, while supervised learning has the challenge of training data, which takes time and expertise. These machine learning methods [3, 4] largely rely on augmenting and pre-processing the data [5]. However, these systems still have a fairly poor identification rate and can overlook stronger features because the features selection process is human-based. Deep learning, a particular type of ML algorithm, is used in the agriculture domain. Convolutional neural networks can automatically extract features, eliminating the need for laborious pre-processing and giving best accuracy. For their training, CNN models need a lot of data but very few neurons. Finding the optimal network model architectures and enhancing the data are challenging. A neural network with only a convolution and pooling layer is a rudimentary CNN model. Deep CNN [6] is a multilayer perceptron neural network, which has several layers. In terms of efficiency, speed, and accuracy, DNN models outperform CNN, whereas CNN needs a more complex architecture, on the other hand, offers a better visual connection. The shortcomings of CNN models include more pooling layers and parameters, and longer calculation times are overcome in this research through the use of a more accurate DNN model, EfficientNet-B3 [7]. EfficientNet uses feature reuse, model scaling, and parameter reduction to maintain correctness. The remainder of the article is organized supervene as follows: we discuss earlier works, current systems, and their shortcomings in Sect. 2. In Sect. 3, we provide a brief overview of the proposed method, originality, and goals. We outline the experimental protocols and evaluation methodology in Sect. 4 We analyse our experiments and give quantitative data in Sect. 5. Section 6 brings the paper to a close.
2 Previous Works and Their Limitations Traditional methods of plant illness control involve human scouting, which frequently results in misdiagnosis and excessive use of pesticides that are harmful to the environment and the economy [8]. Additionally, in this time-consuming operation, laboratory settings are needed. In order to save the produce, an automatic disease identifying system that uses photographs of apple leaves can aid in fast identification. One of the more well-liked and reliable classical machine learning methods is K-Means Clustering (KMC), which combines Random Forests and Support Vector Machines (SVM). Most commonly used is SVM. An enhanced mixed methodology
Plant Pathology Using Deep Convolutional Neural Networks
395
or say hybrid methodology is applied for picture and smoothing, noise, spot enhancement, and colour separation reduction in the SVM model. Box filtering, Gaussian, and Median filters are the three pipeline techniques used in this strategy. The colour and texture of each disease may vary, and EM and SCP algorithms can recognize these variations. The two classifications of Downy and Powdery Mildew in grape disease detection require, respectively, colour and texture traits. K-Means Clustering is used to segment these features. Contrasts, homogeneity, entropy, diagonal, and difference variance are examples of texture features. The Linear SVM is then fed the nine colours and nine textures for class + categorization. To examine colour and texture only in the spot area, the Otsu approach from a specific medium for lesion segmentation of the L*a*b* colour area is utilized in this model. The severity function is used to compare white pixels in the spot regions to all other pixels, and only the region of interest—i.e. the region with the greatest disease—is trained using the Linear Support Vector Machine. However, these methods don’t work well with angle and shade and are better suited for photographs with uniform backgrounds. They significantly rely on extraction of features to improve accuracy, prevent overfitting, and important features are extracted through data pre-treatment and augmentation techniques. To eliminate a time-consuming pre-processing operation, deep learning techniques are more helpful. The most often used model for classifying plant diseases is CNN. In order to extract and fuse illness spots, convolutional neural networks utilize SSD to detect objects [9]. Nine different disease groups are categorized using CNN enhanced models by altering the parameter, pooling combinations, and including ReLU functions. Model overfitting is avoided by using data augmentation techniques. Normalization is then applied to achieve accuracy. However, the parameters are reduced because just the first classifier is trained. With this addition, accuracy has increased to 97.9%. Singh et al. [10] propose a multilayer neural network using an AlexNet architecture, which has a convolutional layer with a pooling layer and a ReLU function to reduce the parameters and to improve the extraction of features. A flatten function is used to transform photographs in one-dimensional array. ANN and hybrid metaheuristic feature selection was implemented together with a feed forward network [11]. To alter pixel intensities and capture the most information possible, contrast enhancement is performed. Normalization is then used to hasten convergence during the back propagation procedure. Cross-entropy is utilized to estimate loss while the Adam optimizer is employed for optimization. To discriminate between affected and common regions, channel-spatial attention (CSA) and regional proposal network (RPN) were used after data augmentation in [12]. The Fast Translation algorithm and SoC are responsible for the accuracy. Leaf life is discovered and analysed using the colour data from the pixels that make up the image. To recognize disease and appropriate pesticide to employ, the data processed is then compared with deep learning (DL) data sets. The Residual Network RESNET 152 was proposed to detect “Mellowness” in Dragon Fruit Using DL [2] to examine fruit growth in every stage, classify ripe and unripe fruit, and determine the ideal
396
B. Haveri and K. Shashi Raj
period for fruit. With the assistance of specialists, the training process was carried out using photos which classified fruit with mellowness. The GoogLeNet and AlexNet [13] architectures use fewer parameters and are additionally improved to strengthen the precession by involving the supplementary layers in the pooling area and an activation function between connected layer pairs. This causes a loss of rotational variance and spatial dispersion when ignored, which is responsible for less precision. It could be inferred that resemblance has to be discernible in photos in spite of the size and orientation variations of the disease patch, along with the advantages and disadvantages of the model choices, and the procedure needs to be simple to use. The proposed technique was developed because classification gives more accuracy if the model uses spot segments as region of interest (ROI). In [14] to improve the functionality of RBVSC, novel supervised systems were created in this study. The input 0 pictures were initially gathered. Then, using mean orientation-based super-pixel segmentation, the retinal vessels were divided up. In addition, the feature vectors from segmented regions were extracted using a CNN. In order to categorize the “vessel” and “non-vessel” regions, the classification is performed by Support Vector Machine (SVM) on collected features. The CNN and SVM combo quickly and accurately detects the patterns by learning the feature values from the raw images. In [15], an automated plant disease detection model for the Internet of Things has been created in this work. For the purpose of taking pictures of plant leaves, the suggested design positions the nodes over the simulated environment. The system keeps a sink node that aids in IoT-based monitoring by gathering data from the automated plant disease detection module. The median filter is used as a pre-processing step on the node pictures to make them appropriate for plant disease identification. After that, the image is segmented, and from it, segment-level and pixel-level features are recovered. The purpose was to make clear the specifics of diseases and how artificial intelligence can quickly identify them. We talk about the autonomous detection of plant diseases using machine learning and deep learning. This research area also focused on the shift in machine learning techniques from the past 5 years. Additionally, many data sets pertaining to these diseased plants are thoroughly examined. Also addressed are the difficulties and issues with the current systems [16]. In this article, a unique CNN architecture is suggested for categorizing images of ladies finger plant leaves into different groups namely diseased, leaf burnt, and healthy. The data set includes 1087 cases of leaves from the ladies finger plant, of which 456 are considered healthy (non-diseased), 509 are considered pest affected and disease, and 121 are considered to have leaf burn from fertilizer overuse. The photographs were considered on-site at several villagers’ farms in the Tiruvannamalai area of Tamil Nadu, India. 96% classification accuracy was attained using the suggested CNN architecture [17]. The paper [18] suggests a solution to the scheduling issue that arises when using UAVs for agricultural plant protection tasks including spraying pesticides, flying, and charging. UAVs are initially given jobs to complete, and then, schedules are
Plant Pathology Using Deep Convolutional Neural Networks
397
slotted. Our approach uses the Dragonfly Algorithm to quickly find a schedule that is close to ideal. The proposed strategy is applied and tested. A thorough discussion of our method’s performance evaluation is included, along with a list of the factors that go into choosing the ideal answer. The structure of a simulation-based strategy is provided in this study, along with a process for putting it into practice. When using simulation to balance operating performance and planning cost, the study also incorporates mathematical techniques and heuristic methodologies. An example shows how the suggested strategy can accomplish the objective of production and better plant arrangement outline planning [19]. In order to reduce the number of parameters and computational cost for plant leaf disease detection and classification, a new hybrid CNN strategy based on Inception The paper, a hyperparameter optimization method for deep CNN, is proposed for the purpose of identifying plant species. This scheme is basically on the artificial bee colony (ABC) algorithm, also known as the optimal deep CNN (ODC) classifier. It is applied on a pre-made leaf images called Folio, which includes #637 pictures of plants from 32 different species. To increase the effectiveness of the different classifiers, the images underwent different image pre-processing such as segmentation, scaling, and augmentation. By comparing the results from the test phase and the literature with the achieved ODC, performance evaluation measures including sensitivity, accuracy, specificity, and F1-score are used to determine the effectiveness of the system demonstrate the proposed approach’s good accuracy performance despite the fact that the number of parameters is drastically decreased. A metadata of 49,135 pictures consisting of 30 different classes from fourteen distinct leaves, in addition to diseased and healthy ones, was used to train and evaluate the proposed hybrid model using k-fold cross-validation. The new model, which offers approximately a 74.99% parameter reduction compared to the standard CNN [20], has a high accuracy of 99.27% and an average accuracy of 99%. In order to identify diseases in photos of tomato leaves, the Modified InceptionResNet-V2 (MIR-V2) Convolution Neural Network is employed in this study along with a pre pre-trained model. A self-collected data set containing one healthy leaf and seven possible classifications of tomato leaf diseases serves as the training set for the suggested model. Different metrics, including learning rate, dropout, number of epochs, batch size and accuracy, are used to assess the performance of the model. The F1-score is 97.94%, while the disease categorization precision rate for the used network is 98.92%. The observation findings confirm the viability of the proposed strategy and demonstrate its potency in identifying diseases [21]. To reduce memory use and processing costs, the proposed DPD-DS includes a Light Head Region Convolutional Neural Network (R-CNN). By altering the backbone’s structure and the Anchor’s proportions in the RPN network, it improves computing efficiency and detection accuracy. The suggested approach’s viability and robustness are examined by contrasting the DPD-DS model with current state-of-theart models. The observational findings show that the suggested strategy outperforms existing approaches in terms of precision, recall, and mean average precision (mAP).
398
B. Haveri and K. Shashi Raj
Additionally, the suggested framework’s identification rate is reduced dramatically by two times, increasing the model’s ability to identify disease in leaf [22]. The paper, a hyperparameter optimization method for deep CNN, is proposed for the purpose of identifying plant species. This scheme is basically on the artificial bee colony (ABC) algorithm, also known as the optimal deep CNN (ODC) classifier. It is applied on a pre-made leaf image called Folio, which includes 637 pictures of plants from 32 different species. To increase the effectiveness of the different classifiers, the images underwent different image pre-processing such as segmentation, scaling, and augmentation. By comparing the results from the test phase and the literature with the achieved ODC, performance evaluation measures including sensitivity, accuracy, specificity, and F1-score are used to determine the effectiveness of the system [23].
3 Proposed System More pooling layers are added to popular CNN model architectures to have high accuracy and less parameters which resulted in spatial loss and feature information. The popular DNN model, EfficientNet, aimed at addressing the drawbacks of previous models. The suggested method will consider the RGB values of the photos and the distribution of each channel. This method will enrich the data utilizing annotation and augmentation methods for less overfitting and higher accuracy, particularly Flipping, Rotations, Canny Edge detection, Brightness, and Blurring modification, after exploratory data analysis to derive insights. EfficientNet-B3 is a DNN-based ImageNet pre-trained model which is then fitted to an augmented data set, and accuracy is evaluated over two epochs. Figure 1 shows the images of leaves from each category. By consistently depth, width, and image size, EfficientNet [7] gets over this limitation and improves the working speed with little data set. The network requires more channels and layers as the image gets bigger in order to increase the reception field and record more accurate patterns. Since EfficientNet-B3 employs the fewest parameters and provides the best level of accuracy, it is implemented. The comparison has been given in Fig. 2 with other deep CNN models. Along with loss, feature information may “fade off” as it travels through numerous levels, specifically the pooling layer. EfficientNet connects each and every layers from all previous levels to the next in order to ensure maximum feature propagation. A smaller network with fewer channels and parameters is possible using feature concatenation and reuse. The system we provide is meant to achieve the following goals. • • • •
To create an automatic classifier that can recognize the specified classes. To correctly classify a testing data set without labels. To spot leaves that have several illnesses. To address unusual symptoms and multiple disease classes.
Plant Pathology Using Deep Convolutional Neural Networks
Healthy
Apple scab
399
Black rot
Cedar apple rust
Fig. 1 Leaves of plants in each category
Fig. 2 Performance evaluation of EfficientNet against other potent DL models [7]
400
B. Haveri and K. Shashi Raj
Fig. 3 Number of images in each class
• To correctly segment spots with better accuracy. • Real-time leaf disease detection using MATLAB.
4 Methodology 4.1 Data Collection About 2403 images of apple leaf diseases are taken from Kaggle website [1]. The data set is divided into a training data set and a test data set, which are labelled and unlabelled correspondingly, from Kaggle website. About 1460 photos classed as healthy, 130 as rusty, 405 as scabby, and 408 as black rot. Figure 3 shows the number of images taken in each class.
4.2 Data Pre-processing Unlabelled images are removed from the data set. The wrong labels have been changed. Annotation and augmentation techniques are used to enhance this preprocessed data, which is further divided into training and validation data sets. In order to have the same size in each batch of images keras image data generator has been applied. Flow chart for our proposed system is given in Fig. 4.
Plant Pathology Using Deep Convolutional Neural Networks
401
Fig. 4 Flow chart for our proposed method
4.3 Data Analyzation By obtaining the mean values of each channel over the complete training data set, the RGB values of the images are analysed, and the channel circulation is examined. Although it has the most consistent circulation among the photos, the blue channel also varies. Blue channel may be the key to identifying the disease in the unhealthy leaves because the greener portions of the image have extremely low blue values whereas sections with diseases have high blue values.
4.4 Data Augmentation Data augmentation is carried out using Keras ImageDataGenerator Class [13], which performs flipping, rotating, blurring, zooming and shrinking operations simultaneously on each and every image and generates the augmented images. These methods produced an enhanced data set to balance the data set.
402
B. Haveri and K. Shashi Raj
The employed algorithms are as follows: i.
The flipping technique is used with image channels that alter the index. In the vertical and horizontal flipping processes, the row and column orders are interchanged. ii. Blurring includes utilizing a Gaussian distribution to add a little bit of noise without entirely hiding the spots. iii. Keras ImageDataGenerator is used to rotate and skew objects by tiny angles. The ImageDataGenerator takes a batch of incoming sample images of input and alters each one using a variety of arbitrary changes, including resizing, brightness modification, blurring, rotating, and more. It is not an additive process, though, as it only returns the newly changed augmented data after transforming the original photos.
4.5 Model Training The last double summation of the CNN algorithm (convolution layer), that estimates the scalar product utilizing the help of smaller sub-matrices in every step, moves a 2D kernel down the length and depth of the image. With additional information, this algorithm generates more accurate findings. Pooling layer algorithm, which is the same as convolutional layer, reduces the dimension of feature maps without a kernel by computing the max value in a window rather than computing the dot products of the sub-matrix and kernel. Activation function called the RELU which is the Rectified Linear Unit in the neural network adds up the nonlinearity and expands the model’s capacity. It yields 0 if x is negative and x in all other cases. Model EfficientNet-B3: During both training and testing, it is common modelling practice to arbitrarily expand the CNN’s depth or width or to use big input images. However, this frequently results in excessive tuning and poorer efficiency. Arbitrary scaling is helpful at first but quickly reaches saturation for many factors. The compound scaling in EfficientNet model is shown in Fig. 5. EfficientNet [7] delivers greater performance by keeping the architecture organized with fewer parameters while consistently scaling the depth, width, and image size. Each dimension is scaled using a set of fixed, computed scaling factors. The best set of coefficients for each dimension is calculated using a grid search to discover the relationship between various network measurement sizes under a predetermined restriction, such as 2× more FLOPS. Figure 6 indicates the basic architecture of the EfficientNet model. The model design incorporates an inverted structure known as the Mobile Inverted Convolution (MBConv), which is also named as Inverted Residual Block, which skips connections in narrow sections and compresses the network to match the number of channels at the beginning. The narrow to wide to narrow architectural approach used by MBConv [7] is an inverted version of the conventional method convolutional network. It widens 1 × 1 convolution, then reduces channel numbers with a 3 ×
Plant Pathology Using Deep Convolutional Neural Networks
403
Fig. 5 Scaling compounds in EfficientNet [7]
Fig. 6 Basic EfficientNet architecture [7]
3 depthwise convolution that affects in fewer parameters and finally with a 1 × 1 convolution. The EfficientNet-B3 model scales model dimensions with extremely good parameter precision, resulting in fewer parameters and a shallow architecture that is smaller.
4.6 Model Testing With an expanded training data set, the EfficientNet-B3 model is developed and has a high level of accuracy. The validation and test data sets are then fitted with the same models for testing and improvement. About 1800 unlabelled photos comprise the testing data, which must be divided into the four classes.
404
B. Haveri and K. Shashi Raj
Fig. 7 Loss and accuracy in the EfficientNet model after two epochs
5 Experimental Results Losses decrease while accuracy increases steadily. The validation metrics don’t exhibit as much instability and fluctuation as the training metrics, which stabilize quite quickly after one or two epochs. The training and validation measures both gradually increased. The EfficientNet-B3 model category accuracy at epoch 2 was 1.0000, and its loss was 0.2855, and 0.0703 is the validation loss. Figure 7 shows the loss and accuracy after two epochs in the EfficientNet model.
6 Conclusion Using input images of sick leaves, an automated plant disease detection system is urgently required. Traditional approaches to machine learning are effective, but they struggle with non-uniform backgrounds, can’t reliably extract features automatically, and require a complex channel for data gathering. Deep CNN models extract and merge the attributes without any filtering methods. To avoid overfitting, data augmentation is necessary for high accuracy. Edge detection, flipping, convolution, blurring, and other image annotation and enhancement techniques can be utilized to create
Plant Pathology Using Deep Convolutional Neural Networks
405
models including precession training data. After random changes, the ImageDataGenerator Class offers a quick way to create a new data set that is many times larger than the original. While prominent CNN models like AlexNet, VGG, GoogLeNet, and others exhibit great accuracy, even they are quite complex models with more pooling layers and parameters, which results in spatial information loss. Even with little training data, the DNN model, EfficientNet, is proposed to classify leaf diseases from leaf images. Their relative abilities to uniformly scale models and reuse features through concatenation result in less parameters and accuracy of 100% over 2 epochs. Strong validation approaches, stacking, and assembly can result in models that are even more precise and reliable.
References 1. https://www.kaggle.com/c/plant-pathology-2020-fgvc7 2. Vijayakumar T, Vinothkanna R (2020) Mellowness detection of dragon fruit using deep learning strategy. J Innov Image Process (JIIP) 02(01):35–43 3. Ratnasari EK, Mentari M, Dewi RK, Hari Ginardi RV (2014) Sugarcane leaf disease detection and severity estimation based on segmented spots image. In: Proceedings of international conference on information, communication technology and system (ICTS) 4. Padol PB, Yadav A (2016) SVM classifier based grape leaf disease detection. In: Conference on advances in signal processing (CASP) 5. Khan MA, Chouhan SS, Kaul A, Singh UP, Jain S (2019) An optimized method for segmentation and classification of apple diseases based on strong correlation and genetic algorithm based feature selection. IEEE Access 7 6. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR 7. Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning 8. Jiang P, Chen Y, Liu B, He D, Liang C (2019) Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access 7 9. Zhang X, Qiao Y, Meng F, Fan C, Zhang M (2018) Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 6 10. Singh UP, Chouhan SS, Jain S, Jain S (2019) Multilayer convolution neural network for the classification of mango leaves infected by anthracnose disease. IEEE Access 7 11. Pham TN, Tran LV, Dao SVT (2017) Early disease classification of mango leaves using feedforward neural network and hybrid metaheuristic feature selection. IEEE Access 12. Chandy A (2019) pest infestation identification in coconut trees using deep. J Artif Intell Capsule Netw 01(01):10–18 13. Wang X et al. (2017) Detection-and-classification-of-apple-tree-disease-based-on-deeplearning-algorithm. IEEE Access 14. Balasubramanian K, Ananthamoorthy NP (2021) Robust retinal blood vessel segmentation using convolutional neural network and support vector machine. J Ambient Intell Hum Comput 12:3559–3569 15. Mishra M, Choudhury P, Pati B (2021) Modified ride-NN optimizer for the IoT based plant disease detection. J Ambient Intell Hum Comput 12:691–703 16. Nanehkaran YA, Zhang D, Chen J et al. (2020) Recognition of plant leaf diseases based on computer vision. J Ambient Intell Hum Comput
406
B. Haveri and K. Shashi Raj
17. Selvam L, Kavitha P (2020) Classification of ladies finger plant leaf using deep learning. J Ambient Intell Hum Comput 18. Sun F, Wang X, Zhang R (2020) Task scheduling system for UAV operations in agricultural plant protection environment. J Ambient Intell Hum Comput 19. Zhang Z, Wang X, Wang X et al (2019) A simulation-based approach for plant layout design and production planning. J Ambient Intell Hum Comput 10:1217–1230 20. Tuncer A (2021) Cost-optimized hybrid convolutional neural networks for detection of plant leaf diseases. J Ambient Intell Hum Comput 12:8625–8636 21. Kaur P, Harnal S, Gautam V et al. (2022) A novel transfer deep learning method for detection and classification of plant leaf disease. J Ambient Intell Hum Comput 22. Kavitha Lakshmi R, Savarimuthu N (2021) DPD-DS for plant disease detection based on instance segmentation. J Ambient Intell Hum Comput 23. Erkan U, Toktas A, Ustun D (2022) Hyperparameter optimization of deep CNN classifier for plant species identification using artificial bee colony algorithm. J Ambient Intell Hum Comput
Performance Evaluation of Sustainable Development Goals Employing Unsupervised Machine Learning Approach Indranath Chatterjee and Jayaraman Valadi
Abstract NITI Aayog has been publishing different reports on the performances of various Indian States and union territories (UTs) under the Sustainable Development Goals (SDGs) agendas. For this purpose, various socio-economic information related to poverty, food security, health care, employment, terrestrial ecosystems, law and order, etc., has been used. NITI Aayog has applied different statistical transformations such as normalization on these data points and applied a straightforward, globally accepted, and robust classification methodology. The objective of NITI Aayog’s methodology is to generate an aggregated score for every States/UTs based on the achievement of the respective goals keeping the national level target as optimal level to adhere. Finally, NITI Aayog has classified the States/UTs into different clusters based on the aggregated scores. In this paper, we have considered the growth rate of above-stated data points to capture year-on-year progression of States/UTs on each SDGs and applied machine learning-based classification algorithms to create different homogeneous clusters of States/UTs. We have analysed the characteristics of each cluster and tried to identify the important differentiating factors. We also compare the results of different machine learning algorithms and find out the similarity of the solutions produced by these algorithms. Keywords Unsupervised machine learning · Clustering · Graph distance
1 Introduction In September 2015, 193 countries including India committed to the Sustainable Development Goals (SDGs): ‘A blueprint to achieve a better and more sustainable future for all people and the world by 2030’. The NITI Aayog (National Institution for Transforming India)-as the apex public policy think tank of the Government of India, I. Chatterjee (B) · J. Valadi Department of Computing and Data Science, FLAME University, Pune, India e-mail: [email protected] J. Valadi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_30
407
408
I. Chatterjee and J. Valadi
develops SDG Index at States, UTs, and districts level [1]. India SDG Index is the world’s first government-led sub-national measure of SDG progress incorporating various socio-economic information related to poverty, food security, healthcare, employment, terrestrial ecosystems, law and order, etc. [2]. It has been developed to capture the progress of all States and union territories (UTs) in their journey towards achieving the SDGs [3]. The third edition of the NITI Aayog SDG India Index (2020–2021) computes goal-wise quantitative scores on the 16 SDGs, and a qualitative assessment on Goal 17 for each state and UT, covering: SDG 1: No Poverty, SDG 2: Zero Hunger, SDG 3: Good Health and Well-Being, SDG 4: Quality Education, SDG 5: Gender Equality, SDG 6: Clean Water and Sanitation, SDG 7: Affordable and Clean Energy, SDG 8: Decent Work and Economic Growth, SDG 9: Industry, Innovation and Infrastructure, SDG 10: Reduced Inequality, SDG 11: Sustainable Cities and Communities, SDG 12: Responsible Consumption and Production, SDG 13: Climate Action, SDG 14: Life Below Water, SDG 15: Life on Land, SDG 16: Peace, Justice and Strong Institutions, and SDG 17: Global Partnerships [4]. Evaluation framework developed by NITI Aayog is simple and robust [5]. The methodology has seven broad stages, namely 1. Selection of indicators (i.e. identification of suitable indicators from the National Indicator Framework on SDGs and mapped with the targets), 2. Consultation with stakeholders (the latest data on the selected indicators was collected in collaboration with the respective ministries and MoSPI), 3. Target setting (target value for 2030 was set for each indicator), 4. Normalization of raw indicator values (normalization of indicator values to a standard scale of 0–100), 5. Computation of State/UT scores (estimated as the average of the normalized values of all indicators under the Goal, for each State/UT), 6. Computation of composite Index score (the composite score is the arithmetic mean of the goal scores for 16 goals for each States/UTs by assigning equal weight to each goal. This score is an indication of the overall position of the States/UTs in their journey towards achieving the SDGs), and 7. Categorization of States/UTs (the States/UTs were classified into the four categories—Achiever, Front Runner, Performers and Aspirants—based on their distance from target). As per the SDG INDIA INDEX 3.0—2020–2021, 15 out of the 28 States are in the Front Runner category and 13 States in the Performer category, among the UTs one falls in the Performer category and seven are in the Front Runner categories [5].
2 Motivation The Index has played a key role in driving the SDG agenda in India. It has raised awareness on SDGs at many levels—within government, media, researchers, and civil society organizations. Index 3.0 is a useful instrument to judge the progress of the States/UTs in adopting and implementing the SDG agendas/policies across the country. Government has been immensely focusing on the successful implementation of such policies. 2020–2021 Government has proposed to spend Rs. 3,042,230
Performance Evaluation of Sustainable Development Goals Employing …
409
Crore in 2020–2021 [6]. The methodology that NITI Aayog has developed and implemented is straightforward, robust, and globally acknowledged. A machine learningbased analysis could provide more insight and information from the data. A set of new micro-clusters of States/UTs can be created, and we can get a better understanding and unique characteristics of each micro-cluster. We can also get a better understanding of how the Indian States/UTs are differentiated in terms of socio-economic parameters. We can also get more informative indicators which are more important in driving the progression. Another important aspect from Index 3.0 report is that the performance of States/UTs over the years has not been considered earlier while developing the segmentation. In our analysis, we have considered these aspects in detail. We have conducted a comparative study of different machine learning algorithms and analysed the formation of clusters with different algorithms.
3 Methodology The methodology section has been divided into four sub-sections. We will be discussing the data sources, challenges related to data and different data treatments in Sect. 3.1. Section 3.2 consists of different machine learning algorithms that have been used in this study. Details of clustering results and their profiling analysis are presented in Sect. 3.3. Section 3.4 contains the comparison of different clustering solutions produced by different machine learning algorithms.
3.1 Data Three years of raw data sources related to SDG performance have been used [7]. Total 62, 100, and 115 indicators information has been taken from ‘baseline report—2018’, ‘V2.0 report—2019–202,020’ [8], and ‘V3.0 report— 2020–2021’, respectively. Firstly, the data related to ‘Baseline report— 2018’ has been ignored from this analysis due to a very limited number of indicators. Secondly, any indicators for which raw data sources were before 2018 have been removed from this study. Henceforth, we have considered the indicators where recent raw data sources have been used. Thirdly, 72 common indicators from years 2019–2020 and 2020–2021 have been identified. And finally, we have removed indicators, if there was a change in raw data source between 2019–2020 and 2020–2021. Three different types of data treatments have been performed to prepare the final data set. Firstly, we have removed indicators where the value has changed drastically, or the raw data source has changed between two consecutive years stated earlier, e.g. extreme change in value has been noted for the indicator SGD 1.3 which is ‘Proportion of the population (out of total eligible population) receiving social protection benefit under Maternity Benefit’, hence removed from the analysis. Similarly, indicator
410
I. Chatterjee and J. Valadi
SDG 4.1 has been removed as the raw data source was ‘Adjusted Net Enrolment Ratio (ANER) in elementary education (class 1–8)’ in year 2019–2020, whereas it has changed to ‘Adjusted Net Enrolment Ratio in Elementary (Class 1–8) and Secondary (Class 9–10) education’ in year 2020–2021. Secondly, we have removed any variables where all the values are completely same between two successive years, e.g. SDG 4.3 which is ‘Gross Enrolment Ratio in Higher education (18–23 years)’, etc. Thirdly, data level adjustment has been performed due to the change in reported data level, e.g. for SDG 8.10 which is ‘Proportion of women account holders under PMJDY’. The data has been reported in 100 pointers scale in 2019–2020, whereas it has been changed to 10 pointers scale in 2020–2021. Similarly for SDG 12.a, it was ‘Installed capacity of grid interactive bio power per 10 lakh populations. (MW)’ in 2019–2020, whereas it was changed to ‘Installed Capacity of Grid Interactive Bio Power per 100,000 population’ in 2020–2021. Finally, Union Territory ‘Ladakh’ has been removed from this analysis due to availability of raw data for the year 2019–2020. Year-on-year growth rate was calculated for 36 States/UTs and for 45 indicators. This data set has been used as the input data set for machine learning models.
3.2 Algorithms Principal Components analysis (PCA) has been used to reduce the dimensionality of the data set [9]. Key finding of PCA has been explained in subsequent sections. For clustering of States/UTs, we have used multiple unsupervised machine learning algorithms. K-Means [10], density-based spatial clustering of applications with noise (DBSCAN) [11], and Affinity Propagation (AF) [12] have been selected due to distinctive theoretical perspective and methodology. Distinction of these algorithms in terms of parameters, scalability, usage ability, and measurement metrics has been explained in Table 1.
3.3 Results from Clustering Analysis Findings from Principal Component Analysis (PCA) Principal Components analysis (PCA) has been performed on 45 indicators. Scree plot has been used to identify the optimal number of Principal Components (PCs). As per the analysis, 7 PC would be the optimal selection for the dataset, refer to Fig. 1. But maximum 2 PCs have been considered after several iterations and extensive profiling of clusters. More than 2 PCs forced all clustering algorithms to merge several clusters together.
Performance Evaluation of Sustainable Development Goals Employing …
411
Table 1 Summary of algorithms employed Sl. No.
Method name
Parameters
1
K-Means
2
3
Scalability
Use case
Geometry (metric used)
Number of clusters Very large n samples, medium n clusters with Mini Batch code
General purpose, even cluster size, flat geometry, not too many clusters, inductive
Distances between points
Affinity propagation
Damping, sample preference
Not scalable with n samples
Many clusters, uneven cluster size, non-flat geometry, inductive
Graph distance (e.g. nearest-neighbour graph)
DBSCAN
Neighbourhood size
Very large n samples, medium n clusters
Non-flat geometry, uneven cluster sizes, outlier removal, transductive
Distances between nearest points
Fig. 1 Selection of optimal number of principal components
Composition of Clusters Under Different Algorithms K-Means, DBSCAN, and AF have developed three different solutions. The optimal number of clusters 7, 6, and 8 has been produced by K-Means, DBSCAN, and AF algorithms, respectively. The number of natural clusters produced by three different solutions, their distributions, and compositions of States/UTs has been published in Tables 2 and 3.
412
I. Chatterjee and J. Valadi
Table 2 No. of States/UTs under three different clustering solutions Cluster number No of States/UTs in each No of States/UTs in each No of States/UTs in each cluster of K-Means cluster of DBSCAN cluster of AF 1
12
7
5
2
1
4
1
3
1
3
1
4
4
3
2
5
6
10
1
6
9
9
8
7
3
NA
9
8
NA
NA
9
Grand total
36
36
36
Profiling of Clusters All clustering solutions have been developed using the first two PCs (derived from the growth rate of 45 indicators over two successive years), but for in-depth profiling of clusters, we have used the actual values of the indicator variables from the year 2020 to 2021. Out of 45 indicators, a certain indicator has higher significance defining and explaining the variations among the clusters. The indicator with higher standard deviation has been selected for profiling analysis. A total of 9 indicators (specifically from 9 different sub-goals) covering 8 different goals (i.e. Good Health and Well-Being, Gender Equality, Industry, Innovation, and Infrastructure, Sustainable Cities and Communities, Responsible Consumption and Production, Climate Action, Life on Land and Peace, Justice, and Strong Institutions) turned out to be the most important indicators across all the three solutions. This implies that all States/UTs are performing competitively in other goals apart from these 8 goals, where deviation in their performances was evident. This implies that all States/UTs are performing competitively in other goals apart from these 8 goals, where deviation in their performances was evident. Three indicator variables ‘Total case notification rate of Tuberculosis per 1,00,000 Populations’, ‘Installed sewage treatment capacity as a Per. of sewage generated in urban areas’ and ‘Per. use of nitrogenous fertilizer out of total N, P, K, (Nitrogen, Phosphorous, Potassium)’ were the common important variables for all the three clustering algorithms. ‘CO2 saved from LED bulbs per 1000 Populations (Tonnes)’ and ‘Forest cover as a Per. of total geographical area’ were important indicators for DBSCAN and AF solutions, whereas ‘Per. of renewable energy out of total installed generating capacity (including allocated shares)’ has been a uniquely important indicator for AF solution. The means of the indicators across all clusters is shown in Fig. 2. It has also been evident for all three solutions that even though there are variations in centroids nevertheless clusters are closely associated, implying that States/UTs are performing very competitively to achieve the targeted SDGs.
Performance Evaluation of Sustainable Development Goals Employing …
413
Table 3 Composition of States/UTs Cluster number K-Means
DBSCAN
AF
Cluster 1
Andhra Pradesh, Gujarat, Haryana, Madhya Pradesh, Punjab, Rajasthan, Uttarakhand, Andaman and Nicobar Islands, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep, and Puducherry
Gujarat, Goa, Himachal Pradesh, Karnataka, Manipur, Uttar Pradesh, Delhi
Assam, Bihar, Jharkhand, Mizoram, West Bengal
Cluster 2
Nagaland
Arunachal Pradesh, Chhattisgarh, Kerala, Odisha
Maharashtra
Cluster 3
Tripura
Andhra Pradesh, Andaman and Nicobar Islands, Puducherry
Nagaland
Cluster 4
Assam, Bihar, Jharkhand, Mizoram
Telangana, Chandigarh, Daman and Diu
Sikkim, Tamil Nadu
Cluster 5
Arunachal Pradesh, Chhattisgarh, Kerala, Maharashtra, Meghalaya, Odisha
Nagaland, Tripura, Assam, Bihar, Jharkhand, Mizoram, Maharashtra, Sikkim, Tamil Nadu, West Bengal
Tripura
Cluster 6
Goa, Himachal Pradesh, Karnataka, Manipur, Telangana, Uttar Pradesh, Chandigarh, Daman and Diu, Delhi
Haryana, Madhya Pradesh, Punjab, Rajasthan, Uttarakhand, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep, Meghalaya
Gujarat, Haryana, Madhya Pradesh, Rajasthan, Uttarakhand, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep
Cluster 7
Sikkim, Tamil Nadu, West Bengal
NA
Goa, Himachal Pradesh, Karnataka, Manipur, Uttar Pradesh, Delhi, Telangana, Chandigarh, Daman and Diu
Cluster 8
NA
NA
Arunachal, Chhattisgarh, Kerala, Odisha, Andhra Pradesh, Andaman and Nicobar Islands, Puducherry, Punjab, Meghalaya
414
I. Chatterjee and J. Valadi
Cluster 1 155.7
Cluster 2
K-MEANS
Cluster 3
Cluster 4
48.3
94.1
263.2
38.6
119.0
143.2 156.0 70.0 233.0
46.7 23.3 23.0 31.0 36.5
86.9 71.4 75.8 74.3 93.0
193.5 TOTAL CAS E NOTI F I CATI ON RATE OF TUB E RCUL OS I S P E R 1 ,0 0 ,0 0 0 P OP L N._ 2 0 2 0 2 1
RATI O
OF F E M AL E TO M AL E L AB OUR F ORCE P ARTI CI P ATI ON RATE (L F P R) (1 5 -5 9 YE ARS )_ 2 0 2 0 2 1
Cluster 1
NO OF M OB I L E CONNE CTI ONS P E R 1 0 0 P E RS ONS (M OB I L E TE L E DE NS I TY)_ 2 0 2 0 -2 1
Cluster 2
176.7 157.7 323.7
130.5 232.9
Cluster 3
47.3
59.8
34.7
18.4 7.1 4.6 0.0 38.5
67.4 0.0 62.0
46.6 30.7 25.1 8.8 47.0
I NS TAL L E D S E WAGE TRE ATM E NT CAP ACI TY AS P E R. OF S E WAGE GE NE RATE D I N URB AN ARE AS _ 2 0 2 0 -2 1
DBSCAN
A
P E R. US E OF NI TROGE NOUS F E RTI L I ZE R OUT OF TOTAL N,P ,K , (NI TROGE N, P HOS P HOROUS , P OTAS S I UM )_ 2 0 2 0 -2 1
Cluster 4
COGNI ZAB L E CRI M E S AGAI NS T CHI L DRE N P E R 1 ,0 0 ,0 0 0 P OP L N._ 2 0 2 0 -2 1
Cluster 5
Cluster 6
93.9
36.8
54.9
23.2
26.8 28.4
81.9
82.2
41.1 18.7 36.8
42.0
105.4
39.3
63.1
20.9 10.1
59.6
87.5
52.1
56.7
54.1
121.4
30.1
CO2 saved from LED bulbs per 1,000 popln. (Tonnes)_2020-21
No of mobile connections per 100 persons (mobile tele density)_2020-21
Cluster 2
Cluster 3
169.6
21.3
263.2
59.8
184.0 70.0 178.0 233.0
38.9 4.6 59.2 0.0 69.9
183.0 147.0
Cluster 7 40.7
59.3
64.4
Total case notification rate of Installed sewage treatment capacity as a Tuberculosis per 1,00,000 popln._2020-21 Per. of sewage generated in urban areas _2020-21
Cluster 1
Cluster 6 33.1
38.1
65.9
218.0
Cluster 5 43.0
7.8
Total case notification rate of Installed sewage treatment capacity as a Tuberculosis per 1,00,000 popln._2020-21 Per. of sewage generated in urban areas _2020-21
Cluster 4
AF
Cluster 5
59.3
60.8 0.0 24.6 0.0 50.7
Cluster 6
Per. use of nitrogenous fertilizer out of total N,P,K, (Nitrogen, Phosphorous, Potassium)_2020-21
Per. of renewable energy out of total installed generating capacity (including allocated shares)_2020-21
Per. use of nitrogenous fertilizer out of total N,P,K, (Nitrogen, Phosphorous, Potassium)_2020-21
Cluster 7
Cluster 8
44.6
44.3 28.8 32.3
50.2
77.2
73.7
27.0 15.7 52.6 18.5 27.6
33.7
CO2 saved from LED bulbs per 1,000 popln. (Tonnes)_2020-21
Forest cover as a Per. of total gerographical area_2020-21
53.6 30.2 26.6
63.9
63.7
Forest cover as a Per. of total gerographical area_2020-21
40.3 40.2 48.0 12.4 68.2
46.0
53.1 59.4
75.3 16.5 35.6
Fig. 2 Profiling of clusters using best six variables
3.4 Comparison of Clustering Solutions Distribution of Cluster Centroids for K-Means, DBSCAN, and AF The distribution of cluster centroids with respect to PC1 and PC2 has been represented in Fig. 3. For all the algorithms, it has been observed that most of the clusters are closely associated in terms of centroid values, except one cluster, resulting in less heterogeneity which confirms our formerly acknowledged conclusion. It has also noticed that the clusters under K-Means and AF are almost similar in terms of the distribution of centroid values (ranging between − 4. 95 and 9.17), while DBSCAN clusters are especially dissimilar (ranging between − 1.03 and 1.52).
10 -0.68 5 0 -5 0.71 0.23
0.10
K-Means 9.17
2 -1.03
1
DBSCAN -0.16
0
0.28
-0.54 0.56
Fig. 3 Distribution of cluster centroids
-0.92 -0.27
-1
-4.95 -0.70
0.52
10 5 0 -5
0.28
-4.95
AF 0.59 0.23 9.17
-0.79
Performance Evaluation of Sustainable Development Goals Employing …
415
The absolute range of cluster centroids is significantly higher for K-Means and AF (i.e. ~ 14.12) with standard deviations of ~ 3.35 comparing with DBSCAN solution with ~ 2.55 of absolute range and standard deviation of 0.71, which makes K-Means and AF to be more suitable algorithms for this dataset. Creation of Cluster Groups for Profile Comparison In the previous section, we have observed that all the three clustering solutions have advocated different numbers of optimal clusters, and the cluster compositions are also not same. Henceforth, we have formed seven different groups (of clusters) to prepare a universal platform for algorithm comparison. Scaled centroid values and distances, among them, have been used to generate these groups. Table 4 has demonstrated the details of newly formed seven groups and their detailed components. All groups consist of at least 2 or more formerly defined clusters. For example, Group 1 has been formed using K-Means cluster 2 (State: Nagaland) and AF cluster 3 (State: Nagaland), whereas Group 5 consists of K-Means cluster 5 (Arunachal Pradesh, Chhattisgarh, Kerala, Maharashtra, Meghalaya, Odisha), DBSCAN cluster 8 (Arunachal Pradesh, Chhattisgarh, Kerala, Odisha), AF cluster 8 (Arunachal, Chhattisgarh, Kerala, Odisha, Andhra Pradesh, Andaman and Nicobar Islands, Puducherry, Punjab, Meghalaya), and AF cluster 2 (Maharashtra). Comparison analysis has been conducted using the same 45 base indicator variables from 2020 to 2021. We have applied the same variable selection criterion (i.e. standard deviation) like before. But for this analysis, we have considered more variables. All the comparison results are shown using 100% stacked diagram plots, depicted in Fig. 4. Comparison of the Group Profile The distribution of centroid values across selected variables for Group 4 has been shown in 1st stack diagram of Fig. 4. Noticeably, the cluster centroids are closely associated for all the solutions, but K-Means and AF have generated almost identical results, and DBSCAN result is dissimilar for certain base variables. For ‘Installed sewage treatment capacity as a Per. of sewage generated in urban areas’, the average values for K-Means and AF are 7.1 and 7.8, respectively, whereas for DBSCAN, it is 23.2. The average value for indicator ‘Per. use of nitrogenous fertilizer out of total N, P, K, (Nitrogen, Phosphorous, Potassium)’ is 41.9 for DBSCAN but 67.3 and 63.9 for K-Means and AF, respectively. For group 6, K-Means and AF have produced identical results. Similar conclusions have been drawn for Group 7 as well. DBSCAN clusters are comparable with K-Means and AF for certain indicators, but noticeably different patterns have been recorded for others. Any clear conclusion can’t be derived for Group 5. For certain indicators, K-Means and AF’s solutions are similar, but for other indicator’s, AF and DBSCAN’s solutions are approximate. Overall, we can conclude that for most of the groups, K-Means and AF have produced almost identical results, and in certain cases, DBSCAN solution is in the vicinity.
416
I. Chatterjee and J. Valadi
Table 4 Rearrangement of States/UTs Algorithm Cluster number # States/UTs Name of States/UTs
Groups
K-Means
1
2
1
Nagaland
AF
3
1
Nagaland
K-Means
3
1
Tripura
AF
5
1
Tripura
K-Means
7
3
Sikkim, Tamil Nadu, West Bengal
AF
4
2
Sikkim, Tamil Nadu
K-Means
4
4
Assam, Bihar, Jharkhand, Mizoram
AF
1
5
Assam, Bihar, Jharkhand, Mizoram, West Bengal
DBSCAN 5
10
Nagaland, Tripura, Assam, Bihar, Jharkhand, Mizoram, Maharashtra, Sikkim, Tamil Nadu, West Bengal
K-Means
5
6
Arunachal Pradesh, Chhattisgarh, Kerala, 5 Maharashtra, Meghalaya, Odisha
AF
8
9
Arunachal, Chhattisgarh, Kerala, Odisha, Andhra Pradesh, Andaman and Nicobar Islands, Puducherry, Punjab, Meghalaya
DBSCAN 2
4
Arunachal Pradesh, Chhattisgarh, Kerala, Odisha
2 3 4
AF
2
1
Maharashtra
K-Means
6
9
Goa, Himachal Pradesh, Karnataka, Manipur, Telangana, Uttar Pradesh, Chandigarh, Daman and Diu, Delhi
AF
7
9
Goa, Himachal Pradesh, Karnataka, Manipur, Uttar Pradesh, Delhi, Telangana, Chandigarh, Daman and Diu
DBSCAN 1
7
Gujarat, Goa, Himachal Pradesh, Karnataka, Manipur, Uttar Pradesh, Delhi
DBSCAN 4
3
Telangana, Chandigarh, Daman and Diu
K-Means
1
12
Andhra Pradesh, Gujarat, Haryana, 7 Madhya Pradesh, Punjab, Rajasthan, Uttarakhand, Andaman and Nicobar Islands, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep, and Puducherry
AF
6
8
Gujarat, Haryana, Madhya Pradesh, Rajasthan, Uttarakhand, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep
DBSCAN 6
9
Haryana, Madhya Pradesh, Punjab, Rajasthan, Uttarakhand, Dadra and Nagar Haveli, Jammu and Kashmir, Lakshadweep, Meghalaya
6
(continued)
Performance Evaluation of Sustainable Development Goals Employing …
417
Table 4 (continued) Algorithm Cluster number # States/UTs Name of States/UTs DBSCAN 3
3
Groups
Andhra Pradesh, Andaman and Nicobar Islands, Puducherry
Group 4
Kmeans : 4 100% 80% 60%
42
23
64
40% 20%
67
8 7
Per. use of nitrogenous fer lizer out of total N,P,K, (Nitrogen, Phosphorous, Potassium)_2020-21
Installed sewage treatment capacity as a Per. of sewage generated in urban areas _2020-21
0%
AF: 1
DBSCAN: 5
158
34
82
37
147
24
73
27
156
23
71
29
9 13 15
Total case no fica on rate of Ra o of female to male Labour No of mobile connec ons per Per. of renewable energy out Tuberculosis per 1,00,000 Force Par cipa on Rate (LFPR) 100 persons (mobile tele of total installed genera ng popln._2020-21 (15-59 years)_2020-21 density)_2020-21 capacity (including allocated shares)_2020-21
No of vic ms of human trafficking per 10 lakh popln._2020-21
43
41
27
38
36
28
42
40
32
Combined 15.1+15.2_2020-21 Forest cover as a Per. of total CO2 saved from LED bulbs per gerographical area_2020-21 1,000 popln. (Tonnes)_202021
Group 5
K-Means: 5 100% 80% 60% 40% 20% 0%
131
60
170
50 45
143 Total case no fica on rate of Tuberculosis per 1,00,000 popln._2020-21
AF: 8
DBSCAN: 2
10 21
39
97
56
52
42
46
92
47
44
44
18
35
CO2 saved from LED bulbs per 1,000 Installed sewage treatment capacity as Per. use of nitrogenous fer lizer out of popln. (Tonnes)_2020-21 a Per. of sewage generated in urban total N,P,K, (Nitrogen, Phosphorous, areas _2020-21 Potassium)_2020-21
88
54
50
47
Per. of popln. covered under Aadhaar_2020-21
Combined 15.1+15.2_2020-21
Forest cover as a Per. of total gerographical area_2020-21
Ra o of female to male Labour Force Par cipa on Rate (LFPR) (15-59 years)_2020-21
Group 6
K-Means: 6 100% 80%
324
60%
233
40%
263
20% 0%
28 54 45
263
45
Total case no fica on rate of Tuberculosis per 1,00,000 popln.
CO2 saved from LED bulbs per 1,000 popln. (Tonnes)
3 22
105
28
121
18
119
14 18
18
119
18
No of vic ms of human trafficking No of mobile connec ons per 100 Manufacturing employment as a per 10 lakh popln. persons (mobile tele density) Per. of total employment
AF: 7
DBSCAN: 1
DBSCAN: 4
19
53
16
66
50
29
30
64
26
57
42
34
51
29
59
23
60
47
35
49
59
23
29 Forest cover as a Per. of total gerographical area
Per. use of nitrogenous fer lizer No. of courts per 1,00,000 popln. out of total N,P,K, (Nitrogen, Phosphorous, Potassium)
45
60
47
35
49
Installed sewage treatment capacity as a Per. of sewage generated in urban areas
Cognizable crimes against children per 1,00,000 popln.
Combined 15.1+15.2
Per. of women account holders in PMJDY
Group 7
K-Means: 1 100% 80% 60% 40% 20% 0%
218 177 184 194
30 51 48 43
Total case no fica on rate of Per. of renewable energy out of Tuberculosis per 1,00,000 popln. total installed genera ng capacity (including allocated shares)
21
10
38 39 39
24 21
Installed sewage treatment capacity as a Per. of sewage generated in urban areas
25
No. of courts per 1,00,000 popln.
AF: 6
DBSCAN: 6
DBSCAN: 3
63
82
59
37
99
39
61
98
64 77 68
94 93 93
55 61 62
37 32 31
93 99 99
40 35 34
56 56 57
93 96 97
Forest cover as a Per. of total gerographical area
Per. of popln. covered under Aadhaar
Combined 15.1+15.2
Labour Force Par cipa on Rate (LFPR) (%) (15-59 years)
Per. of schools with separate toilet facility for girls
CO2 saved from LED bulbs per 1,000 popln. (Tonnes)
No of mobile connec ons per 100 Per. use of nitrogenous fer lizer persons (mobile tele density) out of total N,P,K, (Nitrogen, Phosphorous, Potassium)
Fig. 4 Comparison of group profiles
For all the clusters, sub-goals from ‘Good Health and Well-Being’, ‘Gender Equality’, ‘Industry, Innovation, and Infrastructure’, ‘Sustainable Cities and Communities’, ‘Responsible Consumption and Production’, ‘Climate Action’, ‘Life on Land’ and ‘Peace, Justice and Strong Institutions’ have turned out to be the important differentiators with higher standard deviation. Maximum number of indicators has been selected from ‘Peace, Justice and Strong Institutions’, followed by ‘Life on Land’, ‘Climate Action’ and ‘Industry, Innovation, and Infrastructure’. Interestingly, sub-goals from ‘Decent Work and Economic Growth’ and ‘Clean Water and Sanitation’ have been identified as unique important indicators for group 6 and group 7, respectively, along with the above stated goals. And for both the
418
I. Chatterjee and J. Valadi
cases, indicators are associated with women empowerment, ‘Percentage of women account holders in PMJDY’ and ‘Percentage of schools with separate toilet facilities for girls’.
4 Conclusion This section outlines the results derived from the analysis of the input data and discusses the outcomes. Although there are deviations in year-on-year performances, States/UTs are performing competitively across all SDGs to accomplish the national level targets. Higher variation in States/UTs performances has been recorded in goals Good Health and Well-Being, Gender Equality, Industry, Innovation, and Infrastructure, Sustainable Cities and Communities, Responsible Consumption and Production, Climate Action, Life on Land and Peace, Justice, and Strong Institutions. Three variables ‘Total case notification rate of Tuberculosis per 100,000 population’, ‘Installed sewage treatment capacity as a Per. of sewage generated in urban areas’ and ‘Per. use of nitrogenous fertilizer out of total N, P, K, (Nitrogen, Phosphorous, Potassium)’ turn out to be the common most important indicators across all the three clustering solutions. It has been derived from the distribution of cluster centroid analysis that K-Means and AF solutions are analogous for this dataset than DBSCAN solution. Similar conclusions have also been made from profile comparison analysis. In conclusion, we can say that K-Means and AF solutions have generated most similar results, while DBSCAN clustering results are comparable up to a certain limit. Acknowledgements We would like to express our special thanks and gratitude to NITI Aayog for their extensive research, planning, development of methodology, successful execution, and implementation of SDG framework for the Indian economy. Their work not only inspires several agencies, and researchers, moreover, educated, and aware millions of people on the contribution and the progress of India on SDG agendas. We would like to thank all the executives of NITI Aayog and Ministry of Planning for publishing the reports, sharing the insights and data sets, without which this study would not be possible. Financial Disclosure This work is not supported by any agency. Conflict of Interest None.
Performance Evaluation of Sustainable Development Goals Employing …
419
References 1. National portal of India homepage, https://www.niti.gov.in/. Accessed on 15 Aug 2022 2. Open government data (OGD) platform India homepage, https://data.gov.in/. Accessed on 15 Aug 2022 3. Baseline report (2018) Government of India, New Delhi. NITI Aayog homepage, https:// www.niti.gov.in/sites/default/files/2020-07/SDX_Index_India_Baseline_Report_21-12-2018. pdf. Accessed on 15 Aug 2022 4. Ministry of statistics and programme implementation (MOSPI) (2019) Sustainable development goals national indicator framework baseline report 2015–16, Government of India, New Delhi. MOSPI homepage, https://mospi.gov.in/documents/213904/0/SDG+National+Indica tor+Framework+Baseline+Report%2C+2015-16.pdf/290cae2d-900b-a3ee-5091-ea7969d5c de5?t=1594032410420. Accessed on 15 Aug 2022 5. NITI Aayog and United Nations (2020–2021) SDG India index and dashboard 2020–2021. Government of India, New Delhi. NITI Aayog homepage, https://www.niti.gov.in/writeread data/files/SDG_3.0_Final_04.03.2021_Web_Spreads.pdf. Accessed on 15 Aug 2022 6. iced.cag.gov.in homepage, https://www.intosaicommunity.net/document/articlelibrary/Bud get_of_India_2020-2021_and_Sustainable_Development_Goals.pdf. Accessed on 12 Sep 2022 7. NITI Aayog homepage, https://sdgindiaindex.niti.gov.in/#/ranking. Accessed on 15 Aug 2022 8. NITI Aayog and United Nations (2020) SDG India index and dashboard 2019–2020. NITI Aayog homepage, https://www.niti.gov.in/sites/default/files/2020-07/SDG-India-Index2.0.pdf. Accessed on 15 Aug 2022 9. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York 10. Wu J (2012) Advances in K-means clustering: a data mining thinking, 1st edn. Springer Berlin, Heidelberg 11. Omer AF, Mohammed HA, Awadallah MA Khan Z, Abrar SU, Shah MD (2022) Big data mining using K-Means and DBSCAN clustering techniques. In: Big data analytics and computational intelligence for cybersecurity, 1st edn. Springer, Cham 12. Li W (2012) Clustering with uncertainties: an affinity propagation-based approach. In: Lecture notes in computer science, 1st edn. Springer Berlin, Heidelberg
Performance Analysis of Logical Structures Using Ternary Quantum Dot Cellular Automata (TQCA)-Based Nanotechnology Suparba Tapna, Kisalaya Chakrabarti, and Debarka Mukhopadhyay
Abstract Ternary Quantum-Dot Cellular Automata (TQCA) is a developing nanotechnology that guarantees lower power utilization and littler size, with quicker speed contrasted with innovative transistor. In this article, we are going to propose a novel architecture of level-sensitive scan design (LSSD) in TQCA. These circuits are helpful for the structure of numerous legitimate and useful circuits. Recreation consequences of proposed TQCA circuits are developed by utilizing such QCA designer tool. In realization to particular specification, we need to find the parameter values by using Schrodinger equation. Here, we have optimized the different parameter in the equation of Schrodinger. Keywords TQCA · LSSD · Quantum phenomenon for combinational as well as sequential logic · J-K flip-flop · Schrodinger equation · Energy · Power
1 Background CMOS “complementary metal–oxide–semiconductor” innovation is in uncontrolled utilization in present-day semiconductor manufacture. There is an oddity in CMOS that force expended increments with speed up. In any case, a few applications need less power but more speed. Innovations, for example carbon nanofields, impact transistor, and quantum dot cellular automata (QCA) which can give more speed alongside less force utilization and high equal handling [1, 2]. The new, arising handling stages, option to CMOS, should not expressly force cutoff points to just two states. One such conceivable future handling stage is the quantum-speck cell machine (QCA). The idea was presented in the mid-1990s by Lent et al. [3] and S. Tapna (B) Durgapur Institute of Advanced Technology and Management, Durgapur, India e-mail: [email protected] K. Chakrabarti Haldia Institute of Technology, Haldia, India D. Mukhopadhyay Christ (Deemed to be University), Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_31
421
422
S. Tapna et al.
shown in a research center climate before very long by Bernstein et al. [4]. What followed was an elating period with the advancement of the practically complete set of rationale capabilities, as well as more perplexing handling structures, but all in the domain of binary logic. The primary progression of QCAs to local ternary handling was performed by Lebar Bajec et al. [5–7]. The creators have updated the essential unit, a twofold QCA (bQCA) cell, to take into account the portrayal of three rationale esteems and named it essentially the ternary QCA (tQCA) cell. The ensuing exploration preformed by Pe car et al. [8, 9] show that the presentation of adiabatic pipelining is fundamental for an exquisite execution of essential ternary logic entryways. The comparability of the design of the TQCA rationale entryways proposed by Pe car et al. to the design of the relating bQCA rationale entryways opens up the likelihood to use configuration rules like those produced for the double space. The underlying outcomes are empowering yet the plan of mind boggling handling components is currently at its initial steps. Without a doubt, albeit the methodology was productive for the plan of essential logic gates, one cannot just repeat (or decipher) the plans proposed for the bQCA stage. The plans proposed for the ternary CMOS stage cannot be depended upon also. These normally utilize natives, similar to the TXOR entryway, for which there are no ongoing tQCA reciprocals, or do not depend on binary logic yet address specially appointed arrangements taking advantage of actual impacts. The logic for less efficient implication ended up being successful at the plan of the bQCA retaining cell [10, 11]. The control rationale is, then again, intended to advance a productive execution of a n-trit register that depends on a variety of n ternary remembering cells. Here, we present the plan of one of the most essential ternary handling components, which can store one trit (ternary digit) of information, the ternary retaining cell. It depends on demonstrated comes closer from BQCA plan and productive utilization of the as-of now accessible tQCA natives (ternary inverter, ternary larger part casting a ballot entryway, ternary wire). Its center is fixated on the memory moving idea, which has, however, a level-sensitive scan design (LSSD) in Fig. 1, comprising a multiplexer (MUX) for choosing either an ordinary activity with information or output activity with filter input [12]. This has a controled contribution for choosing either information or sweep input. This is greater than typical flip-flop (as MUX is incorporated here). This adds about 20–30% of territory per flip-flop. The proposed architecture is a part of an incorporated circuit producing a test procedure. This is DFT examine plan technique that utilizes a separate framework as well as scan clocks for recognizing ordinary along with test mode. Latches are utilized two by two; each has a typical information input, information yield, and clock for framework activity. For test activity, the two latches structure a master/slave pair with one scan input, one scan yield along with non-covering scan clocks are held low throughout framework activity yet aim the output information to be locked when beat high during examine. This paper manages the TQCA usage of a portion of these combinational furthermore, sequential circuits.
Performance Analysis of Logical Structures Using …
423
Fig. 1 Basic structure of standard level-sensitive scan design element [3]
Problem Domain We have already described the relative literature survey to this phenomenon for explaining the logical structure but overall implication that just the prediction that some specific circuit in several work which have done previously. But in this research work, it has to be proposed for combinational and sequential both the architecture with the LSSD. So, it justified the problem domain according the several literature survey to overcome the drawback with an implication described in this proposed work.
Novelty In this particular thing, it is identified that we will make comparison between levelsensitive scan design (LSSD) and scan flip-flop in CMOS phenomenon for improvement with area. This might be enhance to analyze the performance for a LSSD in TQCA.
1.1 QCA Basics CS Lent has proposed the first standard of quantum dot cellular automata [1]. For executing a framework, which encodes data as electron position, this gets important for developing a vessel in which an electron can be caught as well as ‘checked’ as missing or present. As Fig. 2 shows a quantum dot only makes this by setting up a district of low potential with a high potential ring. In a conventional structure, this innovation depends on collaboration of bistable QCA cells. The cell consists of four quantum dots that can burrow from neighboring dots [13] and is accused of two free electrons. These electrons will, in general, possess antipodal destinations due to their common electrostatic repulsion. In the QCA cell, there are two identical plans that are enthusiastically insignificant, as shown in Fig. 2. These two courses of action are meant as cell polarization.
424
S. Tapna et al.
Fig. 2 QCA cells
Fig. 3 QCA wires
In the QCA charge arrangement, double data is encoded by utilizing cell polarization to appear for logic “1” as well as signify logic “0”.
1.2 QCA Wires In QCA planning, another significant part is wire. The double sign spread happens in QCA wire, from contribution to yield because of electrostatic cooperations among cells. Since each cell’s polarization will, in general, line up with that of its neighbors, a straight course of action of standard cells is utilized for transmitting paired data starting with one point then onto the next [14, 15]. In this wire, every single free cell adjusts a similar way as input cell (driving cell); thus in the information to state the contained data is transmitted down the wire. Plus, the partition among cells as well as dots is an important parameter giving Coulombic impact in ordinary structure in QCA application. In Fig. 3, there is QCA wire. Also, in QCA wire, the computational force is generated by Coulomb collaboration among cells without electrical flow stream among cells as well as consequently no force dissemination. There is a different type of wire utilized to transport information from one location to another like twofold wire. This is referred to as the reversal chain. Each and every QCA cell is pivoting at 45◦ from traditional QCA cells in a reversal chain. Each one in the chain reverses the sign against the electrons owing to the Coulombic energy. Then, the output might be the same or a perplexed guess of the input, depending on the number of cells between information and yield [3]. As shown in Fig. 4. For wire hybrid, this utilizes a sum of three main layers for planning hybrids. The multilayer hybrid shows up adroitly straightforward in any case; there are inquiries concerning its acknowledgment, since it requires two covering dynamic layers.
Performance Analysis of Logical Structures Using …
425
Fig. 4 Multilayer crossover
1.3 QCA Clock Zones In order to work appropriately, a clock is required in QCA circuits. QCA tickers modify the burrowing border among quantum dot in order to encourage electrons to develop within a cell and enables them to make a predetermined alteration in their configuration [16]. There are two kinds of adiabatic as well as sudden exchange of burrowing obstructions. When the burrowing impediments between the dots are exchanged, they are eventually elevated to a fixed polarization. In addition, the boundaries are gradually reduced to allow electrons to flow between the quantum dots. The adiabatic exchange is preferable to unintended exchanges because it ensures that the circuit will remain in a stable ground at each moment of the activity and will not always go to an energetic state [17, 18]. We used adiabatic exchange in tickers in this study. There are four main QCA clock zones, specifically, clock 0, clock 1, clock 2, and clock 3 as appeared in Fig. 5. Each clock contains four important sections such as rising edge, low level, falling edge, and significant level. These stages are called discharge, hold, unwind, along with switch, individually. Figure 5 demonstrates the four periods of the clock. The fundamental capacity of QCA tickers is to guarantee the best possible exchange of information starting with one spot then onto the next in this circuit [19]. During the hold express, burrowing boundary of quantum spots is high; henceforth, electrons are exceptionally limited as well as have fixed polarization. Throughout the discharge express, burrowing boundary is diminished; furthermore, electrons gradually become allowed for moving. In unwind state, electrons are allowed for burrowing among quantum dots as well as they do not have effect on neighboring cells [20, 21]. In the switch express, the burrowing boundary is gradually raised compelling the electrons to the most steady state [22, 23].
1.4 Majority Gate and Inverters QCA circuit can be proficiently assembled utilizing just majority gates as well as inverters: a majority gate with three data sources, such as A, B, and C, and yield M (A,
426
S. Tapna et al.
Fig. 5 Clocking scheme Fig. 6 Majority gate
Fig. 7 QCA layout of inverter
B, furthermore, C) as seen in Fig. 5. A larger part entryway accomplishes following capacity: Maj (A, B, C) = AB + BC + CA. Furthermore, as well as entryways can be developed utilizing the majority gate effectively by allotting one of the contributions to the majority gate as “0” and “1”, individually [24]. Inverter’s QCA layout is given in Fig. 6.
1.5 QCA with Ternary Quantum Phenomenon In the above sections, we represented a united survey of equal quantum dot cell. Considering the elective getting ready establishment of things to come the basic inconvenience in its convenience is out of the blue its bisecurity. Considering that there are no mechanical limits and that a cell surface can be made of more than four quantum points, our research has concentrated on the components’ study of multi-
Performance Analysis of Logical Structures Using …
427
Table 1 Ternary quantum phenomenon with twofold quantum dot cells Cell A Cell B O/P 0 0 1 1
0 1 0 1
1 0 0 0
state cell. The quantum dot cell at the moment contained eight quantum dots, which were constructed by two electrons and represented in progress [25] and which were seen as ternary quantum dot cell. The extent of such improvements considered to be √ a 22 , in which case dots 1–4 completely relate to those exist in the twofold quantum dot cell along with in QCA structure the adjacent division among cell centers made of ternary quantum dot cells is maintained by Fig. 8. Figure 9 gives the structure of cells; however, four electron approaches (set apart as states “A”, “B”, “C” and “D”), which contrast with their most noteworthy spatial segment, are given in Fig. 10, similarly as the rest of each possible course of action. As the last referenced, in any case, contrast with vivaciously unfavored situating they are separate as “X” states. “A” as well as “B” states (for instance two corner-to-corner approaches) are, as because of twofold quantum dot cell, deciphered as paired logic “0” and “1”. Toward the day’s end, two states are introduced later such as “C” and “D” state. However, both are defined as logical 21 value and the simpler status of “c” is regarded driver and target cells for reasons that will be clarifiable throughout the fragmentation process. This is significant to notify from viewpoint of the demonstration, the characteristics of the ternary cell of the quantum dot are not modified in any way because the QCA structure consists of a ternary QCA structure and a double quantum dot cell QCA structure, however, is a clear one. For reasons of clarity, the QCA structure is defined as a ternary QCA structure (Fig. 7 and Table 1).
1.6 Ternary Logic Utilizing Ternary QCA Compositions Our investigation has been driven by doubt that QCA structures utilizes to exadorble the equal logic limits as well as wire could be, by building them with ternary quantum states, which lead to a “warm demise” (capricious yield or long settling occasions) in gigantic assortments of unclocked combined quantum-dot cells. Dot cells are utilized for completing multiregarded logic limits as well as wire. Ensuingly, we manufactured them using ternary quantum dot cells and enlisted the related ground state through tactics for a comprehensive study in the ground condition of a single quantum spot cells for all investigated input states. Figure 9 depicts to conduct a ternary quantum-dot cells wire.
428
S. Tapna et al.
Fig. 8 QCA cell with eight quantum dots and the corresponding tunneling paths (a) and the possible arrangements of two occupying electrons (b) [26]
As mentioned, there are now four potential data states in the structure. If the driving cell state is “A” or “B”, this may be shown that the total of the cells within the cells recognizes a comparable state similarly to the objective cell that relates impeccably with direct observed by the twofold wire (consider Figs. 8 and 9). The ground status is recorded when “C” or “D” cell’s state is present, irrespective of whether the internal as well as target cells recognize pivotal states. This reasonably inferred either wires must be confined to odd lengths, or the two states must be deciphered as a comparable rationale regard. For reasons of clearing explanation, the last system has been taken that achieved wire’s capacity to transmit three rationale regards “0”, “1”, and 21 , thus carrying on as ternary wire. By concentrating on three logic esteems, the following coherent advance has been for checking whether the QCA structures utilized for completing the double rationale capacities NOT, OR, and AND, if created with ternary quantum dot cells, execute their ternary accomplices. The examination has been based on Lukasiewicz’s relationship to the ternary truth tables [27].
Performance Analysis of Logical Structures Using …
429
Fig. 9 Behavior of a wire of ternary quantum-dot cells [27] Fig. 10 Behavior of the QCA inverter when constructed by using ternary quantum dot cells
As is obvious from Fig. 9, the ternary NOT logic function is performed by QCA inverter, constructed utilizing ternary quantum dot cells. The target cell recognizes the state “B” (rational estimate 1) and the other way round without any mistake if the drivers cell status is “A” (rational esteem 0). In contrast, if the cell’s state is “C” or "D" (logic value 21 ), target cell also expects state “C” or “D” (logic 21 ), which contains the logic values and status of both the cell’s. By the majority gate structure enrolling the ground states for all possible information to states, in any case, it demonstrated that the structure does not carry on as proposed (Fig. 6). This invalidated our basic doubt of a straightforward advancement to ternary logic. In the above figure (Fig. 11), we have represented different states that achieve through the ternary logic with the two adjacent quantum cells in the corresponding phenomenon of majority gate of the proposed design.
430
S. Tapna et al.
Fig. 11 Behavior of the majority gate when constructed with ternary quantum dot cells
2 Proposed Plans with Mechanism Flip-flops and latches are essential structure squares of consecutive computerized circuits. Because of huge internal intra-die procedure varieties, it is important to guarantee right sensible and worldly usefulness, so we have to do both useful just as timing keeps an eye on the manufactured structure [28]. Figure 12 shows the block diagram of a level-sensitive scan design. An integrated circuit production testing procedure is a LSSD “level-sensitive scan design.” It is a DFT plan technique that employs a distinct scan clock and architecture to identify normal along with test mode. Two latches are utilized, each having a typical information input, information output as well as framework activity clock. For testing, the two latches build a master/slave pair with one scan input and output as well as uncovered scan clocks “A” and “B” that are not kept intact when working within the framework, but nonetheless motivating scan information to be linked when scan high. It includes a multiplexer and a “D” along with J-K flip-flop. “A” and “B” are also the multiplexer’s contributions; “S” works as a selected sign. During the normal scan design activity, the “S” signal is minimal. “SE” is high in “C” mode. Currently, the D flip-flop contribution is controlled by signal “B” output. They are substantially the same as flip-flops with extra power pin that enables them to operate as a standard flip-flop (working mode) much like the scan design (testing mode) [10]. For testing, they are utilized and are more favored over intrinsic test units. In this proposed work, we have explored the LSSD in a logical structure for quantum phenomenon, and it is compare with the several parameters to intricate in CMOS technology to prove more efficient in nature for performance analysis.
Performance Analysis of Logical Structures Using …
431
Fig. 12 Block diagram of a level-sensitive scan design (LSSD)
Fig. 13 Proposed level-sensitive scan design (LSSD) in TQCA
Implementation Using QCA Designer This work has been done for the mechanism that also implemented and simulated in QCA designer tool. The consideration for the proposed circuit explore in basic paradigm for the logical structure in software tool after it is simulated and also implicated to observe the change of output waveform with several clock pulses.
432
S. Tapna et al.
Fig. 14 Simulation result of level-sensitive scan design (LSSD) in TQCA
Figure 12 depicts the QCA output flip-flop architecture where “A” along with “B” are a contribution to MUX along with select pin (S). MUX’s yield is a contribution to D flip-flop [29, 30] along with yield (Q). The after impact of the reenactment of the suggested structure [31] is shown in Fig. 13.
3 Result and Discussion Circuit utilizes a lesser number of cells contrasted with conventional circuits and is all the more thickly stuffed. Such circuit plans were reproduced utilizing QCA designer also, outcome appear. One hundred and eighty-five QCA cells are utilized to plan an output flip-flop with a zone of 0.30(µm2 ). These are one of the progressively essential circuit plans which can be clubbed with both combinational and sequential logics to make progressively more complex circuit plans in TQCA nanotechnology as well as an embedded system, say, for example, arithmetic logic unit (ALU) and field-programmable gate array (FPGA) [22]. In augmentation to this research work, a fault-tolerant majority gate is utilized as well as consequently making the circuit progressively reasonable for physical usage due to enhanced outreach predict the both sequential and combinational nature in a single circuit phenomenon. Figure 14 outlines the recreation after of the introduced level-sensitive scan design, and it has shown the waveform of suggested level-sensitive scan design which indicates that it works accurately and planned activities has been carried out. We have also shown the variation of the output waveform by changing the clock at each interval.
Performance Analysis of Logical Structures Using …
433
Fig. 15 QCA cell in 2D representation
Fig. 16 Characteristic curve
Energy and Power Analysis Let the superposition condition of an electron burrowing between the spots the x way be ψ(x). Fourier change communicates the superposition [32] an electron’s state (x), an electron’s state (x), which is considered in the two-dimensional representation of quantum cells in figure (Fig. 15) authenticate to carry out in important role. 1 ψ(x) = 2
∞ φ(k)ei(kx) dk
(1)
−∞
where the adequacy of the superposition wave is represented by ψ(k). k = 2π is the λ √ wave spread speed, the wavelength is denoted by λ and i = −1 . The trademark bend of an electron wave is illustrated by Fig. 16 while moving through channels [19]. At whatever point electron is confined, it is situating at x = 0 and ei(kx) = 1 achieved this value. It implies electron influxes with varied frequencies meddle valuably and no motions are announced. Henceforth, ψ(x) achieves top at x = 0 with varying upsides of x; the segments of ei(kx) are inserted in Eq. 1. In this way, bringing about motions and the worth of ψ(x) is acquired. At x2 , ei(kx) accomplishes least worth and trademark bend accomplishes the negative pinnacle. At the point when x develops starting here, the worth of ei(kx) likewise develops bringing about development of ψ(x). As expressed before, a clock signal is the energy provider to the electrons for changing their state. We accept that between spot channels are encountering infinite V (x) potential energy in the positive x direction. At that point time, autonomous Schrodinger wave condition is
434
S. Tapna et al.
d 2 ψ(x) 2m + 2 (E n − V (x))ψ(x) = 0 dx2
(2)
where electron mass is denoted by m, decreased plank constant by [33, 34]. Infinite arrangement of discrete energy levels is expressed by E n relating to all conceivable non-negative basic upsides of n, whereas the quantum number is given by n. Condition 3 can be diminished to En =
n 2 π 2 2 + V (x) 2md 2
(3)
where d is the atomic cell measurement [19]. Whenever voltage is given to the atom as V volt with C intersection capacitance at that point, articulation will be produced as [19], n 2 π 2 2 1 + V (x) = C V 2 2 2md 2
(4)
This articulation will help to compute the working RMS voltage of the framework. We need to additionally investigation of energy and power as far as Schrodinger equation[12]. On the off chance that expect the quantum number be meant by n, decreased Plank’s constant be signified by , electron mass be m, cell region be a 2 , number of cells in the design be N, number of clock stages utilized be k. Here, n = 10 and n 2 = 3. The energy provided to the whole circuit=energy provided by the clock signal, n 2 π 2 2 N 2ma 2
(5)
π 2 2 (n 2 − 1)N 2ma 2
(6)
π ((n 2 ) − (n 22 )) 2ma 2
(7)
E lat = For energy dispersal E disp = Occurrence energy for frequency Focc =
Recurrence of dissemination energy, Frec =
π ((n 2 ) − (n 22 )) 2ma 2
(8)
Performance Analysis of Logical Structures Using …
435
Difference of the frequency level Frec − Focc =
π (n 22 − 1)N 2ma 2
(9)
Time expected to arrive at the quantum level, T1 =
1 Focc
(10)
1 Frec
(11)
Disperse time to loosened up state, T2 =
The time prerequisite that cells in a clock zone to go to the succeeding polarization, T = T1 + T2
(12)
At conclusive time needed to spread through the whole design T p = T + (k − 1)T2 N
(13)
Here, we have to obtained the value from Eq. 5, E lat is 4.9010−22 Joule and for Eq. 6, the resultant value for, E disp is 4.8610−22 Joule. We have also find the requirement incident energy for frequency Focc that gets the value from Eq. 7, Focc =1.16109 Hz. It extracted the value for finding from Eq. 8, Frec =2.331011 Hz. We have also shown that T1 = F1occ , so Iit is being find T1 =8.6210−10 and T2 =4.2910−12 s from Eq. 10 and 11. The time required to go the polarization T is equal to 8.6610−10 s. The total time required to propagate through the entire design of proposed phenomenon is T p is equivalent to 3.2410−9 s or 3.24 ns. The overall analysis for power we have to E written, P= Tdisp is find to consider the power 1.510−13 watt or 0.15 pwatt. p We have to compute and also plot the variation for different n value approach with respect to time in Schrodinger equation which is illustrated in Figs. 17 and 18 is calculated from the open source [35]. The power and energy for obtaining the requirement is briefly discussed in a Tables 2 and 3. This relative logic phenomenon is developed at the gate level behind the proposed circuit, and the circuit layout is then created. In QCA, these structures are then simulated in the aforementioned sections, as a consequence of an ongoing research work done by the Walus Group of the University of British Columbia for developing a plan as well as simulation tool. The system is converted into a QCA design using main gates and inverters. This tool allows the designer for developing and simulating QCA plan quickly. The QCA designer has provided the engineers a new stage; various international organizations have released the results of simulations that use this tool
436
S. Tapna et al.
Fig. 17 Schrodinger 2D wave equation for 1st approach (n = 3) of quantum number
Fig. 18 Schrodinger 2D wave equation for 2nd approach (n = 4) of quantum number
Performance Analysis of Logical Structures Using …
437
Table 2 Different parameter-level optimization for power and energy analysis of implemented design in TQCA Theoretical Aspect Expression The energy provided to the whole circuit = Energy provided by the clock signal
E lat =
n 2 π 2 2 N 2ma 2
π 2 2 (n 2 −1)N 2ma 2 π ((n 2 )−(n 22 )) Focc = 2ma 2 π ((n 2 )−(n 22 )) Frec = 2ma 2 π (n 22 −1)N Frec − Focc = 2ma 2 1 T1 = Focc T2 = F1rec
E disp =
Energy dispersal Occurrence energy for frequency Recurrence of dissemination energy Difference of the frequency level Time expected to arrive at the quantum level Disperse time to loosened up state
The time prerequisite that cells in a clock zone T = T1 + T2 to go to the succeeding polarization At conclusive time needed to spread through T p = T + (k − 1)T2 N the whole design
Table 3 Power and energy analysis of implemented design in TQCA Design Energy (J) Power (W) 4.8610−22
LSSD
1.510−13
Table 4 Performance analysis of implemented design in TQCA Design No. of cells Area (µm2 ) LSSD Scan flip-flop (Existing) [36]
185 90
0.30 0.56
Delay (clock cycles) 4 4
[19, 23]. The results obtained from this tool then contrast with hypothetical attributes to check that the circuit is correct. We have achieved to consider the performance analysis of the proposed design in the table (Table 4) with different parameters like (no. of cell, Area and delay). It is further intricate to compare the parameter-level optimization of scan flip-flop in existing design [36]. The major analysis is given to realize the ultimate goal of this research work to contribute the improvement of area for proposed design with respect to existing design which is represented in (Table 5). In case to the comparison of level-sensitive scan flip-flop-based phenomenon with existing scan flip-flop [36] in the literature survey, I have made the comparative analysis in terms of area utilized in (µm2 ) to explore the requirement of less area and what are the improvement of over existing one in CMOS technology.
438
Fig. 19 Graphical depiction of level-sensitive scan design (LSSD) in TQCA
Fig. 20 Representation of area improvement of two designs
S. Tapna et al.
Performance Analysis of Logical Structures Using … Table 5 Comparative study of implemented design Design Area (µm2 ) LSSD(In TQCA) Scan Flip-flop (in CMOS Technology) [36]
0.30 0.56
439
Improvement (%) 46.42
4 Conclusions and Future Directions This paper describes how a delicate output structure is implemented. The concepts offered are fundamentally smaller than equivalent CMOS innovation circuits. The configuration of the designs is evaluated on the basis of the cells of QCA. All plans are purposefully scheduled and have been checked with the QCA tool. Finally, the recommended design performance with the cell numbers, area, and latency is improved as per the design requirement. We also illustrated the fundamental design behavior also explored to satisfy all important parameters related to performance analysis for LSSD implicate to other domain for constructing architecture in design behavior is more efficient, which have surrounded in this research proposal and the other perspective view enrich better enhancement to intricate the performance is more relevant for that kind of logical structure (Figs. 19 and 20).
References 1. Huang J, Lombardi F (2007) Design and test of digital circuits by quantum-dot cellular automata. Artech House Inc, Norwood, MA, USA 2. Tougaw P, Lent C, Porod W, Bernstein GH (1994) Logical devices implemented using quantumcellular automata 75(3):1818–1825 3. Lent CS, Tougaw PD, Porod W, Bernestein GH (1993) Quantum cellular automata. Nanotechnology 4(1):49–57 4. Bernstein G, Bazan G, Chen M, Lent C, Merz J, Orlov A, Porod W, Snider G, Tougaw P (1996) Practical issues in the realization of quantum-dot cellular automata. Superlattices Microstruct 20:447–559 5. Bajec IL, Mraz M (2005) Towards multi-state based computing using quantum-dot cellular automata. In: Teucher C, Adamatzky A (eds) Unconventional Computing 2005: from cellular automata to wetware. Luniver Press, Beckington, pp 105–116 6. Bajec IL, Zimic N, Mraz M (2006) The ternary quantum-dot cell and ternary logic. Nanotechnology 17(8):1937–1942 7. Bajec IL, Zimic N, Mraz M (2006) Towards the bottomup concept: extended quantum-dot cellular automata. Microelectron Eng 83(4-9):1826–1829 8. Pecar P, Mraz M, Zimic N, Janez M, Bajec IL (2008) Solving the ternary QCA logic gate problem by means of adiabatic switching. Jpn J Appl Phys 47(6):5000–5006 9. Pecar P, Ramsak A, Zimic N, Mraz M, Bajec IL (2008) Adiabatic pipelining: a key to ternary computing with quantum dots. Nanotechnology 19(49):495401 10. Walus K, Jullien GA, Dimitrov VS (2003) RAM design using quantum-dot cellular automata. Nanotechnol Conf Trade Show 2:160–163
440
S. Tapna et al.
11. Frost S, Rodrigues A, Janiszewski A, Raush R, Kogge P (2002) Memory in motion: a study of storage structures in QCA. In: 8th International symposium on high performance computer architecture (HPCA-8), first workshop on non-silicon computation (NSC-1), Boston, Massachusetts 12. Vankamamidi V, Ottavi M, Lombardi F (2008) Two-dimensional schemes for clocking/timing of QCA circuits,. IEEE Trans Comput Aided Des Integr Circuits Syst 27:34–44 13. Compano R, Molenkamp L, Paul DJ (2000) Technology roadmap for nanoelectronics. Eur Comm IST Programme. Future Emerg Technol 1–104 14. Bernstein GH, Imre A, Metlushko V, Orlov A, Zhou L, Ji L, Csaba G, Porod W (2005) Magnetic QCA systems. Microelectron J 36:619–624 15. Pomeranz I (2019) Extended transparent-scan. IEEE Trans Very Large Scale Integr (VLSI) Syst 27(9):2096–2104 16. Karmakar R, Chattopadhyay S, Kapur R (2020) A scan obfuscation guided design for-security approach for sequential circuits. IEEE Trans Circ Syst II 67(3)1–5 17. Kim J, Lee S, Kang S (2019) Test-Friendly data-selectable self-gating (DSSG). IEEE Trans Very Large Scale Integr (VLSI) Syst 27(8):1972–1976 18. Kim J, Ibtesam M, Kim D, Jung J, Park S (2020) CAN-Based aging monitoring technique for automotive ASICs with efficient soft error resilience. IEEE 8(2169-3536):22400–22410 19. Mukhopadhyay D, Dutta P (2015) A study on energy optimized 4 dot 2 electron two dimensional quantum dot cellular automata logical reversible flip-flops. Microelectron J 46:519–530 20. Selection of primary output vectors to observe under multicycle tests. IEEE Trans Very Large Scale Integr (VLSI) Syst 28(1):1–7 21. Kim K, Wu K, Karri R (2007) The robust QCA adder designs using composable QCA building blocks. IEEE Trans Comput Aided Des Integr Circuits Syst 26:176–183 22. Kanda M, Hashizume M, Ali FAB, Yotsuyanagi H, Lu S-K (2020) Open defect detection not utilizing boundary scan flip-flops in assembled circuit boards. IEEE Trans Compon Packag Manuf Technol 10(5):895–907 23. Walus K, Dysart TJ, Jullien GA, Budiman RA (2004) QCADesigner: a rapid design and simulation tool for quantum-dot cellular automata. IEEE Trans Nanotechnoly 3(1):2631 24. Taskin B, Hong B (2008) Improving line-based QCA memory cell design through dual phase clocking. IEEE Trans VLSI 16(12):1648–1656 25. Kim K, Wu K, Karri R (2006) Quantum-dot cellular automata design guideline. IEICE Trans Fundam Electron Commun Comput Sci 89(6):1607–1614 26. Niemer MT, Kogge PM (2001) Problems in designing with QCAs: layout = timing. Int J Circuit Theory Appl 29:49–62 27. Borkowski L (ed) (1970) Lukasiewicz: selected works. North-Holland Publishing Company, Amsterdam 28. Mukherjee N, Tille D, Sapati M, Liu Y, Mayer J, Milewski S, Moghaddam E, Rajski J, Solecki J, Tyszer J (2021) Time and area optimized testing of automotive ICs. IEEE Trans Very Large Scale Integr (VLSI) Syst 29(1):1–13 29. Juracy Leonardo R, Moreira Matheus T, Kuentzer Felipe A, Moraes Fernando G, Amory Alexandre M (2018) An LSSD compliant scan cell for flip-flops. In: IEEE international symposium on circuits and systems (ISCAS) 30. Agarwal A, Hsu S, Realov S, Anders M, Chen G, Kar M, Kumar R, Sumbul H, Knag P, Kaul H, Mathew S, Kumashikar M, Krishnamurthy R, De V (2020) Time-Borrowing fast Mux-D scan flip-flop with on-chip timing/power/VM I N characterization circuits in 10 nm CMOS. In: IEEE international solid-state circuits conference 31. Juracy LR, Moreira MT, Kuentzer FA, de Morais AA (2016) Optimized design of an LSSD scan cell. IEEE Trans Very Large Scale Integr (VLSI) Syst 1–4 32. Fock V (1986) Quantum mechanics. Mir Publisher, Moscow 33. Chakrabarti K (2020) Realization of original quantum entanglement state from mixing of four entangled quantum states. In: Castillo O, Jana D, Giri D, Ahmed A (eds) Recent advances in intelligent information systems and applied mathematics. ICITAM 2019. Studies in computational intelligence, vol 863. Springer, Cham. https://doi.org/10.1007/978-3-030-34152712
Performance Analysis of Logical Structures Using …
441
34. Chakrabarti K (2021) Is there any spooky action at a distance? In: Maji AK, Saha G, Das S, Basu S, Tavares JMRS (eds) Proceedings of the international conference on computing and communication systems. Lecture notes in networks and systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-865 35. https://www.mathworks.com/matlabcentral/fileexchange/62204-2d-wave-equationsimulation 36. Lee S, Cho K, Choi S, Kang S (2020) A new logic topology-based scan chain stitching for test-power reduction. IEEE Trans Circ Syst II 67(12):1–5
An MLP Neural Network for Approximation of a Functional Dependence with Noise Vladimir Hlavac
Abstract Multilayer perceptron (MLP) neural networks used for approximation of the functional dependency are capable of generalization and thus to a limited noise removal, for example from measured data. The following text shows the effect of noise on the results obtained when data is interpolated by a neural network on several functions of two and one function of three variables. The function values obtained from the trained neural network showed on average ten times lower deviations from the correct value than the data on which the network was trained, especially for higher noise levels. The obtained results confirm the suitability of using a neural network for an interpolation of unknown functional dependencies from measured data, even when the noise load cannot be removed. Keywords MLP neural network · Function approximation · Function interpolation · Gaussian noise · Noise reduction
1 Introduction The ability of a neural network to approximate an unknown function has been investigated since back propagation was discovered [1, 2]. It was soon shown that it can approximate any smooth function if a sufficient number of neurons are used in the hidden layer [3], with the number of neurons needed depending on the required accuracy. Soon the effect of noise, which affects the training result, was also investigated [4]. Later, however, attention was focused on removing noise before training [5], even using smoothing methods, which means loss of information. In contrast, it is possible to find neural network applications intended for removing noise from data [6, 7], for example for image evaluation [8, 9], or resolution enhancement [10]. Nevertheless, neural networks offer the possibility to generalize and thereby reduce the influence of noise [11], especially symmetrical noise. Recent research focuses on deep neural networks [12–14]. V. Hlavac (B) Faculty of Mechanical Engineering, Czech Technical University in Prague, Prague, Czech Republic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_32
443
444
V. Hlavac
In many cases, an unknown function can be approximated, when sufficient data is measured [15, 16] or collected [17]. While being a universal approximator, a multilayer perceptron (MLP) neural network can easily approximate inverse function, simply by changing inputs and outputs before training [18]. This allows solving many mathematical problems, where finding an inverse function can be problematic, for example, solving partial differential equations [19]. In the case of partial differential equations and a lack of data, the physics-informed neural networks have been developed [20, 21]. Another example is the inverted kinematics of a planar [22, 23] or a 3D [24, 25] robotic manipulator, including the case of an obstacle avoidance [26]. For a redundant manipulator, the forward kinematics allows to reach the same point by infinitely many combinations of settings. In this case, a unique solution can be selected by an additional (e.g., fitness) function [27]. The process means to generate many solutions and evaluate them [28]. The best of them should be trained. In this case, the best solution cannot be exactly selected, and residual uncertainty has similar features, as additional noise. The aim of this article is to follow up on [4], to demonstrate the network behavior for noisy data, prepared for selected smooth functions of two variables (and one function of three variables) and to quantify this effect. Functions, used for testing, have been taken from the genetic programming, used for a function approximation [29, 30] or symbolic regression [31, 32], where many simple testing functions have been used [33]. Functions (1) and (5) were used for testing the application in [34, 35]. Function “peaks” (4) have been selected, because of its implementation in MATLAB. The function of three variables comes from [36]. Note that benchmark functions for neural networks exist [37–39], but they are not suitable for the type of tests presented here. The noise has been added by the MATLAB function awgn [40]. All functions were used to ease repeating of the proposed tests. Used MATLAB scripts are available [41].
2 Description of the Solved Problem A neural network is able to interpolate any smooth function if we use enough neurons in the hidden layers, as have been proven in [3], and recently investigated in [42, 43]. For more complex functions, it may be more advantageous to use multiple hidden layers [44]. However, the batch training libraries available in MATLAB (nftool, trainbr, etc.) only work with one hidden layer, which limits the ability to interpolate any function. Compared to two or three hidden layers, an order of magnitude more hidden neurons and also more training cycles could be required.
An MLP Neural Network for Approximation of a Functional …
445
2.1 A Simple Polynomial Function As the first selected smooth function, an example of the following nonlinear function has been chosen (1) (Fig. 1): f (x, y) = x 3 y − x y 3
(1)
The values of the function were multiplied by three to prepare the data. The values were calculated for an array of 100 × 100 elements, for a range of approximately ± 2.5 (with a step of 0.05, see the following program): figure (1) [x,y] = meshgrid(-2.45:0.05: + 2.5); z = 3.*x.*x.*x.*y - 3.*x.*y.*y.*y; mesh(x,y,z)
For a neural network, however, the data must consist of individual samples—the array of input values will contain the x and y coordinates of each point, and the array of required output values will contain the values of the trained function, possibly modified by noise (for a neural network, the order of samples does not matter during training). The data was transformed with the following program: a=zeros(2,10000); b=zeros(1,10000); for i = 1:100 for j = 1:100 b(1,(j-1)*100+i) = z(j,i);
Fig. 1 Simple trained function (x 3 y − xy3 ), multiplied by three
446
V. Hlavac
end
a(1,(j-1)*100+i) = j - 50; a(2,(j-1)*100+i) = i - 50;
end
Therefore, during the tests, the neural network did not work with input values in the range of ± 2.5, but ± 50, which is not a problem in the result (this is multiplication by a constant, which the MLP network includes in the weights for individual neurons). The resulting vectors a(1,:) and a(2,:) were used for input values and vector b for output values. To evaluate the result, the prepared values of the inputs were inserted back into the trained neural network, and the result was transformed into a two-dimensional array. c=trained_mlp_nn(a); e=zeros(100); for i = 1:100 for j = 1:100 e(j,i) = c(1,(j-1)*100+i); end end
This would allow the trained function to be graphed again. The result is not distinguishable by eye from the input. For the evaluation, it is therefore necessary to define the achieved accuracy by the sum of the deviations. The sum of the absolute values of the deviations (2) was chosen: d=
n n z i, j − ei, j
(2)
i=1 j=1
In (2), z is the array of the values of the selected function without noise and e is the result generated by the trained neural network (denomination respects the source examples). Neural network replacement itself causes fitting error even for noise-free data. First, three possible neural network training methods, available in MATLAB, were compared. The neural networks used had ten neurons, and the data was divided into 80% training and 10% testing and validation. Results for the three available methods are given in Table 1 (the median value of the fitting error from the three tests is presented): Table 1 Results for the three available methods Used method
Levenberg–Marquardt
Bayesian regularization
Scaled conjugate gradient
Error
3605.90
248.72
20,188.48
An MLP Neural Network for Approximation of a Functional …
447
Table 2 Results for different sizes of network Nodes in the hidden layer
10
14
20
33
50
70
100
Error
248.72
90.30
46.72
5.67
3.96
1.18
0.71
The “Bayesian regularization” method [45] was chosen. According to MATLAB’s description, this algorithm typically requires more time, but can result in good generalization for difficult, small, or noisy data sets. Training stops according to adaptive weight minimization (regularization). There was also good experience with this method when approximating inverse kinematics [27]. Next, it was tested how many neurons are enough in the hidden layer. For ten from the previous experiment, networks with 14, 20, 33, 50 were tested, and with regard to the harder interpolable functions in the next chapters, also networks with 70 and 100 neurons. Errors of the trained networks are given in Table 2 (medians from three tests). A network of 33 neurons of the hidden layer was chosen, because further increasing the number of neurons no longer brings substantial refinement of the method. The MATLAB function awgn(data,snr) [40] was used to generate noise, which adds Gauss noise, with a defined signal–noise ratio (so less noise, so bigger ratio). The unit is dB, which is logarithmic. The test started from zero noise (snr = infinity), and then 25, 20, 15, 10, and 5 dB were used. For noisy data, neural network training and error evaluation were performed five times for each noise level. The data was used the same for all five experiments to train the network. The resulting errors of the trained networks (evaluated using (2)) are given in Table 3. The last line contains the average from the five training sessions. This average is presented in Fig. 2 (circles), together with the error, evaluated for the source data, and used for training (diamond shape symbol, ten times bigger scale). The relative error can be related to the range of values that the function takes on a given interval. Using the max and min functions, a range of 90.21 was determined for this function. Since the function evaluates the sum of all errors, it is still necessary to divide by their number. The resulting error is in hundredths of a percent (3). This Table 3 Resulting errors evaluated for the function (1) Noise level
No noise
25 dB
20 dB
15 dB
10 dB
5 dB
Error in data
0
439.41
795.06
1426.67
2489.52
4425.74
1
11.84
40.36
85.87
134.99
267.43
480.59
2
2.62
42.16
82.15
123.69
259.93
469.00
3
16.85
66.15
88.57
156.40
267.20
429.40
4
1.99
42.87
82.69
153.74
271.01
432.56
5
2.75
43.55
78.70
139.68
254.24
483.25
Average
7.21
47.02
83.60
141.70
263.96
458.96
448
V. Hlavac 5000
500
4500
450
4000
400
3500
350
3000
300
2500
250
2000
200
1500
150
1000
100
500
50 0
0 no noise
25 dB
20 dB
15 dB
10 dB
5 dB
Fig. 2 Green line and diamond shape markers represent the sum of the absolute values of the deviations in the data (left scale). The blue line, circles, and the right scale (10× smaller scale) represent the sum of the absolute values of the deviations from the data obtained by interpolating the function with a neural network trained on noisy data (average of five trials)
Table 4 Average relative error Noise level
No noise (%)
25 dB (%)
20 dB (%)
15 dB (%)
10 dB (%)
5 dB (%)
Aver. rel. error
0.00079
0.0052
0.0092
0.0157
0.0292
0.0508
relative error evaluated for the (1) (data in the last line of Table 3) is given in Table 4. n n i=1 j=1 z i, j − x i, j (3) δ = 100% × |(x) − min(x)| · n 2 Except for the data without added noise, the function values obtained from the trained neural network show on average ten times lower deviations from the correct value than the data on which the network was trained.
2.2 The MATLAB “Peaks” Function This function (4) [46] is used by the MATLAB documentation for illustrative examples, such as graphs. For a neural network, it is difficult to interpolate and requires
An MLP Neural Network for Approximation of a Functional …
449
either a two-layer network or more neurons in a hidden layer. For a larger number of neurons, it was necessary to increase the maximum number of training cycles (default is 1000). Table 5 shows the results of a preliminary testing (if the value is not shown, the training method stopped at a lower number of training cycles due to reaching the convergence conditions, for example, the results achieved in the test data set stopped improving). The median from three or five experiments is presented for each of the network training parameters combinations. f (x, y) = 3(1 − x)2 e−x −(y+1) 2 2 1 x 2 2 − x 3 − y 5 e−x −y − e−(x+1) −y − 10 5 3 2
2
(4)
Seventy neurons in the hidden layer and 3000 cycles have been selected. Results are given in Table 6. For this function, the error is bigger, but even without the presence of noise. The ratio of the error in the data to the error of the resulting function appears to be close to 7:1 for high noise levels. Table 7 documents the well-known fact from the literature that a network with a small number of neurons generalizes better, while a network with a larger number of neurons tends to adapt even to the noise present in the data [47] (signal-to-noise ratio 5 dB, up to 5000 training cycles, column 70 taken from the previous experiment). Table 5 Resulting errors of tests without noise Hidden layer
1000 Epochs
2000 Epochs
3000 Epochs
5000 Epochs
10 Neurons
3987.82
–
–
–
14 Neurons
2078.76
–
–
–
20 Neurons
1234.48
–
–
–
33 Neurons
373.96
202.42
–
–
50 Neurons
38.65
16.37
20.74
23.60
70 Neurons
9.23
5.72
4.64
4.32
100 Neurons
2.93
1.79
1.40
1.28
Table 6 Resulting errors evaluated for the function “peaks” Noise level
No noise
25 dB
20 dB
15 dB
10 dB
5 dB
Error in data
0
447.45
797.62
1415.68
2510.64
4495.25
1
9.09
72.63
111.08
204.29
329.76
605.79
2
6.26
71.01
108.29
199.64
368.00
553.20
3
7.74
67.84
115.57
211.17
338.07
531.25
4
8.29
67.52
107.46
199.37
276.42
539.41
5
7.32
65.58
116.45
197.82
355.27
551.18
Average
7.74
68.92
111.77
202.46
333.51
556.16
450
V. Hlavac
Table 7 Training error for selected number of neurons in the hidden layer 20 Nodes
25 Nodes
33 Nodes
50 Nodes
70 Nodes
100 Nodes
133 Nodes
Error in data
4493.78
4463.61
4490.65
4487.67
4495.25
4479.18
4495.10
1
1104.95
883.05
545.19
550.34
605.79
601.84
901.21
2
1220.94
807.25
669.11
561.49
553.20
640.40
771.29
3
1105.60
815.04
543.59
557.58
531.25
615.76
833.98
4
1068.83
927.15
583.29
572.71
539.41
604.40
804.56
5
1428.48
825.15
591.64
481.90
551.18
639.38
788.29
Average
1185.76
851.53
586.56
544.80
556.16
620.35
819.87
The Bold value in the last line is the lowest value. It means this artificial neural network settings was the best
Note, that in this case, noisy data has been generated for each of the trained networks separately, so the stochastic behavior of the awgn function is shown.
2.3 A Logarithmic Function In practical use, logarithmic dependencies are common. The following function was used for the tests: f (x, y) = log log(y) · x + log log(x) · y − 2
x y
(5)
A neural network interpolates this function without the presence of noise very well. Twenty-five nodes in the hidden layer were selected. The result of the experiment is given in Table 8. The error in the trained network is on average twelve times lower than in the source data. Table 8 Resulting errors evaluated for function (5) Noise level
No noise
25 dB
20 dB
15 dB
10 dB
5 dB
Error in data
0
453.64
790.67
1427.09
2526.04
4513.15
1
1.16
47.43
72.40
103.82
169.14
431.39
2
0.67
24.40
73.31
126.50
260.46
335.30
3
0.25
27.10
65.96
151.83
198.77
367.29
4
0.57
41.74
61.21
121.56
221.82
407.17
5
0.11
36.06
88.30
87.37
232.26
365.78
Average
0.55
36.06
88.30
87.37
232.26
365.78
An MLP Neural Network for Approximation of a Functional …
451
Table 9 Resulting errors evaluated for the function (6) Noise level
No noise
25 dB
20 dB
15 dB
10 dB
5 dB
Error in data
0
3094.23
5500.02
9775.85
17,380.7
31,023.2
1
29.80
181.12
339.12
575.81
962.11
1524.69
2
54.19
181.12
324.21
481.60
884.96
1310.10
3
42.57
170.53
346.77
487.68
926.24
1600.96
4
59.33
178.44
320.57
532.06
789.74
1461.95
5
60.67
184.98
310.05
538.51
820.75
1444.37
Average
49.31
180.23
328.14
523.13
876.76
1468.42
2.4 A Function with Three Variables A function of three variables typically represents the distribution of values in space, for example the concentration of substances or the distribution of heat. As an example of a simple nonlinear function, the function (6) was chosen: f (x, y, z) = x y 2 z 3 + x 2 y 3 z + x 3 yz 2
(6)
All variables are in the interval [− 2, + 2] with step 0.1. While in the previous cases, 10,000 samples were used to train the network; here it is 413 , i.e., 68,921 samples. After similar tests as above, 37 neurons were selected in the hidden layer. Training was limited to a maximum of 1500 cycles for time reasons. Better results can be achieved with more cycles. It took about 20 min to train one network on the used computer. The resulting errors of the function approximation are summarized in Table 9. For higher levels of noise, the resulting error of the neural network is up to twenty-times lower than error in the data used for training. The bigger filtering effect is probably caused by a higher number of trained samples.
3 Conclusion Except for the data without added noise, the function values obtained from the trained neural network show on average ten times lower deviations from the correct value than the data on which the network was trained. The MATLAB source programs referenced in this text are available online [41]. Overall, it can be stated that although the noise contained in the data significantly worsens the quality of the trained neural network, this neural network was able to remove (in average) nine-tenths of the added Gaussian noise, and it can therefore be used for removing noise from data for which we do not know the mathematical description of the real functional dependence. In the case of the tested simple function, of course better results would be obtained by fitting the known function using the
452
V. Hlavac
method of the least squares, or any gradient method [48]. In practice, when we often do not know the expected theoretical dependence, this possibility to interpolate data by a neural network has a legitimate use.
References 1. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536 2. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 4(2):303–314 3. Gallant W (1988) There exists a neural network that does not make avoidable mistakes. In: IEEE 1988 international conference on neural networks, San Diego, CA, USA 4. Marquez LO, Hill T (1993) Function approximation using backpropagation and general regression neural networks. In: Hawaii international conference on system sciences 5. Steege FF, Stephan V, Groß HM (2012) Effects of noise-reduction on neural function approximation. In: Proceedings of 20th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2012) 6. Badri L (2010) Development of neural networks for noise reduction. Int Arab J Inform Technol 7(3) 7. Goyal P, Benner P (2022) Neural ODEs with irregular and noisy data. Preprint on Researchgate.net, May 2022. https://doi.org/10.48550/arXiv.2205.09479 8. Cocianu C, Stan A (2016) A novel noise removal method using neural networks. Informatica Economic˘a 20(3) 9. Borodinov N, Neumayer S, Kalinin S (2019) Deep neural networks for understanding noisy data applied to physical property extraction in scanning probe microscopy. In: NPJ Comput Mater 5(25). https://doi.org/10.1038/s41524-019-0148-5 10. Balaji Prabhu B, Narasipura O (2020) Improved image super-resolution using enhanced generative adversarial network a comparative study. In: Sharma H, Saraswat M, Kumar S, Bansal J (eds) Lecture notes on data engineering and communications technologies. Springer, Singapore 11. Carozza M, Rampone S (2000) Function approximation from noisy data by an incremental RBF network. Pattern Recogn 32(12). https://doi.org/10.1016/S0031-3203(99)00101-6 12. Kratsios A (2021) The universal approximation property. Ann Math Artif Intell 89:435–469 13. Song H, Kim M, Park D, Shin Y, Lee JG (2022) Learning from noisy labels with deep neural networks: a survey. IEEE Trans Neural Netw Learn Syst 14. Hu S, Pei Y, Liang PP, Liang YC (2019) Robust modulation classification under uncertain noise condition using recurrent neural network. In: 2018 IEEE global communications conference (GLOBECOM) 15. Samson A, Chandra S, Manikant M (2021) A deep neural network approach for the prediction of protein subcellular localization. Neural Netwk World 29–45. https://doi.org/10.14311/NNW. 2021.31.002 16. Abeska Y, Cavas L (2022) Artificial neural network modelling of green synthesis of silver nanoparticles by honey. Neural Netw World 1–4. https://doi.org/10.14311/NNW.2022.32.001 17. Sarveswara RP, Lohith K, Satwik K, Neelima N (2022) Qualitative classification of wheat grains using supervised learning. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 111. Springer, Singapore. https://doi.org/10.1007/978-981-16-9113-3_7 18. Elshafiey I, Udpa L, Udpa S (1992) A neural network approach for solving inverse problems in NDE. In: Review of progress in quantitative nondestructive evaluation. advances in cryogenic engineering, vol 28 19. Bar-Sinai Y, Hoyer S, Hickey J, Brenner MP (2019) Learning data-driven discretizations for partial differential equations. Appl Math 116(31):15344–15349
An MLP Neural Network for Approximation of a Functional …
453
20. Yuan L, Ni Y-Q, Deng X-Y, Hao S (2022) A-PINN: auxiliary physics informed neural networks for forward and inverse problems of nonlinear integro-differential equations. J Comput Phys 462 21. Yang L, Meng X, Karniadakis GE (2021) B-PINNs: bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J Comput Phys 425 22. Shah J, Rattan SS, Nakra BC (2012) Kinematic analysis of a planar robot using artificial neural network. Int J Rob Autom 1(3):145–151 23. Hlavac V (2022) MLP neural network for a kinematic control of a redundant planar manipulator. In: Mechanisms and machine science. Springer, Cham 24. Shah SK, Mishra R, Ray LS (2020) Solution and validation of inverse kinematics using deep artificial neural network. Mater Today Proc 26(2):1250–1254 25. Rivas CEA (2022) Kinematics and control of a 3-DOF industrial manipulator robot. In: Congress on intelligent systems. Lecture notes on data engineering and communications technologies 26. Chembulya VV, Satish MJ, Vorugantia HK (2018) Trajectory planning of redundant manipulators moving along constrained path and avoiding obstacles. Procedia Comput Sci 133(2018):627–634. In: International conference on robotics and smart manufacturing 27. Hlavac V (2021) Neural network for the identification of a functional dependence using data preselection. Neural Netw World 2:109–124 28. Hlavac V (2021) Kinematics control of a redundant planar manipulator with a MLP neural network. In: Proceedings of the international conference on electrical, computer, communications and mechatronics engineering, mauritius 29. Brandejsky T (2019) GPA-ES algorithm modification for large data. In: Proceedings of the computational methods in systems and software. Springer, Cham 30. Nicolau M, Agapitos A (2021) Choosing function sets with better generalisation performance for symbolic regression models. In: Genetic programming and evolvable machines, vol 22, pp 73–100 31. Zhong J, Feng WCL, Ong Y-S (2020) Multifactorial genetic programming for symbolic regression problems. In: IEEE transactions on systems, man, and cybernetics: systems, vol 50, no 11, pp 4492–4505 32. Aldeia GSI, França FOD (2020) A Parametric study of interaction-transformation evolutionary algorithm for symbolic regression. In: 2020 IEEE congress on evolutionary computation (CEC) 33. McDermott J (2012) Genetic programming needs better benchmarks. In: GECCO ‘12: Proceedings of the 14th annual conference on Genetic and evolutionary computation, July, 2012 34. Hlavac V (2016) A program searching for a functional dependence using genetic programming with coefficient adjustment. In: Smart cities symposium Prague 2016, Prague 35. Hlavac V (2017) Accelerated genetic programming. In: MENDEL 2017. Advances in intelligent systems and computing, Brno 36. Davidson J, Savic D, Walters G (2003) Symbolic and numerical regression: experiments and applications. Inf Sci 150:95–117 37. Dhar VK, Tickoo AK, Koul R, Dubey BP (2010) Comparative performance of some popular artificial neural network algorithms on benchmark and function approximation problems. Pramana J Phys 74(2):307–324, 2010 38. Yang S, Ting T, Man K, Guan S-U (2013) Investigation of neural networks for function approximation. Procedia Comput Sci 17:586–594 39. Malan K, Cleghorn C (2022) A continuous optimisation benchmark suite from neural network regression. In: Rudolph G, Kononova AV, Aguirre H, Kerschke P, Ochoa G, Tušar T (eds) Parallel problem solving from nature—PPSN XVII., PPSN 2022. Lecture notes in computer science, vol 13398. Springer, Cham. https://doi.org/10.1007/978-3-031-14714-2_13 40. Matlab documentation (2022) Add white Gaussian noise. Available: https://www.mathworks. com/help/comm/ref/awgn.html. Last accessed 17 June 2022 41. Matlab sources (2022) (Online). Available: http://users.fs.cvut.cz/hlavac/MLP&noise.zip. Last accessed 07 July 2022
454
V. Hlavac
42. Liu J, Ni F, Du M, Zhang X, Que Z, Song S (2021) Upper bounds on the node numbers of hidden layers in MLPs. Neural Netw World 297–309 43. Sekeroglu B, Dimililer K (2020) Review and analysis of hidden neuron number effect of shallow backpropagation neural networks. Neural Netw World 97–112 44. Lu L, Jin P, Pang G (2021) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat Mach Intell 3:218–229 45. Matlab documentation (2022) Bayesian regularization backpropagation. Available: https:// www.mathworks.com/help/deeplearning/ref/trainbr.html. Last accessed 09 May 2022 46. Matlab documentation (2022) Peaks function. Available: https://www.mathworks.com/help/ matlab/ref/peaks.html. Last accessed 16 June 2022 47. Gurney K (1997) An introduction to neural networks. UCL Press 48. Hlavac V (2018) Genetic programming with either stochastic or deterministic constant evaluation. Neural Netw World 2:119–131
Evaluation of Sound Propagation, Absorption, and Transmission Loss of an Acoustic Channel Model in Shallow Water Ch. Venkateswara Rao , S. Swathi , P. S. R. Charan , Ch. V. V. Santhosh Kumar , A. M. V. Pathi , and V. Praveena Abstract Acoustic communication has shown to be the most flexible and widely used method in underwater situations because of the diminutive attenuation (signal reduction) of sound in water and the ability to communicate over excessive distances. However, the performance of acoustic propagation in underwater is affected by the physical factors such as temperature, pressure (depth), salinity, and chemical compositions (boric acids, magnesium sulfate). The influence of aforementioned parameters alters the velocity of acoustic transmission which inherently affects the connectivity of the network. Since the research in the undersea environment has grown rapidly, the need for proficient underwater communication systems is essential in order to attain reliable communication. Hence, it is essential to analyze the velocity of sound propagation in underwater subjecting to various medium parameters. The sound speed changes in the shallow sea environment are illustrated in this work via an acoustic channel that has been modeled. The sound speed variations have been estimated by varying temperature and salinity along with the depth in shallow water scenario. The proposed model also evaluates the absorption and transmission losses underwater. Keywords Absorption · Attenuation · Acoustic channel · Sound speed · Transmission loss · Temperature · Salinity · Underwater communication
Ch. Venkateswara Rao (B) · S. Swathi · P. S. R. Charan · Ch. V. V. Santhosh Kumar · A. M. V. Pathi · V. Praveena Department of ECE, Vishnu Institute of Technology, Bhimavaram 534202, India e-mail: [email protected] Ch. V. V. Santhosh Kumar e-mail: [email protected] V. Praveena e-mail: [email protected] S. Swathi Department of ECE, SRKR Engineering College, Bhimavaram 534202, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_33
455
456
Ch. Venkateswara Rao et al.
1 Introduction The undersea acoustic sensor networks (UASN) are made up of sensor nodes that are submerged and used to collect data on the areas of rivers and oceans that have not yet been studied. These networks are distributed over an area that has to be studied and include a configurable number of anchored and floating sensors as well as vehicles [1]. These nodes create a single hop or multiple hop communication linkage between themselves. Underwater equipment can communicate with one another through optical, radio, electromagnetic, and acoustic waves [2]. Most researchers favor acoustic communication among these since it has a large range and can bring digital data across an underwater channel. The key investigations are introduced by the UASN’s network attributes (node mobility, transmission range, and power) and the underwater medium properties (temperature, pressure, salinity, and pH) (limited bandwidth, multipath fading, transmission loss, absorption, limited battery, and limited data capacity) [3]. Temperature, salinity, and sound speed with regard to depth are the primary elements that are relevant in an underwater environment [4]. The underwater environment is also unpredictable due to a number of factors, including wave height, turbid currents, water pressure, chemical compositions of the water, and wave speed [5]. In order to construct a trustworthy network, the suggested channel model must be able to recognize changes in underwater medium characteristics and network attributes. The operating frequency and underwater sound speed both affect how well an acoustic channel model performs [6]. The sound speed underwater is a function of temperature, salinity, depth, and pH. As the temperature and salinity vary with respect to depth underwater, the sound speed alters accordingly [7]. With respect to both water depth and geographic location, seawater salinity varies. The salinity is determined by calculating the ocean’s dissolved salt concentrations and expressed as a part per thousand (ppt). While the average salinity ranges from 31 to 37 ppt, it is less than 30 ppt in Polar Regions. It demonstrates that, in comparison to higher depths, salinity changes are aberrant at lower depths. This is as a result of the water’s chemical composition and the fact that the surface of the ocean has higher temperatures than its depths. The sound speed in underwater acoustic transmission is impacted by these variations in salinity and temperature [8]. The sound speed decreases gradually with increase in depth in the upper layers, whereas it attains momentum in speed in the middle layers of the ocean [9]. This is virtuously due to the dissimilarities in temperature, salinity, atmospheric pressure (depth), and chemical compositions of the sea water [10]. On the other hand, the absorption of sound in sea water depends on salinity, temperature, transmission distance, and frequency of operation [11]. The creation of links between the sensor nodes is impacted by these irregular dissimilarities in sound speed (dependent on temperature, salinity, water depth, and pH) and absorption (dependent on acoustic frequency, transmission distance). Furthermore, the creation of effective communication links between the sensor nodes is a prerequisite for a network’s connectivity. In order to accomplish reliable communication among the sensor nodes in an underwater network, it is essential to incorporate a workable acoustic channel
Evaluation of Sound Propagation, Absorption, and Transmission Loss …
457
model that considers the impact of sound speed, absorption losses, and transmission losses. Since sound speed is affected by underwater medium elements like temperature and salinity, this work focuses on analyzing the characteristics of an acoustic channel model that also provides an estimate of the absorption and transmission losses in shallow water.
2 Literature Review Ocean exhibits an unpredictable environment which alters with the geographical regions, seasons, and with time. The fundamental physical properties such as temperature, salinity, pressure, and pH of the ocean have an adverse effect on communication networks performance and their connectivity. The several complications (transmission loss, multipath, Doppler effect, absorption) in the underwater medium make UASNs a fascinating region for scientists to deal with. The development in sensor innovation and remote advances, UASNs, has pulled in a great deal of specialists and has contributed fundamentally to this field. Using the vital ideologies of physics, the authors in [12] have presented and explored the underlying physics of basic wave propagations before contrasting the issues and effects of adopting various communication carriers (acoustic, EM, and optical). The foremost difficulties with UWSN are presented in an overview, and the technologies that are currently in use are examined in [13]. The impact of propagation parameters such as speed of sound, channel latency, absorption, scattering, multipath, waveguide effects, and ambient noise on underwater communication has been examined in [14]. The authors in [15] analyzed the dependency of the channel capacity up on depth and temperature by taking into account enhanced propagation loss and ambient noise models. To make large-scale system design easier, a statistical propagation model [16] has been created. A mathematical model [17] that provides the conversion of atmospheric pressure to depth and depth to atmospheric pressure has been proposed to reduce the errors in sound speed in oceans and sea. An experimental setup has been carried out in [18], which illustrates the effect of salinity, temperature, and pressure on physical characteristics of the deployed environment and sound speed variations. An algorithm [19] has been presented which estimates the sound speed at a particular location with time and improves the localization accuracy underwater. An acoustic channel model has been established [20] for network simulation in underwater. A real-time measurement has been carried out [21] to measure the path loss for an underwater acoustic channel. Recent advances in deep learning and artificial intelligence have been adopted for modeling underwater acoustic channel characteristics for attaining accuracy and throughput. A framework for underwater channel modeling based on deep learning has been proposed [22] for improving accuracy in channel model. In [23], the major statistical properties of the channel model have been identified and analyzed.
458
Ch. Venkateswara Rao et al.
Observing submerged conditions is a challenging task. Terrestrial and aeronautical reconnaissance innovation cannot be embraced directly since optical and electromagnetic waves have excessive attenuation in water. However, varied network requirements for the terrestrial sensor network are shown in terms of network metrics [24], link creation [25], hop count [26], fading environment [27], capacity [28], node distribution [29], node failures [30], distance models [31], link reliability [32], and interference [33]. The literature has provided an insight on how the underwater acoustic propagation influenced variations of underwater medium parameters. Hence, it is crucial to address and incorporate the effect of temperature, salinity, absorption and transmission losses due to variations in the sound speed. These variations in the sound speed due to unpredictable underwater environment alter the link formation in the network. Hence, this work is focused on analyzing the acoustic channel characteristics in shallow water regions by considering the effect of absorption and transmission losses.
3 Methodology For simulating an underwater acoustic channel, an acoustic channel that takes into account the impact of temperature, salinity, absorption, and transmission losses on sound speed has been developed. Initially, the sound speed has been evaluated at various depths by varying salinity and temperature in shallow water scenarios, and then, the losses due to the variation in sound speed have been estimated.
3.1 Sound Speed One key distinction between acoustic waves and EM propagation is the extraordinarily slow speed at which sound travels through water. The factors like temperature, salinity, and pressure affect the sound speed in undersea. Near the ocean’s surface, the speed of sound is typically around 1520 m/s, which is five orders of magnitude slower than the speed of light but four times faster than the speed of sound in air. Changes in the environment have a direct impact on the sound speed in water (temperature, salinity, and depth). By accounting for temperature, salinity, and depth, the sound speed underwater is defined using (1), where T is temperature expressed in degrees Celsius, C is the sound speed articulated in m/s, S is salinity stated in parts per thousand (ppt), z is depth expressed in meters, and constants a1 through a9 are presented in Table 1 according to their values [34]. C(T, S, z) = a1 + a2 T + a3 T 2 + a4 T 3 + a5 (s − 35) + a6 z + a7 z 2 + a8 T (s − 35) + a9 T z 3
(1)
Evaluation of Sound Propagation, Absorption, and Transmission Loss … Table 1 Sound speed calculation coefficients
459
a1 = 1.444.96
a4 = 2.374 × 10–4
a7 = 1.675 × 10–7
a2 = 4.591
a5 = 1.340
a8 = − 1.025 × 10–2
a3 = − 5.304 × 10–2
a6 = 1.630 × 10–2
a9 = − 7.139 × 10–13
3.2 Absorption Loss Acoustic wave propagation’s absorptive loss is frequency dependent. To indicate the entire absorption coefficient, use (Eq 2). It is possible to represent the absorption loss [35] utilizing (Eq 3). Rt is the transmission range in meters, L ab is the path loss expressed in decibels, and α is the absorption coefficient expressed in decibels per kilometer. α=
A2 P2 f 2 f 2 A1 P1 f 1 f 2 + + A3 P3 f 2 2 f 2 + f1 f 2 + f 22 L ab = α × Rt × 10−3
(2) (3)
3.3 Transmission Loss Transmission loss (TL) is defined [36] as the cumulative loss of acoustic strength caused by an acoustic pressure wave as it moves away from its source (Eq 4). TLshallow = 10 log10 Rt + L ab
(4)
4 Implementation Parameters The parameters considered for simulating the proposed channel model are listed in Table 2.
5 Experimental Results The main factors which are vulnerable in an underwater environment are temperature, salinity, and sound speed with respect to depth. In addition, numerous parameters
460 Table 2 Implementation parameters
Ch. Venkateswara Rao et al. Parameter
Range
Depth
0–500 m
Temperature
0–30 °C
Salinity
30–35 ppt
Frequency
0–100 kHz
pH
7.8
such as wave height; turbid currents; water pressure; chemical compositions of water; and wave speed cause underwater environments unpredictable in nature. It is necessary that the proposed channel model should identify the changes in underwater medium characteristics, and network properties in order to deploy a reliable network. Generally, the performance of an acoustic channel model depends on the speed of the sound in underwater and operating frequency. The sound speed underwater is a function of temperature, salinity, depth, and pH. As the temperature and salinity vary with respect to depth underwater, the sound speed alters accordingly. The investigation of variation in sound speed with respect to change in temperature (see Fig. 1) and salinity (see Fig. 2) has been carried out at various scenario depths. It is identified from Fig. 1; the sound speed is gradually decreasing with decrease in temperature of the ocean. These temperature variations with respect to depth fluctuates the sound speed which inherently affects the communication link among the sensor nodes. In addition to the temperature variation, the sound speed is gradually increasing with increase in scenario depth has been observed (see Fig. 1). The salinity of the sea water changes with geographical region and with water depth. The salinity obtained by measuring concentrations of dissolved salts in the ocean and expressed in parts per thousand (ppt). The mean value of salinity lies between 31 and 37 ppt, whereas in Polar Regions, this value is less than 30 ppt. It shows that the salinity variations are abnormal at the lower depth when compared to the higher depths (see Fig. 2). This is due to the chemical compositions of the water and the temperatures are higher at the surface when compared to the ocean bottoms. These changes in salinity and temperature affect the sound speed in underwater acoustic communication. Typically, sound travels with a speed of 1500 m/s in seawater; however, it varies with geographical regions, seasons, and with water depth. The changes in the sound speed (see Figs. 1 and 2) with respect to depth, where the sound speed decreases gradually with increase in depth in the upper layers, whereas it attains momentum in speed in the middle layers of the ocean. This is virtuously due to the dissimilarities in temperature, salinity, atmospheric pressure (depth), and chemical compositions of the sea water. The transmission losses in undersea are frequency dependent (see Fig. 3). The transmission losses are also directionally proportional to the transmission range of nodes in the network. It is observed from Fig. 3; the transmission losses are small at higher depths when compared to the lower depths. As D increases, the transmission losses are reducing and increasing with frequency. Throughout proliferation, the acoustic wave energy may be renovated into supplementary forms and captivated by the medium.
Evaluation of Sound Propagation, Absorption, and Transmission Loss …
461
Fig. 1 Sound speed variations with temperature
For the kind of physical wave propagation through it, the material flaw directly controls the absorptive energy loss. This material flaw for acoustic waves is the inelasticty, which turns the wave energy into heat. It is depicted from Fig. 4, that the acoustic wave propagation’s absorptive loss is frequency dependent. Viscous absorption has a major impact over 100 kHz, whereas ionic relaxation caused by boric acid has a big impact at low frequencies (say, up to a few kHz) and by magnesium sulfate has a considerable impact at middle frequencies (up to a few 100 kHz). Hence, the total absorption (see Fig. 4) is a combination of absorptions due to boric acid, magnesium sulfate, and pure water, respectively.
6 Conclusion An acoustic channel model that examines the impact of underwater medium factors including temperature, salinity, and pH on sound speed has been suggested in this work. The proposed channel model investigates the effect of salinity and temperature at a fixed pH by varying different depths in shallow water scenarios. The proposed channel model also investigates the effect of absorption due to various chemical
462
Ch. Venkateswara Rao et al.
Fig. 2 Sound speed variations with salinity
compositions of water and transmission losses with respect to frequency. The simulation results demonstrated that the transmission losses and absorption losses are frequency dependent. As the frequency increases, these two losses have increased. The sound speed increases with depth as the temperature and salinity decreases gradually and along the depth.
Evaluation of Sound Propagation, Absorption, and Transmission Loss …
Fig. 3 Transmission loss with frequency
Fig. 4 Total absorption
463
464
Ch. Venkateswara Rao et al.
References 1. Sozer EM, Stojanovic M, Proakis JG (2000) Underwater acoustic networks. IEEE J Ocean Eng 25(1):72–83 2. Akyildiz IF, Pompili D, Melodia T (2005) Underwater acoustic sensor networks: research challenges. Ad Hoc Netw 3(3):257–279 3. Barbeau M, Garcia-Alfaro J, Kranakis E, Porretta S (2017) The sound of communication in underwater acoustic sensor networks. In: Ad Hoc networks. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering vol 223 4. Akyildiz IF, Pompili D, Melodia T (2004) Challenges for efficient communication in underwater acoustic sensor networks. ACM Sigbed Rev Spec Issue Embedd Sens Netw Wirel Comput 1(2):3–8 5. Stojanovic M, Preisig J (2009) Underwater acoustic communication channels: propagation models and statistical characterization. IEEE Commun Mag 47(1):84–89 6. Jindal H, Saxena S, Singh S (2014) Challenges and issues in underwater acoustics sensor networks: a review. In: International conference on parallel, distributed and grid computing Solan, pp 251–52 7. Ismail NS, Hussein L, Syed A, Hafizah S (2010) Analyzing the performance of acoustic channel in underwater wireless sensor network. In: Asia international conference on modelling and simulation, pp 550–555 8. Wanga X, Khazaiec S, Chena X (2018) Linear approximation of underwater sound speed profile: precision analysis in direct and inverse problems. Appl Acoust 140:63–73 9. Ali MM, Sarika J, Ramachandran R (2011) Effect of temperature and salinity on sound speed in the central Arabian sea. Open Ocean Eng J 4:71–76 10. Kumar S, Prince S, Aravind JV, Kumar GS (2020) Analysis on the effect of salinity in underwater wireless optical communication. Mar Georesour Geotechnol 38(3):291–301 11. Hovem J (2007) Underwater acoustics: propagation, devices and systems. J Electr Ceram 19:339–347 12. Lanbo L, Shengli Z, Jun-Hong C (2008) Prospects and problems of wireless communication for underwater sensor networks. Wirel Commun Mob Comput 8:977–994 13. Garcia M, Sendra S, Atenas M, Lloret J (2011) Underwater wireless ad-hoc networks: a survey. In: Mobile ad hoc networks: current status and future trends pp 379–411 14. Preisig J (2006) Acoustic propagation considerations for underwater acoustic communications network development. Mobile Comput Commun Rev 11(4):2–10 15. Sehgal A, Tumar I, Schonwalder J (2009) Variability of available capacity due to the effects of depth and temperature in the underwater acoustic communication channel. In: OCEANS 2009-EUROPE, Bremen, pp 1–6 16. Llor J, Manuel PM (2013) Statistical modeling of large-scale path loss in underwater acoustic networks. Sensors 13:2279–2294 17. Leroy CC, Parthiot F (1998) Depth-pressure relationships in the oceans and seas. J Acoust Soc Am 103(3):1346–1352 18. Yuwono NP, Arifianto D, Widjiati E, Wirawan (2014) Underwater sound propagation characteristics at mini underwater test tank with varied salinity and temperature. In: 6th International conference on information technology and electrical engineering (ICITEE), pp. 1–5 19. Shi H, Kruger D, Nickerson JV (2007) Incorporating environmental information into underwater acoustic sensor coverage estimation in estuaries. In: MILCOM 2007—IEEE military communications conference, pp 1–7 20. Morozs N, Gorma W, Henson BT, Shen L, Mitchell PD, Zakharov YV (2020) Channel modeling for underwater acoustic network simulation. IEEE Access 8:136151–136175 21. Lee HK, Lee BM (2021) An underwater acoustic channel modeling for internet of things networks. Wirel Pers Commun 116:2697–2722 22. Onasami O, Adesina D, Qian L (2021) Underwater acoustic communication channel modeling using deep learning. In: 15th International conference on underwater networks and systems (WUWNet’21), China
Evaluation of Sound Propagation, Absorption, and Transmission Loss …
465
23. Zhu X, Wang C-X, Ma R (2021) A 2D non-stationary channel model for underwater acoustic communication systems. In: IEEE 93rd vehicular technology conference (VTC2021-Spring), pp 1–6 24. Chaturvedi SK, Padmavathy N (2013) The influence of scenario metrics on network reliability of mobile ad hoc network. Int J Performability Eng 9(1):61–74 25. Venkata Sai Kumar B, Padmavathy N (2020) A hybrid link reliability model for estimating path reliability of mobile ad hoc network. Procedia Comput Sci 171:2177–2185 26. Venkata Sai B, Padmavathy N (2017) A systematic approach for analyzing hop count and path reliability of mobile ad hoc networks. In: International conference on advances in computing, communications and informatics, pp 155–160 27. Padmavathy N, Chaturvedi SK (2016) A systematic Approach for evaluating the reliability metrics of MANET in shadow fading environment using monte carlo simulation. Int J Performability Eng 12:265–282 28. Padmavathy N, Chaturvedi SK (2015) Reliability evaluation of capacitated mobile ad hoc network using log-normal shadowing propagation model. Int J Reliab Saf 9(1):70–89 29. Padmavathy N, Anusha K (2018) Dynamic reliability evaluation framework for mobile ad-hoc network with non-stationary node distribution. Communication and Computing Systems, CRC Press, Taylor and Francis 30. Padmavathy N, Teja JRC, Chaturvedi SK (2017) Performance evaluation of mobile ad hoc network using MonteCarlo simulation with failed nodes. In: 2nd International conference on electrical, computer and communication technologies, pp 1–6 31. Padmavathy N (2019) An efficient distance model for the estimation of the mobile ad hoc network reliability. In: 4th International conference on information, communication and computing technology, pp 65–74 32. Venkateswara Rao Ch, Padmavathy N (2022) Effect of link reliability and interference on twoterminal reliability of mobile ad hoc network. In: Advances in data computing, communication and security. Lecture notes on data engineering and communications technologies, vol 106, pp 555–565 33. Rao CV, Padmavathy N, Chaturvedi SK (2017) Reliability evaluation of mobile ad hoc networks: with and without interference. In: IEEE 7th international advance computing conference, pp 233–238 34. Mackenzie KV (1981) Nine-term equation for sound speed in the oceans. J Acoust Soc Am 70(3):807–812 35. Etter PC (2003) Underwater acoustic modeling and simulation, 3rd edn. Spon Press, New York 36. Padmavathy N, Venkateswara Rao Ch (2021) Reliability evaluation of underwater sensor network in shallow water based on propagation model. J Phys Conf Ser 1921(012018):1–17
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing BESO Technique O. P. Roy, Sourabh Prakash Roy, Shubham, and A. K. Singh
Abstract This paper investigates the load frequency regulation of two interconnected areas consisting of dish stirling solar generator, micro hydro turbine, biogas generator, and flywheel in area 1, whereas wind turbine, tidal generator, biogas generator, and battery in area 2. In addition to it, a super magnetic energy storage device is included in both the areas to damp out the frequency oscillation quickly. After description of the system unit, a collation of the system with various performance indices is carried out to give the dominance of integral square error (ISE) among the performance indices. Furthermore, system frequency and tie-line characteristic are compared utilizing Bald eagle search optimizer (BESO), Black widow optimizer algorithm (BWOA), genetic algorithm (GA), and teaching learning-based optimizer (TLBO). The optimization of tilt integral tilt derivative with filter (TI-TDF), tilt integral derivative with filter (TIDF II), and proportional integral derivative with filter (PIDF) controller parameters is executed using the mentioned algorithms. Moreover, separate case studies are validated to study proposed system performance under various circumstances. The validation shows that the proposed TI-TDF when tuned with BESO performs efficiently and gives minimum error with ISE performance. Keywords Dish stirling solar generator · Biogas generator · Micro hydro turbine · Tidal generator · Wind turbine · Bald eagle search optimizer
O. P. Roy · S. P. Roy · Shubham (B) · A. K. Singh Department of Electrical Engineering, NERIST, Nirjuli, Arunachal Pradesh, India e-mail: [email protected] O. P. Roy e-mail: [email protected] S. P. Roy e-mail: [email protected] A. K. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_34
467
468
O. P. Roy et al.
1 Introduction Eco-friendly green renewable energy sources are the substitute to address the fastgrowing global energy requirement. As the energy production from solo renewable sources is less, so it is required to interconnect more than one renewable energy source forming a microgrid to fulfill energy demand efficiently. In spite of having several advantages like eco-friendliness, non-exhaustibility, and easy accessibility, renewable energy sources also face challenges like intermittency, and weather dependency [1, 2]. Because of this non-reliable nature of renewable energy sources, these are also integrated with energy storage devices to increase performance and minimize shortcoming of the sources. To create a resilient and reliable microgrid, several sources are connected forming multi-area systems [3]. Interconnection of large renewable energy sources into the microgrid leads to increased frequency oscillation, which should be damped via a suitable controller. Some recent papers which explain the modeling of multi-area systems with different conventional as well as non-conventional sources are [4–7]. These papers describe three area interconnected systems encompassing non-reheat thermal power system wind-driven plants integrated with renewable energy sources namely photovoltaic and wind turbine to increase the efficiency of the system. The performance of actual high voltage direct current tie-line power in dish-stirling solar thermal systems is studied in paper [8]. The interconnection of PV with thermal turbines along with energy storage systems is considered in paper [9]. The modeling of emerging renewable energy sources like wave energy conversion systems and pumped hydropower energy storage is illustrated in [10, 11]. In order to make the interconnected area similar to the actual scenario nonlinearities and delays are added to the system. Nonlinearities for example dead-band, saturation, rate generation constraint, etc., not only makes the system realistic but it also complicates the same. This complexity influences the system performance as investigated in [12].
2 Literature Survey Load frequency regulation controller is employed in the microgrid cluster for system stability and improved performance. The most basic and widely accepted controllers like PID, PI, PD, and PIDF [13] are incorporated in single, two, and multi-area systems to stabilize the system as explored in paper [14–17]. In [15], modified sine cosine algorithm (mSCA) optimized multistage fractional-order PD-PI (MSFOPDPI) controller is utilized for single area with solar and wind generation unit. Also, the performance is compared with PID controllers optimized by crow search algorithm (CSA), artificial bee colony (ABC), cuckoo search (CS), gravitational search algorithm (GSA), dragonfly (DA), and genetic algorithm (GA) techniques. Some of the paper illustrating recent application of multistage and cascaded controller as
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
469
fractional order proportional integral–fractional order proportional derivative (FOPI– FOPD) [18], feed forward fractional order PID (FFOPID) [19], tilt derivative with filter/1 + tilt integral (TDF/(1 + TI)) controller [7] for load frequency regulation in multi area microgrid system (MµS) has been studied for present study. In addition to it, paper [4, 20] investigates the performance controller with three degrees of freedom (3-DOF). Some of the robust control strategies are discussed in [21–23]. The load frequency response of the MµS system with sliding mode control, terminal sliding mode control, and adaptive sliding mode control is presented in [5, 9, 21, 24]. Also, the system response of solar cells, wind turbines, and fuel cells with EV systems using artificial neural network methods are effectively validated in [22]. After the selection of the proper controller, it is necessary to obtain the controller parameter with the help of optimization algorithms. They are an important emerging domain. Various optimization methods are used for frequency regulation in MµS. Some the recent algorithms like marine predators algorithm (MPA) [25], Quasioppositional dragonfly algorithm (QODA) [14], and modified salp swarm algorithm (MSSA) [17]. Also, tuning of multistage controllers like proportional integral 1 + proportional derivative (PI − (1 + PD)) using grasshopper algorithmic technique (GOA) [12], tilt integral derivative (TID) using equilibrium optimizer (EO) [4], TDF/(1 + TI) using modified sine cosine algorithm (MSCA) [7], artificial bee colony algorithm (ABC) [21], dragonfly search algorithm (DSA) [18], Harris’ Hawks optimization (HHO) [20], crow search algorithm (CSO) [8], movable damped wave algorithm (MDVA) [6], bees algorithm (BA) [24], harmony search algorithm (HAS) [26], ennoble class topper optimization algorithm (ECTOA) [27], chaotic crow search algorithm (CCSA) [12], jellyfish search optimization algorithm (JSOA) [28], and gray wolf optimization (GWO) [29, 30] are utilized to regulate the controller parameters values. In addition to this, firefly optimization algorithm (FOA) is used to tune fractional order PI in two area pumped hydropower energy storage [11]. In this paper, we are using a newly developed nature-based algorithm BESO which is explicitly used to tune various controllers [31, 32]. Performance comparison of the different optimization algorithm tells the superiority of BESO algorithm. Some of the major benefit of this algorithm is its accuracy, rapid convergence rate, and avoidance of local optimum. Moreover, BESO is not yet used for regulating the controller parameter in two area renewable MµS.
3 Contribution and Organization The objective of the proposed research paper is to employ best utilization of power generated to meet the appropriate load requirement. The contribution to paper is as follows: (a) Design two area interconnected microgrids with dish stirling solar generator, micro hyro generator, wind generator, tidal generator, and biogas generator as
470
(b)
(c) (d) (e)
O. P. Roy et al.
renewable energy sources with incorporation of storage devices like flywheel, battery, and super magnetic. Designing fractional order controllers TI-TDF, TIDF II, and PIDF and tuning its parameters with different algorithms in order to compare various control strategies and algorithms. Evaluating comparative study of different performance indices. Calculating the transient, steady for the frequency and tie-line power response. Time domain and robust analysis of the system are made by simulating for various case studies.
The arrangement of the remaining part of the paper includes–The configuration of proposed two-area system modeling is described in Sect. 2. The controller design methodology is carried out in Sect. 3. Section 4 includes selection of performance index for the system. Moving on further, Sect. 5 gives a detailed overview of the BESO algorithm and its advantages. Time-domain evaluations of various case studies as well as simulation results are explained in Sects. 6 and 7, respectively. Following this, conclusion is expressed in Sect. 8. Lastly, references are listed at the end of the chapter.
4 Description of Proposed Two Area System The proposed model is a two area system which encompasses biogas turbine generating unit (BgT), super magnetic energy storage devices (SMESD), static load (SL), and dynamic load (DL) are deployed commonly in both areas as shown in Fig. 1. In spite of these other components in area 1 includes Dish Stirling solar generating plant (DS-SG), micro hydro turbine (MHT) generating unit and flywheel energy storage devices (FESD). Other components in area 2 comprise wind turbine generating unit (WT), tidal generating unit (TG), and battery energy storage device (BESD). Table 1 discusses values of the components of the proposed system.
5 Bald Eagle Search Optimization Algorithm In 2019, H. A. Alsattar et al. has developed the BESO algorithm [31]. Bald eagle is a raptor bird mostly found in North America. It is a nature inclined metaheuristic algorithm that tries to duplicate the chasing and hunting scheme of Bald eagle for fishes. In Fig. 2, the hunting plan of action can be splitted into three steps, namely, selecting space, searching space and swooping along with their respective governing equations which are utilized in Fig. 5. Selecting Space: Bald eagles pick up the space which has the maximum number of prey as well as that is not too far from their home. The best space selected by them can be expressed as Eq. (1),
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
471
Fig. 1 Schematic diagram of proposed two area system
Anew(i) = Abest + α ∗ rand(Amean − Ai )
(1)
where Abest is the current space selected by the bald eagle, α is between 1.5 and 2 and decides the position changes, and is the random number whose values lie between 0 and 1, Amean indicates the data of the earlier selected space. Searching Space: Bald eagle moves in search space to look for the prey. It follows a spiral pattern in the selected space to speed up the searching process. The prime location for swooping is given by Eq. (2), Anew(i) = Ai + y(i) ∗ (Ai − Ai+1 ) + x(i) ∗ (Ai − Amean ) x(i) =
r (i) ∗ sin(θ (i)) , max(|r (i) ∗ sin(θ (i))|)
y(i) =
r (i) ∗ cos(θ (i)) max(|r (i) ∗ cos(θ (i))|)
θ (i) = a ∗ π ∗ rand, r (i) = θ (i) + R ∗ rand
(2) (3) (4)
where, a ∈ [5, 10] and dictates the corner between search points; R ∈ [0.5, 2] and establishes the search cycles. Swooping: In this step, Bald eagles accelerate from the best location to their selected prey. The mathematically way to describe it is illustrated in Eq. (5), Anew(i) = rand ∗ Abest + x1 (i) ∗ (Ai − c1 ∗ Amean ) + y1 (i) ∗ (Ai − c2 ∗ Abest ) (5) x1 (i) =
r (i) ∗ sinh(θ (i)) , max(|r (i) ∗ sinh(θ (i))|)
y1 (i) =
r (i) ∗ cosh(θ (i)) max(|r (i) ∗ cosh(θ (i))|)
(6)
472
O. P. Roy et al.
Table 1 Values of system component parameters Symbols Dish stirling solar generating unit (DS-SG) [39] Micro hydro turbine generating unit (MHT) [41]
Flywheel and battery energy storage devices (FBSD) [38]
System nomenclature K DS -SG G DS -SG = 1+sT , K DS-SG = DS-G DS -SG
Value 1, 5
valve gain and T DS-SG = valve time constant G MHT =
K MHT 1+sTMHT
K PH 1+sTPH
−1+sTW 1+0.5sTW ,
K MHT = MHT governor gain and T MHT = time constant, K PH = penstock gain and T PH = time constant, T W = turbine time constant 1 1 1 T f BESD = 1+sT 1+sTCM 1+sTDM , C
0.5, 0.2, 5, 28.75, 1
0.1, 0.001, 0.1
T C , T CM , T DM are time constant of flywheel or battery converter, command measurement and delay measurement K WT Wind turbine generating G WT = 1+sT 1, 1.542 , K WT = WT gain and WT unit (WT) [42] T WT = time constant K TG Tidal generating unit G TG = 1+sT 1, 0.08 , K TG = TG valve gain TG (TG) [42] and T = valve time constant TG
Biogas turbine generating unit (BgT) [43]
Super magnetic energy storage device (SMESD) [40]
System dynamic load (SDL) Sensitive load
G BgT =
1+s X C (1+sYC )(1+sbB )
1+sTCR 1+sTBG
1 1+sTBT ,
0.6, 1, 0.05, 0.01, 0.23, 0.2
XC, Y C , bB , T CR , T BG , T BT are lead time, lag time, valve actuator, combustion reaction time constant, biogas time constant, turbine time constant respectively 1+sTS1 1+sTS3 K SMESD G SMESD = 1+sT 1+sTS4 1+sTSMESD , 0.121, 0.800, 0.011, S2 0.148, 0.297 and 0.03 T , T , T , T are compensator time S1
S2
S3
S4
constants, K SMESD = SMESD gain and T SMESD = time constant 1 G RMD = D+s M , D and M are damping
0.012, 0.2
factor and inertia constant respectively Step load is 0.04 pu for 0 to 100 s, increases from 0.04 to 0.05 pu at 100 s and at 0.05 pu for the rest 200 s
θ (i) = a ∗ π ∗ rand, r (i) = θ (i)
(7)
where, c1 , c2 ∈ [1, 2]. It is required to have an appropriate selection of optimization algorithms which provides effective, fast tuning and less calculative optimization procedure. The paper stabilizes two areas MµS with BESO [31], BWOA [33], GA [34], and TLBO [35]. Figure 4 shows the convergence curve for optimization function which recommends
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
473
Fig. 2 Steps of hunting pattern of Bald eagle
Table 2 Algorithms parameters
Algorithms
Parameters
BESO
c1, c2, α = 2, a = 10, R = 1.5
BWOA
− 1 < beta2 < 1, 0.4 < m < 0.9
GA
Pm = 0.01, Pc = 0.8, tournament selection
TLBO
r i ∈ [0, 1]
Common parameters Max iter = 50, pop size = 5, sim time = 200
the supremacy of BESO algorithm. The essential parameters, maximum iteration, population size, and simulation time period utilized for optimization are outlined in Table 2. Moreover, Table 5, depicts the result of Wilcoxon test for the utilized algorithms.
6 Controller Design Scheme for the Proposed System An overview of the control strategy used is explained in this section. It is very necessary to opt for an appropriate controller for a well-organized and efficient response of the system. The frequency and tie-line power characteristics of the proposed two area system with SL of 5% at 100 s are compared with reference to PID, TIDF II [36],
474
O. P. Roy et al.
Table 3 Parameters of different controllers Overshoot
Undershoot
PID
TIDF II
TITDF
f 1
1.525 × 10–2
1.009 × 10–2
1.248 × 10–2
f 2
8.015 × 10–3
5.702 × 10–3
1.142 × 10–2
P12
2.218 ×
10–3
3.738 ×
10–4
5.687 × 10–4
f 1
4.466 ×
10–2
4.447 ×
10–2
4.467 × 10–2
f 2
3.843 × 10–2
3.803 × 10–2
3.785 × 10–2
P12
1.195 ×
1.332 ×
2.137 × 10–3
10–3
10–3
and TI-TDF [37, 44] controller upgraded BESO. The overshoot and undershoot for all the controllers are presented in Table 3. It is clear from the parameters that the TITDF controller is superior. So, TI-TDF is selected for study for various case studies of generating sources which will be explained in Sect. 6. In Fig. 3, the schematic arrangement of TI-TDF control strategy has been represented. The TI-TDF controller output is expressed as Eq. (8), K T1 K T2 Nc KI P = 1 + + 1 + s KD f s s + Nc λ1 λ2 s s
(8)
where K T1 , K T2 , λ1 , λ2 , K I , K D , N c are tilt integral gains, fractional order value, integral gain, derivative gain, and filter coefficient, respectively.
Fig. 3 Schematic arrangement of TI-TDF controller
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
475
Fig. 4 Comparison of convergence curve of BESO, BWOA, GA, and TLBO algorithm
7 Performance Indices Selection The methodology to select the performance indices for the system plays a crucial role in decoding error optimization problems. The integral square error (ISE) furnishes improved performance compared to integral time square error (ITSE), internal absolute error (IAE), and integral time absolute error (ITAE). The objective function for a two area system can be prepared as given in Eq. (9) keeping in consideration ISE for LFC of proposed area in microgrid. Table 4 shows the comparative study of objective function for all the BESO optimized controllers. The interpretation from the comparative study shows less error of the value 0.001520 for ISE with TI-TDF controller. Also, the maximum error is shown by the ITAE performance index of the value 10.87 for the same controller [45]. Hence, it can be concluded that ISE-based TI-TDF when tuned with BESO gives significantly less objective function error and improved performance. T Jmin =
( f 1 )2 + ( f 2 )2 + (P12 )2 dt
(9)
0
8 Result Analysis The propounded two areas microgrid system model is implemented in MATLAB 2016 application. The comprehensive model is replicated for the simulation duration of 200 s. Three controllers, namely, PID, TIDF II, and TI-TDF are utilized to analyze the model performance. The total number of iteration and search agent is taken as 50 and 10 respectively, as the J min converges within it. The proposed work includes sustainable energy units such as solar, microturbine, wind, and tidal energy sources. These sustainable energies are mostly dependent on sustaining weather conditions. The power generated with the help of these is utilized to dispense the load required in
476
O. P. Roy et al.
Fig. 5 Flowchart of BESO algorithm Table 4 Comparative analysis of performance index for TI-TDF controller Controller
ISE
ITSE
IAE
ITAE
PID
0.0016
0.014
0.19
11.01
TIDF II
0.0015
0.014
0.18
10.71
TITDF
0.0015
0.013
0.17
10.87
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
477
the microgrid. BESO algorithm is employed to adjust the controller for coherent load frequency control. Figures 6 and 7 represent frequency and tie-line power outcome is compared for TI-TDF, TIDF II and PID control techniques. It has been noticed that TI-TDF is giving best superior response compared to others. Consequently, the TITDF controller is tuned using BESO, BWOA, GA, and TLBO algorithms. Among these BESO and BWOA are present day advanced algorithms whose response is collated with GA and TLBO responses (Table 5).
Fig. 6 Response of BESO-based TI-TDF, TIDF II, and PID controller a frequency response of area 1. b Frequency response of area 2. c Tie-line power responses
Fig. 7 a Frequency response of BESO, BWOA, GA and TLBO in area 1. b Frequency response of in area 2. c Tie-line power for BESO, BWOA, GA and TLBO
478 Table 5 Wilcoxon test result for algorithms
O. P. Roy et al. BESO versus
p-value
Remark (poor performance)
BWOA
6.39E−05
Than BESO
GA
1.59E−05
Than BESO
TLBO
6.34E−05
Than BESO
9 Conclusion This paper studies the load frequency control of two area microgrid interconnected systems. The system consists of DS-SG, MHT, WT, TG, and BgT as the renewable energy generators. It also comprises FESD, BESD, and SMESD as energy storage devices. First, in order to select the superior algorithm and control strategy, several model responses are compared. It is found that a BESO tuned TI-TDF controller is superior to other controllers. Ultimately, the time domain and sensitivity analysis of these case studies substantiate robustness and stability of the system. The above work can be further extended to incorporate weighted performance index, inclusion of communication delays, system nonlinearities, state of the art recent controllers and algorithms and comparing the same.
References 1. Da Rosa AV, Ordóñez JC (2021) Fundamentals of renewable energy processes. Academic 2. Ramesh M, Yadav AK, Pathak PK (2021) An extensive review on load frequency control of solar-wind based hybrid renewable energy systems. Energy Sources Part A: Recovery Util Environ Effects 1–25 3. Guha D, Roy PK, Banerjee S (2021) Equilibrium optimizer-tuned cascade fractional-order 3DOF-PID controller in load frequency control of power system having renewable energy resource integrated. Int Trans Electr Energy Syst 31(1):e12702 4. Guo J (2021) Application of a novel adaptive sliding mode control method to the load frequency control. Eur J Control 57:172–178 5. Fathy A, Alharbi AG (2021) Recent approach based movable damped wave algorithm for designing fractional-order PID load frequency control installed in multi-interconnected plants with renewable energy. IEEE Access 9:71072–71089 6. Mishra D, Nayak PC, Bhoi SK, Prusty RC (2021) Design and analysis of multi-stage TDF/(1+ TI) controller for load-frequency control of AC multi-islanded microgrid system using modified sine cosine algorithm. In: 2021 1st Odisha international conference on electrical power engineering, communication and computing technology (ODICON). IEEE, pp 1–6 7. Babu NR, Saikia LC (2021) Load frequency control of a multi-area system incorporating realistic high-voltage direct current and dish-stirling solar thermal system models under deregulated scenario. IET Renew Power Gener 15(5):1116–1132 8. Wang Z, Liu Y (2021) Adaptive terminal sliding mode based load frequency control for multi-area interconnected power systems with PV and energy storage. IEEE Access 9:120185–120192 9. Yakout AH, Attia MA, Kotb H (2021) Marine predator algorithm based cascaded PIDA load frequency controller for electric power systems with wave energy conversion systems. Alex Eng J 60(4):4213–4222
A Competent LFR in Renewable Energy Micro-grid Cluster Utilizing …
479
10. Lalparmawii R, Datta S, Deb S, Das S (2021) Load frequency control of a photovoltaicpumped hydro power energy storage based micro-grid system. In: 2021 10th IEEE international conference on communication systems and network technologies (CSNT). IEEE, pp 312–317 11. Vedik B, Kumar R, Deshmukh R, Verma S, Shiva CK (2021) Renewable energy-based load frequency stabilization of interconnected power systems using quasi-oppositional dragonfly algorithm. J Control Autom Electr Syst 32(1):227–243 12. Asgari S, Suratgar AA, Kazemi MG (2021) Feed forward fractional order PID load frequency control of microgrid using harmony search algorithm. Iran J Sci Technol Trans Electr Eng 1–13 13. Roy SP, Mehta RK, Roy OP (2021) Illustration of load frequency control of hybrid renewable system with tuned PIDF controller. In: Asian conference on innovation in technology (ASIANCON). IEEE, pp 1–6 14. Mishra S, Prusty RC, Panda S (2021) Performance analysis of modified sine cosine optimized multistage FOPD-PI controller for load frequency control of an islanded microgrid system. Int J Numer Model: Electron Netw Devices Fields e2923 15. Mahto T, Thakura PR, Ghose T (2021) Wind–diesel-based isolated hybrid power systems with cascaded PID controller for load frequency control. In: Advances in smart grid automation and Industry 4.0. Springer, Singapore, pp 335–343 16. Mohanty D, Panda S (2021) Modified salp swarm algorithm-optimized fractional-order adaptive fuzzy PID controller for frequency regulation of hybrid power system with electric vehicle. J Control Autom Electr Syst 32(2):416–438 17. Çelik E (2021) Design of new fractional order PI–fractional order PD cascade controller through dragonfly search algorithm for advanced load frequency control of power systems. Soft Comput 25(2):1193–1217 18. Ghosh A, Singh O, Ray AK, Jamshidi M (2021) A gravitational search algorithm-based controller for multiarea power systems: conventional and renewable sources with variable load disturbances and perturbed system parameters. IEEE Syst Man Cybern Mag 7(3):20–38 19. Guha D, Roy PK, Banerjee S (2021) Disturbance observer aided optimised fractional-order three-degree-of-freedom tilt-integral-derivative controller for load frequency control of power systems. IET Gener Transm Distrib 20. Bagheri A, Jabbari A, Mobayen S (2021) An intelligent ABC-based terminal sliding mode controller for load-frequency control of islanded micro-grids. Sustain Cities Soc 64:102544 21. Safari A, Babaei F, Farrokhifar M (2021) A load frequency control using a PSO-based ANN for micro-grids in the presence of electric vehicles. Int J Ambient Energy 42(6):688–700 22. Mishra S, Nayak PC, Prusty UC, Prusty RC (2021) Model predictive controller based load frequency control of isolated microgrid system integrated to plugged-in electric vehicle. In: 2021 1st Odisha international conference on electrical power engineering, communication and computing technology (ODICON). IEEE, pp 1–5 23. Shouran M, Anayi F, Packianather M (2021) The bees algorithm tuned sliding mode control for load frequency control in two-area power system. Energies 14(18):5701 24. Sobhy MA, Abdelaziz AY, Hasanien HM, Ezzat M (2021) Marine predators algorithm for load frequency control of modern interconnected power systems including renewable energy sources and energy storage units. Ain Shams Eng J 25. Latif A, Suhail Hussain SM, Das DC, Ustun TS (2021) Double stage controller optimization for load frequency stabilization in hybrid wind-ocean wave energy based maritime microgrid system. Appl Energy 282:116171 26. Rai A, Das DK (2021) Ennoble class topper optimization algorithm based fuzzy PI-PD controller for micro-grid. Appl Intell 1–23 27. Khokhar B, Dahiya S, Singh Parmar KP (2021) A novel hybrid fuzzy PD-TID controller for load frequency control of a standalone microgrid. Arab J Sci Eng 46(2):1053–1065 28. Roy SP, Mehta RK, Singh AK, Roy OP (2022) A novel application of jellyfish search optimisation tuned dual stage (1+ PI) TID controller for microgrid employing electric vehicle. Int J Ambient Energy 1–28 29. Roy SP, Singh AK, Mehta RK, Roy OP (2022) Frequency control of GWO-optimized two-area microgrid with TIDF-II, I-PD and I-TD. In: Sustainable energy and technological advancements. Springer, Singapore, pp 267–277
480
O. P. Roy et al.
30. Roy SP, Singh AK, Mehta RK, Roy OP (2022) Application of GWO and TLBO algorithms for PID tuning in hybrid renewable energy system. In: Computer vision and robotics. Springer, Singapore, pp 483–496 31. Peña-Delgado AF, Peraza-Vázquez H, Almazán-Covarrubias JH, Cruz NT, García-Vite PM, Morales-Cepeda AB, Ramirez-Arredondo JM (2020) A novel bio-inspired algorithm applied to selective harmonic elimination in a three-phase eleven-level inverter. Math Probl Eng 2020 32. Roy SP, Singh AK, Mehta RK, Roy OP (2022) A novel application of BESO-based isolated micro-grid with electric vehicle. In: Sustainable energy and technological advancements. Springer, Singapore, pp 597–609 33. Das DC, Roy AK, Sinha N (2012) GA based frequency controller for solar thermal–diesel–wind hybrid energy generation/energy storage system. Int J Electr Power Energy Syst 43(1):262–279 34. Rao RV, Savsani VJ, Vakharia DP (2011) Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput Aided Des 43(3):303–315 35. Latif A, Das DC, Ranjan S, Barik AK (2019) Comparative performance evaluation of WCAoptimised non-integer controller employed with WPG–DSPG–PHEV based isolated two-area interconnected microgrid system. IET Renew Power Gener 13(5):725–736 36. Alsattar HA, Zaidan AA, Zaidan BB (2020) Novel meta-heuristic bald eagle search optimisation algorithm. Artif Intell Rev 53(3):2237–2264 37. Babaei M, Abazari A, Muyeen SM (2020) Coordination between demand response programming and learning-based FOPID controller for alleviation of frequency excursion of hybrid microgrid. Energies 13(2):442 38. Arya Y (2019) Impact of hydrogen aqua electrolyzer-fuel cell units on automatic generation control of power systems with a new optimal fuzzy TIDF II controller. Renew Energy 139:468– 482 39. Stine WB, Diver RB (1994) A compendium of solar dish/stirling technology 40. Kumari S, Shankar G (2019) Maiden application of cascade tilt-integral–tilt-derivative controller for performance analysis of load frequency control of interconnected multi-source power system. IET Gener Transm Distrib 13(23):5326–5338 41. Latif A, Das DC, Barik AK, Ranjan S (2020) Illustration of demand response supported co-ordinated system performance evaluation of YSGA optimized dual stage PIFOD − (1 + PI) controller employed with wind-tidal-biodiesel based independent two-area interconnected microgrid system. IET Renew Power Gener 14(6):1074–1086 42. Rasul MG, Ault C, Sajjad M (2015) Bio-gas mixed fuel micro gas turbine co-generation for meeting power demand in Australian remote areas. Energy Procedia 75:1065–1071 43. Muthu D, Venkatasubramanian C, Ramakrishnan K, Sasidhar J (2017) Production of biogas from wastes blended with cow dung for electricity generation—a case study. In: IOP conference series: earth and environmental science, vol 80, no 1. IOP Publishing, p 012055 44. Pati SS, Mishra SK (2019) A PSO based modified multistage controller for automatic generation control with integrating renewable sources and FACT device. Int J Renew Energy Res (IJRER) 9(2):673–683 45. El-Fergany AA, El-Hameed MA (2017) Efficient frequency controllers for autonomous two-area hybrid microgrid system using social-spider optimiser. IET Gener Transm Distrib 11(3):637–648
Deep Learning-Based Three Type Classifier Model for Non-small Cell Lung Cancer from Histopathological Images Rashmi Mothkur
and B. N. Veerappa
Abstract Lung cancer is becoming one of the most menacing cancers to human health. Small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) are the two kinds of lung cancer, which are classified based on patterns in behavior and treatment response. Non-small cell lung cancer is classified as lung adenocarcinomas, lung squamous cell carcinomas, and large cell carcinoma. The two most prevalent types of NSCLC are lung adenocarcinoma, which accounts for about 40% and lung squamous cell carcinoma (LUSC), which accounts for almost 25–30% of all lung cancers. Building an automated categorization system for these two primary NSCLC subtypes is vital for building a computer-aided diagnostic system (CAD). CAD can improve the quality and efficiency of medical image analysis by increasing diagnosis accuracy and stability, reducing the chance of wrong diagnosis due to subjective factors and missed diagnosis. With the rapid development of Convolutional Neural Networks (CNN) in image processing, a variety of CNN architectures have emerged, that achieve outstanding image classification performance. In this paper InceptionV3, DenseNet-201, and XceptionNet are selected as candidate networks due to their outstanding classification performance with 99.07%, 95.63%, 98.9%, respectively. The results of our study shows that InceptionV3 performs well in the categorization of types of non-small cell lung cancer histopathological images and benign images. Keywords InceptionV3 · XceptionNet · DenseNet · Adam optimizer
1 Introduction Adenocarcinoma accounts for nearly half of all lung cancers and is the most common histologic subtype in most countries. Lung adenocarcinoma has a wide range of clinical, radiologic, molecular, and pathologic features. As a result, there is a lot of R. Mothkur (B) Department of CSE, Dayananda Sagar University, Bangalore, India e-mail: [email protected] B. N. Veerappa Department of Studies in CSE, University BDT College of Engineering, Davanagere, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_35
481
482
R. Mothkur and B. N. Veerappa
misunderstanding, and it’s difficult to differentiate adenocarcinoma, squamous cell carcinoma. This categorization is required to help with patient therapy and result prediction. Due to technological limitations, the best option for lung cancer is still early detection, tailored treatment, and clinical data which clearly reveals that treatment success is closely connected with the stage of cancer at the time of diagnosis. It can effectively elevate patient’s survival rates to 49%; therefore, prevention and early diagnosis are critical components of the current lung cancer management strategy. Computer-aided diagnosis (CAD) has become increasingly popular in medical image analysis in recent years, because of significant advancements in computer technology, particularly computer vision. CAD can considerably increase the quality and efficiency of medical image analysis by boosting the accuracy and stability of diagnosis, while lowering the time it takes. This reduces the risk of inaccurate diagnosis due to subjective factors and missed diagnosis due to human eye oversight. It also reduces the radiologist’s effort significantly, making it a viable diagnostic method for general examination and large-scale medical picture processing. Furthermore, imaging diagnosis is defined as the observation and interpretation of large medical images that eliminates alternative diagnostic approaches such as needle biopsy, making CAD an appropriate screening tool [1, 2]. The convolutional neural network (CNN)’s performance in visual feature extraction has made it a popular approach for image classification problems. Based on existing CNN designs, the goal of this study is to create an appropriate CNN architecture for lung non-small cell cancer image categorization. This network must capture characteristics of lung cancer, both benign and malignant, in order to improve its suspiciousness categorization. The current computer-assisted approaches for analyzing slides with lung cancers focus on classifying one or two forms of lung cancer and distinguishing tumor from non-tumor. The majority of the work is accomplished using a single CNN architecture. Our key contributions in more detail are as follows: i.
This paper proposes applying InceptionV3, DenseNet-201, and XceptionNet into classification of lung adenocarcinomas, lung squamous cell carcinoma, and benign patches. ii. An Adam optimizer is used to update the multiple variables, causing the loss to be minimized with far less effort. iii. Various basic model evaluation metrics are calculated. These metrics help us select the best model amongst the different models trained.
2 Related Work For the purpose of diagnosing mediastinum metastases of NSCLC, Wang et al. [3] compared four conventional machine learning algorithms to one deep learning system dubbed AlexNet. The utilization of visual blotches from two forms, PET and CT, simultaneously, may have negatively impacted CNN’s performance. Due to lymph nodes’ diminutive diameter, the study prioritized diagnostic criteria above textural clues while treating them. Despite lacking essential diagnostic qualities, the CNN
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
483
performed similarly to top approaches, which prompted research into subsuming diagnostic facets into newly developed dual-form PET/CT images. Despite the obvious differences in appearance across nodules, Hussein et al. [4] employed CNN for classification of lung nodules as malignant or non-malignant was successful. The suggested model is a multiple-view CNN that creates three 2D patches, each correlating to a different dimension, using the initial mean intensity projection. In order to calculate the malignancy score, the CNN network then extracts features from the improved input images using Gaussian Process Regression. Regression accuracy for high-level features is 86.58% (0.59 SEM percent), however when CNN is included, the regression accuracy increases to 92.31% (1.59 SEM percent). DFCNet, a deep fully convolutional neural network for lung cancer detection and stage classification in CT images, was suggested by Masood et al. [5]. Using metastatic data from wireless implantable devices or medical wearables with little power, the tumor was eventually identified as one of the four stages of lung cancer. DFCNet’s accuracy was 84.58%, compared to CNNs’ accuracy of 77.6%, and the proposed model was assessed to be sufficiently generic to cover a variety of cancer forms. Hussein et al. [6] devised a method for determining if lung nodules are cancerous. 3D CNN transfer learning was employed to improve nodule characterization. The architecture uses six characteristics and a malignancy label to fine-tune 3D CNNs. Each attribute and label is input into a 3D CNN with five convolutional layers, five max pooling layers, and two fully connected layers for each attribute and label. Then comes feature fusion, which is followed by coefficient vector multiplication to provide a malignancy score. Incorporating PET scans with CT scans appears to be useful in terms of boosting diagnostic accuracy, as testing using the suggested model gave a precision of 91.26%. Chest CT images were utilized by Xie et al. [7] to distinguish between benign and malignant lung nodules. An adversarial auto-encoder-based unsupervised reconstruction network and a supervised classification network constitutes the proposed model’s components. The two components of the model are connected by learnable transition layers for adaptability. Using the LIDC-IDRI dataset, an extension of the model was utilized to characterize each nodule’s overall characteristics, with 92.53% accuracy and 95.81% AUC observed. Humayun et al. [8] suggested an approach which is divided into three stages: first, data augmentation; second, classification using CNN model; and third, localization. The suggested technique provides an efficient non-intrusive diagnostic tool for use in the clinical examination. Compared to the most recent models, the suggested model has a lesser number of parameters that are substantially smaller. Depending on its size, the resilience of the intended dataset is also considered. The accuracy of VGG 16, VGG 19, and Xception at the 20th epoch is 98.83%, 98.05%, and 97.4%, respectively. Pandey and Kumar [9], explored various combinations of deep learning-based feature extractors and machine learning-based classifiers (NSCLC) in order to distinguish between the two kinds of non-small cell lung cancer, adenocarcinoma (ADC) and squamous cell carcinoma (SCC). The ADC and SCC CT scan images utilized
484
R. Mothkur and B. N. Veerappa
in this study’s experiments comprised 400 for learning, 190 for validating, and 38 for test. Seven automatic integrated models were built utilizing the three classifiers Xgboost, Support vector machine, and fully connected neural network and variants of Inception, Xception, and VGG feature extractors. Training, validation, and testing accuracy for the most efficient optimum model using InceptionResNetV2 + SVM were 93.50%, 93.16%, and 94.74%, respectively.
3 Model and Framework Although it may also be used with one-dimensional and three-dimensional data [8], the convolutional neural network (CNN) is a class of neural network model that was created to deal with two-dimensional image [10]. Convolution is the general idea of applying a filter to an input to achieve activation [11]. A stride is the size of the kernel’s sliding window. The output of the network is replaced by the pooling layer using a summary statistic of adjacent outputs. As a result, the spatial dimension of the depiction is lessened, which cuts the calculation and weights required. InceptionV3, DenseNet-201, and XceptionNet are the three CNN architectures used in the proposed model, which are detailed more below.
3.1 InceptionV3 InceptionV3 is CNN architecture that employs label smoothing; an auxiliary classifier to standardize the classifier by evaluating the out-turn of label-dropout in training. InceptionV3, an inception model similar to GoogleNet [12], which integrates numerous discrete sized convolutional filters into a new filter. This design considerably reduces the computational complexity and the quantity of parameters that need to be learnt. It has factorized 7 × 7 convolutions and 48 layers. In naive inception model convolution is performed on an input image with three filter sizes of 1 * 1, 3 * 3 and 5 * 5. Max pooling is further used. The results are concatenated and forwarded to the subsequent inception model. Linear logits act as a non-linearity layer. The image of size 384 * 384 with color channels set to 3. The batch size and validation steps are set to 16 and 32, respectively. The model is trained with 40 iterations and a learning rate of 0.001. A few more layers are added to the base model and use a softmax layer with 3 outputs as the final layer, for the 3 classes in our task. The dropout in hidden layer is set to 0.6 and 0.5 to prevent the model from overfitting. An Adam optimizer is used to reduce the loss in less effort. As illustrated in Fig. 1, the trained model is utilized to predict the form of lung cancer, including adenocarcinomas, benign tumors, and squamous cell carcinomas.
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
485
Fig. 1 InceptionV3 architecture for lung cancer classification. Feature extraction part reprinted from “Transfer Learning-Based Image Visualization Using CNN” by Giri, Santosh & Joshi, Basanta, International Journal of Artificial Intelligence & Applications, 2019
3.2 DenseNet DenseNet layers get extra input from all preceding layers and transmit their own feature maps to all future layers. Each layer gets aggregate information from the layers above it as shown in Fig. 2. The network can be narrower and more condensed, results in fewer channels because each layer compiles feature maps from all prior levels. Using a pre-activation batch norm supervened by a rectified linear unit with 1 * 1 reduces the model’s complexity. Instead of applying the input, 3 * 3 convolution applies the convolution process on the altered version of the input with fewer channels [13]. The suggested model employs the DenseNet-201 architecture, which has 201 layers. To implement batch normalization, the transition layer employs down sampling. At the conclusion, we calculate a worldwide average pooling.
3.3 XceptionNet XceptionNet is a representation of extreme inception. It has 71 layers and is a deep CNN. The initial flow is where the data enters, followed by eight iterations via the middle flow and finally the exit flow. 36 convolutional layers make up the feature extraction cornerstone of the Xception architecture as shown in Fig. 3. With the exception of the first and final layers, the 36 convolutional layers are organized into
486
R. Mothkur and B. N. Veerappa
Fig. 2 DenseNet-201 architecture for lung cancer classification. Adapted from “Cerebral MicroBleeding Detection Based on Densely Connected Neural Network”, by Wang, Shuihua & Tang, Chaosheng & Sun, Junding & Zhang, Yu-Dong, Frontiers in Neuroscience, 2019
Fig. 3 XceptionNet architecture for lung cancer classification
14 modules, each of which has linear residual connections enclosing them. In deep learning frameworks like tensorflow and keras, a depthwise separable convolution entails first performing a depthwise convolution [14], or a spatial convolution that is performed separately over each channel of an input, and then widening the depthwise convolution’s output channels with a pointwise convolution, or a 1 × 1 convolution.
4 Methodology The architecture of the proposed system is as shown in Fig. 4. The input is lung cancer dataset consisting of 15,000 histopathological images belonging to three classes namely lung adenocarcinomas, lung squamous cell carcinoma, and lung benign cells. Originating in cells that typically exude fluids like mucus are lung adenocarcinomas tumors. The outer layers of the lung are where adenocarcinoma is most frequently seen. Squamous cells, which are flat cells that coat the lining of
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
487
Fig. 4 Proposed methodology
the lung’s airways, are where squamous cell carcinomas begin. They are typically connected to a history of smoking and are located in the center of the lungs, close to a major airway (bronchus). A benign lung tumor is an abnormal growth of tissue that is neither cancerous nor serves any purpose. The model is trained using three different CNN architectures namely InceptionV3, DensetNet-201, XceptionNet. Then Adam optimizer is applied. In lieu of the conventional stochastic gradient descent method, an Adam optimization algorithm can be employed to update network weights iteratively depending on training data. Using estimations of the first and second moments of gradient, Adam modifies the learning rate for each neural network weight [15]. It works effectively for issues with large amounts of parameters or data. Due to its ability to deliver the best results, it is an effective algorithm in the field of deep learning. The test images are then automatically divided into three classes using the trained model. Based on performance criteria including accuracy, precision, recall, and F1 score, each model will be assessed.
5 Experiment 5.1 Lung Cancer Histopathological Images The Larxel’s lung cancer dataset [16] from Kaggle is used for evaluation purpose. There are 15,000 histopathological scans in this collection, divided into three classifications of lung adenocarcinomas, lung squamous cell carcinomas, and lung benign tissues. Each image is a jpeg file with a resolution of 768 by 768 pixels. The images
488
R. Mothkur and B. N. Veerappa
were obtained using a sample of 750 original lung tissue images from sources that complied with and were authorized by the Health Insurance Portability and Accountability Act (250 benign lung tissue, 250 lung adenocarcinomas, and 250 lung squamous cell carcinomas). The 750 images have been augmented to 15,000 images using the augmentor package. The dataset is increased to 15,000 images using augmentor by the following augmentations: left and right rotations (up to 25°, 1.0 probability), as well as horizontal and vertical flips (0.5 probability). Each class has a total of 5000 images. The split ratio for the training, validation, and testing sets is 60:20:20. 9000 of the 15,000 images are used for training, 3000 for validation and 3000 for testing.
5.2 Experiment Details Training hyperparameters are defined as follows with input image resized into 384 * 384 pixels. The color channels are set to 3. The batch size of 16 training samples is used. With validation images consisting of 3000 and batch size value with 16, the total validation steps set to 32. The learning rate is a hyperparameter that regulates how much the model evolves each time the model scales are varied in response to the approximation fault. It is set to 0.001. The number of epochs determines complete runs of the training dataset which is set to 40 in our evaluation. The model is trained with three different candidate architectures namely InceptionV3, XceptionNet, and DenseNet-201. Learning rate, early stop and checkpoints are defined as model callbacks. Models frequently gain by decreasing learning rate by a factor of 2–10 when learning gets static. If no progress is shown after a certain number of “patience” epochs, this callback checks a quantity and lowers the learning rate. Upon attaining a learning plateau, validation loss for 5 epochs (patience = 5) is monitored and the learning rate by a factor of 0.1 (factor = 0.1) is decreased. When a monitored parameter stops advancing, early stopping terminates the training. Every epoch, the training loop will check to see if the validation loss is still decreasing. If it is no longer decreasing, then training terminates. For every 10 epochs (patience = 10), the validation loss will be monitored. The train set size of 9000 with batch size of 16 and 562 steps per epoch is initialized. Similarly, the validation set size of 3000 with batch size of 16 and validation steps are set to 187.The model is experimented using google Collaboratory with GPU runtime environment.
5.3 Evaluation Metric and Results 5.3.1
Accuracy
It is ratio of righty predicted sample to total samples. Equation 1 represents the accuracy.
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
Accuracy =
5.3.2
489
True Positives + True Negatives Total images sampled
(1)
Precision
Precision is the ratio of rightly predicted positive samples to total predicted positive samples. Equation 2 defines the precision. Precision =
5.3.3
True Positive True Positive + False Positive
(2)
Recall
Recall also termed as sensitivity is the ratio of rightly predicted positive samples to total samples in the actual class. Equation 3 represents the recall measure. Recall =
5.3.4
True Positive True Positive + False Negative
(3)
F1 Score
F1 score is the weighted average of precision and recall. Equation 4 represents the F1 score. 2 ∗ (precision ∗ recall) precision + recall
F1 score =
(4)
Results of our evaluation are listed in Table 1. InceptionV3 model outstands with 99% accuracy compared to DenseNet-201 and XceptionNet model. The model accuracy graph for Densenet-201, XceptionNet, and InceptionV3 are shown in Figs. 5, 6 and 7, respectively. Table 1 Metrics of different CNNs model Model
Accuracy (%)
Precision (%)
Recall (%)
F1 score (%)
InceptionV3
99.07
99.02
99.05
99.07
DenseNet-201
95.63
95.67
95.63
95.65
XceptionNet
98.9
98.9
98.9
98.9
490
R. Mothkur and B. N. Veerappa
Fig. 5 InceptionV3 model accuracy
Fig. 6 XceptioNet model accuracy
Table 2 evaluates the accuracy of CNN architectures for three classes and Fig. 8 shows the comparative analysis of three CNN architectures. Table 3 represents the comparison of proposed method with state of art models with accuracies obtained.
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
491
Fig. 7 DenseNet-201 model accuracy
Table 2 Accuracy of CNN architectures based on classes
Classes
InceptionV3 DenseNet-201 XceptionNet
Lung 98.7 adenocarcinomas
91.2
98.7
Lung benign
99.1
99.2
99.5
Lung squamous cell carcinomas
98.5
95.7
98
105 100 95 90 85 IncepƟonV3
DenseNet-201
Lung Adenocarcinomas
XcepƟonNet
Lung Benign
Lung Squamous Cell Carcinomas
Fig. 8 Comparative analysis of different CNN architectures on three classes
6 Conclusion In this paper, we experimented with multiple CNN models namely InceptionV3, DenseNet-201, and XceptionNet. These three trained models are used to classify
492
R. Mothkur and B. N. Veerappa
Table 3 Comparison with art of state models Authors
Methodology
Accuracy (%)
Liu et al. [17]
CapsNet
81.3
Tsukamoto et al. [18]
AlexNet, GoogLeNet (Inception V3), VGG16, and ResNet50
74.0, 66.8, 76.8 and 74.0
Teramoto et al. [19]
DCNN
71
Baranwal et al. [20]
VGG-19
92.1
Proposed method
InceptionV3, DenseNet-201, XceptionNet
99.07, 95.63, 98.9
the images into lung adenocarcinomas, lung squamous cell carcinoma, and lung benign tissues. Further, we concluded that out of these three models, the InceptionV3 has the best performance and is suited to be used. It illustrates the potential for using such approaches to automate diagnosing duties in the near future. Although deep learning-based classification systems have shown encouraging results, they are still hugely reliant on pre-processing techniques, external input, pooling strategies, multi-resolution models, and detection performance. Classifiers are becoming more generalizable, allowing them to cover a wider range of diseases at once, and the amount of treatment urgency has become a target to anticipate as well. Comparing the technique to current algorithms reveals its efficacy and shows promising results for clinical use. Instead of using the raw data in the future, feature-based input might be employed, which could improve the network learning. This could result in acceptable enhancements to the network model’s performance.
References 1. Kobayashi T, Xu X-W, MacMahon H, Metz CE, Doi K (1996) Effect of a computer-aided diagnosis scheme on radiologist’s performance in detection of lung nodules on radiographs. Radiology 199:843–848 2. MacMahon H, Engelmann R, Behlen FM et al (1999) Computer-aided diagnosis of pulmonary nodules: results of a large-scale observer test. Radiology 213:723–726 3. Wang H, Zhou Z, Li Y, Chen Z, Lu P, Wang W, Liu W, Yu L (2017) Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images. EJNMMI Res. Article No. 11 4. Hussein S, Gillies R, Cao K, Song Q, Bagci U (2017) Tumornet: lung nodule characterization using multi-view convolutional neural network with Gaussian process. In: IEEE 14th international symposium on biomedical imaging, pp 1007–1010 5. Masood A, Sheng B, Li P, Hou X, Wei X, Qin J, Feng D (2018) Computer-assisted decision support system in pulmonary cancer detection and stage classification on CT images. J Biomed Inform 79:117–128 6. Hussein S, Cao K, Song Q, Bagci U (2017) Risk stratification of lung nodules using 3d CNNbased multi-task learning. In: International conference on information processing in medical imaging. Springer, pp 249–260 7. Xie Y, Zhang J, Xia Y (2019) Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal 57:237–248
Deep Learning-Based Three Type Classifier Model for Non-small Cell …
493
8. Humayun M, Sujatha R, Almuayqil SN, Jhanjhi NZ (2022) A transfer learning approach with a convolutional neural network for the classification of lung carcinoma. Healthcare (Basel) 10(6):1–10 9. Pandey A, Kumar A (2022) Deep features based automated multimodel system for classification of non-small cell lung cancer. In: IEEE Delhi section conference (DELCON), pp 1–7 10. Vinutha MR, Chandrika J (2021) Prediction of liver disease using regression tree. Int J Online Biomed Eng (iJOE) 17(02):164–172 11. Prashanth SJ, Prakash H (2021) A features fusion approach for neonatal and pediatrics brain tumor image analysis using genetic and deep learning techniques. Int J Online Biomed Eng (iJOE) 17(11):124–140 12. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833 13. Nguyen L, Lin D, Lin Z, Cao J (2018) Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation. In: IEEE international symposium on circuits and systems (ISCAS), pp 1–5 14. Rahman T, Chowdhury ME, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl Sci 10(9) 15. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. Computing Research Repository 16. Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM (2019) Lung and colon cancer histopathological image dataset (LC25000). arXiv:1912.12142v1 [eess.IV] 17. Liu H, Jiao Z, Han W, Jing B (2021) Identifying the histologic subtypes of non-small cell lung cancer with computed tomography imaging: a comparative study of capsule net, convolutional neural network, and radiomics. Quant Imaging Med Surg 11(6):2756–2765 18. Tsukamoto T, Teramoto A, Yamada A, Kiriyama Y, Sakurai E, Michiba A, Imaizumi K, Fujita H (2022) Comparison of fine-tuned deep convolutional neural networks for the automated classification of lung cancer cytology images with integration of additional classifiers. Asian Pac J Cancer Prev 1–10 19. Teramoto A, Tsukamoto T, Kiriyama Y, Fujita H (2017) Automated classification of lung cancer types from cytological images using deep convolutional neural networks. BioMed Res Int 1–6 20. Baranwal N, Doravari P, Kachhoria R (2021) Classification of histopathology images of lung cancer using convolutional neural network (CNN). arXiv:2112.13553
Cancer Classification from High-Dimensional Multi-omics Data Using Convolutional Neural Networks, Recurrence Plots, and Wavelet-Based Image Fusion Stefanos Tsimenidis and George A. Papakostas Abstract High-dimensional and multi-modal data pose an exceptional challenge in machine learning. With the number of features vastly exceeding the number of training instances, such datasets often bring established pattern recognition techniques to an awkward position: Traditional, shallow models crumble under the sheer complexity of the data, but deep neural networks will helplessly overfit. In this study, an innovative methodology takes up the task, in the case study of using multimodal biological data for binary classification of cancer types. Our deep learning approach entails transforming the data into images, integrating different modalities via wavelet-based image fusion, then extracting features, and classifying the data with pretrained convolutional neural networks. The results reveal that this framework has the potential to tackle high-dimensional data efficiently and effectively, learning from a low volume of complex data without overfitting, suggesting this to be a promising direction for further research. Keywords Deep learning · Computer vision · Artificial intelligence · Multi-omics · Systems biology · Cancer
1 Introduction High-dimensional datasets are too complex for shallow models to manage but contain too few training instances for deep models not to overfit. When the number of features ranges in the tens of thousands or more, deep neural networks are necessary, but when the number of training instances ranges in the hundreds or less, deep neural networks overfit. The present study aims to address this issue. S. Tsimenidis · G. A. Papakostas (B) MLV Research Group, Department of Computer Science, International Hellenic University, Kavala 65404, Greece e-mail: [email protected] S. Tsimenidis e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_36
495
496
S. Tsimenidis and G. A. Papakostas
A high-value case of high-dimensional data is in biomedical science, namely the multi-omics data (i.e., genomics, proteomics, etc.) from cancer clinic patients, used for diagnosis [1]. The multi-modal nature of such data means that for each patient we have a number of different datasets available, such as gene expression profiles, DNA methylation data, genomic logs of Copy Number Variants, and clinical information. These datasets routinely come with thousands, or tens of thousands, of features, or possibly more. With the laboratory costs to capture multi-omics data being high, the number of training instances in the datasets is, as a rule, less than a thousand, usually no more than a few hundred. In these tasks, dimensionality reduction is the rule, usually with autoencoders (AE) coupled with feature selection techniques (see Sect. 2). In our study, we reduce dimensionality with techniques from computer vision, namely CNN-based feature extraction and image fusion. Convolutional Neural Networks (CNNs) are perhaps the most successful Deep Learning (DL) models. By converting tabular data into images, we can take advantage of CNNs and, in our study, we apply and compare two methods for tabular-to-image data conversion: (1) a simple reshaping of a vector into a 2d matrix, which is taken to be a single-channel image, and (2) recurrence plots [2, 3]. These images will serve as training data for CNN models. Since the data is complex we need a fairly deep model, but since we will train the model with only a few images, overfitting is almost guaranteed. We remedy this by utilizing stateof-the-art CNN architectures that come pretrained with the ImageNet dataset, and we test and compare four different models. The data integration takes place after the datasets have been converted into images, via wavelet-based image fusion. A block diagram of the complete pipeline is on Fig. 1. The strength of our proposed framework is three-fold. First, instead of training feature-extracting AEs with limited data, it takes advantage of state-of-the-art CNN models that come pretrained with thousands of images. Second, it can be applied for compression and dimensionality reduction of very high-dimensional data by dividing a dataset in half, converting it to images, fusing them, and then extracting features via the CNNs. Third, it could serve as a scalable, general-purpose, universal methodology for multi-modal data of all kinds. For example, a biometrics recognition system may integrate speech (sequential), fingerprints (image), and biomedical data (tabular). A truly universal framework should be able to integrate all these modalities. Our system, by converting tabular data into images, can potentially be utilized to fuse and integrate sequential, tabular, and image data and can learn from only a limited number of training instances yet avoid overfitting. Comparing our results with other papers is infeasible, as there are no benchmark datasets in the field. Scientists search, preprocess, combine, and use their own datasets, usually from The Cancer Genome Atlas [4]. Our goal is to compare the two tabular-to-image conversion techniques, to test four different pretrained models for the most suitable for images derived from non-image data, and to determine whether image fusion brings any improvement over the individual, single-omic datasets.
Cancer Classification from High-Dimensional Multi-omics Data …
497
Fig. 1 High-level view of our proposed methodology
The next section reviews the publications in the field of DL-based modeling of biological multi-omics data. Section 3 lays out the various techniques used in our study. Section 4 describes our experimental methodology and examines the results. In Sect. 5, we reach conclusions and discuss the directions future research may take.
2 Related Work In this section, we review the research on deep learning-based classification of multimodal biological data (multi-omics ), notably for diagnosis and prediction of cancer. Given the nature of the data involved, a pervading theme throughout the various implementations is the reduction of dimensionality either through feature extraction, or feature selection, or a combination of both.
498
S. Tsimenidis and G. A. Papakostas
In [5], an Autoencoder (AE) extracts features from multi-omics data from hepatocellular carcinoma patients. Further feature selection is pursued with Cox-PH (Cox proportional hazards) [6], and then, traditional ML models learn to classify the data in terms of different cancer types. A similar setup [7] extracts features with an AE, selects the best of them with Cox-PH, and trains traditional models, followed by statistical analysis to extract biomedical insight and search for patterns in the genomic interactions in cancer patients. The exact same trilateral approach of AE, Cox-PH, and traditional ML has been explored in [8], this time with an all-inclusive set of six different types of omic data, plus a variety of biomedical data. In [9], the three-step procedure deploys a denoising, instead of a regular or a stacked AE for extraction, followed by selection based on feature coefficients computed via logistic regression. Conventional statistical analysis for the selection phase has been applied in [10], in the same overall scheme. In [11], two different AE architectures are tested for the feature extraction step. In the first approach, two omics datasets are concatenated, then processed by an AE. In the second approach, the concatenation takes place higher up the pipeline, each dataset starting with its own AE layer, then the layers concatenated and ending up into two output layers that reconstruct the data. After training the AE, the bottleneck layer is used for feature extraction. Both approaches proceed, after the AE-based feature extraction, to ANOVA for selection and to a support vector machine for classification. Experimental results showed the second arrangement to produce better performance. In [12], an experimental AE topology was tested on multi-omics datasets, arranged in pairs, from breast cancer patients. An AE takes a single-omic dataset as input but two datasets as output, learning to reconstruct a pair of omics datasets from a single one. Then an element-wise mean of the extracted, concatenated features is used as input for neural network classifiers performing both binary and multi-class classification. The unorthodox AE architecture, though interesting and creative, was surpassed by a conservative setup with an AE extracting features from concatenated pairs of multi-omics data. In [13], three AEs extract features from three multi-omics datasets, and then, their outputs are concatenated and combined with drug profiles, which contain both categorical and numerical data. This information is then used to train a neural network to predict the effect combinations of drugs will have when applied to various cancer cell lines. In [14], an initial random forest is trained with a combination of various multi-omics and drug profile data to classify viable targets for drug treatment in cancer therapy. The feature importance values of the random forest model are then retrieved and applied for feature selection, with the resulting dataset used to train a neural network. Throughout the reviewed literature, two major themes reappear. First, the everpresent dimensionality reduction often entails an AE but may also come with other techniques as well. Given the high-dimensionality of the data, simplifying the datasets seems necessary and inevitable. Second, the combination of various multi-omics datasets rather than single omics, as it is generally assumed that data captured from different aspects of biological systems provide richer and more discriminative information, as each level of the biological system contains information not present in
Cancer Classification from High-Dimensional Multi-omics Data …
499
other levels. Our approach interacts with these two themes in novel ways, as well as anticipates its application in other fields, adopting techniques fit to manage a vast assortment of multi-modal data, both tabular and sequential.
3 Proposed Approach Dataset The data comes from the TCGA repository (https://portal.gdc.cancer.gov) and consists of three datasets: MiRNA expression, gene expression, and DNA methylation. The labels are two forms of cancer, breast and ovarian. All datasets are real valued and normalized. The total number of instances is 426 and 213 for each class. Datasets differ in the number of features, with the gene dataset containing 6347 features, the methylation dataset 3056, and the MiRNA dataset 94 features. Data to raw images The three datasets were converted into images with two techniques. Firstly, each vector of each dataset was reshaped into a square matrix, with zero-padding at the end to fill any leftover empty slots. Thus, the vector, e.g., [0.2, 0.2 0.2, 0.7], becomes 0.2 0.7 0 The resulting matrices are taken as images of a single color channel (Fig. 2) and compiled into a dataset of images to train convolutional neural networks. Data to recurrence plots The data is also converted into recurrence plots, a visualization technique used in dynamical systems theory, physics, and time-series analysis [2, 15–18]. A recurrence plot shows which parts of a signal or vector recur, and it shows at what specific position within the vector the recurrence takes place. In Fig. 3, we show examples of recurrence plots resulting from the data used in this study. A recurrence plot is a square matrix of binary values, either 0 or 1, with each line and column representing a portion of a vector that has been divided into equal parts. A time-window rolls over the vector, pairs of subvectors are being compared, and a 2D
Fig. 2 Examples of raw images, generated by reshaping the numerical vectors into square matrices, adding zero-padding at the end
500
S. Tsimenidis and G. A. Papakostas
Fig. 3 Examples of recurrence plots produced from the omics data
matrix is being computed, where each pixel reveals whether the Euclidean distance between two subvectors lies below a user-specified threshold or not. Wavelet-based Image Fusion Other than training CNNs with images derived from the individual omics datasets, we also experiment with multi-omic data integration via wavelet-based image fusion [19, 20]. This fusion technique is based on the Discrete Wavelet Transform (DWT) [21]. Similar to the Discrete Fourier Transform, DWT convolves a signal with a basis, but whereas in Fourier transforms the basis is a sinusoid and the signal is decomposed in its frequency content; in DWT, the bases are wavelets. These are short oscillations or spikes, and both the frequency and the spatio-temporal information of the signal are captured. In wavelet-based image fusion, two images are first decomposed, using DWT, into sets of coefficients. Then, the coefficients of the two images are combined using one of a number of methods. Finally, the combined coefficients are passed through an Inverse Discrete Wavelet Transform that yields a single image, a fusion of the two initial images.
Cancer Classification from High-Dimensional Multi-omics Data …
501
Fig. 4 An example of wavelet-based image fusion on the recurrence plot generated in our study. The two images on top are fused into the image at the bottom
In our implementation, the coefficients are combined by averaging them, thus c = (c1 + c2 )/2. An example of how the fused recurrence plots look is depicted in Fig. 4. Convolutional Neural Networks In Convolutional Neural Networks (CNNs) [22– 24], the hidden layers perform convolutions, that is, they perform a dot product between their kernel and the input matrix. As the kernel slides along the input matrix and the dot product is computed, a feature map is generated representing whether, and to what degree, the patterns encoded in the kernel correspond to the patterns in the input matrix. Convolutional layers are usually followed by global and/or local pooling layers, whereas the dimensionality of the feature map is reduced by grouping pixels together. A typical CNN consists of a cascade of convolutional and pooling layers, and their output then flattened and fed into fully connected layers for additional processing and the final classification.
502
S. Tsimenidis and G. A. Papakostas
Fig. 5 a A VGG block and a VGG network. b An inception block. c A ResNet block
Pretrained models We use four different CNN architectures pretrained with the ImageNet dataset [25]. In our scheme, these models perform feature extraction, and the final classifications are generated by dense layers added on top. The four pretrained models we will be using for transfer learning [26] are discussed next. VGG16 A research group from Oxford University, named Visual Geometry Group (VGG), developed the idea of the VGG block (Fig. 5a). It consists of a series of convolutional layers, followed by a max-pool layer. A VGG network is a cascade of VGG blocks, followed by a number of dense layers that lead up to the final classification [29]. VGG16 contains 13 convolutional layers and 3 dense layers. InceptionV3 Inception blocks run convolutional kernels of various sizes in parallel (Fig. 5b). All the various convolutional layers receive the input matrix, process it, and then their outputs are concatenated and sent to the rest of the neural network. InceptionV3 [27] is an improvement of the Inception architecture, with a number of additions such as a side-classifier to integrate label information lower down the network, factorized 7 × 7 convolutions, and label smoothing. Inception-ResNetV2 A hybrid Inception-ResNet block processes the input with multiple kernels in parallel, in Inception fashion, but also possesses skip connections [30]. These skip connections are characteristic of Residual Networks (ResNet) (Fig. 5c) and enable the information to both skip the block and also go through it. On one hand, this addresses the vanishing gradient problem. On the other hand, they ensure adding a residual layer does not cause the model to forget the previous mapping; thus, the new model cannot be worse than the initial one. Xception An architecture originating from variations of the Inception block, Xception relies exclusively on depth-wise separable convolution layers [28]. In this case, the computation occurs in two steps, with a depth-wise convolution passing a single filter per input channel, then a point-wise convolution taking the results of the depth-wise convolutions and producing a linear combination of them.
Cancer Classification from High-Dimensional Multi-omics Data …
503
Overall Methodology The bottom line of our proposed method is to convert highdimensional, multi-modal data into images, integrate different data modalities by fusing the images via image fusion techniques, and use this data to train state-ofart, pretrained CNN models. Thus, by transforming the data and bringing it to the image domain we take advantage of cutting-edge computer vision techniques and transfer learning. In this particular experimental study, using omics data for cancer classification as a case study, we run tests for different combinations in three axes: omics dataset, image representation, and pretrained CNN model. Initially, we take three omics datasets and transform them into both raw images and reccurence plots, resulting in six datasets. We also generate, through wavelet-based image fusion, fused datasets comprised of all possible pairings of the initial, single-omic ones. The three individual omics datasets can be combined in three possible pairs, and this is done for each of the two image representations. Twelve different datasets now emerge, and each of these will be used to train four pretrained CNNs. Thus, we run a total of 48 experiments. These are the 6 × 2 × 4 possible combinations of these three parameter grids: (1) Omics Dataset [meth, gene, mirna, meth-gene, meth-mirna, mirna-gene] with the dual-omics generated via wavelet-based image fusion, (2) Image Representation [raw images, recurrence plots], and (3) Pretrained CNN Model [VGG16, InceptionV3, Inception-ResNetV2, Xception]. We measure accuracies for these experiments and evaluate the results, reporting our findings in the next section, as well as providing more technical details of the implementation.
4 Experiments and Results In this section, we discuss our experiments and examine the results. Data The data consists of three datasets, each representing an omic type: gene expression, MiRNA expression, and DNA methylation. The biological samples represented by the data are divided into 213 samples of one class (breast cancer) and 213 samples of a second class (ovarian cancer), for a total of 426 samples. The DNA methylation dataset has 3056 features, the gene expression dataset has 6347 features, and the MiRNA expression has 94. To make appropriate use of the limited number of training instances, we refrain from a static train-test split and apply tenfold cross-validation instead. Each fold contains 346 test samples (80% of total), 42 train samples (10% of total), and 38 validation samples (10% of total). Image Representations In order for CNNs to be utilized, the data, which consists of real-valued vectors, must be converted into some type of image representation. We do this in two ways: (1) a “raw image” representation, with a simple reshape of the vectors into square matrices, plus zero-padding at the end, and (2) recurrence plots. Since the datasets have varying dimensionality, the images that would be produced would be inconvenient both for the image fusion and for the CNN training phase. We remedy this in one way for the raw images, and in another for the recurrence plots. For the raw images, we concatenate the shorter datasets with themselves as many
504
S. Tsimenidis and G. A. Papakostas
Fig. 6 Accuracies from training with raw images, grouped by omics types
times as needed, each time changing the order of the features, making sure a variety of patterns will emerge for the CNNs to learn from. For the recurrence plots, we tune the relevant parameters (window and dimension) to produce plots of identical dimensions. We apply image fusion with the pywavelet package in Python. For each omics pair that we want to fuse, we decompose the datasets into wavelet coefficients, take the element-wise average of the coefficients (c = (c1 + c2 )/2), and then reverse transform the results into a set of images. The image fusion is applied for pairs of omics types: gene-MiRNA, gene-meth, and meth-MiRNA. We did this for both raw images and recurrence plots, resulting in six separate datasets. Each one of these six datasets is used to train four pretrained CNNs. CNN architecture and hyper-parameters For both types of image representation, and for each of the three datasets, four pretrained models are tested. We use the pretrained models from the keras framework in the Python programming language. The models come pretrained on the ImageNet, and we only keep the convolutional bases, adding one dense layer with 256 neurons on top, then a dropout layer with a rate of 0.1, and finally, an output neuron that generates the classification. The convolutional bases of the four pretrained models are “locked”, and their parameters do not change during training. Only the dense layers tune their internal parameters, learning to fit the extracted features from the convolutional bases. For the loss function, we apply binary cross-entropy. We use the Adam optimizer with a learning rate of 0.01. Training takes place for 30 epochs maximum, with the early stopping of 5 epochs, and a batch size of 20. Effect of omic type From a high level, the best overall classification performance across image representations and CNN models comes from the methylation dataset, with a mean accuracy of 76.84%. When we examine the results closer, there seems to be a high variance of performance with each CNN model, especially when raw images are used, as evidenced in Fig. 6. With raw images, most datasets yield inconsistent accuracy for each model, with Inception-ResNetV2 having the lowest accuracy, and Xception the highest. The gene dataset with the Xception model brings the highest accuracy, although the lowest accuracy comes from the gene dataset as well, with
Cancer Classification from High-Dimensional Multi-omics Data …
505
Fig. 7 Accuracies from training with recurrence plots, grouped by omics types
Fig. 8 Accuracies per image representation, grouped by model
the Inception-ResNetV2 model. The best overall and most consistent across models, with the raw images, is the MiRNA expression dataset. With the recurrence plots, the performance is more consistent across CNN models (Fig. 7), with the methylation dataset having the best overall performance, and the gene dataset the worst. Effect of image representation As already alluded to in the previous paragraph and figures, the recurrence plots proved superior in performance to the raw images. Recurrence plots achieved a mean accuracy, across datasets and models, of 75.38%, while raw images had an accuracy of 70.66%. We also see, in Fig. 8, that raw images have uneven performance, with Inception-ResNetV2 yielding the worst results, and Xception the best. The recurrence plots have a more even performance, with Xception achieving the highest overall performance. Effect of pretrained CNN model (Fig. 9) The best overall performance across datasets and image representations is achieved by Xception, with a mean accuracy of 76.20%, and the worst by Inception-ResNetV2, with a mean accuracy of 68.03%. What is interesting is that, between these two models, when it comes to the recurrence plots based classification, both models have similar accuracy, with Xception mean accuracy 75.92%, and Inception-ResNetV2 mean accuracy 75.15%. The two models’ status as best (Xception) and worst (Inception-ResNetV2) overall is attributed to their difference in performance with the raw images, with Xception achieving 76.47% mean accuracy, and Inception-ResNetV2 reaching a quite low mean accu-
506
S. Tsimenidis and G. A. Papakostas
Fig. 9 Accuracies per model, grouped by image representation Table 1 Accuracies (%) achieved on the raw images. meth stands for DNA Methylation, gene stands for gene expression, and mirna stands for MiRNA expression Raw images InceptionV3 Xception VGG16 InceptionResNetV2 meth gene mirna meth-gene meth-mirna mirna-gene
75.11 77.23 77.26 67.05 76.30 70.41
73.48 82.44 77.50 73.54 74.58 77.29
73.24 69.73 75.14 69.70 64.85 75.59
64.70 50.14 72.79 62.11 59.94 55.74
Table 2 Accuracies (%) achieved on the raw images. meth stands for DNA methylation, gene stands for gene expression, and mirna stands for MiRNA expression Recurrence plots InceptionV3 Xception VGG16 InceptionResNetV2 meth gene mirna meth-gene meth-mirna mirna-gene
81.07 61.69 75.68 79.86 78.48 73.89
85.11 60.11 76.78 76.69 82.14 74.74
82.20 64.16 73.21 77.47 81.42 73.64
79.88 61.60 76.84 79.06 78.53 75.03
racy of 60.90%. Thus, the distinction of which model fares better and which worse, is more meaningful with the raw images than with the recurrence plots. Final observations The complete results of the experiments are displayed in Tables 1 and 2. The best absolute accuracy was produced by the gene dataset with raw image representation and the Xception model, with an accuracy of 82.44%. The worst performance was produced by the gene dataset, raw image representation, and the
Cancer Classification from High-Dimensional Multi-omics Data …
507
Inception-ResNetV2 model, with an accuracy of 50.14%. The recurrence plot fared generally better than the raw images, often surpassing the raw images by up to 20% in accuracy, although the opposite has been observed as well. It seems that each CNN model responds better to different types of data and patterns, and no general trend can be discerned. Choice of the model should be based on empirical grounds, with no apriori model to be preferred. Even the overall worst model, Inception-ResNetV2, can be the best in some arrangement of dataset and image representation, the case in our study being the mirna and the gene-mirna dataset coupled with recurrence plot representation. VGG16 surpassed the other models with the gene dataset represented as recurrence plots. In the rest of the cases, Xception and InceptionV3 are the best models. Fusion does not seem to yield any synergistic effects, with the accuracy resulting from the fused dataset being somewhat of an average of the accuracies of the two individual, constituent datasets. One option that could change this would be to experiment with different methods to merge the DWT coefficients in the waveletbased fusion process since the only method we applied was to average the coefficients. The recurrence plots fare better with the meth, meth-gene, and meth-mirna datasets, while the raw plots are best, though with quite uneven results across different models, with the gene, mirna, and the gene-mirna datasets. In conclusion, the decision within any one of these three axes, that of datasets, image representation, and pretrained CNN model, must occur on a strictly empirical basis and after experimental tests. All possible combinations are on the table, with the agenda for further experimentation and testing on the fusion method.
5 Conclusion In this study, we have proposed a framework for the preparation and classification of multi-modal, high-dimensional data, and we have applied this framework to the case study of multi-omics biological data for cancer classification. Our proposed scheme accounts for the complexity of multi-modal data by applying DL, as opposed to traditional ML models, and avoids the overfitting expected when deep neural networks encounter data with thousands of features but only hundreds of training instances. This is achieved mainly by applying transfer learning with state-of-the-art CNN models. Additionally, the use of techniques adopted from the field of computer vision, even though the raw data does not come in the form of images, ensures the generalizability of the framework to integrate multi-modal data from almost any field imaginable. Sequential, numerical, categorical, and image data can potentially be integrated and used to train cutting-edge models successfully, even if the number of training instances is as low as a few hundred. Our planned direction for future research and improvement would be, first of all, the image fusion process. Instead of averaging the DFT coefficients, we could apply an element-wise min or max, e.g., c = max(c1 , c2 ). Or we could apply a different technique (mean, min, max) for each color channel, resulting in a richer
508
S. Tsimenidis and G. A. Papakostas
representation of the data. Second, we will experiment with more pretrained CNN models, which are quite popular in the computer vision field and routinely achieve state-of-the-art performance in problems previously deemed as unsolvable. Finally, our pipeline will be complemented by interpretability techniques such as grad-cam [31], with the end goal of arriving at a comprehensive framework for the complete, end-to-end integration, classification, and interpretation of a variety of multi-modal, high-dimensional data. Acknowledgements This work was supported by the MPhil program “Advanced Technologies in Informatics and Computers”, hosted by the Department of Computer Science, International Hellenic University, Kavala, Greece.
References 1. Zhu W, Xie L, Han J, Guo X (2020) The application of deep learning in cancer prognosis prediction. Cancers 12(3):603 2. Tziridis K, Kalampokas T, Papakostas GA (2021) EEG signal analysis for seizure detection using recurrence plots and tchebichef moments. In: 2021 IEEE 11th annual computing and communication workshop and conference (CCWC), pp 0184–0190. https://doi.org/10.1109/ CCWC51732.2021.9376134 3. Thiel M, Romano MC, Kurths J (2004) How much information is contained in a recurrence plot? Phys Lett A 330(5) 4. https://portal.gdc.cancer.gov/repository 5. Chaudhary K, Poirion OB, Lu L, Garmire LX (2018) Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24(6):1248–1259 6. Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B (Methodological) 34(2) 7. Lv J, Wang J, Shang X, Liu F, Guo S (2020) Survival prediction in patients with colon adenocarcinoma via multiomics data integration using a deep learning algorithm. Biosci Rep 40(12) 8. Takahashi S, Asada K, Takasawa K, Shimoyama R, Sakai A, Bolatkan A et al (2020) Predicting deep learning based multi-omics parallel integration survival subtypes in lung cancer using reverse phase protein array data. Biomolecules 10(10):1460 9. Guo L-Y, Wu A-H, Wang Y-X, Zhang L-P, Chai H, Liang X-F (2020) Deep learning-based ovarian cancer subtypes identification using multi-omics data. BioData Mining 13(1):1–12 10. Lee T-Y, Huang K-Y et al (2020) Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput Biol Chem 87:107277 11. Jun Y, Xiaoliu W, Lv M, Zhang Y et al (2020) A model for predicting prognosis in patients with esophageal squamous cell carcinoma based on joint representation learning. Oncol Lett 20(6):1–1 12. Tong L, Wu H, Wang MD (2021) Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer. Methods 189(2021):74– 85 13. Zhang T, Zhang L, Payne PRO, Li F (2021) Synergistic drug combination prediction by integrating multiomics data in deep learning models. In: Translational Bioinformatics for Therapeutic Development. Springer, pp 223–238 14. Bazaga A, Leggate D, Weisser H (2020) Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology. Sci Rep 10(1):1–10 15. Webber CL Jr, Zbilut JP (1994) Dynamical assessment of physiological systems and states using recurrence plot strategies. J Appl Physiol 16. Marwan N, Thiel M, Nowaczyk NR (2002) Cross recurrence plot based synchronization of time series, Nonlin Process Geophys 9:325-331. https://doi.org/10.5194/npg-9-325-2002
Cancer Classification from High-Dimensional Multi-omics Data …
509
17. Afonso LCS, Rosa GH, Pereira CR et al (2019) A recurrence plot-based approach for Parkinson’s disease identification. Future Gener Comput Syst 94 18. Ioana C, Digulescu A, Serbanescu A, Candel I, Birleanu FM (2014) Recent advances in nonstationary signal processing based on the concept of recurrence plot analysis. In: Translational recurrences. Springer proceedings in mathematics and statistics, vol 103. Springer, Cham 19. Pajares G, de la Cruz JM (2004) A wavelet-based image fusion tutorial. Pattern Recogn 37(9) 20. Amolins K, Zhang Y, Dare P (2007) Wavelet based image fusion techniques—an introduction, review and comparison. ISPRS J Photogram Remote Sens 62(4) 21. Heil CE, Walnut DF (1989) Continuous and discrete wavelet transforms christopher. Walnut SIAM Rev 31(4):628–666 22. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 International conference on engineering and technology (ICET) 23. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. In: IEEE transactions on neural networks and learning systems. https://doi.org/10.1109/TNNLS.2021.3084827 24. Dixit S, Velaskar A, Munavalli N, Waingankar A (2021) Text recognition using convolutional neural network for visually impaired people. In: Sharma H, Saraswat M, Kumar S, Bansal JC (eds) Intelligent learning for computer vision (CIS 2020). Lecture notes on data engineering and communications technologies, vol 61. Springer, Singapore 25. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90 26. Bhati GS, Garg AR (2021) Handwritten Devanagari character recognition using CNN with transfer learning. In: Sharma H, Saraswat Yadav A, Kim JH, Bansal JC (eds) Congress on intelligent systems (CIS 2020). Advances in intelligent systems and computing, vol 1335. Springer, Singapore 27. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308 28. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, pp 1800–1807 29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556 30. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI’17). AAAI Press, pp 4278–4284 31. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV). pp 618–626. https://doi.org/10.1109/ICCV.2017.74
Predicting Users’ Eat-Out Preference from Big5 Personality Traits Md. Saddam Hossain Mukta, Akib Zaman, Md. Adnanul Islam, and Bayzid Ashik Hossain
Abstract Social Networking Sites (SNS) such as Facebook and Twitter have become important place for sharing one’s views, belief, and ideas and communicating with family members and friends. These virtual places can capture a wide range of details of every user which may represent his behavioral traits such as user’s preferences in daily life. In this study, we build a machine learning (ML) model to predict a user’s eat-out preference from her Big5 personality traits derived from tweets. To this end, we collect users’ check-ins from a location-aware social network, Foursquare. Later, we build a ML-based model based on content of user’s tweets and check-ins from the Foursquare which allows us to predict user’s eat-out preferences in various types of restaurants from his personality traits. We conduct an experiment with a total of 731 Twitter and Foursquare users, and the result shows that user’s Big5 personality traits have strong association with their eat-out preference. Our model achieves an average AUC score of 84% for all categories of restaurants. Keywords Big5 · Twitter · Eat-out · Foursquare · Regression
1 Introduction Today, Twitter has turned into a significant online communication tool. Scientists can determine human behavior, such as personality [12] and preferences [6] of related users, based on the textual information of these interactions, i.e., tweets. Additionally, Md. S. H. Mukta (B) · A. Zaman United International University (UIU), Dhaka, Bangladesh e-mail: [email protected] A. Zaman e-mail: [email protected] Md. A. Islam Monash University, Melbourne, Australia e-mail: [email protected] B. A. Hossain Charles Sturt University, Bathurst, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_37
511
512
Md. S. H. Mukta et al.
location-aware social networking services like Foursquare have grown tremendously in popularity recently. Users of these websites exchange their check-ins, which reveal crucial details about the places they have visited and their preferences. We derive users’ demographic data by fusing information from several social networking sites. In light of this, it is now possible to gather new forms of data on user preferences and behaviors by merging user interactions from several social networking sites. Depending on their hobbies, habits, or other socioeconomic variables, customers may choose to eat at restaurants that fall into the cheap, mid-range, expensive, or very costly categories. For instance, a person might go to an expensive or veryexpensive restaurant because of the appealing environment and decoration, the tasty cuisine, and the attractive way the food is presented. However, because they serve food quickly, cheap eateries may be visited. Similar to this, some people could favor mid-range restaurants since they offer superior food, ambiance, and décor to lowquality restaurants while not being as pricey as the luxury restaurants. According to a study [20], psychological traits like personality affect users’ decisions about various lifestyle activities. In light of above observation, we find out how users’ Big5 personality traits have impact on their eat-out pattern in real life. In our study, we first conduct a study that build a strong association between users’ Big5 personality traits and eatout pattern derived from Twitter and Foursquare interactions. Several research find correlation between food habits and eating styles [16, 20] from shared images and users’ demographics of social media interactions. To the best of our knowledge, no prior studies establish connection between users’ personality traits and their eat-out pattern derived from their social media interactions. An extended abstract of this work has published in [21]. We predict users’ eat-out patterns where input is their word usage pattern in tweets. In this study, we have made substantial improvement over of first version and showed a detailed study of Big5 personality traits left impact on a user’s patterns of eat-out. First, we collect tweets of 731 Twitter users who post Foursquare url in their tweets. We consider a total of 72,662 Foursquare urls from these above tweets where users’ share restaurant information which is categorized into four types depending on the food price. We find 23,986, 36,187, 10,335, and 2154 links for cheap, moderate, expensive and very expensive categories of restaurants, respectively, from these tweets. Later, we compute the frequency of visits in different types of restaurants for each user and consider these frequency as the ground truth data. Then, we obtain users’ Big5 personality [19] traits by feeding these tweets to IBM Watson personality insights API1 and linguistic feature vectors using pretrained Bidirectional Encoder Representations from Transformers (BERT) to create personality trait and linguistic dataset. Then, we find the Pearson correlation between personality traits with the users’ frequency of their visits to different categories of restaurants. Then, we develop two multivariate Bi-LSTM regression models, namely Big5 Regression Model (BRM) and Linguistic Regression Model (LRM) using the personality trait and linguistic dataset, respectively. BRM obtains 34.5% (Moderate category) and 25.1% (Expensive category) 1
https://personality-insights-livedemo.mybluemix.net/.
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
513
R 2 strength for the highest- and the lowest-performing models, respectively, in the test dataset outperforming the score of LRM. To measure the strength of the best performing BRM, we further evaluate the model with different binary classifiers by measuring the Area under the Receiver Operating Characteristic (AUC-ROC) curve. It obtains an average AUC-ROC of 83.25% with a maximum score of 93.1% (Moderate category) and a minimum score of 73.2% (Expensive category). In summary, we have the following contributions: • We are the first to integrate the data of Twitter and Foursquare to predict users’ eat-out pattern from their Big5 personality. • We obtain a strong relationship between the restaurant categories and personality traits. • We develop a Bi-LSTM-based regression model to predict the price-based eat-out preferences of the users from their Twitter data. • We demonstrate a comparative performance analysis of Linguistic and Personality features to predict eat-out preference. Predicting the eat-out pattern of a user has numerous applications. For example, by knowing the eat-out preference of Twitter users, restaurant owners can lunch personalized advertisements. A food chain service provider can take decision for his new location of business after investigating the eat-out patterns of the users of the locality. Moreover, it is also possible to assume an economic profile of a region by using the application.
2 Related Work Numerous research on various cognitive and human characteristics, like personality [12], values [17], and sentiment analysis [9, 18, 23], has been carried out using just one social networking site, such Facebook and Twitter. From individuals’ publicly accessible tweets on Twitter, Kumar et al. [12] predict Big5 personality scores. Study [17] identifies a user’s shift in value orientations based on her Facebook word use patterns. Furthermore, only a small number of research [3, 22] look at a person’s political affiliation based on their tweets and retweets. Moreno-Sandoval et al. [15] show an insight from Twitter data about behavior of consumers, by linking the food-related content, emojis, and respective demographics. An experiment regarding gender-specific food consumption behavior is conducted by Wagner et al. [16]. They examine a dataset of 15 million flicker images and arrive at plausible conclusions. According to another study [11], individuals’ preferences for certain movie genres may be predicted based on the Big5 personality characteristics extrapolated from their tweets. Researchers may also guess a person’s age by the language they use in their tweets. According to [1], a person’s tweets can predict their socioeconomic class, stage of life, and exact age. Additionally, authors use social media sites like Twitter to gather personal information about their readers, such as name [2], gender [25, 27], and education [28].
514
Md. S. H. Mukta et al.
Similar to Facebook, Foursquare is a social networking service that enables users to share their positions with friends through check-ins. By analyzing the data of Foursquare, it is possible to learn a lot of fascinating things about how people move around during the day and what they do. A number of research, like [29], examine the geographical characteristics of information exchanged via location-based systems like Foursquare. Combining several social networking sites, such as Twitter and Foursquare, to find intriguing human behavioral and psychological traits is an emerging study area. The study mentioned in [26] looks at the trend of users who reside close by using Twitter food-related talk. Numerous research demonstrate the correlation between personality factors and food intake preferences and behavior. According to Bartkiene et al. [4], the topic of what and why we eat is complicated. Numerous aspects, such as societal, dietary, biological, and psychological issues, are involved in this problem. The author also argued that people’s choices of foods are influenced by the characteristics of those foods, such as flavor, look, texture, and color, as well as psychological aspects including attitude, mood, and conduct. According to Pfeiler et al. [20], eating habits and food preferences are both directly and indirectly related to their personalities. In a study involving 224 female students, Heaven et al. [8] demonstrated that neuroticism and conscientiousness are directly related to an individual’s eating habits. To the best of our knowledge, no previous research have looked at whether the Big5 personality traits are directly related to users’ eating habits as shown through their use of social media.
3 Methodology In this study, we predict users’ eat-out preference score in restaurants with different price range from their tweets. Figure 1 illustrates the research framework with following steps. Firstly, we collect the users’ eat-out preferences in four categories (Cheap, Moderate, Expensive and Very Expensive) and calculate the category-based relative frequency from the users’ tweets. Secondly, we extract two types of features from the users’ tweet: (a) linguistic feature vectors using BERT and (b) Big5 personality traits using IBM Personality API. Finally, we develop two Bi-LSTM regression models to compare the performance of the models and select the best performing model.
3.1 Data Collection One of the most difficult aspects in our work is extracting information about consumers’ psychological characteristics and how frequently they visit a particular type of restaurant because there is no single source from which we can get all of this data. Therefore, we combine the Twitter and Foursquare datasets in order to gather psy-
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
515
Fig. 1 Research framework of our personality trait-based eat-out prediction
chological characteristics and frequency of visiting a certain type of restaurant. We gather information from 731 Twitter users to develop our regression model, which forecasts how frequently people will visit certain categories, or their preference for eating out. Then, in order to verify that our prediction model correctly predicts frequency of visits to various restaurant categories in actual life, we additionally gather data from 220 Twitter users. Since not all Twitter users include these links in their tweets, we start by looking for people that do. Because we identify Twitter users who are active on Twitter and who post Foursquare check-ins through tweets, we apply the judgemental sampling approach [13]. To locate these persons whose tweets contain connections to Foursquare, we thus employ the Twitter advance search approach. When a user utilizes Foursquare links, her tweets’ check-ins typically include keywords like “4sq” and “Foursquare.” Because examining a single language for all users is likely to provide consistent linguistic traits and personality ratings, we exclusively search tweets that contain English terms. To ensure appropriate English language competency that shows in their tweets, we choose users who reside in several US states, including “California”, “Texas”, “Florida”, and “Virginia,” among others. We see that the Foursquare connections for these states include the prices for various restaurant kinds that are currently on the market. After choosing the Twitter handles of individuals who often tweet using Foursquare links, we utilize the Python implementation package tweepy2 to gather those users’ tweets. We discover 656,101 tweets in total from these 731 people. The greatest, least, and average number of tweets per user are 3210, 189, and 897.54, respectively. We tallied the number of Foursquare links in each user file including recent tweets. We find a 2
http://www.tweepy.org.
516
Md. S. H. Mukta et al.
total of 72,662 Foursquare connections for various eateries within the tweets. Each link on the Foursquare website leads to a different web page. In certain instances, the location’s type and other details are displayed right on the page. If not, visiting the location page, which includes the restaurant’s prices, may be accessed by clicking the name of the place. To learn more about the places that the Foursquare links connect to, we do HTML parsing. Not all links are linked to eateries or other services in the food industry. As a result, we ignore any links that have nothing to do with restaurants or other such establishments. When a link points to a restaurant, Foursquare often classifies the eateries according to the cost of the food they provide. Based on pricing, we categorize the restaurant links as cheap, moderate, expensive, and extremely costly. If the link is to a restaurant and is listed in Foursquare’s restaurant category, we can locate a dollar symbol ($) on the Foursquare link page. The dollar marks one ($), two ($$), three ($$$), and four ($$$$) denote, respectively, low, moderately priced, expensive, and very costly categories of restaurants. Users with fewer than 50 restaurant-related Foursquare check-ins are disqualified. Then, using the equation indicated in Eq. 1, we determine the relative frequency of user visits to a certain restaurant type and utilize that information as the ground truth data. The percentages of links connected to low, moderate, expensive, and extremely costly restaurant categories that we identify overall are 33.01%, 48.80%, 14.22%, and 2.96%, respectively.
3.2 Feature Extraction To forecast consumers’ preferences for eating out, we extract the relevant elements in this section from both language and personality perspectives. Since both of our independent variables (Personality scores) and dependent variable (frequency of visits to various restaurant categories) are continuous values, we also apply Pearson’s correlation (ρ) analysis to find a meaningful correlation between these two variables in the case of Big5 personality traits. Linguistic feature extraction using BERT: Context-based vector representations of a specific word within a text are known as word embedding. The method can identify a word’s relationship to other words, as well as its semantic and syntactic similarity, in a document. For the purpose of creating word embeddings, we use a pretrained BERT model. By leveraging data from the full tweet, the BERT embedding layer creates token level representations. The input features are organized as follows: M0 = {m 1 , . . . , m N }, where m n (n ∈ [1, N ]) is the combination of the token, position, and segment embedding corresponding to the input token xn . The representations, M t = m t1 , . . . , m tK at the kth transformer layer ((0 ≤ k ≤ K )), can be shown in accordance with the following equation: M k = Transformert (M k−1 )
(1)
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
517
. The contextualized representations of the input tokens are thus represented by M k . The regression block receiving the output from BERT in the form of contextualized representations M K is supplied as an input as follows: M K = m 1K , . . . , m KN ∈ M N ×dimm
(2)
Personality feature extraction using IBM Watson API: Big5 model is one of the well-studied topics in the personality research [10]. Big5 model has five personality traits, namely Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Big5 personality model [10] is one of the popular models for personality research. Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism are the five personality attributes that make up the Big5 model. Using the IBM Watson API, we extract the scores of the abovementioned five traits by involving the collected tweets. The scores varies between users based on their writings, because Pennebaker et al. [7] observe what people say and write, actually revealing their behavior and personality. It may be quite challenging to pinpoint which personality qualities specifically impact a person’s decision to dine out at which category of restaurants, despite the fact that one may believe they do. Since both of our independent (personality traits) and dependent (visiting frequency of various categories of restaurants) variables are continuous, we also use Pearson’s correlation coefficient to determine the relationship between users’ Big5 personality traits and visiting frequency of various restaurant variables. Table 1 display the pearson correlation between personality characteristics and several restaurant category types where N = 731 and p < 0.10 critical value. The chart clearly shows that various personality types are associated with particular restaurant categories.
Table 1 Pearson’s correlations between Big5 personality traits and visiting frequencies of different categories of restaurants Chp. Mod. Exp. V. Exp. Openn. Conscit. Extrav. Agree. Neuro.
− 0.087 − 0.109** − 0.060 − 0.109** − 0.016
* p < 0.05, ** p < 0.10
0.104** 0.020 − 0.007 0.012 0.018
0.003 0.098* 0.075* 0.107** 0.031
− 0.034 0.075* 0.055 0.080* 0.039
518
Md. S. H. Mukta et al.
4 Development and Comparison of the Models In this section, we develop two multivariate Bi-LSTM regression models as shown in Fig. 1 named Big5 Regression Model (BRM) and Linguistic Regression Model (LRM). In both of the developed regression models, the visiting frequencies of the restaurant categories are the dependent variables. Big5 personality traits and linguistic feature vectors extracted from the BERT are the independent variables in BRM and LRM, respectively. Architecture of the models: We develop a Bi-LSTM Regression model architecture to utilize both the extracted features. At first, extracted features are fed to a Bi-LSTM layer having 64 neurons, and the return sequence is set as True. We feed the output of this layer to a new Bi-LSTM layer having 32 neurons. Additionally, we add a dropout of 50% in both of the Bi-LSTM layers to avoid overfitting during training neural networks. Then the output is fed to three fully connected layers with 256, 128, and 4 neurons, respectively. We use mean square error for the loss function and RMSprop with a learning rate = 0.0005 and ρ = 0.85 as an optimizer. We use tanh for hidden layers and linear for the output layer as the activation function. We split the dataset into train (80%) and test(20%) and utilize the train dataset on the abovementioned model architecture using both the linguistic feature vectors and Big5 personality traits to get the trained LRM and BRM, respectively. Table 2 presents the performance of the trained regression models. We consider R 2 of the regression models as the performance parameter and measure the scores in the case of the train and test dataset. Results: The obtained scores from the developed BRM and LRM are satisfactory across all four restaurant categories based on price categories. On average R 2 , BRM (Big5 Personality model) attains a score of 34.55% and 30.93% in train and test datasets, respectively. On the other hand, LRM (BERT linguistic model) attains a score of 29.37% and 24.62% in train and test datasets, respectively. Figure 2 highlights the comparison of performance between BRM and LRM in case of test dataset where BRM significantly outperforms the LRM. On the other hand, the moderate ($$) category has the highest R 2 score (34.5%) compared with the other categories and the expensive ($$$) category has the lowest-R 2 score(25.1%). We also perform the prediction potential using supervised multi-class machine learning classification methods, inspired by the work of Sumner et al.[24]. We determine the median of the data and categorize restaurants into high-class and low-class categories based on the frequency of visits above and below the median. The best
Table 2 Performance of the developed Model (R 2 score) in train and test dataset Models Train dataset Test dataset Chp. Mod. Expen. V. Exp. Avg Chp. Mod. Expen. V. Exp. Avg BRM LRM
34.3 31.8
39.8 33.4
27 23.1
37.12 29.17
34.55 29.37
31 25.8
34.5 29.4
25.1 19.1
33.12 24.17
30.93 24.62
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
519
Fig. 2 Comparison in performance between BRM and LRM (test dataset)
classifier, together with its TPR, FPR, and AUC, is presented in Table 3 for determining the frequency of visits to each restaurant category [5]. As a starting point, we employ the ZeroR classifier. Our baseline classifier has an average AUC score of 0.793. Using our classifiers, we determine the lowest AUC score (0.732) for the Expensive category of restaurants and the highest AUC score (0.931) for the Moderate category of restaurants. Using our classifiers, we additionally discover AUC values of 0.864 and 0.803 for the cheap and very-expensive categories of restaurants, respectively. The top classifier consistently outperforms the baseline average for every category (Fig. 3).
5 Discussion In this study, we develop an eat-out preference prediction system by extracting the Big5 personality traits from the users’ data collected from Twitter. To the best of our knowledge, this research is the first study to predict users’ eat-out preference from their Big5 personality traits, derived from social media usage. The outcomes of our study demonstrate a few logical conclusions from Table 1. We discover that the majority of personality characteristics are connected to consumers’ out-to-eat habit. We find a substantial correlation between openness personality traits and moderate categories of restaurants. People who score well on openness are more likely to frequent moderately priced restaurants rather than costly and extremely expensive ones since they are more inclined to try new foods there. Strongly inverse relationships exist between the characteristic of high conscientiousness and the price range of restaurants.
520
Md. S. H. Mukta et al.
Table 3 Best performing classifier to predict different restaurant categories from Big5 personality traits Resturnt. types Best AUC AUC TPR FPR obtaining classfr. Cheap. Moder. Exp. V. Exp.
N. Bayes Rep Tree Rep Tree N. Bayes
0.86 0.93 0.73 0.80
0.81 0.91 0.63 0.72
0.092 0.046 0.18 0.13
Fig. 3 Comparison of the AUC-ROC of the best performing classifiers
Contrarily, attending costly and extremely expensive categories of restaurants is positively connected with the personality attribute of conscientiousness. For instance, it is likely that customers who care about their friends and family members are more inclined to visit these restaurants because of the improved hygienic conditions, décor, and food quality (including flavor, odor, and texture). We discover a link between the extraversion type and the pricey restaurant category. Additionally, we discover a high correlation between people’s willingness to get along and how frequently they frequent costly and extremely expensive restaurant categories. We notice a substantial inverse relationship between the agreeable personality traits and the affordable restaurant category. Similar to the contentiousness personality characteristic, agreeable individuals are worried about their friends’ and family’s health and hygiene while dining out, hence they frequently choose costly and extremely expensive restaurants. They tend not to eat at restaurants in the inexpensive category. On the other hand, we discover no link between having neurotic personality characteristics and going to any particular kind of restaurants. This study [8] claimed that conscientiousness and neurotic personality characteristics have a significant impact on eating behavior. However,
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
521
because neurotic persons might not be interested in sharing their Foursquare checkins about eating out, we may not have been able to establish any correlation between neuroticism personality characteristics and eating out activity. In our study, we also discovered less check-ins of neurotic people to predict eat-out choice. This is because another study [14] from Facebook demonstrates that neurotic people inclined to share less information with their friends. We note that compared to other restaurant categories, the forecast for expensive restaurants has a lower potential. We only have a tiny number of examples in our training dataset to forecast the really costly restaurant categories. As a result, we only get weak forecast accuracy for restaurants in the highly costly price range. We note that the size of the datasets is between 250 and 300 in a recent well-cited work [12] connected to psycholinguistic research from social media. As a result, the size of our dataset (N = 731) is sufficient for employing psychological traits to forecast consumers’ preferences for eating out.
6 Conclusion In this study, we extrapolated individuals’ personality attributes from their tweets to predict their preferences for eating out. In order to determine how personality factors impact consumers’ choices for eating out in the real world, we took use of the data fusion of Twitter and Foursquare. By calculating correlations between them, we have shown which sorts of personality characteristics are better at predicting which types of restaurant categories. Next, we developed a model to forecast users’ restaurant consumption based on their Big5 personality features. The key benefit of our method is that we can accurately forecast someone’s preference for eating out even when they do not include Foursquare check-ins in their tweets.
References 1. Aletras N, Chamberlain BP (2018) Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on hypertext and social media, pp 20–24 2. Álvarez-Carmona MÁ, Villatoro-Tello E, Villaseñor-Pineda L, Montes-y Gómez M (2022) Classifying the social media author profile through a multimodal representation. In: Intelligent technologies: concepts, applications, and future directions. Springer, pp 57–81 3. Ansari MZ, Aziz M, Siddiqui M, Singh K (2020) Analysis of political sentiment orientations on twitter. Procedia Comput Sci 167:1821–1828 4. Bartkiene E et al (2019) Factors affecting consumer food preferences: food taste and depressionbased evoked emotional expressions with the use of face reading technology. BioMed Res Int 2019 5. Cantarino I, Carrion MA, Goerlich F, Martinez Ibañez V (2019) A roc analysis-based classification method for landslide susceptibility maps. Landslides 16(2):265–282
522
Md. S. H. Mukta et al.
6. Cardaioli M, Kaliyar P, Capuozzo P, Conti M, Sartori G, Monaro M (2020) Predicting twitter users’ political orientation: an application to the Italian political scenario. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 159–165 7. Chung CK, Pennebaker JW (2018) What do we know when we liwc a person? Text analysis as an assessment tool for traits, personal concerns and life stories. In: The Sage handbook of personality and individual differences, pp 341–360 8. Golestanbagh N, Miraghajani M, Amani R, Symonds ME, Neamatpour S, Haghighizadeh MH (2021) Association of personality traits with dietary habits and food/taste preferences. Int J Prev Med 12(1):92 9. Islam MN, Khan NI, Roy A, Rahman MM, Mukta SH, Islam AN (2021) Sentiment analysis of Bangladesh-specific covid-19 tweets using deep neural network. In: 2021 62nd International scientific conference on information technology and management science of Riga technical university (ITMS). IEEE, pp 1–6 10. John OP (2021) History, measurement, and conceptual elaboration of the big-five trait taxonomy: the paradigm matures 11. Khan EM, Mukta MSH, Ali ME, Mahmud J (2020) Predicting users’ movie preference and rating behavior from personality and values. ACM Trans Interact Intell Syst (TiiS) 10(3):1–25 12. Kumar KP, Gavrilova ML (2019) Personality traits classification on Twitter. In: 2019 16th IEEE AVSS. IEEE, pp 1–8 13. Marshall MN (1996) Sampling for qualitative research. Fam Pract 13(6):522–526 14. Misirlis N, Lekakos G, Vlachopoulou M (2018) Associating facebook measurable activities with personality traits: a fuzzy sets approach. J Tourism Heritage Serv Mark 4(2):10–16 15. Moreno-Sandoval LG, Sánchez-Barriga C, Buitrago KE, Pomares-Quimbaya A, Garcia JC (2018) Spanish twitter data used as a source of information about consumer food choice. In: International cross-domain conference for machine learning and knowledge extraction. Springer, pp 134–146 16. Mostafa MM (2018) Mining and mapping halal food consumers: a geo-located twitter opinion polarity analysis. J Prod Mark 24(7):858–879 17. Mukta MSH, Ali ME, Mahmud J (2019) Temporal modeling of basic human values from social network usage. J Assoc Inf Sci Technol 70(2):151–163 18. Mukta MSH, Islam MA, Khan FA, Hossain A, Razik S, Hossain S, Mahmud J (2021) A comprehensive guideline for Bengali sentiment annotation. ACM Trans. Asian Low-Resour Lang Inf Process 21(2) 19. Oshio A, Taku K, Hirano M, Saeed G (2018) Resilience and big five personality traits: a meta-analysis. Personality Individ Differ 127:54–60 20. Pfeiler TM, Egloff B (2020) Personality and eating habits revisited: Associations between the big five, food choices, and body mass index in a representative Australian sample. Appetite 149:104607 21. Rahman MM, Majumder MTH, Mukta MSH, Ali ME, Mahmud J (2016) Can we predict eatout preference of a person from tweets? In: Proceedings of the 8th ACM conference on web science. ACM, pp 350–351 22. Sharma K, Ferrara E, Liu Y (2022) Characterizing online engagement with disinformation and conspiracies in the 2020 us presidential election. In: ICWSM, vol 16, pp 908–919 23. Singal A, Thiruthuvanathan MM (2022) Twitter sentiment analysis based on neural network techniques. In: Congress on intelligent systems. Springer, pp 33–48 24. Sumner C, Byers A, Boochever R, Park GJ (2012) Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In: ICMLA, vol 2. IEEE, pp 386–393 25. Vashisth P, Meehan K (2020) Gender classification using twitter text data. In: 2020 31st Irish signals and systems conference (ISSC). IEEE, pp 1–6 26. Vydiswaran VV, Romero DM, Zhao X, Yu D, Gomez-Lopez I, Lu JX, Iott BE, Baylin A, Jansen EC, Clarke P et al (2020) Uncovering the relationship between food-related discussion on twitter and neighborhood characteristics. J Am Med Inf Assoc 27(2):254–264
Predicting Users’ Eat-Out Preference from Big5 Personality Traits
523
27. Wang Z, Hale S, Adelani DI, Grabowicz P, Hartman T, Flöck F, Jurgens D (2019) Demographic inference and representative population estimates from multilingual social media data. In: The world wide web conference, pp 2056–2067 28. Xing W, Gao F (2018) Exploring the relationship between online discourse and commitment in twitter professional learning communities. Comput Educ 126:388–398 29. Yang D, Qu B, Yang J, Cudre-Mauroux P (2019) Revisiting user mobility and social relationships in lbsns: a hypergraph embedding approach. In: The world wide web conference, pp 2147–2157
Smart Accident Fatality Reduction (SAFR) System Daniel Bennett Joseph , K. Sivasankaran , P. R. Venkat , Srirangan Kannan , V. A. Siddeshwar , D. Vinodha , and A. Balasubramanian
Abstract Injuries and deaths resulting from road accidents are a growing public health problem in India. Road crash deaths have increased by 31% from 2007 to 2017. Surveys show that a 10 min reduction of the medical response time can be statistically associated with an average decrease in the probability of death by onethird, both on motorways and conventional roads. We propose an accident mitigation system that can be used in vehicles post-accident to reduce the death rate due to road accidents. The proposed framework is a hybrid decision algorithm that enables the system to take accurate and precise decisions to save lives that matter. We use two conditions to decide if the accident victim needs medical help or is safe. This valuable accident information is sent to the hospitals along with the GPS location to enable hospitals in a 5-km radius around the accident zone to see the information and respond to the accidents immediately. This system will reduce the response time by at least 10 min, thereby reducing the death rate by one-third of the present death rate. Keywords Intelligent accident system · Accident intimation · Hybrid decision
1 Introduction Injuries and deaths resulting from road accidents are a growing public health problem in India. Nearly 2600 people get killed and 9000 get injured due to traffic accidents. The National Crime Records Bureau (NCRB) 2016 report [5] states that 464,674 collisions caused 148,707 traffic-related deaths in India. The number of road crash deaths has increased by 31% from 2007 to 2017. Another survey conducted by the Spanish public authority [6] concluded that a 10 min reduction of the D. B. Joseph (B) · K. Sivasankaran · A. Balasubramanian Department of Automobile Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India e-mail: [email protected] P. R. Venkat · S. Kannan · V. A. Siddeshwar · D. Vinodha Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_38
525
526
D. B. Joseph et al.
medical response time can be statistically associated with an average decrease in the probability of death by one-third, both on motorways and conventional roads. These data from various countries and surveys show that accidents are likely to happen on roads due to various reasons and thereby cannot be fully avoided. But what can be avoided is the death caused by road accidents. The road accidents that lead to a high death rate are due to delayed medical attention as a result of insufficient accident data, hospitals unable to prioritize the accidents in need of immediate medical help, improper communication of accidents and inadequate mitigation steps. Current systems in vehicles use devices such as airbags, seat belts and crumple zones to prevent death in an accident. These devices’ efficiency has been improved over the years but still are not foolproof or have the ability to save the passenger always. These systems have improved vehicle safety a lot but need to be incorporated with post-accident systems that enable a quick emergency response to reduce the death that might be imminent in an accident. This paper aims to propose a framework that uses Raspberry Pi and GSM-GPRS module with cloud database integration for an accident mitigation system that can be used post-accident in vehicles to reduce the death rate due to road accidents. The framework proposed uses a hybrid decision algorithm that enables the system to take accurate and precise decisions that leads to saving lives that matter. The hybrid decision approach called human–machine cooperative decision (HMCD) does not solely rely on the machine to take all the decisions but involves and regards the human input as a crucial priority in the decision-making process. The HMCD process shows more accuracy in decisions which is essential as the system is in place to prevent death due to accidents. The system sends the HMID algorithm output and blood detection algorithm output to the hospitals in a 5-km range along with the accident site’s location coordinates to make sure immediate action is taken and the response time for the hospitals is reduced by at least 10 min. The contributions of this paper can be summarized as below: • As far as the authors’ knowledge, this is one of the pioneer studies which explores the possibilities of reducing death after accidents in road vehicles. • HMCD is utilized to address the human–machine interaction problem. It has been proven to be an efficient method, which enables the accident mitigation system to achieve better results. • The performance of BloodFilter-SSD is compared with other benchmark approaches on the same dataset. Results suggest that the BloodFilter-SSD outperforms the others with various evaluation metrics, i.e., accuracy, training loss and validation loss. • Our system as per the surveys previously conducted will ensure that the death rate will reduce by one-third under ideal conditions.
Smart Accident Fatality Reduction (SAFR) System
527
2 Related Work This section discusses solutions related to existing technologies and examines their advantages and disadvantages. Parveen et al. [1] propose an IoT-based automatic vehicle accident alert system. This system uses an IR sensor to detect the accident, and GSM and GPS are the devices that send SMS and location to the users. In [1], the authors have used Arduino Uno for performing accident detections in the vehicle and sent the data using a GSM module. Their system is an adequate solution for accident detection but does not give any information on the accident and the state of the victim. Durga Devi et al. [2] use a system that will keep track of the eye and head moments of the driver using an eye blink sensor and camera using the Voila–Jones algorithm to find if the driver is distracted to prevent an accident. Their approach includes sending the GPS location to emergency contacts for helping the accident victim. Their system uses a much more sophisticated ARM controller but has the same drawback of not sensing the victim’s state after the accident as they concentrate on only sending information to the emergency contacts. Sadaphal [3] uses a different approach to intimate the accident to the emergency responders. They use an android application to allow emergency responders to get to the accident site by sending the GPS location. This approach is much more accessible than the other approaches as it includes an android application for notifying the people registered in the app along with the map feature for directions to the accident site. Watthanawisuth et al. [4] use an accelerometer to detect the accident and use the concept of having a black box in a road vehicle to be able to send the location of the vehicle to alert family members to take emergency actions. This system is much more reliable as the black box is safe even upon heavy impact and can send the information to the required target precisely. The black box also enables review of the accident history upon inspection by police or other authorities. Although the system gives accident data, it does not give the passenger’s data and their status upon accident. Fernandes et al. [10] use eCall for alerting the hospitals of accidents along with an android application that uses onboard accelerometer and gyroscope for accident detection. The system uses OBD-II for getting airbag and velocity information from the vehicle for accident detection. The system proposed is complete with the only drawback of using call service to intimate hospitals as it might impact the response time, The reviewed techniques and methodologies use IoT and android applications to send the accident alert to family, emergency responders, ambulance and police. But these systems do not send the data on whether the victim of the accident needs medical help or if the person is safe which can be used to take necessary actions. These systems do not incorporate the involvement of the hospitals for a quick emergency response to accidents.
528
D. B. Joseph et al.
3 Methodology 3.1 Introduction Accidents are imminent in vehicles. Therefore, stopping them and preventing them is a task that cannot achieve consummation. The alternate approach is to prevent death due to accidents rather than preventing the accident itself. To check if the accident occurred or not, the Raspberry Pi is connected to the onboard diagnostics (OBD) port of the vehicle. The vehicle velocity and airbag information is taken from the OBD port to check for abrupt drop in velocity indicating high deceleration. This condition is checked and used to trigger the system as any accident will lead to abrupt drop in velocity of the vehicle. To reduce the death rate due to road accidents, we propose a system that will make a hybrid decision based on both the machine learning model and human input to identify if the victim of the accident requires medical help or if they are safe. The medical help request decision made by the hybrid decision approach checks for two conditions to finalize the decision. 1. Presence of blood 2. Consciousness Once these conditions are checked using the input from the sensors, the system will use HMCD to decide if the passenger/driver needs medical help or not. The solution is not complete with just identification of the victim’s need for medical help but includes intimation of hospitals to take quick action for the emergency. The range identified for quick emergency response from hospitals is 5 km. The 5-km range enables ambulances to arrive at the accident location well within 10 min of the accident making the solution adequate and highly responsive.
3.2 Accident Detection An accident causes a sudden reduction of speed. The severity of the accident depends on the direction, orientation and speed of both the colliding objects. If the vehicles’ direction and orientation are not the same, then the resulting accident will be more violent than moving in the same direction and orientation. This means that, when the relative velocity between the colliding objects increases, the accident will be more severe. This variation of velocity over time (∂v/∂t) is called acceleration, and since the velocity drops suddenly, it is called deceleration. The deceleration generated during vehicle crash is a vital factor to be considered in accident detection systems. Authors such as Thompson et al. [11] and Kumar et al. [12], in their work on accident detection systems, used the 4 g (g = 9.8 m/s2 ) threshold for detecting accidents. The same threshold is considered in our system as well. Thompson et al. also show that phones fall and harsh car breaks are unlikely to transcend the 4g threshold, which proves that this threshold acts as an accurate condition for false detections.
Smart Accident Fatality Reduction (SAFR) System
529
Fig. 1 Image showing the OBD-II device that is connected in OBD port of the vehicle
In order to get the vehicle velocity data, our system uses the onboard diagnostics (OBD) port present in the vehicle usually used for fault analysis in vehicle service as used by Zaldivar et al. [13] with the difference of OBD directly connected to the controller. OBD is the standard connector that has been mandated in the USA since 1996. OBD-II is the onboard system that is responsible for monitoring the vehicle’s engine, transmission, and emissions control components. Vehicles that comply with the OBD-II standards will have a connector within 2 feet of the steering wheel. SAE International calls the OBD-II as SAE J1962 Diagnostic Connector [9]. The controller connects with the OBD-II using the open-source pyOBD which is designed to interface with low-cost ELM 32x OBD-II (Fig. 1) in real time. From the OBD port, the vehicle velocity data is extracted in real time as input trigger for the accident mitigation system. The system is triggered only if the deceleration is greater than 4 g.
3.3 Sensor Module—Design and Position Design The sensor module as shown in Fig. 2 is a cuboid-shaped stainless steel physical hardware that is mounted between the A pillars of the vehicle as shown in Fig. 3. The module has two side panels fastened by screws. A 360-degree camera is present in the underside of the module for complete view of the passengers. Two LED lights
530
D. B. Joseph et al.
Fig. 2 Image showing the sensor module
Fig. 3 Image showing the position of the module with least impact in the vehicle
are present alongside the camera for lighting the passenger and driver area. This is to ensure that the camera stream is clear and not dark in bad lighting conditions. Since the system is triggered only during an accident, the lights will not be a disturbance for the driver. The sensor module also contains a mic and speaker for getting information from the user to check for consciousness. The module has cooling vents on one side, so the controller does not fail due to overheating. Position For the system to work effectively, the sensors need to be placed in the right place in the vehicle. The correct position in the vehicle will enable a clear view of the passengers and the driver, as shown in Fig. 4, which elevates the performance of the sensor-based algorithms in the system. After experimentation, the ideal sensor position is found to be in the column between the 2 A pillars, as shown in Fig. 3, of the vehicle which is the last place of impact in any type of accident.
3.4 HMCD (Human–Machine Cooperative Decision) Human–machine cooperative decision (HMCD) is a hybrid decision approach to determine if the accident victim needs medical help or rules them to be safe after an accident. To make a cooperative decision, the system uses the camera and microphone as sensors to receive valuable information regarding the accident and the victim.
Smart Accident Fatality Reduction (SAFR) System
531
Fig. 4 Image showing the camera visual in the position that captures the entire face of the driver to ensure detection algorithms work effectively
The signal architecture of the system (Fig. 5) includes getting visual data from the camera and voice responses from the passengers of the vehicle for various questions prompted using a speaker from the physical hardware. The location of the vehicle during the accident is tracked by the GPS module present as part of the system. The system uses the data from the input sensors and GPS location to run the accident mitigation algorithm to send the information collected from the user, and the decision of the algorithm is sent to a common web portal that can be accessed by all hospitals in the specified range of 5 km. Blood Detection The data from the camera mounted inside the vehicle is used to get the visual data of the vehicle passengers after an accident. The visual data of the accident victims is run
532
D. B. Joseph et al.
Fig. 5 Flowchart showing the signal flow in the HMCD process
through a blood detection algorithm that includes a deep learning model. The blood detection DL model used in the system was trained using a dataset containing 1000 images of accident scenarios which showed blood loss in the victim and a normal passenger without any blood. Data Preprocessing The dataset is split into two, train and test data as done for all machine learning models. The datasets have both the images and the label map that contains information about the classification model—Blood/NoBlood. This information is essential for the deep learning model to run. Before the blood detection algorithm runs on the images from the camera, the images are cropped to the face to achieve higher accuracy. Feature Extraction Once the images are cropped to the required size, as shown in Fig. 7a, they are passed to a BloodFilter that extracts the blood feature from the image. To extract the blood feature, we first convert the image from RGB to HSV format. This is done to filter colors from the image based on luminescence. Blood in images has a hue range of 170–180 as shown in Fig. 6.
Fig. 6 Image depicting the relationship between hue, chroma and value in HSV format images
Smart Accident Fatality Reduction (SAFR) System
533
Fig. 7 a Original image from camera. b Image after passing through BloodFilter
Upper HSV value of blood—[180, 255, 255]. Lower HSV value of blood—[170, 50, 50]. Using the HSV values of blood, we can extract the blood feature from the original image using cv2.inRange. The inRange() function returns an array consisting of elements equal to 255 and 0 if the elements of the source array lie between the elements of the two arrays representing the upper and lower bounds. By using this, we can extract blood features from the image (Fig. 7b). We set a threshold value of 35% for best results. Deep Learning Model The filtered images, as shown in Fig. 7b, from the BloodFilter are then passed as input to the configured deep learning model. By doing this, we found that our deep learning SSD model called BloodFilter-SSD gives predictions with higher accuracy over SSD model [14] as the blood features are enhanced. MobileNetV2 architecture is a conventional neural network which is a Keras image classification model, loaded with weights pre-trained on ImageNet as a reference for transfer learning. Since the model is used for binary classification, binary cross-entropy is chosen as the loss function. Additional layers with ‘ReLU’ and ‘SoftMax’ as activation functions are used to improve the accuracy further. The improved accuracy of the BloodFilter-SSD model as shown in Figs. 8 and 9 shows the training and validation loss, training and validation accuracy of the BloodFilter-SSD model.
534
D. B. Joseph et al.
mAP
Fig. 8 Chart showing the accuracy comparison of the SSD model and BloodFilter-SSD model
85 80 75 70 SSD
BloodFilter-SSD
Fig. 9 Chart showing the training and validation loss, training and validation accuracy of the BloodFilter-SSD model
Consciousness Detection The second condition the system checks for to confirm its decision is consciousness. The consciousness of the victim is decided based on both visual data and voice responses from the user. These inputs are converted into a consciousness score that can have a maximum value of 6 and a minimum value of 2. A score of 6 means that the victim is fully conscious, while a score of 2 means that the victim is unconscious. A study on coma-impaired consciousness [7] was carried out in 1974 that gave a clear criterion to scale consciousness. We use this scale as a reference to create the consciousness score used to find if the victim is conscious or unconscious after the accident. As per Table 1, the consciousness score is evaluated by the system. The system takes the visual data from the camera and runs an eye blink detection algorithm [8] proposed by Soukupová et al. We use their work as a reference to count the number of blinks the person does in 1 min. The algorithm uses the eye aspect ratio or EAR, based on the 2D eye landmarks as shown in Fig. 10, to check for the blink count. EAR =
p2 − p6 + p3 − p5 2 p1 − p4
Smart Accident Fatality Reduction (SAFR) System Table 1 Consciousness score
535
Behavior
Response
Score
Eye-opening response
Spontaneous
3
To speech
2
Verbal response
No response
1
Oriented
3
Inappropriate
2
No response
1
P2
Fig. 10 Image showing the 2D eye landmarks
P3
P1
P4 P6
P5
where p1, …, p6 are the 2D landmark locations on the eye as in Fig. 10. This count is used to evaluate the blink rate. Based on the blink rate, the visual response score is evaluated out of 3. The system also takes audio input from the victim to access the verbal response of the victim. This is used to evaluate the verbal response score out of 3. The visual response score and the verbal response scores are added to give the overall consciousness score. The system decides that the victim is unconscious if the consciousness score is less than 4. Otherwise, the system decides that the person is conscious. Medical Help Requirement Evaluation The decision from the blood detection model and the consciousness detection algorithm are used collaboratively to decide if the victim needs medical help or is safe as shown in Fig. 11. The passenger is considered safe if there is no blood and is conscious, while the victim is ruled to need medical help under all other conditions.
536
D. B. Joseph et al.
Fig. 11 Flowchart showing the HMCD decision process to evaluate the medical help requirement for the accident victim
3.5 Intimation System Once the system decides whether the victim needs medical help or is safe, the decision is sent to the hospitals in the 5-km range to take quick emergency actions which include sending an ambulance to the accident zone and informing the police about the accident. To achieve this, the control system shown in Fig. 12 is used. The intimation system has many stages of sending the information before it reaches the hospitals in a refined format that can be used effectively as shown in Fig. 13. This includes the signals transmitted from the GSM/GPRS module which
Smart Accident Fatality Reduction (SAFR) System
537
Fig. 12 Image showing the control system to send the accidental information to the hospitals
Fig. 13 Image showing the flow of information from the vehicle to the hospital for emergency response
reaches the cloud database from which the information is used to run a hospital finder program within the desired range. Finally, the information is displayed in a common portal to all hospitals in the 5-km range to take immediate action. Location Gathering Global Positioning System (GPS) signals are collected using the NEO-6M module. The Raspberry Pi controller collects the coordinates of the accident zone from the GPS module in latitude and longitude format. The wiring and layout of the modules are shown in Fig. 12. Once the location from the GPS module is received, the controller generates a Google Maps link using the coordinates received from the GPS module. Transferring the Information GPRS or General Packet Radio Service is an extension of the GSM Network. GPRS is an integrated part of the GSM Network which provides an efficient way to transfer data with the same resources as the GSM Network. Originally, the data services (like Internet, multimedia messaging, etc.) in the GSM Network used a circuit-switched connection. A GSM/GPRS module is an IC or chip that connects to the GSM Network using a Subscriber Identity Module (SIM) and Radio Waves. The information on the need for medical help, consciousness score and location coordinates are transferred with help of the GSM/GPRS module SIM 800L. Using the GSM/GPRS module, the
538
D. B. Joseph et al.
Fig. 14 Image showing the connection between database, web portal and microcontroller
accident-related information is transferred to a cloud database that stores the data with a unique ID for each recorded accident. Cloud Database Cloud database stores the accident information referenced with a unique ID. The database also has the details of the hospitals registered in the portal designed for emergency response. The database receives the information from the GSM/GPRS module in real time. The cloud database is linked with the portal to enable real-time updating of accidents with reference to their ID as shown in Fig. 14. Hospital Finder Program The hospital finder program runs locally in the microcontroller. The program gets the list of coordinates of all the hospitals in the database using the GSM/GPRS module. The coordinates are associated with the hospital name and a unique ID. The coordinates of the accident are taken and compared with each hospital in the list. This is done using the Haversine distance formula. Haversine distance formula for finding the distance between two points:
Smart Accident Fatality Reduction (SAFR) System
539
a = sin2 (dlat/2) + cos(lat1) ∗ cos(lat2) ∗ sin2 (dlon/2) dlon = lon2 − lon1, dlat = lat2 − lat1 √ Haversine formula (c) = 2 ∗ a sin a Radius of earth in kilometers, r = 6371 Result = (c ∗ r ) km The result is compared with 5 and all hospitals whose distance are below 5 km from the accident zone are listed, and the accident information is sent to those hospitals alone. The flow of information is shown in Fig. 14. Portal The portal is designed to show the accident information to hospitals with reference to the unique ID. Every hospital is registered in the portal with login credentials as shown in Fig. 15 to ensure the accident information is not misused. While registering, the hospital’s location is collected with other credentials like name, email address and password. The portal gives hospitals all the vital information such as the need for medical help, consciousness score and the victim’s response to predefined questions that indicate further clarity on the victim’s state. Once an accident happens, the location and other accident information are updated in the cloud database. This information is then available to hospitals in the portal in real time. The hospitals are to respond to the information in the portal such that there
(a)
(b)
(c)
Fig. 15 a Image showing the hospital registration page in the web portal. b Image showing the hospital login page in the web portal. c Image showing the web portal with accident data
540
D. B. Joseph et al.
are no resources (ambulance) wasted on the same accident. The portal gets updated in real time and will indicate all hospitals in the 5-km range when a hospital responds to the accident.
4 Results 4.1 Post-accident Processing Results The post-accident processing includes the blood detection model and the consciousness detection algorithm that enables the HMCD process to decide if the accident victim needs medical help or is safe. The results of the blood detection and consciousness detection algorithms are shown in Figs. 16, 17 and 18. The BloodFilter-SSD model is used to identify blood in the accident victim and shows higher accuracy over conventional SSD [14] object detection model due to the presence of BloodFilter that extracts blood features in the image.
Fig. 16 Output of the blood detection model with high accuracy
Fig. 17 Output of the consciousness detection model with high accuracy
Smart Accident Fatality Reduction (SAFR) System Fig. 18 Chart showing the accuracy of the BloodFilter-SSD model
541
mAP 85 80 75 70 SSD
BloodFilter-SSD
5 Conclusion The outcome of this research paper is to showcase the possibility of reducing the death rate due to road accidents by at least one-third by reducing the accident response time by 10 min. The proposed accident mitigation system which incorporates the hybrid decision by human–machine cooperation enables higher accuracy in decisions that do not solely rely on visual data but upon data from the accident victim as well. This increases the reliability of the system and the ability of hospitals to trust the decision of the system in the real world. The system will save many lives if the system is effectively implemented in all vehicles thereby increasing the quality of life in the implementation zone.
References 1. Parveen N, Ali A, Ali A (2020) IOT based automatic vehicle accident alert system. In: 2020 IEEE 5th international conference on computing communication and automation (ICCCA), New Delhi, India 2. Durga Devi GY, Sowmya S, Preena D, Nanda PV (2020) Accident alert system. Int J Adv Res Eng Technol (IJARET) 11(7):560–567 3. Sadaphal I (2019) Accident detection and alert system using Android application. Int J Res Appl Sci Eng Technol 7(5):3466–3469 4. Watthanawisuth N, Lomas T, Tuantranont A (2012) Wireless black box using MEMS accelerometer and GPS tracking for accidental monitoring of vehicles. In: Proceedings of 2012 IEEE-EMBS international conference on biomedical and health informatics. IEEE, Hong Kong, China, pp 847–850 5. Accidental deaths & suicides in India 2016. NCRB. https://ncrb.gov.in/sites/default/files/ADSI2016-FULL-REPORT-2016.pdf 6. Sánchez-Mangas R, García-Ferrrer A, de Juan A, Arroyo AM (2010) The probability of death in road traffic accidents. How important is a quick medical response? Accid Anal Prev 42(4):1048– 1056 7. Teasdale GM, Jennett B (1974) Assessment of coma and impaired consciousness. A practical scale. Lancet 2(7872):81–84 8. Soukupová T, Cech J (2016) Real-time eye blink detection using facial landmarks
542
D. B. Joseph et al.
9. Diagnostic Connector J1962_201607. SAE International. https://www.sae.org/standards/con tent/j1962_201607 10. Fernandes B, Alam M, Gomes V, Ferreira J, Oliveira AS (2016) Automatic accident detection with multi-modal alert system implementation for ITS. Veh Commun 3:1–11 11. Thompson C, White J, Dougherty B, Albright A, Schmidt DC (2010) Using smartphones to detect car accidents and provide situational awareness to emergency responders. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, pp 29–42 12. Punetha D, Kumar D, Mehta V (2012) Design and realization of the accelerometer based transportation system (ATS). Int J Comput Appl 49:17–20 13. Zaldivar J, Calafate CM, Cano J, Manzoni P (2011) Providing accident detection in vehicular networks through OBD-II devices and Android-based smartphones. In: 2011 IEEE 36th conference on local computer networks. LCN. IEEE, Bonn, Germany, pp 813–819 14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Lecture notes in computer science, vol 9905. Springer, Cham
Android Malware Detection Against String Encryption Based Obfuscation Dip Bhakta, Mohammad Abu Yousuf, and Md. Sohel Rana
Abstract Android operating system is one of the most prominent operating systems among the mobile device users worldwide. But it is often the most targeted platform for malicious activities. Many researchers have studied android malware detection systems over the previous years. But android malware detection systems face many challenges, and obfuscation is one of them. String encryption is one such obfuscation technique which helps android malwares to evade malware detection systems. To address this challenge in android malware detection systems, a novel approach is being proposed in this study where crypto-detector: An open-source cryptography detection tool has been used in decompiled application code to extract encrypted strings and encryption methods as features. Accuracy of 0.9880 and F1-score of 0.9843 have been achieved during performance evaluation. Importance of newly proposed crypto features has been discussed. Performance of our framework has been compared to those of other similar existing works, and our work has outperformed all of them. Keywords Android · Malware detection · Machine learning · Obfuscation
1 Introduction The active number of mobile device users is growing everyday. Android operating system is immensely popular among these mobile device users, and it is gaining D. Bhakta (B) Bangladesh University of Professionals (BUP), Dhaka, Bangladesh e-mail: [email protected] M. A. Yousuf Jahangirnagar University, Savar, Dhaka, Bangladesh e-mail: [email protected] URL: https://www.juniv.edu/teachers/yousuf Md. S. Rana University of Alabama at Birmingham (UAB), Birmingham, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_39
543
544
D. Bhakta et al.
more popularity day by day. But primarily because of its vast popularity, android users are often targeted by malicious attacks. Nowadays, many crucial information is stored in android platform. Therefore, it is very important to study android malware detection systems. Many studies have been performed on android malware detection systems over the past few years. Most of them have used machine learning or deep learning to detect android malwares [1–3]. But machine learning and deep learning-based android malware detection systems face a challenge which is obfuscation [1–3]. Various obfuscation techniques are applied to evade the android malware detection systems [4]. Obfuscation detection and deobfuscation techniques have also been investigated over the past few years [4, 5]. Obfuscation techniques are applied on android applications for both legitimate and malevolent purposes. These obfuscation techniques can negatively impact the performance of the android malware detection model. Therefore, obfuscation-resilience is very important to improve the performance of an android malware detection system. String encryption, identifier renaming and control flow obfuscation are some popular obfuscation techniques in android applications. String encryption helps android malwares to evade malware detection systems by encrypting malicious API calls and other strings. To address this challenge, a novel approach in feature extraction for android malware detection using machine learning is being proposed in this paper. We have used an open-source cryptography detection tool, crypto-detector [6], which has been developed by Wind River [7]. This tool helps detecting cryptography in the source code. This tool has been used to detect application of cryptography in decompiled source code of android applications. Then, along with features from AndroidManifest.xml, dangerous API calls and bytecodes, we have considered encrypted strings and encryption algorithms as features to the machine learning model to detect android malwares. This has helped us to consider encrypted API calls and encrypted strings as features to our machine learning algorithm. Later, random forest classifier has been applied to evaluate the performance of our approach to detect android malwares using machine learning. A dataset of 21,909 android applications has been used to evaluate our approach. Here, 8197 applications are malware and 13,712 applications are benign in the dataset. Accuracy of 0.9880, ROC AUC of 0.9996 and F1-score of 0.9843 have been achieved. Fivefold cross-validation has been used to validate our results. The features which had the most contribution to the result have been analyzed, and vital presence of our encrypted string features has been found there. Our work has also been compared with other similar works, and our work outperformed all of them. Our key contributions are as follows. • To the best of our knowledge, we are the first to use crypto-detector [6] to extract crypto features in android malware detection systems. • We are addressing the challenge of string encryption in android malware detection systems using machine learning. We have used crypto features along with other features, and we have achieved an excellent result.
Android Malware Detection Against String …
545
• We have analyzed the importance of crypto features in detecting android malwares and are presenting a novel approach to build an obfuscation-resilient android malware detection system. The remainder of this work is organized as follows. In Sect. 2, previous studies similar to our work are summarized. In Sect. 3, the methodology of our work is discussed. In Sect. 4, the features used in our work are defined. In Sect. 5, performance evaluation of our proposed framework is presented, and Sect. 6 concludes the paper.
2 Related Works Chen et al. [5] have applied deobfuscation on the obfuscated API calls extracted from the .smali files from the apk. They used deobfuscated API calls as an essential feature for android malware detection. But they used Deguard [8] which only deobfuscates the applications that are obfuscated by a tool named ProGuard according to them. Our work does not depend on any specific obfuscation tool. Rather, crypto-detector detects the presence of cryptographic algorithms applied by any obfuscation tool. Sihag et al. [9] used Opcode Segment Document (OSD) to classify an app as malware or goodware. But they did not contemplate at other important features for android malware detection like permissions, API calls, etc. [1]. We have considered the most important features for android malware detection along with our proposed crypto features, e.g., permissions and dangerous API calls. Roy et al. [10] used API calls to reverse map the permissions and intents. They also considered API type and hardware permissions mapped by the API calls to construct the feature set. But they did not address the challenge of encrypted API calls, whereas we have been able to consider encrypted API calls with crypto features. Here, we have observed relatively small dataset. Small datasets may have outliers and skew the findings. Moreover, Roy et al. used a dataset where the number of malwares is equal to or greater than the number of benign apps. But the real-world scenario is not like that. This may cause a class imbalance and bias the result. Therefore, we have used a dataset which resembles the real-world proportions of malwares and benign applications. Garcia et al. [11] used package-level android API usage, method-level android API usage, reflection and native code. But obfuscated API calls were not considered in their feature extraction process which have been considered in our work. Aghamohammadi and Faghih [12] used same type of features. But the authors made use of important n-gram features from sequence of opcodes processed from .dex. They utilized word embedding instead of large sparse one-hot vector. LSTM and GRU algorithms were applied then. Cai et al. [13] computed features from app execution traces. The authors conducted systematic dynamic characterization study by defining and measuring 122 metrics. Kim et al. [14] used string features, method opcodes, method API’s, shared library functions, permissions, components and environment features and applied multimodal deep learning. Aghamohammadi and Faghih [12], Cai et al. [13] and Kim et al. [14] did not address the challenge of string encryption for API calls or other
546
D. Bhakta et al.
string features explicitly. It can be observed that though other techniques have been applied to address the challenge of obfuscation earlier, encrypted strings and API calls have not been addressed with detection of application of cryptographic algorithms in any study discussed in this section. To the best of our knowledge, we are the first to apply such features in android malware detection system. We have studied some works on security in other fields as well for solutions. Al Asad et al. [15] proposed a proof of authority-based permissioned blockchain for secured sharing of healthcare records. Newaz et al. [16] suggested a novel model for better security in IoT systems. Nirjhor et al. [17] also worked on electronic medical records. They proposed a model with IPFS, IDS and two-way authentication for secure sharing of medical records.
3 Methodology Figure 1 is representing an architectural overview of our work. From the figure, it can be observed that we have divided our proposed framework into four major parts. They are raw data extraction, feature extraction, feature vector generation and detection. Raw data extraction comprises decoding manifest files, decompiling dex and generating crypto files from the applications in the dataset. Then permissions/components/environmental features from manifest, dangerous API, dalvik bytecode features and crypto features from crypto files have been extracted. Then feature vector has been generated, and detection algorithm has been applied on the feature vector subsequently. In the following part of this section, a synopsis of our dataset is presented, and following that, the parts of our proposed framework are described briefly.
Fig. 1 Architectural overview of the proposed framework
Android Malware Detection Against String … Table 1 Summary of dataset Source Malware Drebin AndroZoo Total
5564 2633 8197
547
Benign
Total
0 13,712 13,712
5564 16,345 21,909
3.1 Dataset 21,909 samples have been used in total to evaluate our approach. Among them, 8197 were malware samples from Drebin [18] and AndroZoo [19]. 5564 malware samples were collected from Drebin, and 2633 malware samples were collected from AndroZoo. 13,712 benign samples have been used to evaluate our approach. All of the benign samples were collected from AndroZoo. While collecting benign samples, we have ensured that none of the anti-virus scanners in VirusTotal [20] classifies the application as malware. For malwares from Drebin, at least two anti-virus scanners in VirusTotal classify the application as malware, and for malwares from AndroZoo, we have ensured that at least eight anti-virus scanners in VirusTotal classify the application as malware. The information of our dataset is summarized in Table 1. In real world, the percentage of benign applications are far greater than the percentage of malicious applications. To resemble that, we have used 37.41% malwares and 62.59% benign applications in our dataset.
3.2 Raw Data Extraction Process ApkTool [21] has been used to disassemble resources of the apk files to original form. It is a tool for reverse engineering android applications. We have recovered the AndroidManifest.xml file from the disassembled resources of the apk. Smali is the assembly language that is used by the android dalvik virtual machine. Smali files have the extension .smali. These files can be used for a low-level analysis of an android app and can be decompiled from .dex file. The .smali files have been decompiled from the .dex files of the applications using ApkTool. Crypto-detector [6] developed by Wind River [7] has been used to extract .crypto files from the decompiled applications which contain the cryptographic information of the applications in the dataset.
3.3 Feature Extraction Process In this process, the AndroidManifest.xml file that we recovered has been parsed first. The xml file has been traversed to find and extract the necessary information
548
D. Bhakta et al.
by inspecting the xml tags. We have used ‘uses-permission’ tag for permissions, ‘uses-feature’ for features, ‘uses-sdk’ for sdks, ‘activity’ for activities, ‘service’ for services, ‘provider’ for providers and ‘receiver’ for receivers. Details about these features can be found at the next section. A list of dalvik bytecodes and a list of dangerous APIs that can be invoked for malicious activities have been prepared. The list of dalvik bytecodes has been prepared from Android Open Source Project [22], and the list of dangerous APIs has been prepared from [23] and from a manual investigation on Android Developer Reference pages [24]. Then, the .crypto files extracted from the applications have been parsed as JSON objects. We have traversed the files and extracted cryptographic evidence types and the line texts where cryptography had been applied. These information are against ‘evidence_type’ and ‘line_text’ keys in the .crypto files. These two information have been joined for each cryptographic evidence to create a single crypto feature. Then, the frequencies of crypto features in the whole dataset have been calculated. Thus, 7,120,238 crypto features have been collected. Crypto features with a minimum frequency of 200 in the whole dataset have been considered. We have got 20,887 crypto features through this process.
3.4 Feature Vector Generation Process From Fig. 1, it can be perceived that our final feature vector can logically be divided into four parts. First of all, we have generated the feature vector from manifest features. Here, we have followed an existence-based approach. That means we have taken permissions/components/environmental features and based on their existence in the particular application represented the features as 0 or 1. The methods in .smali files have been traversed, and the frequencies of dalvik bytecodes and frequencies corresponding to invocations of any dangerous API from the list have been extracted. The frequencies of crypto features in each apk have also been calculated by inspecting the .crypto file associated with the apk. For dangerous API invocation feature vector, dalvik bytecode feature vector and crypto feature vector, we have followed a frequency-based approach. The frequency of each feature in a particular application represents the feature for that application. Finally, we have got the final feature vector representation of the application by appending these feature vectors together. Thus, the final size of the feature vector representing an application is 29,337.
3.5 Detection Process Random forest classifier has been used to classify an application as malicious or benign. Random forest is a supervised machine learning algorithm which uses deci-
Android Malware Detection Against String …
549
sion trees on various subsamples of the dataset to classify a sample. Scikitlearn’s random forest classifier library [25] has been utilized to apply random dorest to our dataset. ‘max_features’ and ‘n_estimators’ parameters have been adjusted while applying random forest classifier. Fivefold cross-validation has been applied for evaluating random forest classifier for our proposed framework. In the process, 80% of the dataset has been considered as training set and 20% of the dataset has been considered as validation set.
4 The Definition of Features We have introduced a new type of features called ‘crypto features’ for android malware detection systems. Besides this, we have also used the most crucial features for android malware detection systems which has helped us to reflect the characteristics of the application. The features for our proposed system can be divided into four types. They are as follows: • • • •
Permissions/components/environmental information Dangerous API invocation Dalvik bytecode Crypto features.
4.1 Permissions/Components/Environmental Information Every android application source code contains an AndroidManifest.xml file. We can get permissions, components and environmental information from this file. Permissions are techniques to make sensitive data of users available for the android applications. A unique label is assigned for each permission. Features are elements which are used to access hardware and software features of the device from the android application. Sdk feature indicates the minimum version of platform with which the android application is compatible. An activity feature means an activity component of the android application. In android application development, activity means a window on which the functionalities are developed. A provider feature means a content provider of the android application. A service feature means a service component, and a receiver feature means a broadcast receiver component of the android application.
550
D. Bhakta et al.
4.2 Dangerous API Invocation To avail different services, API calls are invoked. Among these API calls, there are some API calls that collect or process sensitive user data. These API calls can be used to perform malicious activities. Therefore, such dangerous API invocations are considered as important features in our work.
4.3 Dalvik Bytecode Dalvik is a virtual machine that runs Java applications and code. A normal Java compiler converts text files which contains source code into bytecode, which is subsequently compiled into a .dex file. Dalvik VM can read and utilize these .dex files. Class files are transformed to .dex files, which are then read and run by the dalvik virtual machine. The list of dalvik bytecodes can be accessed from Android Open Source Project [22].
4.4 Crypto Features Crypto-detector is an open-source tool for detecting the presence of cryptographic algorithms in a source code package. At first, it tries to identify keywords and then it tries to find API calls to cryptographic repositories. It then stores the result of cryptographic evidences in a .crypto file. This follows an output specification which can be found at [6], and this .crypto file can be parsed into a JSON object. In the .crypto file, ‘evidence_type’ and ‘line_text’ keys for each hit have been used to build the crypto features for our proposed android malware detection system. ‘evidence_type’ indicates the type of the cryptographic evidence found, and ‘line_text’ indicates the line on which cryptography has been applied.
5 Performance Evaluation 5.1 Effectiveness of Our Approach Accuracy, F1-score and ROC area under curve have been used as evaluation metrics to evaluate the performance of our framework. The best results we have achieved for random forest classifier (max features = sqrt) are accuracy of 0.9848, F1-score of 0.9804 and ROC AUC of 0.9996. The best results for random forest classifier (max features = log2) are accuracy of 0.9845, F1-score of 0.98 and ROC AUC of 0.9993. The best results for random forest classifier (max features = None) are accuracy of
Android Malware Detection Against String … Table 2 Summary of results of the experiment Random forest max features Accuracy sqrt log2 None
0.9848 0.9845 0.9880
551
F1-score
ROC AUC
0.9804 0.98 0.9843
0.9996 0.9993 0.9984
Fig. 2 Results for different metrics, different numbers of trees and different max features; fivefold cross-validation has been applied in this experiment; training set—80%, validation set—20%
0.9880, F1-score of 0.9843 and ROC AUC of 0.9984. The summary of the results of our experiment is presented in Table 2. Accuracy, F1-score and ROC AUC for different numbers of trees and different maximum features in random forest classifier are presented in Fig. 2. All the results are mean after applying five-fold cross-validation. The results of our experiment show that we have got excellent results according to accuracy and F1-score while using random forest classifier (max feature = None). According to ROC AUC, we have observed excellent results while using random forest classifier (max features = sqrt). Moreover, our results are quite stable across the numbers of trees in random forest classifier. The result shows that using crypto features along with other features have helped us to get excellent results according to different metrics by mitigating the impression of string encryption. From the following subsections, it can be observed that the newly proposed crypto features have the efficacy against string encryption, and our proposed framework performed better compared to other similar works by addressing the challenge of string encryptionbased obfuscation.
552
D. Bhakta et al.
Fig. 3 Percentage of crypto features in list of most important features that had the most vital role to classify the android malwares using random forest classifier
5.2 Effectiveness of Crypto Features We have used crypto features along with other features to classify android applications as malware or benign. To the best of our knowledge, we are the first to use crypto-detector to extract crypto features and use them to detect android malwares. So, we have performed an experiment to evaluate effectiveness of the crypto features to detect android malwares. First, random forest classifier (max features = None) has been applied to the whole dataset. Then, the features have been sorted according to their importance to the random forest classifier. Then, the percentage of crypto features in the first n most important features that had the most vital role to classify the android malwares using random forest classifier has been calculated for n = 1 to 29,337. The result of this experiment has been presented in Fig. 3. Here, it can be observed that though the percentage of crypto features is quite less than other features at first, but the percentage grasps an excellent rise very quickly.
5.3 Comparison to Other Similar Works We have compared the performance of our work with other related works on android malware detection systems which have addressed the challenge of obfuscation. The result of the investigation is presented in Table 3. Table 3 aligns accuracy score, F1-score, number of malwares in the dataset, number of benign in the dataset and
Android Malware Detection Against String …
553
Table 3 Performance comparison with other similar works Study Accuracy F1-score Malwares
Benign
Algorithms used Random Forest LightGBM, CatBoost, Linear SVM, random forest, extra trees J48, K-NN, random forest, sequential minimal optimization SVM, logistic regression, random forest, K-NN SVM
Ours 0.9880 Chen et al. [5] 0.9874
0.9843 0.8125
8197 7254
13,712 162,985
Sihag et al. [9]
0.9878
NA
34,506
160,372
Roy et al. [10]
0.9377
0.9173
1100
1100
0.98
30,000
24,000
0.9739 0.99
16,942 21,260
17,365 20,000
Garcia et NA al. [11] Cai et al. [13] NA Kim et 0.98 al. [14]
Random forest Multimodal neural network
algorithm used in the study of our work and other similar works together to compare the attributes. From the table, it can be observed that our accuracy is at least 0.0002 higher than all other works, and our F1-score is at least 0.0043 higher than all other works except one. There is one aspect to be noted that other works have used much larger dataset than ours. But Roy et al. [10], Garcia et al. [11] and Kim et al. [14] used datasets where the number of malwares is larger than or equal to the number of benign samples. But in the real world, the case is not the same. So, the datasets do not represent the real-world scenario. None of the other works addressed the crypto features like our work has.
6 Conclusion and Future Works A novel approach to address the challenge of string encryption-based obfuscation in android malware detection has been proposed in this study. Encrypted API calls and encrypted strings have been considered as features to our machine learning algorithm. We have also used features from android manifest file and dex file. Then, random forest classifier has been applied. Fivefold cross-validation has been applied to validate our results. The importance of our proposed crypto features has been discussed, and our performance has been compared to that of other related works
554
D. Bhakta et al.
during performance evaluation. From this discussion, it is quite apparent that our proposed framework has shown excellent performance, and since crypto features built with crypto-detector have had a vital role in android malware detection, it is evident that these features proposed by us have a significant role against string encryption-based obfuscation. Further studies based on this approach can be experimented on larger datasets in future. Other important features for android malware detection can be considered to improve the performance in future.
References 1. Liu K, Xu S, Xu G, Zhang M, Sun D, Liu H (2020) A review of Android malware detection approaches based on machine learning. IEEE Access 8:124579–124607 2. Qiu J, Zhang J, Luo W, Pan L, Nepal S, Xiang Y (2021) A survey of Android malware detection with deep neural models. ACM Comput Surv 53:1–36 3. Pan Y, Ge X, Fang C, Fan Y (2020) A systematic literature review of Android malware detection using static analysis. IEEE Access 8:116363–116379 4. Zhang X, Breitinger F, Luechinger E, O’Shaughnessy S (2021) Android application forensics: a survey of obfuscation, obfuscation detection and deobfuscation techniques and their impact on investigations. Forens Sci Int: Digit Invest 39:301285 5. Chen Y, Chen H, Takahashi T, Sun B, Lin T (2021) Impact of code deobfuscation and feature interaction in Android malware detection. IEEE Access 9:123208–123219 6. GitHub–Wind-River/crypto-detector. Cryptography detection tool. https://github.com/WindRiver/crypto-detector. Accessed 9 May 2022 7. Wind River. https://www.windriver.com/. Accessed 9 May 2022 8. DeGuard. Statistical deobfuscation for Android. http://apk-deguard.com/. Accessed 9 May 2022 9. Sihag V, Vardhan M, Singh P (2021) BLADE: robust malware detection against obfuscation in android. Forens Sci Int: Digit Invest 38:301176 10. Roy A, Jas D, Jaggi G, Sharma K (2020) Android malware detection based on vulnerable feature aggregation. Procedia Comput Sci 173:345–353 11. Garcia J, Hammad M, Malek S (2018) Lightweight, obfuscation-resilient detection and family identification of Android malware. ACM Trans Softw Eng Methodol 26:1–29 12. Aghamohammadi A, Faghih F (2019) Lightweight versus obfuscation-resilient malware detection in Android applications. J Comput Virol Hack Tech 16:125–139 13. Cai H, Meng N, Ryder B, Yao D (2019) DroidCat: effective Android malware detection and categorization via app-level profiling. IEEE Trans Inf Forens Secur 14:1455–1470 14. Kim T, Kang B, Rho M, Sezer S, Im E (2019) A multimodal deep learning method for Android malware detection using various features. IEEE Trans Inf Forens Secur 14:773–788 15. Al Asad N, Elahi MT, Al Hasan A, Yousuf MA (2020) Permission-based blockchain with proof of authority for secured healthcare data sharing. In: 2020 2nd international conference on advanced information and communication technology (ICAICT). IEEE, pp 35–40 16. Newaz NT, Haque MR, Akhund TMNU, Khatun T, Biswas M, Yousuf MA (2021) IoT security perspectives and probable solution. In: 2021 fifth world conference on smart trends in systems security and sustainability (WorldS4). IEEE, pp 81–86 17. Nirjhor MKI, Yousuf MA, Mhaboob MS (2021) Electronic medical record data sharing through authentication and integrity management. In: 2021 2nd international conference on robotics, electrical and signal processing techniques (ICREST). IEEE, pp 308–313 18. Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens CERT (2014) Drebin: effective and explainable detection of android malware in your pocket. Ndss 14:23–26
Android Malware Detection Against String …
555
19. Allix K, Bissyandé TF, Klein J, Le Traon Y (2016) Androzoo: collecting millions of android apps for the research community. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 468–471 20. VirusTotal. https://www.virustotal.com/gui/home/upload. Accessed 12 May 2022 21. Apktool—a tool for reverse engineering 3rd party, closed, binary Android apps. https:// ibotpeaches.github.io/Apktool/. Accessed 12 May 2022 22. Android Open Source Project. https://source.android.com/. Accessed 12 May 2022 23. Aafer Y, Du W, Yin H (2013) Droidapiminer: mining API-level features for robust malware detection in android. In: International conference on security and privacy in communication systems. Springer, Cham, pp 86–103 24. Android Developers. https://developer.android.com/. Accessed 12 May 2022 25. sklearn.ensemble.RandomForestClassifier. https://scikit-learn.org/stable/modules/generated/ sklearn.ensemble.RandomForestClassifier.html. Accessed 12 May 2022
Machine Learning Techniques for Resource-Constrained Devices in IoT Applications with CP-ABE Scheme P. R. Ancy
and Addapalli V. N. Krishna
Abstract Ciphertext-policy attribute-based encryption (CP-ABE) is one of the promising schemes which provides security and fine-grain access control for outsourced data. The emergence of cloud computing allows many organizations to store their data, even sensitive data, in cloud storage. This raises the concern of security and access control of stored data in a third-party service provider. To solve this problem, CP-ABE can be used. CP-ABE cannot only be used in cloud computing but can also be used in other areas such as machine learning (ML) and the Internet of things (IoT). In this paper, the main focus is discussing the use of the CP-ABE scheme in different areas mainly ML and IoT. In ML, data sets are trained, and they can be used for decision-making in the CP-ABE scheme in several scenarios. IoT devices are mostly resource-constrained and has to process huge amounts of data so these kinds of resource-constrained devices cannot use the CP-ABE scheme. So, some solutions for these problems are discussed in this paper. Two security schemes used in resource-constrained devices are discussed. Keywords Machine learning · IoT · CP-ABE · Encryption · Security
1 Introduction ABE is a type of public-key encryption system. Secret key and ciphertext are generated based on attributes, which are the identity of the user. Ciphertext can decrypt only when the attribute of the user matches with the attribute of the access policy. There are mainly two types of ABE: key-policy ABE (KP-ABE) and another one is ciphertextpolicy ABE (CP-ABE) as in Fig. 1. In KP-ABE, the user’s private key is associated with policies and ciphertext by a set of attributes, whereas in CP-ABE, ciphertext is associated with policies, and the private key of the user is associated with attributes. Again, based on the number of attribute authority, it can again be classified into single P. R. Ancy (B) · A. V. N. Krishna Computer Science and Engineering Department, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_40
557
558
P. R. Ancy and A. V. N. Krishna
Fig. 1 Types of ABE
authority ABE (SA-ABE) if only one attribute authority and multi-authority ABE (MA-ABE) if multiple independent attribute authority is involved. As we know, the field of IoT is growing which increases the need of protecting sensitive data. IoT is seen as the consequence of machine-to-machine (M2M) architecture and its connectivity [1]. Thus, cryptographic models become an important part of it. ML systems improve performance by experience. ML is mainly used in different areas such as NLP, image recognition, and robotics. The main goal of most ML problems is to find a model that gives the distribution of the input data set [2]. By gaining knowledge about patterns and cybersecurity data and by developing a data-driven model, we can make a security system automatic and intelligent [3]. This makes the computing process more actionable and intelligent than the traditional one. ML models consist of a set of rules, methods, or transfer functions to find data patterns or predict behavior. ML techniques can even be used to solve complicated challenges in network scenarios [4]. IoT is common in our society and interaction between embedded devices, and the cloud server is common in IoT deployment. IoT clouds can store and manage massive IoT data. Here come the security issues in the architecture in the context of an untrusted third-party service provider [5]. As a scenario, electronic health records (EHRs) users share their health records which contain sensitive data with third-party cloud service providers which leads to security and privacy concerns [6]. For solving this problem, CP-ABE is introduced. Traditional CP-ABE methods have several benefits, such as confidentiality, authentication, and access control, although certain issues with access policy privacy, information security, malicious insiders, storage complexity, and interoperability with IoT-enabled infrastructures [7]. Different architectures of IoT devices to achieve security are discussed in [8].
Machine Learning Techniques for Resource-Constrained Devices in IoT …
559
1.1 Related Work In mobile computing, users get cloud-based functions through mobile devices [9]. This can increase productivity but has introduced security issues. This paper reviews different ABE schemes used in mobile cloud computing. As we know, mobile devices have limited resources such as storage, battery, and processing capacity. To overcome these limitations, mobile cloud computing is introduced. But data confidentiality is the main concern in this kind of environment whether the third-party will get the user data. To overcome this problem, researchers introduce different schemes to securely store encrypted data in the third third-party service provider. One of these schemes is ABE. Their study compares different schemes in terms of computation and communication. It is found that more studies have to be carried out in the mobile cloud for improving encryption and decryption speed and also to reduce complexity. Machine learning is commonly used for prediction and classification. Security issues arise when the training and testing process include sensitive data [10]. The authors introduced a system that protects machine learning engines in IoT without changing internal structure. This proposed structure reduced time consumption. Machine learning is used in many applications such as detecting and preventing malicious activity. Machine learning engines are prone to security vulnerability. It performs computations based on input data. So if we feed malicious training data as input, it will lead to misclassification. For this, they used the CP-ABE module for encryption and decryption in client data to and further on machine learning engines. An ML algorithm uses a multilayer perceptron (MLP) consisting of three layers. Numerous amounts of data are generated in the smart city [11]. These data contain both private and sensitive information that has to be secured from unauthorized users. For this, CP-ABE can be used to encrypt data that allows encryption to define the access policy. But this has two limitations: The first one is access policy which is exposed, and the second is decryption time. For solving the issue related to access policy, they proposed a security model called chosen sensitive policy security. Moreover, this scheme applies to resource-constrained devices. A huge amount of data is generated in industrial production every day [12]. Using cloud technology, it is the best solution for managing and storing these large amounts of data. But there is some concern about cloud technology such as privacy and security. The concept of outsourcing emerged as a solution to address this issue. In outsourcing data is encrypted before uploading. To achieve this ABE scheme is introduced and it is one of the best available schemes. The main issue the author addressed is that the key generation center (KGC) will have keys and attributes of all users. This makes a concern whether the KGC tries to fit sensitive data. So for solve this issue, the author introduced the attribute auditing center (AAC) along with KGC. In which KGC deals with private keys, and AAC stores attributes. Privacy preservation is the main concern in data mining and machine learning [13]. For achieving this, encrypted data are used to train machine learning models. This paper discussed the implementation of ML algorithms using encrypted data. To address the poor performance of operations in IoT, the author proposed a security method
560
P. R. Ancy and A. V. N. Krishna
that is based on Hadoop and double-secret key encryption [14]. Main issue with the existing CP-ABE scheme is that it may leak user information [7]. For solving this problem, the authors introduced a new technique that hides access policy using a hashing algorithm. A delegation-based scheme is compared with full encryption technique [15] for data privacy in cloud service. To improve security in IoT devices, different techniques based on ML and DL are discussed [16]. Authors conducted a survey in different ABE schemes for checking feasibility of applying it in mobile devices for improving computation and to reduce complexity [17]. A scheme used for embedded devices for achieving fast encryption and for making memory efficient is discussed [18]. Design constraints for IoT devices to achieve security are discussed in [19]. For securing different IoT devices, different ML techniques are discussed [20].
1.2 Our Contribution In this paper, we have discussed the CP-ABE scheme and its use in two main areas such as machine learning and the Internet of things. We are mainly focusing on the security aspects. 1. In this paper, we discussed general aspects of the CP-ABE scheme, mainly the model and framework. 2. Considering the security issues in the field of ML, we discussed the use of CPABE schemes. 3. We explained how the CP-ABE scheme can be used in IoT, for providing security and fine-grained access control.
1.3 Organization The paper is organized such that in Sect. 2, we presented some definitions of terminologies. Section 3 describes the use of the CP-ABE scheme in the field of ML, and Sect. 4 describes the CP-ABE scheme in the field of IoT. Finally, in Sect. 5, we discussed some open problems.
Machine Learning Techniques for Resource-Constrained Devices in IoT …
561
2 Preliminaries 2.1 Bilinear Mapping Let G and G T be two multiplicative cyclic groups with prime order p. The bilinear mapping of this group can be represented as e : G × G → G T . The bilinear map e has the following three properties. (1) Bilinearity: for all g ∈ G and any α, β ∈ Z p , we have e g α , g β = e(g, g)αβ . (2) Nondegeneracy: Nondegeneracy is e(g, g) = 1G T .
2.2 Access Structures Let set {P0 , P1 , . . . , Pn−1 } denote parties. An access structure is called monotone if the collection ⊆ 2 P0 ,P1 ,...,Pn−1 where it satisfies X ∈ and X ⊆ Y imply P ∈ . It must be monotone collection of non-empty subsets in {P0 , P1 , . . . , Pn−1 }. The sets in are called the authorized sets if P ∈ , otherwise P is an unauthorized set.
2.3 CP-ABE A ciphertext-policy attribute-based encryption scheme encrypts messages with access policy. Users should own the private key for decryption which is based on a set of attributes satisfying the access policy. The system model is shown in Fig. 2. 1. Attribute authority: AA is the entity that controls the attribute universe. The master secret key and public parameter are generated.
Fig. 2 System model
562
P. R. Ancy and A. V. N. Krishna
2. User: The one who tries to access a message. 3. Data Owner: Owner is the one who has a resource-constrained device and stores data. 4. Semi-trusted storage: Place that store ciphertext, where users keep data and it may not be trusted in the system. CP-ABE consists of four algorithms such as Setup, Encrypt, KeyGen, and Decrypt. Setup (λ, U). This algorithm takes input as security parameter λ and attribute universe U, whereas outputs are public parameters PK and a master key MK. Encrypt (PK, A, M). The encryption algorithm takes input as public parameters PK, a message M, and an access structure A. It encrypts the message and produces ciphertext CT as the output. KeyGen (MK, S) input as master secret key MSK, and a set of attributes S. This algorithm generates private key SK. Decrypt (PK, CT, SK). The decryption algorithm takes a public parameter and ciphertext with access policy as input and decrypts the ciphertext and generates the message.
3 CP-ABE and Machine Learning 3.1 Smart Offloading Technique for CP-ABE Schemes As we know, CP-ABE is a public-key encryption system that provides fine-grained access control to data that is stored in a third-party cloud service provider. It helps in outsourcing data which means encrypting the data before storing it with a third party. Here, encryption is done on the user device. One of the main issues here is due to its resource constraints in performing such operations on resource-constrained devices. For solving this problem, an adaptive CP-ABE scheme is proposed which will do smart offloading from full encryption to partial encryption based on a decision strategy. Machine learning algorithm is used for making this decision [1].
3.2 Full Versus Partial Encryption By offloading most of the encryption or decryption tasks can be assign to remote/proxy machine. Performing all ABE tasks on the same device or resourceconstrained device is called full encryption. Whereas partial encryption performs in cases where some tasks which are based on dummy attributes will offload to remote server/proxy. Full encryption is done on-device when the available CPU, battery, and other resources are sufficient to perform the operation. Otherwise, I will go for partial encryption. This decision-making process is rule-based.
Machine Learning Techniques for Resource-Constrained Devices in IoT …
563
3.3 Decision Variable Let X Di be the notation for the decision variable, where X Di is defined as: X Di =
0 Full Encryption 1 Partial Encryption
To achieve another optimization objective, all factors that increase the time to generate ciphertext are used. Then, it will decide whether to perform CP-ABE locally or to offload. This is called an adaptive encryption scheme or adaptive CP-ABE. ML is used to select the appropriate encryption technique. The accuracy of a machine learning algorithm is defined by the following equation: Accuracy =
(TN + TP) (TP + FP + FN + TP)
where True Positive (TP)—Every valid predictive value is equal to the value that really occurred. False Positive (FP)—When an actual value and a true prediction value are not equal. True Negative (TN)—If the actual value matches the negative projected value exactly. False Negative (FN)—If the actual value differs from the negative predictive value. The exactness and quality can be measured by the precision metric, Precision = TP . TP+FP The recall metric assesses the algorithm’s completeness or quality, Recall = TP . TP+FN The weight is a harmonic of recall and precision known as Fprecision ∗ recall . measure,F-measure = 2 ∗precision+recall The best-fit machine learning algorithm is the decision tree algorithm. It is used to take the decision, X D , i.e., whether the algorithm based on the observed value performs complete or partial CP-ABE encryption.
4 CP-ABE and Internet of Technology 4.1 IoT-Fog-Cloud Architecture The existing cloud service providers are having low computability, less response time, and limited resources. To solve this problem, an IoT-Fog-Cloud architecture is introduced. Already existing ABE schemes are not suitable for this architecture. So a new CP-ABE scheme is introduced for this architecture for solving the above
564
P. R. Ancy and A. V. N. Krishna
problem. Fog is used in this instance to handle the costly offline encryption by creating an interim ciphertext pool with the aid of the Chameleon hash function. This scheme is mainly useful for resource-constrained IoT devices.
4.2 Chameleon Hash Functions Specific hash function including a key pair pkch , skch is called chameleon hash function. If a user has access to the public key, it is simple to calculate the hash value for any input pkch . Three polynomial time algorithms are as follows: • KeyGench 1λ → skch, pkch : Security parameter λ ∈ N is taken as input, then pair of keys skch , pkch is output. • Hashch : The output is a hashed value Hm based on key, message, and random parameter pkch , m, rch , respectively. • Forgech: It outputs another random parameter rch s.t.Hm = Hashch pkch , m, rch = Hashch pkch , m , rch = Hm based on the input skch , m. rch and another message which is not equal to m.
4.3 System Model The system model consists of the following functions and phases: • It should have a quick encryption phase for a resource-constrained owner. • Almost all computation work is transferred to fog device (FD). • Filter phase will be done by FD without any extra secret data, i.e., check whether a ciphertext is valid or not. If it is invalid, it will be thrown away. The system’s foundation is a straightforward online/offline CP-ABE method. There is five algorithm: • Setup 1λ → (pp, msk): AA executes this algorithm. It accepts the system security parameter and outputs the master key MSK and the system public parameter pp. • KeyGen(pp, msk, S) → sk: AA is also executing this algorithm. The user’s attribute set S, the master key MSK, and the system public parameter pp are the inputs, and the output is the private key sk. • Encryptoff (pp) → CToff : FD uses this algorithm to make decisions. The output is the equivalent offline ciphertext, designated CT off, while the input is the system public parameter pp. • Encrypton (pp, m, CToff , (M, ρ)) → K , CT : The algorithm executed by DO. Inputs are pp, m, (M, ρ), and CToff . The ciphertext CTon , embedded with a session key K is the output.
Machine Learning Techniques for Resource-Constrained Devices in IoT …
565
• Decrypton (CT, pp, sk) → m, ⊥: Both FD and DO execute this algorithm. The inputs are pp, CT, sk of the user. The message m or ⊥ is generated as output.
5 Discussion and Conclusion In the first instance, it can be seen that full encryption is always faster than partial encryption. A full encryption technique calculates total time as the amount of time needed to complete the CP-ABE procedure prior to uploading data. In partial encryption, total time is measured by adding the transmission time between the user device and proxy machine to the total time of CP-ABE operations in the user device. If we calculate CPU usage, we can find that the full encryption scheme is using more CPU than the partial encryption. This is a big concern in resource-constrained devices. So, for solving this problem, a proposed solution is using the IoT-Fog-Cloud architecture for secure data sharing using CP-ABE scheme which is discussed. Because the online/offline encryption will reduce CPU usage, and public verification reduces the user’s computational burden. This will overall improve the performance of resourceconstrained devices. This paper discussed two scenarios in which we can use CP-ABE to improve security.
References 1. Bany Taha M, Ould-Slimane H, Talhi C (2020) Smart offloading technique for CP-ABE encryption schemes in constrained devices. SN Appl Sci 2(2):1–19 2. Hettwer B, Gehrer S, Güneysu T (2020) Applications of machine learning techniques in sidechannel attacks: a survey. J Cryptogr Eng 10(2):135–162 3. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A (2020) Cybersecurity data science: an overview from a machine learning perspective. J Big data 7(1):1–29 4. Cheng Y, Geng J, Wang Y, Li J, Li D, Wu J (2019) Bridging machine learning and computer network research: a survey. CCF Trans Netw 1(1):1–15 5. Chi BC, Bica I, Patriciu VV, Pop F (2018) A security authorization scheme for smart home Internet of Things devices. Future Gener Comput Syst 86:740–749 6. Liu X, Xia Y, Yang W, Yang F (2018) Secure and efficient querying over personal health records in cloud computing. Neurocomputing 274:99–105 7. Chinnasamy P, Deepalakshmi P, Dutta AK, You J, Joshi GP (2021) Ciphertext-policy attributebased encryption for cloud storage: toward data privacy and authentication in AI-enabled IoT system. Mathematics 10(1):68 8. Mrabet H et al (2020) A survey of IoT security based on a layered architecture of sensing and data analysis. Sensors 20(13):3625 9. Sujatha U, Saranya U, Boopathy CP (2019) Use of attribute-based encryption for secure data access control in mobile cloud computing—a case study 10. Kurniawan A, Kyas M (2019) Securing machine learning engines in IoT applications with attribute-based encryption. In: 2019 IEEE international conference on intelligence and security informatics (ISI). IEEE, pp 30–34 11. Meng F, Cheng L, Wang M (2021) Ciphertext-policy attribute-based encryption with the hidden sensitive policy from keyword search techniques in the smart city. EURASIP J Wirel Commun Netw 1:1–22
566
P. R. Ancy and A. V. N. Krishna
12. Song Y, Wang H, Wei X, Wu L (2019) Efficient attribute-based encryption with privacypreserving key generation and its application in the industrial cloud. Security and communication networks 13. González-Serrano FJ, Amor-Martín A, Casamayón-Antón J (2018) Supervised machine learning using encrypted training data. Int J Inf Secur 17(4):365–377 14. Duan Y, Li J, Srivastava G, Yeh JH (2020) Data storage security for the internet of things. J Supercomput 76(11):8529–8547 15. Taha MB, Talhi C, Ould-Slimane H (2019) Performance evaluation of CP-ABE schemes under constrained devices. Procedia Comput Sci 155:425–432 16. Hussain F et al (2020) Machine learning in IoT security: current solutions and future challenges. IEEE Commun Surv Tutor 22(3):1686–1721 17. Moffat S, Hammoudeh M, Hegarty R (2017) A survey on ciphertext-policy attribute-based encryption (CP-ABE) approaches to data security on mobile devices and its application to IoT. In: Proceedings of the international conference on future networks and distributed systems 18. Venema M, Alpár G (2022) TinyABE: unrestricted ciphertext-policy attribute-based encryption for embedded devices and low-quality networks. Cryptology ePrint archive 19. Javed B, Iqbal MW, Abbas H (2017) Internet of things (IoT) design considerations for developers and manufacturers. In: 2017 IEEE international conference on communications workshops (ICC Workshops). IEEE 20. Sagu A, Gill NS (2020) Machine learning techniques for securing IoT environment. Int J Innov Technol Expl Eng (IJITEE) 9
Safely Sending School Grades Using Quick Response Code Roxana Flores-Quispe and Yuber Velazco-Paredes
Abstract In recent years, the use of technology has increased rapidly as many people send a lot of text or image information every day, and also sometimes in schools or different study centers, teachers need to send private information such as grades to their students or their parents. In addition, this information is generally for a particular student, so it is necessary to implement some tool that helps them send secure information through to the Internet. And in these cases, encryption plays a fundamental role in security. On the other hand, the quick response (QR) code is used widely to provide easy access to information, and also it could be applied for secured communications using different devices to access the internet. For that reason, this paper proposes a new method to send secure information about students’ grades using a QR code, and in order to keep the information safe, we have proposed to use 3 × 3 patterns to represent any number and the information into the QR code which has been distributed in random positions whose average variation percentage was 1.142857143, which is within the margin of error allowed in QR codes. Finally, the experiments have demonstrated that the proposed method can achieve successful results and each student with the correct password can be able to retrieve only their grades. Keywords QR code · Encryption · Patterns · Data hiding
1 Introduction The use of Information and Communication Technology (ICT) in the education realm has increased quite rapidly. This concept leads to the improvement of learning quality. The era of industrial revolution 4.0 demands efficiency, digitalization, and automaR. Flores-Quispe (B) · Y. Velazco-Paredes School of Computer Science, Universidad Nacional de San Agustín de Arequipa, Arequipa, Peru e-mail: [email protected] Y. Velazco-Paredes e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_41
567
568
R. Flores-Quispe and Y. Velazco-Paredes
tion. It provides a golden opportunity for those who are able to apply information and communication technology in various fields, especially in education [1]. In addition, nowadays people need to send and receive reliable text or graphic information whether for work, study, or many other activities, through the internet, which has provided numerous benefits, including increased social support, academic enrichment, and worldwide cross-cultural interactions but there are concomitant risks to internet [2], for example when we need to send confidential information. And this is where encryption plays an essential role in securing information [3]. On the other hand, different kinds of codes are used in order to store, retrieve and manage information. Mark Weisers vision of Ubiquitous Computing underlines the need of seamlessly unify computers and humans around the notion of a rich environment. He explained: The most profound technologies are those that disappear [4]. One of these codes is a quick response (QR) which is a two-dimensional code that consists of light and dark squares, referred to as modules [5]. It is considered the most common two-dimensional bar codes, and has the advantages of low cost, easy production, durability, and so on [6, 7], and they have been approved by International Organization for Standardization (ISO) and are freely available [8] to generate and access data quickly. Also, the QR codes are used in Modern commercial applications for brand promotion, enriching consumer usage experience, and interactive labeling for sharing product information, including promotional videos, web links, etc. In addition, QR codes are integrated with service platforms of governments for the effective delivery of utilization and administrative services to the public. The simplicity of QR code generation and scanning with cheap smartphones and IoT has harnessed their extensive adaptation by commercial and nonprofit organizations [9]. They help to get valuable information regarding consumer behavior, demographic information, and response rates. QR codes help to collect customer reviews via the website. After collecting online reviews, marketers can understand the behavior of consumers [10]. Also in the school, the QR codes can connect contextual information on smartphones to the physical environment, working as a learning tool that may be used for orienting students in their interaction with the physical environment [11]. Thus, the use of QR codes in classrooms has been identified as an important tool in promoting active as well as distributed learning [12]. Although these QR codes can be misleading due to the difficulty in differentiating a genuine QR code from a malicious one [13]. For that reason, is important to create a method that provides security when the information is sent.
2 Literature Review In recent years, many kinds of research have been developed in order to use QR codes, as tools in the learning process, in commercial applications, or to share information.
Safely Sending School Grades Using Quick Response Code
569
In the educational field, Syarifuddin et al. [1] proposed pre-experiment research with one group pretest-posttest design to use the QR code in Learning during Covid19 Pandemic. This study was conducted by giving treatment to one particular class and then comparing the circumstance before and after the treatment. Then the learning motivation had a positive result, and the response of students showed a positive result. Hence, it implies that there was an increase in students’ learning achievement after the QR code had been used in the learning process. Uçak [14] has proposed a study to investigate the opinions of prospective science teachers about the use of QR codes in the teaching materials that they prepared in the Instructional Technologies and Material Development course. In this case study, the data were collected through semi-structured interviews. The study revealed the perspectives of prospective science teachers on the use of QR codes in the teaching materials they prepared, in the learning process, its advantages, disadvantages, and effects of QR codes on the materials prepared. The majority of the prospective teachers stated that QR codes in the teaching materials aroused interest in students. They also stated that the use of QR codes in games entertained the students. Then they concluded that QR codes form a bridge between teachers or students and information. Students can access the content on the mobile web pages directly and quickly. Thus, as prospective teachers indicated, science and technology are integrated. So, Liu et al. [15] proposed a method to implement Mobile learning in the learning of the English language, where each student follows the guide map displayed on the phone screen to visit learning zones and decrypt QR codes. The detected information is then sent to the learning server to request and receive context-aware learning material wirelessly. A case study and a survey conducted at the university demonstrate the effectiveness of the proposed m-learning system. In addition, Weng et al. [16] proposes a method to embed the QR code into the lowest bit of the pixels in the background image for hiding the information, the method doesn’t reduce the picture quality and the information can be accurately transmitted and identified. Ajini and Arun [17], have developed research to transmit data in mobile devices using AES encryption, where the original message can be obtained by applying Fast Fourier Transform (FFT) on QR code captured by receiver, followed by demodulation. Since data is stored in phase difference, adjacent elements are less affected by the motion blur distortions and also the data transfer rate can be increased by increasing the bits per symbol from the current 2 bits per symbol constellation. Finally, the captured image was successfully decoded and decrypted to obtain actual data. Other papers proposed encoding information into the QR codes such as Chou and Wang [18] that comprises two QR codes with individual messages on a shared square image. The two QR codes can be read separately by slightly adjusting the distance and angle at which images are obtained using standard QR code readers. Construction methods for generating the proposed nested QR code with high decoding robustness are presented. Finally, the experimental results verify the feasibility of the proposed method. Most research papers show the use of QR codes in the educational area as such an interesting tool to learn any subject, but there are very few applications to send
570
R. Flores-Quispe and Y. Velazco-Paredes
confidential information to individual person through QR. In this sense, only the student with the correct password will be able to receive the information on the QR code.
3 Methodology—Proposal Method Figure 1 shows the graphical representation of the proposed method to send secure information about the students’ grades using QR codes.
Fig. 1 Flowchart of proposed method to send students’ grades using QR codes
Safely Sending School Grades Using Quick Response Code
571
3.1 Generation of QR Code to Each Student At first, a professor needs to have a report with three important pieces of information for each student: a CUI (unique student code), grade, and password, then with this information, a QR code is generated for each student in order to encrypt the student’s grade.
3.1.1
QR Code Structure
The QR codes were introduced by the Japanese company Denso-Wave in 1994, such a kind of two-dimensional (matrix), initially they were used to track inventory in vehicle parts manufacturing and are now used in a variety of industries [4]. It is thus that from that moment on, it became in general use as an identification mark for all kinds of commercial products, advertisements, and other public announcements [19]. The QR code itself is an array of bits to be identified by a scanner to be able to identify and orient the image, as well as for version and format information [20]. Figure 2 shows the basic structure of the QR code. They are quiet zone, position detection patterns, separators for position detection patterns, timing patterns, alignment patterns, format information, version information, data, and error correction codewords [21]. The remaining bits are used to encode the message, and the specific amount of available space leftover is dependent on the version of the QR code, which indicates the number of bits per row/column, and the level of error correction, which introduces redundancy [20]. In our research, we have used these remaining bits located in the lower right area below the detection pattern and we have 144 bits to store the CUI student. For example to the student code 20212024, Fig. 3 shows the corresponding QR code.
Fig. 2 QR code structure [21]
572
R. Flores-Quispe and Y. Velazco-Paredes
Fig. 3 QR code generated to the CUI 20212024
Fig. 4 Sort patterns to the numbers 0 to 9
Fig. 5 Mixing the grade 17 and the password 2451
3.1.2
Generation of Patterns
In this stage, ten different patterns will be used in order to encrypt each number corresponding to the password or grade. In this case, we propose to use a 3 × 3 binary grid. Figure 4 shows the binary grids for each number.
3.1.3
Encrypt the Information in the QR Code
Then the numbers of the password and the numbers of the grade will be mixed. The first value is the first character of the password, the second value is the first character of the grade, the third and 4th characters are de second and third characters of the password, the 5th character is the second character of the grade and finally, the last character is the last character of the password. An example is shown in Fig. 5. After that, according to the pattern in Fig. 6, each grid has 9 values and the length of the grade is 2 characters, the length of the password is 4 characters, then 54 values will be necessary to represent the information of password and grade. In this case, each pattern has been converted into a string of length 9, where the 3 first values correspond to the first row of the pattern, then the next values correspond to the second row of the pattern, and then the last values correspond to the third row of the pattern. Figure 6 shows the inputs of the encrypting process, when the grade of the student is 17 and the password is 2451.
Safely Sending School Grades Using Quick Response Code
573
Fig. 6 Code of 54 characters to represent the grade 17 and the password 2451 Fig. 7 QR code with 54 characters chosen randomly and the yellow blocks represent the 6 positions with values one and they could change in their values
Using the CUI as a seed, 54 different random positions are calculated to store the 54 characters of the code.
3.1.4
XOR Operation
Then, an XOR operator is calculated for each random position between the original information of the QR generated and the code of 54 characters which is considered as a mask. In this case, follow the rules of the XOR operator only if the value of the mask is 1, the value of the QR code must be changed. For that reason, as a maximum, the number of changes will be 6 values in the QR code. Figure 7 shows the characters that could be changed.
3.1.5
Retrieval the Grade of the Student
When the students receive the QR code, they can get their grade using their password and their student code in a safe way. Figure 8 shows the process to retrieve the grade for the students who need to enter the resulting QR code and the correct password.
574
R. Flores-Quispe and Y. Velazco-Paredes
Fig. 8 Process to retrieve the grade of the student
4 Experiments and Results Table 1 shows the QR codes using the same password to four different student codes with four different grades. Table 1 in the first row shows the QR codes corresponding to the same student using the same password but with different grades. Also, for each student, 5 different codes have been generated where the difference between them is very small, this is a key point because users will not be able to recognize the grade sent in each QR code. Now, using the Hamming distance, which computes the number of positions in which differences exist between two strings of equal length. For example, the distance between strings 010 and 101 is D = 3; and the distance between strings week and weak is D = 1 [22], it is possible to find the percentage of variation that affects a QR code when only the grade is changed. According to the values in Table 2 the average of the percentages of variation reaches 1.142857143, which is within the margin of error allowed in QR codes [23].
Safely Sending School Grades Using Quick Response Code Table 1 QR codes generated to different students Student code Without grade 6 9
575
14
18
20191678
20202458
20213452
20221693
Table 2 Calculating the Hamming distance CUI Grade Password 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724 20211724
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451 2451
Hamming distance
Percentage of variation
4 5 1 2 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 5
0.91 1.13 0.23 0.45 1.13 1.13 1.13 1.13 1.13 1.13 1.13 1.36 1.36 1.36 1.36 1.36 1.36 1.36 1.36 1.36 1.13
576
R. Flores-Quispe and Y. Velazco-Paredes
Fig. 9 Comparative between the cumulative values from the 9th to 20th columns to four different QR codes
Grade 6 Grade 9 Grade 14 Grade 18
Cumulative values
6
4
2
0
8
10
12
14
16
18
20
Number of columns in the QR code 20 Number of values one in the QR code
Fig. 10 Comparative between QR codes histograms including grades and without grade
15
10
5
0
Original Grade 8 Grade 20 0
5
10
15
20
25
Number of columns in the QR code
In addition, Fig. 9 shows the variability of the values from the 9th to 20th columns of the generated QR codes to the four different grades. In this case, we can see that there is no relationship between these values so the efficiency of the proposed encryption method is guaranteed. Figure 10 shows the histograms of two QR codes including the student code and the student grade and the histogram of the QR code with the student code without a grade. The key point is that the difference between them is not perceived, that is, the students’ grades cannot be seen with the naked eye. Figure 11 shows the number of bits that change in each QR code including grades from the 0 to 20 columns. This represents an average variation percentage of 1.142857 with respect to the original image for each grade.
Number of bits changed in the QR code
Safely Sending School Grades Using Quick Response Code
577
15
10
5
Number change 0
0
5
10
15
20
Different grades of the student Fig. 11 Number of bits that change value for each grade in the QR code Table 3 Comparison between different information hiding methods [24] Methods
Hash function [25]
Symmetric key SD-EQR [27] [26]
Reversible data Proposal [28] method
Basic application
Secret hiding
Secret hiding
Secret hiding
Image hiding
Secret hiding
Computational Low complexity
Low
Low
High
Low
Processing on QR code
No
No
No
No
No
Utilizing the Yes error correction capability
Yes
Yes
No
Yes
Encryption on data before embedding
Yes
Yes
Yes
No
Yes
Hiding mechanism
Encrypted data Encrypted data Encrypted data QR barcode of Encrypted data embedded into embedded into embedded into data embedded embedded into QR code QR code QR code into cover QR code image
Random selection capability to hide data
No
No
No
No
Yes
578
R. Flores-Quispe and Y. Velazco-Paredes
Based on [24] in Table 3 is showed a comparison between our proposal method and the methods: Using hash function [25], Using TTJSA symmetric key Algorithm [26], Using SD-EQR [27], and Using reversible data hiding [28]. According to Table 3, one of the main advantages of our method is to have a random distribution to store the hidden data within the QR code, which doesn’t permit breaking keys and retrieving the information by unwanted people.
5 Conclusions In this paper, a new method has been proposed using the QR code to encrypt the students grades and the experimental results show us that the method produces satisfactory results because it is not easy to find the grade in the QR code sent by the teacher if the student doesn’t have the correct password. In addition, the percentage of bits that be changed in the generated QR code is quite low reaches to 1.142857, which is within the margin of error allowed in QR codes. Also, this value is located in the level L, the lowest error correction level that allows recovery of up to 7% damage. Likewise, these QR codes could be sent in public but only to students who know their password and their student code could retrieve their password, due to our proposal method use a random selection to hide the information, which demonstrates the efficiency of sending encrypted information using QR codes.
References 1. Syarifuddin Syarifuddin MA, Takdir T, Mirna M (2021) Effectiveness of QR-code in learning during covid-19 pandemic. In: EAI 2. Moreno MA, Egan KG, Bare K, Young HN, Cox ED (2013) Internet safety education for youth: stakeholder perspectives. BMC Public Health 13 3. Quenaya MR, Villa-Herrera AA, Ytusaca SF, Ituccayasi JE, Velazco-Paredes Y, Flores-Quispe R (2021) Image encryption using an image pattern based on advanced encryption standard. In: 2021 IEEE Colombian conference on communications and computing (COLCOM), pp 1–6 4. Rouillard J (2008) Contextual QR codes. In: Third international multi-conference on computing in the global information technology, pp 50–55 5. Chow Y-W, Susilo W, Tonien J, Vlahu-Gjorgievska E, Yang G (2018) Cooperative secret sharing using QR codes and symmetric keys. Symmetry 10(4) 6. Chen C (2017) QR code authentication with embedded message authentication code. Mob Netw Appl 22:06 7. Focardi R, Luccio F, Wahsheh H (2019) Usable security for QR code. J Inf Secur Appl 48:102369 8. Pulliam B, Landry C (2010) Tag, you’re it! using QR codes to promote library services. Ref Libr 52:68–74 9. Velumani R, Sudalaimuthu H, Choudhary G, Bama S, Jose MV, Dragoni N (2022) Secured secret sharing of QR codes based on nonnegative matrix factorization and regularized super resolution convolutional neural network. Sensors (Basel, Switzerland) 22
Safely Sending School Grades Using Quick Response Code
579
10. Wang H, Guo K (2016) The impact of online reviews on exhibitor behaviour: evidence from movie industry. Enterprise Inf Syst 11:1–17 11. Eliasson J, Knutsson O, Ramberg R, Cerratto-Pargman T (2013) Using smartphones and QR codes for supporting students in exploring tree species. In: Scaling up learning for sustained impact, pp 436–441 12. Abdul Rabu SN, Hussin H, Bervell B (2019) QR code utilization in a large classroom: higher education students initial perceptions. Education and information technologies 24:359–384 13. Mavroeidis V, Nicho M (2017) Quick response code secure: a cryptographically secure antiphishing tool for QR code attacks. In: Computer network security, pp 313–324 14. Uçak E (2019) Teaching materials developed using QR code technology in science classes. Int J Prog Educ 15:09 15. Liu T-Y, Tan T-H, Chu Y-L (2010) QR code and augmented reality-supported mobile English learning system. In: Mobile multimedia processing: fundamentals, methods, and applications, pp 37–52 16. Weng Z, Zhang J, Qin C, Zhang Y (2021) Quick response code based on least significant bit. In: Tian Y, Ma T, Khan MK (eds) Big data and security. Singapore, pp 122–132 17. Asok A, Arun G (2016) QR code based data transmission in mobile devices using AES encryption. Int J Sci Res (IJSR) 18. Chou G-J, Wang R-Z (2020) The nested QR code. IEEE Signal Process Lett 27:1230–1234 19. Chang J (2014) An introduction to using QR codes in scholarly journals. Sci Ed 1:113–117 20. Kieseberg P, Leithner M, Mulazzani M, Munroe L, Schrittwieser S, Sinha M, Weippl E (2010) QR code security. In: Proceedings of the 8th international conference on advances in mobile computing and multimedia, vol 1, pp 430–435 21. Jun-Chou C, Hu Y-C, Hsien-Ju K (2010) A novel secret sharing technique using QR code. Int J Image Process 4:12 22. Hamming RW (1950) Error detecting and error correcting codes. Bell Syst Tech J 29(2):147– 160 23. Chow Y-W, Susilo W, Yang G, Phillips JG, Pranata I, Barmawi AM (2016) Exploiting the error correction mechanism in QR codes for secret sharing. In: Information security and privacy, pp 409–425 24. Rewatkar M, Raut S (2014) Survey on information hiding techniques using QR barcode. Int J Cryptogr Inf Secur 4:243–249 25. Lin P-Y, Chen Y-H, Lu EJ-L, Chen P-J (2013) Secret hiding mechanism using QR barcode. In: 2013 international conference on signal-image technology & internet-based systems, pp 22–25 26. Dey S, Agarwal S, Nath A (2013) Confidential encrypted data hiding and retrieval using QR authentication system. In: 2013 international conference on communication systems and network technologies, pp 512–517 27. Dey S (2012) SD-EQR: a new technique to use QR codes in cryptography. In: Use of QR codes in data hiding and securing 28. Huang H-C, Chang F-C, Fang W-C (2011) Reversible data hiding with histogram-based difference expansion for QR code applications. IEEE Trans Consum Electron 57(2):779–787
Abstractive Text Summarization of Biomedical Documents Tanya Mital, Sheba Selvam, V. Tanisha, Rajdeep Chauhan, and Dewang Goplani
Abstract Huge amounts of data has to be processed and toned down to coherent simpler forms in order to be made sense out of. Abstractive summarizers generate summaries similar to how humans would summarize a document, consisting of new words and phrases in a simple, readable format. With vast quantities of information getting generated in the medical domain every day, there’s a huge demand to summarize complex medical information for the benefit of both doctors and patients alike. This includes pre-processing biomedical documents and building an abstractive text summarizer using LSTM Encoder-Decoder model with attention layer. The LSTM encoder reads the entire input sequence, with one word fed into the encoder at each timestep. The decoder is likewise an LSTM network that predicts the same sequence with a one-timestep offset after reading the complete target sequence wordby-word. Using fresh source sequences for which the target sequence is unknown, the summary generated thereby can be evaluated for its accuracy. Extractive summarization of medical documents has been explored but there hasn’t been much work done on abstractive summarization for the same. We aim to provide concise summaries of medical documents with a nearly 2% higher accuracy as compared to the existing methods. Keywords Abstractive · Text summarization · LSTM · Encoder-decoder · Biomedical · Attention layer
1 Introduction In the world of information explosion, capturing the essential information from a large number of documents is a very daunting task. No one has the time today to go through the entire length of documents but instead skim through to decipher relevant information quickly. Problems are encountered because it’s very tedious for humans to manually summarize a large quantity of relevant information. T. Mital (B) · S. Selvam · V. Tanisha · R. Chauhan · D. Goplani Department of CSE, BNMIT, 12th Main Road, 27th Cross, Banashankari Stage II, Bengaluru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_42
581
582
T. Mital et al.
In order to make humans’ tasks easier, it’s required to efficiently summarize huge documents. Summarization is the process of shortening of text while keeping the relevant information and without losing the overall meaning of the source document. There are mainly two approaches to summarization—Abstractive and Extractive. Abstractive summarization usually uses natural language generation and semantic methods to create a precise summary similar to the way humans create [1]. Extractive summarization is a traditional approach because it generates summaries by reducing significant segments of original text and combining them to form a useful summary [2]. The alternate way of classifying summarizing techniques is based on quantity— single and multi-document summarization. Single document summarization involves summarizing a single document whereas multi-document summarization involves summarizing two or more documents belonging to the same topic [3]. The aim of the biomedical community is to increase health literacy by making scientific concepts, content and research understandable to the common public and also help doctors, clinicians and researchers make crucial health decisions, so as to result in better treatment outcomes. PubMed, a science journal, publishes 1.5 papers per minute. Scientific articles, medical records, web documents and clinical reports need to be summarized to give insights to doctors, clinicians and surgeons. Since the pandemic, it has been required to fill the gap between general public and biomedical texts so that its complex semantics can be comprehendible. Some of the benefits of text summarization are saving huge amounts of time and effort instead of doing the process manually. Summarization reduces the user reading time while extracting only the relevant information. It also ensures all important facts are covered and the reader is able to distinguish beneficial information from the rest of the content. One of the other benefits is that it decreases the user’s workload thereby increasing their productivity. Traditional graph-based extractive summarizers consider a document to contain a group of sentences, thereby disregarding the semantic similarity in the document [4]. Computers find it difficult to understand the essence of the input document and don’t have the capability in picking out its main points [5]. Some of the existing tools of summarization are—Microsoft AutoSummarize option, Text Miner tool by IBM, Context by Oracle and Insights [6]. All of these are popular, yet come with their own limitations. Apart from these, there are various simple machine learning techniques and neural models like RNN, CNN, seq-to-seq. The fields where summarization makes a huge impact are newsletters, automated content creation, report generation, financial analysis, medical cases, automatic chat bots, search marketing and many more. Figure 1 shows the basic working of a text summarizer. The input can be huge voluminous data from various sources like text documents, PDFs, news articles, medical literature and so on. The text summarizer takes the input document and concises it to form a short readable summary. Therefore, the proposed work builds an abstractive text summarizer using the LSTM Encoder-Decoder model with an attention layer. The rest of the paper is as follows: Sect. 2 describes the previous research and methodologies implemented in the field of text summarization. Section 3 gives the implementation and methodology.
Abstractive Text Summarization of Biomedical Documents
583
Fig. 1 A text summarizer model
Section 4 tabulates the results obtained for the proposed work with the help of various evaluation measures. Section 5 concludes with the future scope of the proposed work.
2 Related Work There are two main approaches to summarization. Text summarization historically has always been done in the extractive approach. Summarization of biomedical documents also has mostly been extractive. Moradi et al. [7] BioBERT is an NLP model trained on PubMed extracts and PubMed Central (PMC). BERT generated contextualized embeddings is used on Biomedical text to generate summaries. The drawback of this approach was that it was only trained on a single document. Uçkan and Karcı [8] stated that extractive multidocument text summarization based on graph approaches was implemented a little while later using Maximum Independent Set along with the KUSH model. But in the suggested technique, no memory issues result from the dataset dimensions utilized throughout experimentation (on average of 200 sentences) Steinberger and Jezek presented [3] reinforcement learning has also been used in the extractive approach by designing a rewards scheme that guides the learning and uses keyword level semantics to produce a summary. They use a reward model that combines sentencekey word level and lexical semantics to definite reward function thus giving a good summary. The results show that this is better than the strong baselines on various legal datasets. The future enhancements are to find precedence to achieve better results. Davoodijam et al. [9] stated that extractive-based approaches for summarization focused on a multi-layer graph approach using Unified Medical Language System (UMLS) and multi-rank algorithms. This required more focus on similarity measures in the document with scope for improved efficiency in summarization. Depending on the context of the papers, it does not concentrate on additional similarity measurements. Along with the multi-layer graph model, the user needs may also be stated. Du et al. [4] proposed another interesting work utilizing the BioBertSum model with a sentence position embedding mechanism and a decoder based on a transformer structure to summarize medical text. This work used the BioASQ Task data as well. The disadvantage here was that more specialized knowledge should have been applied during the renewing process to check the generation process and enhance the quality of the summary.
584
T. Mital et al.
Alami et al. [10] stated that work on abstractive summarization of biomedical documents is not very profound. Most of the summarization in this approach dealt with different domains and datasets like News datasets. Abstractive summarization has its obvious benefits of producing a summary similar to the way humans comprehend a document thereby automating our task of quickly skimming through a lengthy document. Yang et al. [5] presented that recent works in abstract summarization have dealt with a hierarchical human-like deep neural network which provides the capability of text categorization and syntax annotation. The further scope for the HH-ATS model is to explore graphs and inculcate a way for automatic summarization. Khan et al. [1] proposed the Sem-Graph-Both-Rel was used in later works to overcome the above shortcomings for multi-document summarization and also significantly improved the ranking algorithm. They suggest the formal context analysis method (FCA) and concept hierarchy as future implementations for their work. Yao et al. [11] proposed implementations of the Dual Encoding for Abstractive Text Summarization (DEATS) model in several works. They propose a primary and secondary encoder with decoder-attention mechanism model which is efficient in dealing with rare words. They wish to adopt a reinforcement learning approach toward training. Song et al. [2] stated a work using recurrent encoder-decoder mechanism for LSTMCNN-based deep learning implementations was also come across. They involve the procedures of phrase extraction, phrase collocation and phrase generation. Following training, the model will produce a phrase sequence that satisfies the syntactic structure criteria. Additionally, they leverage phrase location data to address the issue of unusual phrases that practically all ATS models would run into. Finally, they undertake extensive experiments on two different datasets, and the outcome demonstrates that the model performs better in terms of semantics and syntactic structure than the state-of-the-art techniques. Guldenhe word embedding approach is used to improve the quality of abstractive text summarization by using deep neural network techniques as well. After which, they show the word2vec representation to enhance the results. The first part combines BOW and word2vec, the second part combines the information from the BOW approach and neural networks. The third part is combining the information provided by word2vec and neural networks. Thus, showing word2vec gives better results. This work’s future scope is to explore other deep learning models such as attention encoder-decoder and unsupervised convolutional neural network. It can also provide the scope for Arabic text summarization. See et al. [12] proposed that a question-based salient span approach uses policy gradient, sentence level to combine non-differential computations in two networks. The results obtained are four times faster than other encoder-decoder models. They achieved greater improvements on abstractive and extractive for the CNN dataset. By using the RL model, they make the model aware of the sentence word hierarchy. Although the model demonstrates many abstract abilities, how to achieve further levels of abstraction is still an unresolved research issue. Gigioli [13] proposed that the baseline sequence-to-sequence with attention model is outperformed by the pointer-generator model. Due to its capacity to incorporate OOV terms in the generated summaries, the pointer technique performs better, supporting our prediction that OOV words are frequently the
Abstractive Text Summarization of Biomedical Documents
585
most crucial words to include in a summary. Anh and Trang [14] stated that this study uses a pre-trained word embedding layer to enhance the quality of pointer-generator network-based abstractive text summarization. With the use of this method, the input words’ meaning has been more accurately captured with greater context and derived from pre-trained embedding models. Gambhir and Gupta [15] proposed that Modified Corpus-Based Approach (MCBA) and Latent Semantic Analysis-based TRM methodology (LSA + TRM) are two novel methods for automatically summarizing text. Being a trainable summarizer, MCBA relies on a score function and examines crucial variables including Position (Pos), +ve keyword, ve keyword, Resemblance to the Title (R2T), and Centrality for producing summaries (Cen). The best combination of characteristics, such as Pos, +ve keyword, Cen and R2T, are Cen and R2T. During the training phase, GA offers an appropriate mixture of feature weights. Both at the level of a single document and a corpus, LSA + TRM outperforms keyword-based text summarizing approaches. Alami [10] proposed a number of neural network-based unsupervised learning techniques for autonomous text summarization. Based on the word vector representation, these algorithms are run. We use Arabic lexicons, ontologies, manmade knowledge and language models as examples of these tools. Lee et al. [16] proposed the importance of pre-training BERT on biomedical corpora before using it in the biomedical domain. On biomedical text mining tasks like NER, RE and QA, BioBERT outperforms earlier models with the least amount of task-specific architectural adjustment. Abujar et al. [17] proposed that text generation in relation to the growth of abstractive text summary in Bengali. Text generation enables a machine to recognize the structure of human-written text and to produce output that looks and reads like that of a person. This method of text synthesis uses a base recurrent neural network (RNN). Balipa et al. [18] proposed that many text analysis frameworks and approaches have been developed to extract knowledge from medical texts and discourses, which are largely written in text in natural language. These methods don’t create a thorough summary of the information regarding an illness from web sources. In order to collect all information about an illness from Internet healthcare forums, we suggest text summarizing in our work based on machine learning and natural language processing methods. Kumar et al. [19] stated that the first experiment uses a tiny dataset, but the second experiment uses a somewhat larger dataset and an application of LSTM. We made the decision to move forward and improve the model by expanding our dataset and selecting a sizable dataset with an attention layer to the proposed LSTM model by comparing the accuracy of all the models with the existing model. It will be assessed using ROUGE metrics and the proposed solution will be demonstrated to the world. Kovaˇcevi´c et al. [20] stated that the database of food and product reviews on Amazon was the dataset used in the study. The study uses an encoder-decoder architecture with an attention layer mechanism and layered LSTMs in the encoder phase. In addition, the study tests employing bidirectional LSTMs as opposed to the conventional unidirectional method. This results in improved sequence representation and summarization performance when paired with stacking.
586
T. Mital et al.
3 Methodology The goal is to develop a text summarizer that produces a brief summary from an input of a lengthy list of terms in a text body, which is a sequence as well. The dataset used in the proposed work “Multi-Document Summarization of Medical Studies” is a dataset of over 470k documents and 20k summaries derived from the scientific literature. A Many-to-Many Seq2Seq issue may thus be used to simulate. This includes text pre-processing, model definition and creation after which it is model training and inference. The proposed abstractive text summarization as given in Fig. 2 is implemented using the LSTM Encoder-Decoder module with attention mechanism. The next steps in building the proposed system are explained here. Long short-term memory (LSTM) is an artificial neural network in the field of deep learning. It is a popular variant of Recurrent neural network (RNN). LSTM is used in various applications such as summarization, speech recognition and so on. Figure 2 explains the flow of steps for the encoder-decoder LSTM model. It takes an input text which goes through encoder and decoder steps to produce the output summary. Fig. 2 Encoder-decoder LSTM model
Abstractive Text Summarization of Biomedical Documents
587
3.1 Text Pre-processing In text pre-processing, the initial tasks include loading our dataset using Pandas library. After which the input text is split into highlight and body text and apply pre-processing on both the highlights and body text. The dataset contains medical documents which are transformed into a pickled file. This dataset has two major columns which are highlights and body. The first step of pre-processing is to convert contractions into their longer form. For example, words such as “haven’t”, “can’t” are converted into “have not” and “cannot”. Certain python libraries are used to achieve this part. After which the entire text is converted into lower case which makes it easier to process. Following this is stop word removal. Stop word removal ignores the basic and common words such as “the”, “an”, “a” and so on. This step is done to reduce the space occupied. The final step of pre-processing is to tokenize the cleaned data. A tokenizer builds the vocabulary and converts the sequence of words to integers. This utilizes the Keras Tokenizer for performing the step.
3.2 Encoder-Decoder Architecture For LSTM in order to generate new sentences it is required to train the model from a sentence word by word. The encoder-decoder architecture has 2 phases: Training phase and Inference phase. Training Phase is where the model is trained to generate the target sentence offset for every timestep by setting up the encoder and decoder. Encoder is an LSTM model that takes in the full text to learn how to transform the context of text into neural network parameters. It inputs the text sequence by taking one word in the sequence at every timestep. The encoder unit is supplied with our input. The encoder reads the input sequence and compiles the data into what are known as internal state vectors (in case of LSTM these are called the hidden state and cell state vectors). The information is processed at every timestep and finally understands the contextual information of the input text. Figure 3 is an LSTM encoder which has a hidden state (hi ) and cell state (ci ). h0 , c0 are the initial hidden and cell states of the first LSTM encoder block. h1 is the output for the first input x 1 . Hence h2 , c3 subsequently and h3 , c3 of the last timestep is used to initialize the decoder. h t = f W (hh) h t−1 + W (hx) xt
(1)
The output of a typical recurrent neural network is represented by this straightforward formula. In (1), simply applied are the proper weights to the input vector x t and the prior hidden state h (t − 1). Here every word is fed into the encoder
588
T. Mital et al.
Fig. 3 Encoder architecture
at one timestep and h0 , c0 are the initial state which is initialized by zero vector or random initialization and other hidden and cell states are intermediate and finally h4 , c4 of the final timestep is used to initialize the decoder. The context vector, which is produced by the encoder, is supplied as input to the decoder unit. Only the context vector is sent to the decoder, and the encoder’s outputs are disregarded. The decoder unit generates an output sequence based on the context vector. Decoder is an LSTM model that takes the learned parameters of neural networks from the encoder. Figure 4 reads the entire target sequence and also predicts the sequence offset at one timestep. It also predicts the next word when the previous work is given. Before feeding it to the decoder there are start and end tokens attached to the target sequence. indicates the start of the target sequence and indicates the end of the target sequence. s0 , c0 are the initial hidden and cell states of first LSTM decoder block. h t = f W (hh) h t−1
(2)
Any hidden state hi is computed using Eq. (2): yt = softmax W S h t
Fig. 4 Decoder architecture
(3)
Abstractive Text Summarization of Biomedical Documents
589
The output yt at timestep t is computed using Eq. (3). This simple equation represents the output of a typical recurrent neural network. As you can see, the input vector x t is given and the previous hidden state has the appropriate weights (t − 1). Attention Layer helps in finding the important and essential parts of the question or text by adjusting the attention. For example, the question was “Which animal do you like?”. It simply focuses on the main words like animal and likes to get the answer straight. This helps in blurring out the non-essential parts of the question. The steps involved in the attention mechanism include—Encode the whole input sequence, then use the encoder’s internal states to initialize the decoder. Pass the start token to the decoder as an input. Run the decoder using the internal states for a single step. The likelihood of the following word will be the output. The sampled word is sent as an input to the decoder in the following timestep and updates the internal states with the current timestep. The word with the highest probability is chosen. Continue until the final word is generated. To derive the attended context vector, just a small number of the encoder’s hidden states are taken into account. Inference Phase Once the model is trained, an inference architecture is used to decode a test sequence for which the target sequence is unknown. The decoder is initialized with the internal states of the encoder. The start token is given to the decoder. The decoder is fed with the internal states for every timestep. The probability of the next word is the output and the one with highest probability will be the one selected and this word is fed to the decoder at the next timestep. This continues until the end of the target sequence is reached. Figure 5 shows the architecture of a text summarizer in which input is fed to LSTM encoder output of which is given to the decoder along with the attention layer which tells the decoder to pay attention to certain words in the text specifically which produces the summary.
Fig. 5 Architecture of the abstractive text summarizer
590
T. Mital et al.
4 Evaluation and Results This section talks about results produced and evaluation performed for the proposed methodology. Figure 6 shows a histogram which represents the attributes of the dataset namely body and highlights. Figure 7 shows the comparison between the quantities of data trained versus the quantity of the data tested. It shows that as number of epochs that the data is trained on increases the lines in the graph tends to overlap.
Fig. 6 Histogram representing the attributes of the dataset
Fig. 7 Comparison between the quantities of data trained versus tested
Abstractive Text Summarization of Biomedical Documents
591
4.1 Evaluation In most previous implementations abstractive and extractive summarization is evaluated using the rogue metric. It allows us to evaluate the quality of our summarized text by calculating the number of overlapping n-grams between the summary created and the reference summary provided by the summaries attribute of the dataset. The dataset this evaluation uses is “Multi-Document Summarization of Medical Studies” is a dataset of over 470k documents and 20k summaries derived from the scientific literature. ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It is essentially a set of metrics for evaluating automatic summarization of texts as well as machine translations. The acronym ROUGE-1 stands for the overlap of unigrams between the reference summary and the system summary. The bigram overlap between the system and reference summaries is known as ROUGE-2. ROUGE-3 stands for the overlap of trigrams between reference and system summary. The recall divides the total number of n-grams in the reference by the number of overlapping n-grams that were discovered in both the model output and the reference. The precision metric is derived nearly exactly the same way; however, we divide by the model n-gram count rather than the reference n-gram count. An evaluation based on this metric and results are shown in Table 1. From Table 1, we can observe that the number of overlapping unigrams is the highest between the reference summary and the summary generated by the model proposed. In Table 2, Table 1 results have been compared against state-of-the-art existing methods for text summarization and it’s been observed that our LSTM EncoderDecoder model outperforms the methods mentioned. The existing approaches to abstractive text summarization that were evaluated were words-lvt-2k-temp-att which had a rogue score of 35%, followed by pointer-generator method at 36% and RL + ML method at 39%. The method used in this study—LSTM with Attention Mechanism outperforms the methods used in the existing work by 2%, having a rogue score of 41.7%. Table 1 Experimental results of Rogue evaluation performed on the summarization model F-measure
Precision
Recall
Rouge 1
0.40600253672846719
0.3952794651584974
0.41197259022469401
Rouge 2
0.186917630697512925
0.17504913982889802
0.1987985824846648
Rouge 3
0.377985824846648
0.3653778569101151
0.36902726456443148
592
T. Mital et al.
Table 2 Performance comparison of existing text summarization methods vs summarization model proposed in this study (normalized to %) Method
Rogue 1
Rogue 2
Rogue 3
words-lvt-2k-temp-att
35.46
13.30
32.65
Pointer-generator
36.44
15.66
33.42
RL + ML
39.87
15.82
36.9
LSTM + Attn
41.70
17.50
37.50
5 Conclusion and Future Scope The main aim of this research was to show how biomedical documents could be summarized in an abstractive manner that may be used to automatically create text summaries, cutting down on the time it takes to manually type and summarize lengthy text documents. Text pre-processing, often known as data pre-processing, is a method of preparing text for form that can be analyzable and predictable for our task after which the work implements the LSTM encoder-decoder model with attention layer. Previously abstractive text summarization has been majorly done on news datasets. Extractive summarization has been performed on biomedical datasets but the patients and medical community would be benefited much more if summarized in a human readable manner. The future scope for this research would include improvising the accuracy of abstractive summarization using other machine learning algorithms. As of yet abstractive text summarization is the best way to summarize documents.
References 1. Khan A, Salim N, Farman H, Khan M, Jan B, Ahmad A, Ahmed I, Paul A (2018) Abstractive text summarization based on improved semantic graph approach. Int J Parallel Prog 46(5):992–1016 2. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875 3. Steinberger J, Jezek K (2004) Using latent semantic analysis in text summarization and summary evaluation. Proc ISIM 4(93–100):8 4. Du Y, Li Q, Wang L, He Y (2020) Biomedical-domain pre-trained language model for extractive summarization. Knowl-Based Syst 199:105964 5. Yang M, Li C, Shen Y, Wu Q, Zhao Z, Chen X (2020) Hierarchical human-like deep neural networks for abstractive text summarization. IEEE Trans Neural Netw Learn Syst 32(6):2744– 2757 6. Gulden C, Kirchner M, Schüttler C, Hinderer M, Kampf M, Prokosch HU, Toddenroth D (2019) Extractive summarization of clinical trial descriptions. Int J Med Inform 129:114–121 7. Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Programs Biomed 184:105117 8. Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inform J 21(3):145–157 9. Davoodijam E, Ghadiri N, Shahreza ML, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inform 116:103706
Abstractive Text Summarization of Biomedical Documents
593
10. Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks-based text summarization with word embedding and ensemble learning. Expert Syst Appl 123:195– 211 11. Yao K, Zhang L, Du D, Luo T, Tao L, Wu Y (2018) Dual encoding for abstractive text summarization. IEEE Trans Cybern 50(3):985–996 12. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 13. Gigioli P, Sagar N, Rao A, Voyles J (2018) Domain-aware abstractive text summarization for medical documents. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 2338–2343 14. Anh DT, Trang NTT (2019) Abstractive text summarization using pointer-generator networks with pre-trained word embedding. In: Proceedings of the tenth international symposium on information and communication technology, pp 473–478 15. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66 16. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240 17. Abujar S, Masum AK, Islam S, Faisal F, Hossain SA (2020) A Bengali text generation approach in context of abstractive text summarization using RNN. Innovations in computer science and engineering. Springer, Singapore, pp 509–518 18. Balipa M, Yashvanth S, Prakash S (2023) Extraction and summarization of disease details using text summarization techniques. Intelligent communication technologies and virtual mobile networks. Springer, Singapore, pp 639–647 19. Kumar H, Kumar G, Singh S, Paul S (2022) Text summarization of articles using LSTM and attention-based LSTM. Machine learning and autonomous systems. Springer, Singapore, pp 133–145 20. Kovaˇcevi´c A, Keˇco D (2021) Bidirectional LSTM networks for abstractive text summarization. International symposium on innovative and interdisciplinary applications of advanced technologies. Springer, Cham, pp 281–293
NLP-Based Sentiment Analysis with Machine Learning Model for Election Campaign—A Survey Shailesh S. Sangle and Raghavendra R. Sedamkar
Abstract The election campaign provides the evaluation and experience of the voters. The analysis of an election campaign comprises the different twists and turns to monitor and evaluate the situation in the elections. India is one of the biggest democratic countries with different languages, races, and policies. Through manual processing in the election, the campaign government can control the situation. The opinion of the voters is a key factor in the determination of the election results. Hence, it is necessary to process the opinion of the voters to get a clear view of the election. To gain knowledge from the opinion of the voter, machine learning (ML)-based techniques are implemented to classify the voter’s opinion about political parties and candidates of the parties. In ML, sentiment analysis is the key factor in the identification of the opinion about parties to estimate the positive and negative opinions of voters. This paper presented a survey about ML and classification techniques in the NLP-based election campaign process. Also, to process the opinion of the people, natural language processing (NLP) is effective for processing. In the NLP process, sentiment analysis is a key factor to identify the opinion of the voters about political parties and candidates. The estimation is based on the evaluation of the election campaign for the computation of the opinion of the voters in the election campaign evaluation. The opinion about candidates and views of the candidates are evaluated in the analysis. Keywords Election campaign · Machine learning · Natural language processing · Sentiment analysis
S. S. Sangle (B) Thadomal Shahani Engineering College, Bandra, Mumbai, India e-mail: [email protected] R. R. Sedamkar Computer Engineering Department, Thakur College of Engineering and Technology, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_43
595
596
S. S. Sangle and R. R. Sedamkar
1 Introduction In the past two decades, innovation and development led to drastic technological innovation with the provision of freedom of speech [1]. According to the report by the Internet and Mobile Association of India and Indian Research Market bureau in the year 2017, it stated that around 78% of the people are active Internet users. India is one of the biggest democratic countries to expel technological development effectively. Democracy offers fundamental rights to citizens in terms of speech and expression with guaranteed rights to every citizen [2]. With the effective utilization of the Internet, significant communication is possible between democratic governments and citizens. The nature of democracy and its interface with the political system with which it is associated has always been a subject matter of much concern and research. Elections as well as processes that govern them hold key to the success of democratic institutions or their downfall. In the famous words of James Bryce, “the excellence of popular government lies not so much in its wisdom—for it is apt to err as other kinds of government—as its strength”. Democracy is a reflection of a nation’s voice. There are no straightforward solutions to the question of whether elections fulfill their designated duty, i.e., whether voters set the agenda or someone else does [3]. A political campaign is a concerted attempt by politicians to persuade citizens, particularly voters, to vote for them in elections [4]. A campaigning team might be a small group of motivated individuals or a huge group of people with high-quality resources. Traditional political campaigning strategies are dependent on the number of volunteers in the campaign team and the geographic location of the election [5]. Only door-to-door campaigns were popular in the 1980s, 1990s, and before that, where political officials individually contacted voters, had public meetings in villages and towns, or spoke at a large rally attended by thousands of people. Political parties used printed materials such as posters, banners, flyers, printed T-shirts, badges, and wristbands to promote their candidates [6]. To disseminate their agendas, beliefs, and future growth plans in society, all political parties deploy mass communication apparatus [7]. They even employ paid media such as newspapers, television, radio, and other forms of communication to impact the decision-making abilities of voters or groups of voters. Politicians used to employ oral communication, such as giving a public speech, radio and television speeches, and textual communication, such as newspapers, posters, and pamphlets [8]. Later on, individuals progress to sophisticated levels of television use for the same purpose. Political communication offers interactive and effective transmission of information between political leaders to members [9]. The trend in political communication is upward for the public to the government in the election time by the parties. Through political communication appropriate campaigns are framed by the party’s members to represent the parties. In this, based on political communication, a machine learning-based model is developed for appropriate classification and detection of the opinion of the voters [10]. Natural linguistic processing (NLP) is a key factor for the identification and evaluation of the voter’s opinion about the representative or the parties.
NLP-Based Sentiment Analysis with Machine Learning Model …
597
This paper presented a review of application of the ML technique implemented in the election campaign. The analysis is based on the consideration of the sentiment analysis applied with the NLP process. This paper is organized as follows: In Sect. 2, evolution of policies and media in election campaigns is presented. The machine learning technique and sentiment analysis are presented in Sects. 3 and 4. The overview and process of NLP on the election campaign are presented in Sect. 5. The application and task in NLP with machine learning are presented in Sect. 6. Finally, the integrated sentiment analysis with NLP in the election campaign is presented in Sect. 7.
2 Overview of Election Campaign The election campaign is an idea developed by the political parties and candidates to present their corresponding opinion, issues, and views about the election resolved for the voters. In this section the election campaign, issues and processes are presented.
2.1 Issues in Election Campaign The Indian Parliament comprises the legislative assembly with Lok Sabha and Rajya Sabha. The member of the Lok Sabha has a period of 5-years, and Rajya Sabha member has a period of 6-years. The president of the Indian state assembly is elected for a period of 5-year for lawmakers. The electorate comprises a large set of religions and languages [2]. India comprises 14.9% of Hindus, 2.5% of Muslims, 1.9% Christian, Sikh 1.9%, and Buddhism is 0.8%. In India, approximately, 850 languages are used with 1600 dialects. However, in India, English is considered a link language between people of different languages and political parties. The election is a more complicated process for the Indian population who are in rural, remote, and other areas of the country. As it is hard to provide adequate polling facilities for eligible voters [3]. One of the most challenges with reaching voters is the higher number of illiteracy. As per the census collected in 2011, the rural literacy level in the rural area is 68%; in urban areas, literacy is 84%. Additionally, males were more literate than females. It is estimated that about 3 in 10 Indian voters do not have the knowledge to read or write undermines the modern effectiveness and media-based campaigns [10]. With the evaluation of election propaganda-based analysis through sentiment analysis, political parties can impact election policies [11]. Through evaluation of elections, propaganda slogans can be framed that can be easily understandable by all people. The impact of propaganda can be evaluated based on sentiment analysis. With sentiment analysis, people can achieve the decision of the people in a series of subsequent choices. Artificial intelligence (AI)-based sentiment analysis increases the understanding of the political party’s view for the citizens [12]. In this scenario,
598
S. S. Sangle and R. R. Sedamkar
natural language processing (NLP)-based sentiment analysis is effective for examination of election propaganda. With this, researchers can view the thoughts and opinions of the voters. Hence, sentiment analysis with AI is considered an effective scenario for the analysis.
2.2 Process in Campaign Generally, an election campaign is much longer in democratic countries like India. The election campaign was initiated several months before Election Day. Campaigns involved volunteers in their local communities to meet voters and collect their opinion about the candidate. Through the volunteer’s local community voters are met to evaluate their support and opinion about the candidate. The volunteers are responsible for the estimation and identification of the supporters, recruiters, and volunteers or register them with the already registered candidates. The supporters are identified as those who are useful in the campaigns for the voters to cast the votes. In the campaign process, the expensive launches are performed with the mail, television, and radio to support their candidates in front of voters. The campaign also identifies the grassroots and volunteer coordination to win. An election campaign needs to consider message communication through volunteer recruitment and money. With advertising, campaign techniques are evolved for commercial propaganda and advertisement.
2.3 New Media and Politics In 1992, the era of a political campaign with new-age technology evolved in the American presidential election. The agenda of the political campaign is created for entertainment. Surprisingly, voters are highly engaged and active with the radio and television programs [11]. Similarly, the Barak Obama campaign demonstrated the election campaign process in the election of the US president. Through Facebook, Twitter, and YouTube, different media platforms are circulated to register the opinion, view, and political agenda of Obama which attracts followers and supporters [12]. In USA election, former President Obama achieves a huge victory through the use of social media which provides the strong connection between him and his followers. In [12], it stated that Facebook is an effective factor that impacts the opinion of the voters. Similarly, [13] expressed that with social media communication is established for person to person to identify the political data. Also, [14] explained that based on person-to-person communication, data can be assembled with the political applicants and associations for assessment. In [15], it stated that with the hostage of the political media, messages are bundled with consistent news and structure about the public data. In a conventional voting scheme, the electorate’s normal scenario in the casting is evaluated. In [16], it constructed a model to convince the individual through the
NLP-Based Sentiment Analysis with Machine Learning Model …
599
political campaign. Also, [17] developed a model to evaluate the voter’s opinion on the candidate of the political parties with an estimation of the non-zero coverage in the election final results. In [18], it evaluated the extreme policy of the participants in the political parties. The analysis stated that participants of the elections increase the expanded level through consideration of the variables. In election parties, five choices of the individual elector are estimated in [19]. The political candidate is evaluated based on the consideration of the parties to nominate the political candidate. The estimation and computation are based on the turnout levels with the statistical theories.
3 Survey of Machine Learning Technique with Election Campaign Supervised techniques with the test data are employed for verifying the implementation of the classifier, whereas the training data are employed for the establishment of an automatic classifier for learning the varied characteristics of the document [20]. The prime aim of machine learning (ML) is the formation of rules for increasing the performance of machines with the help of sample, implicit data, or experience. The classification categories are divided into subcategories [21]. Supervised Learning—Machine learning model with supervised approach concentrated on the identification of the training dataset to derive function. The derived dataset for training is computed based on the consideration of the different training cases. According to a supervised learning scheme, vector objects are computed to derive classes [22]. In [17], it developed a supervised learning scheme with computation of the statistics. It involved splitting of the prepared data to derive functions those need to be mapped based on consideration of different scenarios. However, the significant model is evaluated through ideal statistics to resolve the occurrence of identities. Unsupervised Learning—Machine learning model with unsupervised learning concentrated on the concealed structure for those unrecognized data. To achieve the improved performance unsupervised learning computes the error value for the unlabeled data [2]. Unsupervised learning is recognized securely to measure concerning opinions about the voters. Reinforcement Learning—Finally, the machine learning model uses the reinforcement learning mechanism for classification. It differs from caliber learning, in which the correct datasets cannot be revealed nor puzzled ventures explicitly arranged. Moreover, a balance must be maintained among an unknown area with the current data [23].
600
S. S. Sangle and R. R. Sedamkar
4 Review of Sentiment Analysis The sentimental analysis is performed based on consideration of different social media platforms such as Facebook, Linkedin and Pinterest. However, those data were achieved in different formats such as XML, JSON, XLS, HTML, Proprietary formats, and spreadsheets [24]. Upon the collection of the data from voters in the election campaign as follows:
4.1 Pre-processing in Sentiment Analysis The sentiment analysis is involved in the detection and classification of the collected data which are in the form of text to determine the opinion of the voters about election parties. The sentiment analysis is based on the consideration of the different steps involved in the estimation of the opinion of the voters as either positive, negative, or neutral [25]. To evaluate the opinion or view of respondent’s association, rules are formulated based on different categories and opinions about the products.
4.2 Sentence Extraction in Sentiment Analysis In [26], it examined the performance level estimator model to analyze the sentiments based on the word polarity. In this, the technique subjective or objective is not effective as a cross-domain approach. However, the cross-domain approach is not effective due to limitations associated with the polarity either positive or negative domain. In [7], it stated that overall polarity is not computed and answerable in the compression format without any opinion mining concept in the document. The sentence-level classification is performed in the different classes and models of the machine learning (ML) model. In [27], it evaluated the issues related to the sentiment analysis. The issues concentrated are context dependency and opinionoriented phases. This research exhibits bringing based on the consideration of the opinion-oriented phases with the advancement in technology. In [28], it evaluated the different platforms in the social factors based on the posts.
4.3 Machine Learning-Based Sentiment Analysis The sentiment analysis integrated with the ML model is adopted based on consideration of different aspects. Those are explained as follows:
NLP-Based Sentiment Analysis with Machine Learning Model …
601
Supervised Sentiment Analysis—Sentiment analysis with the class of supervised learning focused on the labeled training database. The preferable class of supervised learning is the support vector machine (SVM) and hidden Markov model (HMM) [29]. To classify the data as either positive or negative, sentiment analysis is performed with maximum entropy, SVM, and Naïve Bayes. In [30], it constructed a classification algorithm for sentiment analysis with text processing. Also, in [31], entities are labeled for extraction for RRM and HMM classification. With the appropriate feature extraction technique, the classifier with RRM exhibits superior performance compared with others. Semi-supervised Sentiment Analysis—Semi-supervised learning comprises the training system with the database for labeled and unlabeled data. In [32], it constructed a bootstrapping integrated with a semi-supervised learning scheme with linguistics sign for identification of the subjective model. In [33], it developed a graph-based model with consideration of the 31 cuts to minimize the manual labeling of the data with bootstrapping. Unsupervised Sentiment Analysis—Machine learning (ML) is the class of unsupervised learning schemes in the estimation of the structure in the unrecognized data form. The security is recognized based on the estimation of the measurement thickness. Unsupervised learning scheme comprises the training system in which a database can be processed elucidated or unlabeled data scheme [3]. In [34], it developed machine learning algorithms such as linear regression, Naive Bayes, and SVM to detect sentiment polarity of text based on bag-of-word models. In [35], it constructed an agglomerative classification model to classify the features in the combinations. Similarly, [36] developed a lexicon-based unsupervised scheme for classification. The evaluation expressed that the elucidated performance is effective in the World Wide Web. In [37], it implemented pysentimiento, a multilingual toolkit for sentiment analysis and emotion analysis. In Table 1, the overall summary of the sentiment analysis is presented.
5 Review of NLP in Election Campaign The progress of the semantic Web due to modern technologies has resulted in the extensive use of ontologies in various domains.
5.1 Elements in NLP According to [38], the elements with different expressiveness can be extracted from the text corpus by a set of methods and algorithms and are included in the proposed ontology, or handed to the user for validation. This field of “ontology learning” is split into so-called “layer cake”, where the layers are viewed as element extraction
602
S. S. Sangle and R. R. Sedamkar
Table 1 Summary of sentiment analysis References
Objective
ML technique
Limitations
[26]
Examined negative or positive opinion of respondents
Sentiment analysis
Polarity is not defined clearly
[27]
Examined issues in sentiment analysis
Sentiment analysis
Complexity is higher
[30]
Text processing is based on the opinion of people
Supervised sentiment analysis
Performance is not explained clearly
[33]
Graph-based model for identification of people opinion
Semi-supervised sentiment analysis
Labeling is not clear
[35]
Classification model for estimation of opinion of people
Unsupervised
Classification accuracy is not defined
[36]
Automatic classification of text into positive or negative to determine the opinion of mass in positive or negative toward the subject of interest
Supervised sentiment analysis, in the domain of micro-blogging
Focused on specific types of keywords
tasks. It is also referred to as “ontology learning layer cake” by some researchers [39]. The tasks include term extraction, synonym identification, concept formation, taxonomy extraction, relations extraction, and axiom extraction. Finally, generalized relations are used to form axioms. Over the past decade, many proven techniques from IR, DM, ML, and NLP have contributed to the progress of learning ontology [40]. The techniques are broadly categorized as linguistic-based, statistics-based, logic-based, or hybrid-based. In [41], it proposed a new acquisition process, the LocalMaxs algorithm to automatically extract contiguous as well as non-contiguous lexical multi-word lexical units. Hits algorithm for induction of domain terms as unigrams and multi-grams. In [39], it uses the RENT algorithm for term extraction. The algorithm uses regular expressions as domain-specific patterns to extract the single words and composite terms from agriculture text. In [42], it presented a general architecture “Ontogain” for automatic acquisition of ontology from plain text documents. [43] described a rule-based approach to MWT extraction and lemmatization from Serbian texts that depends on lexical resources such as e-dictionaries and finite-state transducers. [44] suggested a method for screening for ontology ideas in same-type texts that involves Chinese word segmentation, the n-gram algorithm for multi-word extraction, statistics, and rules. To replace contextual n-gram with a single-word term, [11] established the concept of uniqueness score based on the distributed thesaurus. [45] presented a hybrid technique in which a linguistic filter is used to extract candidate Arabic MWTs, and a statistical filter is used to incorporate additional association measures based on
NLP-Based Sentiment Analysis with Machine Learning Model …
603
term and unit hood estimates. [45] Searched for candidate MWTs using automatically POS tagged and lemmatized text, then extracted MWTs using the C-NC value weighting approach. In [46], it used TF-IDF, word distance, word position within a sentence and whole-text, and probability features from Naïve Bayes classifier to classify the term relevance.
5.2 Synonym Extraction in NLP True synonyms are found rarely as terms may slightly differ in their meaning. Hence, while learning ontology, rather than representing true synonyms, a few commonly used terms are grouped to represent a concept in certain contexts [47]. For example, “school” and “college” are used to represent “Institute”. The discovery of synonyms implies the acquisition of linguistic knowledge or possible symbols that refer to SWT or MWT within the text. There is no doubt that WordNet is widely used for referring synonyms of general terms, but it is least useful for a particular domain. In [48], it has applied automated techniques to discover synonyms from the phrase lexicon. This is based on heuristics which is decomposed into 14 reusable semantic rules.
5.3 Relation Between Sentiment Analysis and Machine Learning Sentiments are pivotal to almost all human ventures as they are an important connoisseur of one behavior. Whenever we require necessitating decisions, we desire to identify other people’s beliefs or opinions. Artificial intelligence (AI) is the facsimile of the human mind or human intelligence with machines or computers. The simulation procedure includes learning, reasoning, and self-correction.
5.4 Role of Machine Learning with NLP for Election Campaign Natural language processing (NLP) belongs to the class of AI involved in interconnections of linkage between human and computer language. It is actively involved in preprocessing and analysis of a vast range of natural language data to train computers. The NLP process is related to the processing of the text in which text is converted into machine format [19]. Generally, the conventional ML method is involved in determination of a huge data amount that impacts the NLP either positively or negatively. Several techniques are involved in the examination of the subjectivity in the data [20]. The sentiment analysis concentrated on individual behavior so it is active
604
S. S. Sangle and R. R. Sedamkar
in elections in India to evaluate the opinion of the voters about political parties and candidates [21]. In election campaign method, opinions of the voters are classified either as a supervised or unsupervised learning technique. In [22], it constructed a sentiment classification approach with the implementation of the binary and multiclass ranking. The sentiments are ranked based on the estimation of the polarity of the sentiments and allotment intensity. In [17], it stated that sentiment tasks are evaluated based on the computation of the frequent tasks with consideration of the information as part of speech (PoS) with consideration of negation, opinion, phrases, and frequency. With object holder extraction, [49] stated that classification is performed based on the estimation of the opinions, objects, and sources. The classification is based on the consideration of the 8 objects with consideration of direct or indirect sources to acknowledge. According to estimation in [23], feature extraction is evaluated based on the recognition of the target entities. The sentiment analysis attributes are focused on the extraction of features with recognition of the required data entities.
5.5 Levels of Sentiment Analysis Based on the granularities, sentiment analysis is being conducted on three levels classified as follows: Document-Level Sentiment Analysis—In [23], it stated that document-level category offers the complete statement in the document format with the provision of the opinion related to substance. In case of document-level classification, with the application of the supervised machine learning model, a finite set of the datasets were available. In the case of an unsupervised approach, a feature-oriented model is implemented for the classification of the positive and negative impacts. If the orientation is closer to positive, it is considered as the positive sentiment with consideration of the orientations else it is declared as negative sentiment [24]. Sentence-Level Sentiment Analysis—Sentiment analysis with sentence-level classification evaluated for subjectivity for computation of the individual sentences in the complete data. However, every sentence of the individual is treated based on consideration of the voter’s feelings or opinion [25]. In [26] sentence level, it integrates two classes to perform tasks either subjectivity or sentiment category. In subjective classification, classes are estimated based on the objective classes, whereas categorization in the document level has a class of positive and negative. Feature Level Sentiment Analysis—In case of feature level analysis, voters’ opinions are evaluated through the extracted features. Table 2 provides the overall summary about the existing literature related to election campaigns. The examination expressed that 6 literature has been conducted for the examination of the election campaign. The examination stated that the conventional election campaign literature uses the multi-gram SVM, N-gram, statistical
NLP-Based Sentiment Analysis with Machine Learning Model …
605
filter, and Naïve Bayes classifier. However, in [48], the automated technique uses the ontologies-based classifier for the election campaign model. The ontologies-based classification model comprises the attributes, things, events, and agents. The other techniques are not sufficient for processing the multi-word term with the examination of the semantic relations. The evaluation is based on the consideration of the multi-word terms with use of NC-values in the classification model. The recent trends and development in the NLP process comprise the advancement in [50] NLPbased deep learning model. The developed approach considers electronic health records (HER) for the examination. The proposed scheme comprises the examination of redundancies. In [51], it presented an agile-based requirement engineering (RE) with automated detection of the privacy information. Through the syntactic structure, the deep learning model is applied for the extraction of information. In [52], NLP-based supervised classification model is developed for the classification of the healthy information data. The developed model exhibits significant performance for the health strategy with speed transfer for information processing. The NLP-based processing of the named entity recognition (NER) for data privacy is presented in the neural network model for evaluation. In [53], textual data is classified based on the data-driven model termed as bidirectional encoder representations from transformer (BERT). The NLP processing technique in [54] performs the text classification through latent semantic analysis (LSA) and latent Dirichlet allocation (LDA) with the classification model. In [55], neuro-fuzzy technique integrated with the optimization model is presented. The examination expressed that a real-time stream of information is examined with the deep forward layer architecture. Table 2 Summary of survey on election campaign Reference Technique
Classifier
Analysis
[41]
LocalMaxs algorithm
Multi-grams SVM
Semantic relation with high hub scores, and unigrams are considered for nouns with high authority scores
[42]
Ontogain
C/NC value
Multi-word terms (compound terms) based on multi-words
[43]
e-dictionaries and finite-state N-gram transducers
C-value, T-Score, LLR, and Keyness
[45]
Hybrid technique
Statistical filter
Unit hood estimates
[46]
TF-IDF
Naïve Bayes classifier Word distance, word position
[48]
Automated techniques
Ontologies
Attribute (A), thing (T ), event (E), agent (G), and α
606
S. S. Sangle and R. R. Sedamkar
6 Practical Analysis of Election Campaign Based on the examination, the data related to the election campaign are presented for examination. The data for analysis are collected from the 50 sample population to evaluate the opinion of the voters in the election campaign. The data collected from the sample population demographic profile are presented in Table 3. The analysis of the demographic profile of the respondents expressed that the majority of the sample population are the male population with a frequency of 84%, whereas the female population is at the frequency of 16%. The educational qualification of the 50 sample population expressed that most of the population are postgraduates with the frequency of 42%, and the graduate educational qualification is 28%. This implies that the majority of the sample were highly educated which provides significant accurate results. In the evaluation of the annual income, above 8 lakhs are the higher group with the frequency of 46%, where the 2–4 lakhs occupy the 20%. In the evaluation of the voting count, the high frequency is obtained for the count of 4–10 times. This implies the population has adequate knowledge about the election campaigns and its contribution in the election as per the view of voters. In Table 4, candidate view and analysis based on the political parties are presented. The analysis of the data expressed that slogans are effectively contributed in the decision making process of voters’ about candidate and political parties. The decision-making process is highly engaged in the success rate of the voters and candidates. Also, the analysis stated that technological advancement plays an active role in the decision-making process. Additionally, the open-ended questionnaire expressed that the decision-making process of the voters is based on the slogan. Also, the majority of the sample population are interested in conducting the election campaign in the local language and need to concentrate on the development.
7 Concluding Remarks Sentiment analysis is actively involved in the decision-making process to shape the life of the people and the subsequent process of life choices is computed. Human tendencies are responded to by the viewpoint of different aspects of human life [13]. At present, to perform effective information about attitudes and thoughts, artificial intelligence (AI)-based sentiment analysis with texts involved in the decision-making process. Sentiment analysis is a challenging task due to the artificial intelligence model being based on automated parse text and figure out with an examination of positive, negative, and neutral review. Artificial intelligence involved in the understanding of the machine strives for the emotion of humans [14]. Meanwhile, sentiment analysis is beneficial for the diverse stream of NLP to evaluate the features and downstream tasks. Currently, the paradigm of research is focused on sentiment analysis in an automated manner. The constructed model comprises the structure such as convolutional
NLP-Based Sentiment Analysis with Machine Learning Model … Table 3 Demographic profile
607
Frequency
Percentage
Male
42
84
Female
08
16
0
0
19–25
10
20
25–30
6
12
30–40
16
32
Gender
Age Below 18
40–50
9
18
Above 50
9
18
Graduate
14
28
Diploma
2
4
Undergraduate
6
12
21
42
7
14
Below 1 lakh
1
2
1–2 lakhs
1
2
2–4 lakhs
10
20
4–6 lakhs
9
18
6–8 lakhs
6
12
23
46
Education sector
31
62
IT sector
11
22
Non-IT sector
3
6
Student
2
4
Other
3
6
Educational qualification
Postgraduate PhD Annual income
Above 8 lakhs Profession
Voting count 0 1–3 times
4
8
15
30
4–10 times
16
32
More than 10 times
15
30
608
S. S. Sangle and R. R. Sedamkar
Table 4 Analysis of candidate Candidate analysis
1
3
4
5
Do you know adequate background information about all candidates?
26 18 3
1
–
Do you vote for any candidate based on the slogan of political parties?
18 16 11 2
3
Do you feel the agenda of political parties engaged in the decision-making 8 process?
2
19 10 12 1
Do you cast vote for a candidate based on their framed slogan?
11 13 19 4
Does the symbol of the political parties/ candidate motivates you to vote?
3
8
12 16 11
3
Do you feel the slogan of political parties is unwanted?
2
5
9
Does the use of technology impact the election campaign?
11 16 10 5
8
Does the slogan of the political parties is easy to understand?
26 21 2
1
–
Does the slogan of the political parties based on the recent issue and trends?
26 22 2
–
–
I feel slogans are strongly influenced by the success rate of the candidate
19 17 11 –
3
19 15
neural network (CNN) [9], recursive auto-encoders [10], long short-term memory network (LSTM) [11], and advancing model training strategies. Even though those existing models promise results, it is subject to a drawback of interpretability. The conventional technique is unable to express salient words or phrases with the estimation of sentiment polarity to reflect the human components [15]. Also, the automated learning process comprises large raw-scale data in devaluing the external resources such as sentiment lexicon, linguistic knowledge, and cognition-grounded data. The implementation of the AI-based sentiment analysis increases the understanding rate of the election parties. However, to process the existing machine and deep learning model, it uses different processing techniques for automated language processing. To improve the performance of the learning process, existing techniques such as auto-encoder, LSTM, and CNN-based techniques are evolved. Even though the existing technique exhibits improved performance of the text data processing. It fails to process the effective analysis of people’s views in a real-time scenario. Also, it fails to derive automated processing of the text information. Hence, this research intended to develop an automated text processing deep learning model. In Table 5, the overall summary of the number of papers referred for the analysis is presented. Table 5 Summary of paper referred
Topic
Number of papers
Classifier
Machine learning
19
Multi-gram and N-gram
Machine learning for election campaign Sentiment analysis
8 26
Unsupervised learning Multi-gram
NLP-Based Sentiment Analysis with Machine Learning Model …
609
Through the review to evaluate the election campaign, sentiment analysis is posed with the artificial intelligence coordination system. To overcome the limitations associated with the sentiment analysis, NLP is derived for conversion of the structured blocks as the common point. To enhance the standard of sentiment analysis, ML technique transformation is informally structured into structured data. Sentiment analysis is perfect with cognitive abilities as humans. The integration of the effort in the research direction needs to concentrate on the abilities. It is necessary to construct an appropriate model to seek attention in the ideal system with the sentiment analysis.
8 Conclusion This paper presented a review of the requirement of the election campaign for examining the opinion of the voters. The analysis expressed that the machine learning technique is an effective model for computation of the voter’s opinion about parties and candidates. Conventionally, with the application of the classification technique on the ML model, the opinion of the voters about particular parties or candidates is examined. The analysis classification technique on the machine learning model expressed that NLP is considered an effective tool for the processing and evaluation of the voter’s opinion on the election campaign. With the implementation of the NLP process, the opinion of the voters on the election campaign is examined in view of their opinion. The review of the literature associated with the election campaign stated that sentimental analysis is widely adopted in the ML model for the identification of the voter’s opinion. With NLP integrated with sentiment analysis, it is effective for processing the opinion of the voters. Hence, the application of the NLP with sentiment analysis election campaigns is effective for processing the collected data with the machine learning process.
References 1. Tameryan TY, Zheltukhina MR, Slyshkin GG, Zelenskaya LL, Ryabko OP, Bodony MA (2019) Political media communication: bilingual strategies in the pre-election campaign speeches. Online J Commun Media Technol 9(4):e201921 2. Marchal N, Neudert L-M, Kollanyi B, Howard PN (2021) Investigating visual content shared over Twitter during the 2019 EU parliamentary election campaign. Media Commun 9(1):158– 170 3. Unkel J, Haim M (2021) Googling politics: parties, sources, and issue ownerships on Google in the 2017 German federal election campaign. Soc Sci Comput Rev 39(5):844–861 4. Blumler JG, Esser F (2019) Mediatization as a combination of push and pull forces: examples during the 2015 UK general election campaign. Journalism 20(7):855–872 5. Chen Y, Wang L (2022) Misleading political advertising fuels incivility online: a social network analysis of 2020 US presidential election campaign video comments on YouTube. Comput Hum Behav 131:107202
610
S. S. Sangle and R. R. Sedamkar
6. Siegel AA, Nikitin E, Barberá P, Sterling J, Pullen B, Bonneau R, Nagler J, Tucker JA (2021) Trumping hate on Twitter? online hate speech in the 2016 US election campaign and its aftermath. Q J Polit Sci 16(1):71–104 7. Ibrishimova MD, Li KF (2019) A machine learning approach to fake news detection using knowledge verification and natural language processing. In: International conference on intelligent networking and collaborative systems, Springer, Cham, pp 223–234 8. Jensen MJ (2017) Social media and political campaigning: changing terms of engagement? Int J Press/Polit 22(1):23–42 9. Koli AM, Ahmed M (2021) Machine learning based parametric estimation approach for poll prediction. Recent Adv Comput Sci Commun Formerly: Recent Patents Comput Sci 14(4):1287–1299 10. Looijenga MS (2018) The detection of fake messages using machine learning. Bachelor’s thesis, University of Twente 11. Miranda E, Aryuni M, Hariyanto R, Surya ES (2019) Sentiment analysis using sentiwordnet and machine learning approach (Indonesia general election opinion from the twitter content). In: 2019 International conference on information management and technology (ICIMTech), vol 1, IEEE, pp 62–67 12. Buntoro GA, Arifin R, Syaifuddiin GN, Selamat A, Krejcar O, Hamido F (2021) The implementation of the machine learning algorithm for the sentiment analysis of Indonesia’s 2019 Presidential election. IIUM Eng J 22(1):78–92 13. Alashri S, Alalola T (2020) Functional analysis of the 2020 US elections on Twitter and Facebook using machine learning. In: 2020 IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 586–589 14. Sharma A, Ghose U (2020) Sentimental analysis of twitter data with respect to general elections in India. Proc Comput Sci 173:325–334 15. Sandoval-Almazan R, Valle-Cruz D (2018) Facebook impact and sentiment analysis on political campaigns. In: Proceedings of the 19th annual international conference on digital government research: governance in the data age, pp 1–7 16. Sandoval- R, Valle-Cruz D (2020) Sentiment analysis of Facebook users reacting to political campaign posts. Digit Govern: Res Pract 1(2):1–13 17. Budiharto W, Meiliana M (2018) Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis. J Big Data 5(1):1–10 18. Wongkar M, Angdresey A (2019) Sentiment analysis using Naive Bayes algorithm of the data crawler: Twitter. In: 2019 Fourth international conference on informatics and computing (ICIC), IEEE, pp 1–5 19. Ansari MZ, Aziz MB, Siddiqui MO, Mehra H, Singh KP (2020) Analysis of political sentiment orientations on twitter. Proc Comput Sci 167:1821–1828 20. Yaqub U, Sharma N, Pabreja R, Chun SA, Atluri V, Vaidya J (2020) Location-based sentiment analyses and visualization of Twitter election data. Digit Govern Res Pract 1(2):1–19 21. Chauhan P, Sharma N, Sikka G (2021) The emergence of social media data and sentiment analysis in election prediction. J Ambient Intell Humaniz Comput 12(2):2601–2627 22. Alvarez G, Choi J, Strover S (2020) Good news, bad news: a sentiment analysis of the 2016 Election Russian Facebook Ads. Good Syst Published Res 23. Diaz-Garcia JA, Ruiz MD, Martin-Bautista MJ (2020) Non-query-based pattern mining and sentiment analysis for massive microblogging online texts. IEEE Access 8:78166–78182 24. Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Syst 60(2):617–663 25. Dhaoui C, Webster CM, Tan LP (2017) Social media sentiment analysis: lexicon versus machine learning. J Consum Mark 26. Farzindar A, Inkpen D (2015) Natural language processing for social media. Synth Lect Hum Lang Technol 8(2):1–166 27. Parackal M, Mather D, Holdsworth D (2018) Value-based prediction of election results using natural language processing: a case of the new Zealand general election. Int J Mark Res 60(2):156–168
NLP-Based Sentiment Analysis with Machine Learning Model …
611
28. Farrell J (2019) The growth of climate change misinformation in US philanthropy: evidence from natural language processing. Environ Res Lett 14(3):034013 29. Albanese F, Pinto S, Semeshenko V, Balenzuela P (2020) Analyzing mass media influence using natural language processing and time series analysis. J Phys Complex 1(2):025005 30. Dimitrova DV, Matthes J (2018) Social media in political campaigning around the world: Theoretical and methodological challenges. J Mass Commun Q 95(2):333–342 31. Halpin D, Vromen A, Vaughan M, Raissi M (2018) Online petitioning and politics: the development of Change. org in Australia. Aust J Polit Sci 53(4):428–445 32. Hitesh MSR, Vaibhav V, Kalki YJA, Kamtam SH, Kumari S (2019) Real-time sentiment analysis of 2019 election tweets using word2vec and random forest model. In: 2019 2nd International conference on intelligent communication and computational techniques (ICCT), IEEE, pp 146–151 33. Aquino PA, López VF, Moreno MN, Muñoz MD, Rodríguez S (2020) Opinion mining system for twitter sentiment analysis. In: International conference on hybrid artificial intelligence systems, Springer, Cham, pp 465–476 34. Hasanli H, Rustamov S (2019) Sentiment analysis of Azerbaijani twits using logistic regression, Naive Bayes and SVM. In: 2019 IEEE 13th International conference on application of information and communication technologies (AICT), IEEE, pp 1–7 35. Kaur M, Verma R, Otoo FNK (2021) Emotions in leader’s crisis communication: Twitter sentiment analysis during COVID-19 outbreak. J Hum Behav Soc Environ 31(1–4):362–372 36. Anupama BS, Rakshith DB, Rahul KM, Navaneeth M (2020) Real time twitter sentiment analysis using natural language processing. Int J Eng Res Technol 9(7):1107–1112 37. Pérez JM, Giudici JC, Luque F (2021) Pysentimiento: a python toolkit for sentiment analysis and socialnlp tasks. arXiv preprint arXiv:2106.09462 38. Javed M, Kamal S (2018) Normalization of unstructured and informal text in sentiment analysis. Int J Adv Comput Sci Appl 9(10) 39. Antonakaki D, Fragopoulou P, Ioannidis S (2021) A survey of Twitter research: data model, graph structure, sentiment analysis and attacks. Expert Syst Appl 164:114006 40. Martins R, Almeida J, Henriques P, Novais P (2020) Predicting an election’s outcome using sentiment analysis. In: World conference on information systems and technologies, Springer, Cham, pp 134–143 41. Plaza-del-Arco FM, Martín-Valdivia MT, Ureña-López LA, Mitkov R (2020) Improved emotion recognition in Spanish social media through incorporation of lexical knowledge. Future Gener Comput Syst 110:1000–1008 42. Santos JS, Paes A, Bernardini F (2019) Combining labeled datasets for sentiment analysis from different domains based on dataset similarity to predict electors sentiment. In: 2019 8th Brazilian conference on intelligent systems (BRACIS), IEEE, pp 455–460 43. Valle-Cruz D, Fernandez-Cortez V, López A, Sandoval- R (2022) Does twitter affect stock market decisions? financial sentiment analysis during pandemics: a comparative study of the h1n1 and the covid-19 periods. Cogn Comput 14(1):372–387 44. Hassan SU, Saleem A, Soroya SH, Safder I, Iqbal S, Jamil S, Bukhari F, Aljohani NR, Nawaz R (2021) Sentiment analysis of tweets through Altmetrics: a machine learning approach. J Inf Sci 47(6): 712–726 45. Kristiyanti DA, Umam AH (2019) Prediction of Indonesia presidential election results for the 2019–2024 period using twitter sentiment analysis. In: 2019 5th International conference on new media studies (CONMEDIA), IEEE, pp 36–42 46. Sanders AC, White RC, Severson LS, Ma R, McQueen R, Paulo HCA, Zhang Y, Erickson JS, Bennett KP (2021) Unmasking the conversation on masks: natural language processing for topical sentiment analysis of COVID-19 Twitter discourse. AMIA Summits Transl Sci Proc 2021:555 47. Franco- JN, Bello-Garcia A, Ordieres-Meré J (2019) Indicator proposal for measuring regional political support for the electoral process on Twitter: the case of Spain’s 2015 and 2016 general elections. IEEE Access 7:62545–62560
612
S. S. Sangle and R. R. Sedamkar
48. Martin-Gutierrez S, Losada JC, Benito RM (2018) Semi-automatic training set construction for supervised sentiment analysis in political contexts. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, pp 715–720 49. Bansal B, Srivastava S (2018) On predicting elections with hybrid topic based sentiment analysis of tweets. Proc Comput Sci 135:346–353 50. Park S, Strover S, Choi J, Schnell MK (2021) Mind games: a temporal sentiment analysis of the political messages of the internet research agency on Facebook and Twitter. New Media Soc 14614448211014355 51. Liu J, Capurro D, Nguyen A, Verspoor K (2022) Note Bloat impacts deep learning-based NLP models for clinical prediction tasks. J Biomed Inf 104149 52. Casillo F, Deufemia V, Gravino C (2022) Detecting privacy requirements from user stories with NLP transfer learning models. Inf Softw Technol 146:106853 53. Li K, Zhou C, Luo XR, Benitez J, Liao Q (2022) Impact of information timeliness and richness on public engagement on social media during COVID-19 pandemic: an empirical investigation based on NLP and machine learning. Decis Support Syst 113752 54. Marulli F, Verde L, Campanile L (2021) Exploring data and model poisoning attacks to deep learning-based NLP systems. Proc Comput Sci 192:3570–3579 55. Xu S, Zhang C, Hong D (2022) BERT-based NLP techniques for classification and severity modeling in basic warranty data study. Insur: Math Econ
Heart Problem Detection from Electrocardiogram by One-Dimensional Convolutional Neural Network Prince Kumar, Deepak Kumar, Poulami Singha, Rakesh Ranjan, and Dipankar Dutta
Abstract A widespread global health concern among people is heart disease. It is one of the major causes of fatalities. The electrocardiogram (ECG) is one of the devices used to check heart health. Nowadays, one can do ECGs in small towns or small hospitals, but only cardiologists can analyze ECGs to detect heart problems. Cardiologists are not easily available, and their consulting fees are also high. They are small in number, too. So there is a need for software systems which can analyze ECGs to detect heart problems by machine. By such a system we can monitor the condition of the heart more frequently as it is very cheap. If such a system detects any heart problems, then that person may consult a cardiologist for further proceedings. The amount of ECGs that need to be manually analyzed by cardiologists can be significantly reduced if no heart problem is found. Therefore, it will make healthcare decision-making easier and result in significant time and financial savings. In this paper, we propose a classifier for categorizing ECGs based on a onedimensional convolutional neural network (1D-CNN). On the MIT-BIH and PTB Diagnostics datasets from PhysioNet, we assessed the suggested methodology. The proposed method can predict with 98.48% and 98.97% accuracy on the MIT-BIH and PTB Diagnostics datasets from PhysioNet, respectively. We analyzed the proposed model’s performance with that of previously published models. It shows that the proposed model is either better or comparable with other models. Keywords ECG · Heart diseases · ID-CNN · Classification
P. Kumar · D. Kumar · P. Singha · R. Ranjan · D. Dutta (B) University Institute of Technology, The University of Burdwan, Golapbag (North), Burdwan 713104, West Bengal, India e-mail: [email protected] URL: https://dipankarduttas.yolasite.com/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_44
613
614
P. Kumar et al.
Fig. 1 Electro-conduction system of heart [18]
1 Introduction A heart’s primary functions and the cardiovascular system are both vitally explained by an ECG. Therefore, ECGs can indicate susceptibility to severe heart conditions, strokes, or unexpected cardiac death. ECG records bioelectric potential variations with time. An ECG is a collection of waves and deflections that captures the heart’s electrical activity from a particular “view”. Voltage changes between electrodes positioned in various locations on the body are observed by numerous viewpoints, each known as a lead. Electroconduction system of heart is depicted in Fig. 1. Sodium (Na+ ), potassium (K+ ), and calcium (Ca2+ ) solutions surround and fill each heart cell. When a cell is at rest, the interior of the cell membrane is thought to be negative in relation to the exterior. The interior of the heart becomes positive in relation to the exterior when an electrical impulse is produced there. Depolarization is the name given to this shift in polarity. After depolarization, the cell returns to its initial condition. This is known as repolarization. The heart’s electrical signal is captured by the ECG as the muscle cells contract and repolarize. An ECG waveform is shown in Fig. 2. The P, Q, R, S, and T deflections represent the heart’s impulses. The following is a description and an explanation of each deflection. • P denotes atrial depolarization or contraction. • The PR interval counts the amount of time a depolarization wave takes to get from the atria to the ventricles. • Q, R, and S include three deflections following P deflection. Q is the first negative deflection, R is a positive deflection, and S indicates the first negative deflection after R.
Heart Problem Detection from Electrocardiogram …
615
Fig. 2 An ECG waveform [20]
• The ST segment counts the seconds that pass between ventricular depolarization and the start of repolarization. • The letter T stands for ventricular repolarization. • The QT interval is an indicator of overall ventricular activity. Classification is supervised learning; it is one of the most studied areas of machine learning. Many types of classifier are invented by researchers [6, 26]. Some of these classifiers are used for ECG classification [12, 23, 27]. Among these, convolutional neural network (CNN) is very effective. From the input data, CNN may extract features and obscure patterns. Each of the layers that make up CNN has a tiny subset of neurons that it uses to process different parts of the incoming data. With more layers, the extraction of features becomes more abstracted since each layer’s output is made up of numerous features that were taken from its input.
2 Literature Review Artificial intelligence has recently been used in the study of ECG signal analysis. Deep learning approaches in particular have demonstrated good performance in detecting abnormal ECG waveforms and events, increasing the accuracy of detecting a number of heart-related disorders. Depending on how many electrodes are used, there are various forms of ECG, including 1-lead, 2-lead, 6-lead, and 12-lead. Therefore, we can categorize works into two groups: single-lead-based works and multiple-lead-based works.
616
P. Kumar et al.
2.1 Works Based on Single Lead In [7, 16], researches used MIT-BIH dataset. First, they extracted different features from ECG signals, and then, support vector machine (SVM) is used for classification based on those features. In [7], researches claimed to achieve 98.40% accuracy whereas in [16], researches claimed to achieve 94.40% accuracy. Yildirim et al. [24] proposed a new model called deep bidirectional long-short term memory network-based wavelet sequences called DBLSTM-WS for classifying ECG signals. Experiment on MIT-BIH dataset shows 99.39% accuracy. It has been discovered that the wavelet-based layer recommended in the study significantly improves the capability of networks to identify signals. A system for continuous cardiac monitoring using wearable devices with constrained computing power was developed by Saadatnejad et al. [19]. They used multiple LSTM. Zhai et al. [25] used a 2D-CNN as classifier to classify heartbeats extracted from MIT-BIH dataset. A dual beat coupling matrix was created using the beats for the CNN classifier’s 2D inputs. They compared the performance of the proposed classifier with other classifier to show the superior performance of their classifier. The 1D-CNNs utilized by Kiranyaz et al. [11] and Li et al. [12] can be employed to monitor the ECG in real time and an early warning system on a portable, lightweight device. Kiranyaz et al. show that the results over the MIT-BIH dataset of the proposed classifier by them achieve a higher classification performance compared to the majority of state-of-the-art techniques for the identification of ventricular ectopic beats (VEB) and supraventricular ectopic beats (SVEB). In addition to the input layer and the output layer, the 1D-CNN suggested by Li et al. contains five layers: two convolution layers, two down sampling layers, and one fully connected layer. On MIT-BIH dataset, they reported 97.5% classification accuracy. Panganiban et al. used Google’s Inception V3 model for classification and reported 97.33% classification accuracy [17]. Mohonta et al. utilized continuous wavelet transform (CWT) and 2D-CNN to classify ECG signals. They reported an accuracy of 99.65% [14]. Recently, Tripathi et al. used superlet transform (SLT) to convert the 1D ECG signal into a 2D time frequency (TF) spectrogram. A CNN called DenseNet-201 provides 96.2% accuracy [21]. In another recent paper by Li et al., [13] the researchers utilized discrete wavelet transform (DWT) to denoise the signals and used an improved deep residual CNN for classification. Highest accuracy reported by them was 88.99%.
2.2 Works Based on Multiple Leads Dohare et al. [5] have done myocardial infarction (MI) detection based on SVM classifier using 12-lead ECG of the standard PTB dataset. With 220 parameters, SVM classifier shows 98.33% accuracy. Principal component analysis (PCA) extracted top 14 orthogonal features from 220 features. With these 14 features, obtained accuracy was 96.66%.
Heart Problem Detection from Electrocardiogram …
617
Chang et al. [4] have done atrial fibrillation (AF) detection by LSTM. For this purpose, they used multi-lead ECG data from the PhysioNet and some ECG data collected in their laboratories. The accuracy of AF detection reached up to 98.3%. A 1D densely connected CNN was utilized by Wang et al. [22] to detect nine types (one regular type and eight problematic types) from 12-lead ECG records. To preprocess the data, a number of wavelet-based shrinkage filtering techniques were employed. The approach has been tested and found effective using The First China ECG Intelligent Competition dataset, achieving a final F1 score of 0.873 and 0.863 on the validation and test sets, respectively. Panganiban et al. used Google’s Inception V3 model for classification and reported 98.73% classification accuracy [17]. Hong et al. have done a review by considering 191 papers, which uses deep neural networks to ECG data [8]. In general, works on heart problem detection using classical classifiers like SVM are less accurate. Deep CNN has many layers and a lot of parameters. So, they are computationally very expensive to train. 2D-CNNs use matrix multiplications inside CNNs, whereas 1D-CNNs use array multiplications, thus 2D-CNNs are computationally expensive than 1D-CNNs. For these reasons, we have chosen shallow 1D-CNNs for our work and these are giving high accuracy in classification.
3 Datasets We used the Massachusetts Institute of Technology—Boston’s Beth Israel Hospital (MIT-BIH) Arrhythmia dataset [15] and the Physikalisch-Technische Bundesanstalt (PTB) Diagnostic ECG dataset [2] to test the proposed method. 48 half-hour samples recordings of two-channel ambulatory ECG from 47 people who were being researched by the BIH Arrhythmia laboratory are available in the MIT-BIH Arrhythmia dataset. 4000 24-hour ambulatory ECG recordings from a mixed group of inpatients (approximately 60%) and outpatients (about 40%) were collected at Boston’s Beth Israel Hospital. 23 recordings were chosen at random from this set, and the remaining 25 were chosen to incorporate clinically important but arrhythmias that happen less frequently and would not be effectively reflected in a tiny random sample. Each channel’s recordings were digitalized at 360 samples per second over a 10 mV range. A total of 290 patients contributed 549 records to the PTB Diagnostic ECG dataset (aged 17–87, with a mean age of 57.2; 209 men, with a average age of 55.5, and 81 women, with a average age of 61.6; 1 female and 14 male subjects’ ages were not recorded). One to five recordings are made from each subject. No subjects with the numbers 124, 132, 134, or 161 exist. The traditional 12 leads (ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, and v6) and the 3 Frank lead ECGs (vx, vy, and vz) are all simultaneously measured in each record, totaling 15 signals. With a sampling rate of 1000 samples per second and a resolution of 16 bits over a range of 16.384 mV, each signal is digitalized.
618
P. Kumar et al.
Fig. 3 ECG heartbeat extraction [9]
4 Methodology 4.1 Pre-processing The input of the proposed 1D-CNN is ECG signals, but before that processing of ECG signals to extract beats is required. Those are stored in a comma separated value (CSV) file. Each row in the CSV file corresponds to a single heartbeat, and those are generated as follows: • Divide the records at the R-peaks into separate heartbeat records. • To include a complete QRS, the first 40 values of the subsequent heartbeat record are added to each pulse record. • Reduce each heartbeat recording’s 360–125 Hz sampling rate. • Set the mV readings to a range from 0 to 1. • Records of heartbeats longer than 187 values are ignored. • Smaller heartbeat records are increased in size by padding the end with zeroes until they contain exactly 187 data. • Heartbeat classifications from the annotations are appended to each heartbeat record for MIT-BIH records (0 for normal, 1 for ventricular ectopic, 2 for supraventricular ectopic beat, 3 for fusion beat, and 4 for unknown beat) and PTB records (0 is normal and 1 is abnormal). Then, each row has exactly 188 values. • Records of heartbeats without categorization are deleted. Following this procedure, we got 109,446 heartbeats form MIT-BIH Arrhythmia dataset and 14552 heartbeats from PTB Diagnostic ECG dataset. Fig. 3 shows a single heartbeat record extracted from a series of heartbeats.
4.2 Architecture We have already mentioned that we used a 1D-CNN for building a classifier to do classification of heartbeat. The proposed architecture of 1D-CNN is shown in Fig. 4.
Heart Problem Detection from Electrocardiogram …
619
Fig. 4 Architecture of the proposed 1D-CNN
Extracted heartbeats are used as inputs of this 1D-CNN during training of this CNN to build a classifier by adjusting weights. After input layer, there will be a convolutional layer. In that layer and for other convolutional layers, we used 32 filters of size 5, which stride 1 bit at a time. After that, max pooling layer is there. Max pooling layers are used for reducing the size of the input. Size of these layers are shown in Fig. 4. After that two fully connected layers are there. In all this layers, we used ReLU activation function [1]. At the end output layer used softmax function [3]. Rate of dropout is 0.2. • The proposed model contain 8 convolutional layers, 3 max pooling layers, 1 global max pooling layers and 3 fully connected layers. • With no padding used, the first and second convolutional layers each include 16 filters of size 5 with ReLU as the activation function. • After that max pooling layer with stride size 2 and dropout at a rate of 0.1 is applied. • Third and fourth convolutional layers contain 32 filters of size 3, ReLU as activation function and padding is not applied. • After that max pooling layer with stride size 2 and dropout at a rate of 0.1 is applied. • Fifth and sixth convolutional layers contain 32 filters of size 3, ReLU as activation function and padding is not applied. • After that max pooling layer with stride size 2 and dropout at a rate of 0.1 is applied. • Seventh and eighth convolutional layers contain 256 filters of size 3, ReLU as activation function and padding is not applied. • After that a global max pooling operation and dropout at a rate of 0.1 is applied. It produces 256 vectors of one-dimension. • First and second fully connected layers contain 64 nodes and ReLU as activation function. • The output layer contains 5/2 nodes and softmax as activation function. Softmax activation function is used for classification purpose. • We used 1000 epochs for training purpose.
620
P. Kumar et al.
5 Result and Comparison As we have discussed that we have used two datasets for our experimentations. In MIT-BIH dataset, heartbeat records from 39 persons are utilized for training the classifier and remaining 9 are utilized for testing the classifier. Testing accuracy is 98.48%, and F1-score is 91.38%. Table 1 shows the testing accuracy obtained by other classifiers developed by different researchers at different point of time on MIT-BIH dataset. In some cases [14, 24], accuracies are higher than the proposed method, but these methods used different pre-processing techniques and 2D-CNNs which computationally expensive in nature. Figure 5 shows the confusion matrix for heartbeat classification on test dataset. In PTB dataset, data of 290 patients are there. Among these data, 228 patient’s data are used for training and remaining 58 patient’s data are used for testing. Testing accuracy is 98.97%, and F1-score is 99.2%. Table 2 shows the testing accuracy obtained by other classifiers developed by different researchers at different point of time on PTB dataset. Figure 6 shows the confusion matrix for heartbeat classification
Table 1 Comparison of heartbeat classification results for MIT-BIH dataset Research work Methodology used Accuracy (%) Ge et al. [7] Pandey et al. [16] Yildirim et al. [24] Li et al. [12] Panganiban et al. [17] Mohonta [14] Tripathi et al. [21] Li et al. [13] Proposed method
Fig. 5 Confusion matrix for heartbeat classification on the test dataset of MIT-BIH
SVM SVM DBLSTM-WS 1D-CNN Google’s Inception V3 CWT and 2D-CNN SLT and DenseNet-201 DWT and deep residual CNN 1D-CNN
98.40 94.40 99.39 97.50 97.33 99.65 96.20 88.99 98.48
Heart Problem Detection from Electrocardiogram …
621
Table 2 Comparison of heartbeat classification results for PTB dataset Research work Methodology used Accuracy (%) Dohare et al. [5] Dohare et al. [5] Chang et al. [4] Panganiban et al. [17] Proposed method
SVM PCA and SVM LSTM Google’s Inception V3 1D-CNN
98.33 96.66 98.30 98.73 98.97
Fig. 6 Confusion matrix for heartbeat classification on the test dataset of PTB
on test dataset. Tables 1 and 2 show that the proposed method is showing better or comparable accuracies with other methods proposed by researchers. In Figs. 5 and 6, cells along the diagonals highlighted with darker color are showing higher values compared to other cells in the figures, which shows that, in most of the cases classifiers are predicting class labels accurately, and they are matching with the actual class labels.
6 Conclusion The automatic recognition and classification of the various ECG heartbeat types which are crucial for identifying cardiac arrhythmia by 1D-CNN is shown in this paper. We developed a 1D-CNN that can classify five or two different types of ECG heartbeats and can be incorporated into a coronary artery disease (CAD) ECG system to provide a quick and precise diagnosis. Cardiologists may utilize the proposed model as an adjunct tool in clinical settings to help them analyze ECG heartbeat data. Implementing such a model in polyclinics for online and offline review of large quantities of ECG recordings would speed up patient care, reduce the volume of work for cardiologists, and the price of processing ECG signals in hospitals. The wavelet
622
P. Kumar et al.
technique can be integrated into the proposed model to achieve higher classification accuracy, as it has been observed to improve accuracy in previous experiments.
References 1. Agarap AF (2018) Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 2. Bousseljot R, Kreiseler D, Schnabel A (1995) Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet 3. Bridle J (1989) Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. Adv Neural Inf Process Sys 2 4. Chang YC, Wu SH, Tseng LM, Chao HL, Ko CH (2018) AF detection by exploiting the spectral and temporal characteristics of ECG signals with the LSTM model. In: 2018 computing in cardiology conference (CinC), vol 45. IEEE, pp 1–4 5. Dohare AK, Kumar V, Kumar R (2018) Detection of myocardial infarction in 12 lead ECG using support vector machine. Appl Soft Comput 64:138–147 6. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181 7. Ge Z, Zhu Z, Feng P, Zhang S, Wang J, Zhou B (2019) ECG-signal classification using SVM with multi-feature. In: 2019 8th international symposium on next generation electronics (ISNE). IEEE, pp 1–3 8. Hong S, Zhou Y, Shang J, Xiao C, Sun J (2020) Opportunities and challenges of deep learning methods for electrocardiogram data: a systematic review. Comput Biol Med 122:103801 9. Kachuee M, Fazeli S, Sarrafzadeh M (2018) ECG heartbeat classification: a deep transferable representation. In: 2018 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 443–444 10. Kaur P, Sharma RK (2014) LabVIEW based design of heart disease detection system. In: International conference on recent advances and innovations in engineering (ICRAIE-2014). IEEE, pp 1–5 11. Kiranyaz S, Ince T, Gabbouj M (2015) Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Trans Biomed Eng 63(3):664–675 12. Li D, Zhang J, Zhang Q, Wei X (2017) Classification of ECG signals based on 1D convolution neural network. In: 2017 IEEE 19th international conference on e-health networking, applications and services (Healthcom). IEEE, pp 1–6 13. Li Y, Qian R, Li K (2022) Inter-patient arrhythmia classification with improved deep residual convolutional neural network. Comput Methods Programs Biomed 214:106582 14. Mohonta SC, Motin MA, Kumar DK (2022) Electrocardiogram based arrhythmia classification using wavelet transform with deep learning model. Sens Bio-Sens Res 100502 15. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50 16. Pandey SK, Janghel RR, Vani V (2020) Patient specific machine learning models for ECG signal classification. Procedia Comput Sci 167:2181–2190 17. Panganiban EB, Paglinawan AC, Chung WY, Paa GLS (2021) ECG diagnostic support system (EDSS): a deep learning neural network based classification system for detecting ECG abnormal rhythms from a low-powered wearable biosensors. Sens Bio-Sens Res 31:100398 18. Patel B, Shah D (2014) Evaluating ECG capturing using sound-card of PC/laptop. arXiv preprint arXiv:1402.3651 19. Saadatnejad S, Oveisi M, Hashemi M (2019) LSTM-based ECG classification for continuous monitoring on personal wearable devices. IEEE J Biomed Health Inform 24(2):515–523 20. Sinha R (2012) An approach for classifying ECG arrhythmia based on features extracted from EMD and wavelet packet domains
Heart Problem Detection from Electrocardiogram …
623
21. Tripathi PM, Kumar A, Kumar M, Komaragiri R (2022) Multi-level classification and detection of cardiac arrhythmias with high-resolution superlet transform and deep convolution neural network. IEEE Trans Instrum Meas 22. Wang C, Yang S, Tang X, Li B (2019) A 12-lead ECG arrhythmia classification method based on 1D densely connected CNN. In: Machine learning and medical engineering for cardiovascular health and intravascular imaging and computer assisted stenting. Springer, pp 72–79 23. Xu X, Liu H (2020) ECG heartbeat classification using convolutional neural networks. IEEE Access 8:8614–8619 24. Yildirim Ö (2018) A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification. Comput Biol Med 96:189–202 25. Zhai X, Tin C (2018) Automated ECG classification using dual heartbeat coupling based on convolutional neural network. IEEE Access 6:27465–27472 26. Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150 27. Zubair M, Kim J, Yoon C (2016) An automated ECG beat classification system using convolutional neural networks. In: 2016 6th international conference on IT convergence and security (ICITCS). IEEE, pp 1–5
Deep Monarch Butterfly Optimization-Based Attack Detection for Securing Virtualized Infrastructures of Cloud Bhavana Gupta
and Nishchol Mishra
Abstract Due to the tremendous advancements in technology and development in cloud computing, it has become more common for businesses to implement virtualization in data centers to fully use their hardware resources. As a result, virtualization and security have recently undergone several modifications. Virtualization and its remarkable design provide various benefits and features over non-virtualized systems. Nonetheless, these new characters provide new weaknesses and potential attacks in virtualized systems. This paper presents a novel two-stage secure virtualization paradigm in the cloud. At first, the “Improved holoentropy and Exponential Moving Average features” are derived. The derived features are then classified using the deep belief network to identify the attacks in the network. To improve the detection accuracy, the weights of DBN will be optimally tuned using the monarch butterfly algorithm. Further, analysis is done based on varied metrics Keywords Virtualization · Optimization · Cloud computing · Monarch butterfly optimization · Deep belief network
1 Introduction CC is shown as a solution that provides infinite on-demand IT supplies to third parties through the Internet with minimum interaction by SPs [1, 2]. CC uses virtualization to provide custom IT services on demand, previously only available in data centers. Third-party users employ appliance applications to access IT resources on the Internet. The ideas of “utility computing, grid computing, distributed computing, cluster computing, and virtualization” [3] comprise CC. CC also refers to the extension of the previously described computing. The CC virtualization approach is used in data centers to distribute workload to servers, IT resources, monitoring, and resource B. Gupta (B) · N. Mishra SOIT, RGPV Bhopal, Bhopal, India e-mail: [email protected] N. Mishra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_45
625
626
B. Gupta and N. Mishra
usage. CSPs virtualize sources in data centers and provide third-party services on a pay-per-use basis [1, 4, 5]. Several CSPs provide trading businesses services to simplify maintenance and management procedures [6, 7]. Other services CC provides to third parties include computer resources, workload management, resource backup, and software and hardware services. CSPs charge third-party customers using cloud services following SLA [8]. CC offers several services, including SaaS, XaaS, IaaS, and PaaS. IaaS provided by CC confronts several challenges, including privacy and security of hosted cloud databases, optimum power consumption, and fault tolerances [1, 9– 11]. CC employs various ways to maximize available IT infrastructure sources to provide better services [12–14]. Traditional solutions that rely on virtualizationoriented memory security may provide some amount of privacy to software [15]. They were, however, designed to protect partial code privacy or do not address the critical difficulties in a true cloud marketplace environment, in which code privacy should be entirely maintained over the whole lifetime of the software after its compliance [16–19]. The following contributions are included in the work: • Develops a unique assault detection model based on improved holoentropy features such as Exponential Moving Average. • Uses improved DBN to identify assaults, with DBN weights tuned using the MBO technique. Section 2 goes through the chosen subject. Section 3 describes the intended work. Section 4 explains how to extract features from input data. Classifiers are shown in Sect. 5: optimized DBN. Sections 7 and 8 provide the findings and conclusions.
2 Literature Review Attack on Virtualized Infrastructure Asvija et al. [1] published a BAG with the results of a security risk assessment for cloud computing’s virtualized network infrastructures. BAGs were important security attack tools that changed the system’s complexity. They add mostly to reference conditional probability estimates with BAG nodes that use attacks seen in the CVE database and virtualized approaches to add to these estimates. The proposed method would end with BAG, which device architects would use to think about important security design issues and choose the right safety measures. Lastly, the suggested strategy and its results show that the infrastructure works better. Compastié et al. [20] published an in-depth look at a framework based on server virtualization designs to find security holes in cloud safety virtualization technologies. The design of the completely virtualizable architecture makes it possible to keep an eye on some of the most important instructions. The VMM controls this monitoring with the help of the trapping mechanism. Consequently, the technique
Deep Monarch Butterfly Optimization-Based Attack Detection …
627
was fully virtualizable because all the private instructions were bundled in the privileged instructions. So, sensitive instructions were taken care of, instructions from an unprivileged mode could be intercepted, and traps could be used to control another privileged mode. Lastly, the results of the changed technique showed that the design was more secure. Patil et al. [16] suggested a good way to keep cloud computing virtual networks safe and find intrusions in 2019. A proposal was made to have each cloud server setup with its HLDNS. The node monitors all inbound and outbound traffics on the local area network, the Internet, and any other networks connected to the virtualized environment for signs of intrusion. Researchers made it more general to get useful cloud network traffic functions from a BBA by adding two more fitness functions. Lastly, the performance of the suggested system showed that the cloud security network made analysis better. Alaluna et al. [21] used embedding to figure out what the safe multi-cloud virtual network would look like in 2020. Safe virtual network routing can be found by shifting attention to the substrate network, previously ignored as a potential solution to the primary resource distribution problem in network virtualization and encoding. The MILP was made to solve this problem in the best way possible. The results of our tests show how putting security requirements into network virtualization has pros and cons. Lastly, the performance of the suggested method showed that virtual security networks could make good use of their resources. Agarwal and Duong [22] suggested in 2019 that virtual machines in cloud computing should be kept safe. This method showed a new way to place virtual machines (VMs) called “Previously Co-located Users First” so that harmful colocation in VMs would happen less often. On safe VM placement methods that a cloud service provider can use to automatically protect against some risks related to co-location. Here is a way to measure and evaluate the co-location security of public IaaS clouds with multiple tenants. Lastly, the suggested strategy did better than other well-known strategies in cloud co-location resistance. In 2018, Win et al. [23] published a big data-based security analytics solution that can detect targeted assaults on cloud computing services and infrastructure. The HDFS holds network and device program information from guest VMs. To detect potential attack vectors and extract security characteristics, the MapReduce parser and graph-based event correlation will be used. Finally, the results of the testing on the proposed strategy reveal that it outperforms other models. Joseph and Mukesh [24] reported malware attacks in cloud storage where virtual machines used VM snapshots to fix themselves. Snapshots were often used as backups in certain backup servers. Using the VMs’ ability to fix themselves in the local cloud network would reduce the load on the backup server without affecting any VMs in the backup server. Also, machine learning was used to tell the difference between attacked and unaffected pictures. Lastly, the technique has shown that it is better at finding and identifying different malware threats. Benkhelifa et al. [25] suggested a framework for monitoring a virtual environment in 2019, mainly for a cloud provider to keep virtual infrastructure and services safe. Because the testing method was not always certain, it was hard to determine how
628
B. Gupta and N. Mishra
to price the needed thing. This study aims to help businesses and universities that are building virtualized cloud-based applications and their ecosystems. A look at the difficulties of making a software system for testing a virtual network through a service. Lastly, the suggested strategy worked better than other methods in terms of likely solutions.
3 Proposed Attack Detection Framework This paper shows a new way to use security analytics to find attacks in cloud computing’s virtualized infrastructure. The work is mostly about getting real information from the network’s properties to determine if an attack is happening in a virtualized infrastructure. The proposed model has two main steps: (i) extracting features and (ii) classifying them. Given that the suggested technique is a “big data” problem with competing requirements for speed, volume, and accuracy, it is intended to take advantage of Python’s multi-process framework to partition the data in a way that makes sense and speeds up the process. For each dataset that has been separated, proposed holoentropy-based features and EMA features are taken out. Lastly, the features from the different processes are classified to see if there is an attack in the cloud. An MBO algorithm is used to find the best way to adjust the weights of DBN so that the model can predict the attack more accurately. Figure 1 shows the general structure of the work that was done. Feature Extraction Here, we extract two distinct sorts of features: (i) holoentropy features and (ii) EMA features. 1. Holoentropy feature Conventional holoentropy: “The holoentropy is defined as the sum of entropy, and the overall correlation of the random vector is shown as the sum of the entropies of all characteristics” [26]. HE X (Y ) = E X (Y ) + C X (Y ) =
m
HE X (yi )
(1)
i=1
In Eq. (1), E X (Y ) shows the entropy, C X (Y ) shows the correlation, X shows the dataset, and Y defines the random vector. Weighted holoentropy: The sum of the weighted entropies of all the attributes is the weighted holoentropy [26]. Equation (2) represents a weighted holoentropy (2). WHE X (Y ) =
m i=1
w X (yi )HE X (yi )
(2)
Deep Monarch Butterfly Optimization-Based Attack Detection …
629
Fig. 1 Architecture of proposed model
In Eq. (3), the reverse sigmoid function is applied as a weighting factor. w X (yi ) = 2 1 −
1 1 + exp(−HE X (yi ))
(3)
In Eq. (3), the reverse sigmoid is a generating function that changes from (0, 2). Also, the weight coefficients are among 0 and 1, and the entropies are positive. Weighted Probability Proposed Holoentropy: We propose the following formula for determining the holoentropy using the weighted probability it introduces. HL E (X i ) = w × E(X i ) w =2× 1− E(X i ) =
1 1 + exp(−E(X i )) μ(X i ) t=1
pt log pt
(4) (5)
(6)
630
B. Gupta and N. Mishra
In Eq. (6), pt denotes the probability of feature attributes X i and μ(X i ) is the feature attribute’s unique attribute. Similarly, Eq. (7) shows the suggested holoentropy for both the feature and label characteristics given as: HL E (X i , yi ) = w × E(X i , yi ) w =2× 1−
1
(8)
1 + e−E(X i ,yi )
E(X i , yi ) =
μ(X i )
(7)
pt
(9)
t=1
Differential holoentropy: Consider taking an object x0 of X , the object’s differential holoentropy [26] x0 as per Eq. (10) is the difference between weighted holoentropy h X (x0 ) of X and the holoentropy of X/{x0 }. h X (x0 ) = WHE X (Y ) − WHE X/{x0 } (Y )
(10)
m w X (yi )HE X (yi ) − w X/{x0 } (yi )HE X/{x0 } (yi ) h X (x0 ) =
(11)
i=1
In Eq. (11), w X (yi ) is shown as a reverse sigmoid function of entropy HE X (yi ). The difference between w X (yi ) and w X/{x0 } (yi ) is lower than the entropy HE X (yi ). The differential holoentropy is shown more simply in Eq. (12). hˆ X (x0 ) =
m
w X (yi ) HE X (yi ) − HE X/{x0 } (yi )
(12)
i=1
4 Exponential Moving Average Features An EMA [27] is a form of MA with the greatest weight and significance for the present data points. A moving average with exponential weighting responds more quickly μ ; t = 0, . . . , T to changes in each observation period. The EMA is defined as t asset returns y = y t ; t = 0, . . . , T described in Eq. (13). μt = βyt−1 + (1 − β)μt−1 μ0 = α ∈ R
t ≥1
(13)
Deep Monarch Butterfly Optimization-Based Attack Detection … Fig. 2 Solution encoding
M1
M2
M3
……
631
MN
A
Equation (13) α is a constant (mean of asset returns {y(t)}) recognized as the constant smoothing factor, which reduces the degree of weighting. The architectural diagram of multiprocessing is depicted in Fig. 2 [28].
5 Optimized Attack Detection Using Deep Belief Network The DBN is trained to detect an attacker’s presence. The DBN [29] structure is a way of thinking comprised visible neurons, hidden neurons, and various layers that comprise the output layer. Also, there is a strong connection between hidden and input neurons. But, there are no connections between neurons that can be seen, and the association rules are not present in neurons that cannot be seen [30]. The connection between the neurons that can be seen and those that cannot be seen is symmetric and unique. The output of neurons in a Boltzmann network is based on chance. The result O depends on the probability function Q p (δ) shown in Eq. (15). The DBN representation is shown by Eq. (16), where q is the “fake temperature.” Q p (δ) =
1 1+e
(14)
−δ q
1 with 1 − Q p (δ) 0 with Q p (δ) ⎧ ⎨ 0 for δ < 0 1 1 lim+ Q p (δ) = lim+ = for δ = 0 −δ ⎩2 q→0 q→0 1 + e q 1 for δ > 0 O=
(15)
(16)
The process of feature processing in the DBN architecture occurs via a series of RBM layers, and the classification method is performed using MLP. The mathematical design depicts Boltzmann machine energy as a neuron or binary state b in Eqs. (17) and (18). The weights among the neurons Mc,l are shown here, best adjusted by a monarch butterfly algorithm, and θa defines the biases. N (b) = −
c p. MBO(t+1) = AMBO(t) Ai,k s2,k
(35)
In Eq. (35), AMBO(t) specifies the kth element AMBO in the new position of the s2 s2,k generated monarch butterfly s2. The monarch butterfly s2 is randomly selected from Subpopulation 2 [34]. Butterfly adjusting operator: The monarch butterfly positions will be updated based on the butterfly adjusting operator. For all j elements in the monarch butterfly, the produced r is lower than or equal to, as expressed in Eq. (36) [34]. = AMBO(t) AMBO(t+1) j,k b,k
(36)
specifies the kth element of AMBO at t +1 generation which In Eq. (36), AMBO(t+1) j j,k denotes the demonstrates the position j of the monarch butterfly. Likewise, AMBO(t) b,k MBO kth element of the best monarch butterfly Ab in L1 and L2. Contrarily, if r is
Deep Monarch Butterfly Optimization-Based Attack Detection …
635
greater than p, AMBO(t+1) is as per Eq. (37). j,k = AMBO(t) AMBO(t+1) j,k s3,k
(37)
refers to the kth randomly selected element in L2. FurtherIn Eq. (37), AMBO(t) s3,k more, s3 ∈ {1, 2, . . . , TP2 }, where TP indicates the total population. Suppose r > BAR, AMBO(t+1) is updated as per Eq. (38). j,k AMBO(t+1) = AMBO(t+1) + α × (dxk − 0.5) j,k j,k
(38)
In Eq. (38), the butterfly’s rate of change B A R is shown by the weighting factor β in Eq. (40), and the monarch butterfly’s walk step j is shown by dx. How to figure out the Levy flight is shown in Eq. (39). In Eq. (40), Sm shows how far a monarch butterfly can walk in a single step [34]. dx = L l tj
(39)
β = Sm /t 2
(40)
7 Results and Discussion A. Simulation Procedure The Python-based DBN + MBO (DMBO) threat detection model was executed on AWS cloud infrastructure. The outcomes were documented. Furthermore, for the following parameters: accuracy, sensitivity, specificity, precision, FPR, F-measure Recall, and MCC, the proposed DBN + MBO-based attack detection system was compared to other conventional models such as SVM, KNN, and Naive Bayes. The performance analysis was carried out for learning rates ranging from 60 to 90. B. Performance Analysis See Fig. 3. C. Performance Analysis of the CSE-CIC-IDS2018 Dataset Figure 3 shows that the DBN + MBO method is 24.14%, 22.35%, 8.7%, and 17.3% more accurate than the RNN, SVM, KNN, and Naive Bayes models, respectively, for the split ratio of 70. Also, for a split ratio of 80, the DBN + MBO method is 21.5%, 38.8%, 39.7%, and 39.4%, which is better than RNN [28], SVM [6], KNN [35], and Naive Bayes [24] in in terms of precision.
636
B. Gupta and N. Mishra
Fig. 3 Performance analysis on learning rate 70 and 80
The CSE-CIC-IDS 2018 datasets can be found at https://www.unb.ca/cic/dat asets/ids-2018.html and have the data that back up the result of this study.
8 Conclusion This study presents a novel paradigm for employing security analytics in virtualized infrastructure to detect cloud-based threats. Since the study is focused on large data issues and the behavior of networks, the detection phase was divided into two major steps: (1) feature extraction and (2) classification. During the feature extraction step, suggested holoentropy and EMA features were extracted. The retrieved characteristics were subjected to the classification process, and the optimized DBN was utilized to identify network threats. The weights were adjusted using a “Monarch Butterfly” technique that assisted the DBN in learning. It increased the detection’s precision. In addition, the MBO technique was 24.14%, 22.35%, 8.7%, and 17.3% more accurate than the conventional RNN [28], SVM [6], KNN [35], and Naive Bayes models, respectively. In addition, the MCC of the proposed DBN + MBO model for split ratio 70 is still more than RNN, SVM, KNN, and Naive Bayes. This work may be developed in the future to incorporate the categorization of many classes, and it will be utilized in real time.
References 1. Asvija B, Eswari R, Bijoy MB (2020) Bayesian attack graphs for platform virtualized infrastructures in clouds. J Inf Secur Appl 51:102455 2. McDonnell N, Howley E, Duggan J (2020) Dynamic virtual machine consolidation using a multi-agent system to optimize energy efficiency in cloud computing. Future Gener Comput Syst 108:288–301 3. Hussain SA, Fatima M, Saeed A, Raza I, Shahza RH (2017) Multilevel classification of security concerns in cloud computing. Appl Comput Inform 13(1):57–65
Deep Monarch Butterfly Optimization-Based Attack Detection …
637
4. Jeddi S, Sharifian S (2020) A hybrid wavelet decomposer and GMDH-ELM ensemble model for network function virtualization workload forecasting in cloud computing. Appl Soft Comput 88:Art.no. 105940 5. Alsmadi D, Prybutok V (2018) Sharing and storage behaviour via cloud computing: security and privacy in research and practice. Comput Hum Behav 85:218–226 6. Xu X, Zhang Q, Maneas S, Sotiriadis S, Bessis N (2019) VMSAGE: a virtual machine scheduling algorithm based on the gravitational effect for green Cloud computing. Simul Model Pract Theory 93:87–103 7. Mavridis I, Karatza H (2019) Combining containers and virtual machines to enhance isolation and extend functionality on cloud computing. Future Gener Comput Syst 94:674–696 8. Hudic A, Smith P, Weippl ER (2017) Security assurance assessment methodology for hybrid clouds. Comput Secur 70:723–743 9. Fernández-Cerero D, Jakóbik A, Grzonka D, Kołodziej J, Fernández-Montes A (2018) Security supportive energy-aware scheduling and energy policies for cloud environments. J Parallel Distrib Comput 119:191–202 10. Namasudra S, Devi D, Kadry S, Sundarasekar R, Shanthini A (2020) Towards DNA-based data security in the cloud computing environment. Comput Commun 151:539–547 11. Kumar V, Ahmad M, Mishra D, Kumari S, Khan MK (2020) RSEAP: RFID-based secure and efficient authentication protocol for vehicular cloud computing. Veh Commun 22:Art.no. 100213 12. Luo L, Xing L, Levitin G (2019) Optimizing dynamic survivability and security of replicated data in cloud systems under co-residence attacks. Reliab Eng Syst Saf 192:Art.no. 106265 13. Paulraj GJL, Francis SAJ, Dinesh Peter J, Jebadurai IJ (2018) Resource-aware virtual machine migration in IoT cloud. Future Gener Comput Syst 85:173–183 14. Zhu W, Zhuang Y, Zhang L (2017) A three-dimensional virtual resource scheduling method for energy saving in cloud computing. Future Gener Comput Syst 69:66–74 15. Shrestha M, Johansen C, Noll J, Roverso D (2020) A methodology for security classification applied to smart grid infrastructures. Int J Crit Infrastruct Prot 28:Art.no. 100342 16. Patil R, Dudeja H, Modi C (2019) Designing an efficient security framework for detecting intrusions in the virtual network of cloud computing. Comput Secur 85:402–422 17. Ficco M, Chora´s M, Kozik R (2017) Simulation platform for cyber-security and vulnerability analysis of critical infrastructures. J Comput Sci 22:179–186 18. Bazm M-M, Lacoste M, Südholt M, Menaud J-M (2019) Isolation in cloud computing infrastructures: new security challenges. Ann Telecommun 74(3–4):197–209 19. Nowakowski P, Bubak M, Barty´nski T, Gubała T, Meizner J (2018) Cloud computing infrastructure for the VPH community. J Comput Sci 24:169–179 20. Compastié M, Badonnel R, Festor O, He R (2020) From virtualization security issues to cloud protection opportunities: an in-depth analysis of system virtualization models. Comput Secur 97:Art.no. 101905 21. Alaluna M, Ferrolho L, Figueira JR, Neves N, Ramos FMV (2020) Secure multi-cloud virtual network embedding. Comput Commun 155:252–265 22. Agarwal A, Duong TNB (2019) Secure virtual machine placement in cloud data centres. Future Gener Comput Syst 100:210–222 23. Win TY, Tianfield H, Mair Q (2018) Big data based security analytics for protecting virtualized infrastructures in cloud computing. IEEE Trans Big Data 4(1):11–25, 2715335 24. Joseph L, Mukesh R (2018) Detection of malware attacks on virtual machines for a self-heal approach in cloud computing using VM snapshots. J Commun Softw Syst 249–257 25. Benkhelifa E, Bani Hani A, Welsh T, Mthunzi S, Ghedira Guegan C (2019) Virtual environments testing as a cloud service: a methodology for protecting and securing virtual infrastructures. IEEE Access 7:108660–108676 26. Wu S, Wang S (2013) Information-theoretic outlier detection for large-scale categorical data. IEEE Trans Knowl Data Eng 25(3):589–602 27. Nakano M, Takahashi A, Takahashi S (2016) Generalized exponential moving average (EMA) model with particle filtering and anomaly detection. Expert Syst Appl
638
B. Gupta and N. Mishra
28. Manickam M, Ramaraj N, Chellappan C (2019) A combined PFCM and recurrent neural network-based intrusion detection system for cloud environment. Int J Bus Intell Data Min 14(4):504–527 29. Wang HZ, Wang GB, Li GQ, Peng JC, Liu YT (2016) Deep belief network-based deterministic and probabilistic wind speed forecasting approach. Appl Energy 182:80–93 30. Thomas R, Rangachar MJS (2018) Hybrid optimization based DBN for face recognition using low-resolution images. Multimed Res 1(1):33–43 31. Wang G-G, Deb S, Cui Z (2019) Monarch butterfly optimization. Neural Comput Appl 31:1995–2014 32. Wang, G-G, Deb S, Coelh L (2015) Elephant herding optimization. In: International symposium on computational and business intelligence (ISCBI) 33. Marsaline Beno M, Valarmathi IR, Swamy SM, Rajakumar BR (2014) Threshold prediction for segmenting tumour from brain MRI scans. Int J Imaging Syst Technol 24(2):129–137 34. Gupta B, Mishra N (2022) Optimized deep learning-based attack detection framework for secure virtualized infrastructures in the cloud. Int J Numer Model Electron Netw Devices Fields 35(1):e2945 35. Zhang W, Chen X, Liu Y, Xi Q (2020) A distributed storage and computation k-nearest neighbour algorithm based cloud-edge computing for cyber-physical-social systems. IEEE Access 8:50118–50130
Artificial Intelligence Technologies Applied to Asset Management: Methods, Opportunities and Risks Saad Kabak and Ahmed Benjelloun
Abstract The use of artificial intelligence in asset management is an emergent trend that is strongly and intensely energizing the environment of asset management companies, prompting and imposing the adaptation and digital transformations of asset management companies and their management in order to maintain and strengthen their competitive advantages in the market. In this paper, we seek to explore, on the basis of a literature review, some use cases of artificial intelligence within the life of an asset manager. It does not claim to be exhaustive or technical, as the financial markets are so different and diverse, whether in terms of asset classes or in terms of activities. The literature review conducted suggests that the current advances in artificial intelligence provide asset managers with the tools that can allow them to create value from the flood of data they are submerged in every day. Artificial intelligence helps the manager make investment choices by anticipating market trends and better understanding market sentiment. It also helps him to be more relevant in his proposals thanks to his knowledge of investors. Nevertheless, there are still obstacles to the adoption of artificial intelligence. Access to data, heavy investments and transparency are among the main challenges that remain. Nevertheless, there are still challenging obstacles hindering the adoption of artificial intelligence among asset management companies such as access to data, transparency and the heavy investments on the infrastructure, to name a few. Keywords Artificial intelligence · Asset management · Financial market
S. Kabak (B) · A. Benjelloun National School of Business and Management, University Mohammed Ben Abdellah, Fez, Morocco e-mail: [email protected] A. Benjelloun e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_46
639
640
S. Kabak and A. Benjelloun
1 Introduction In the field of finance, we witness a millennium of constant technological evolution. A report by McKinsey [1] shows that finance is leading the way in the appropriation of artificial intelligence technologies. The asset management business is no exception to this technological trend, which is reconfiguring and even upending it. According to a study conducted by CFA Institute [2], asset management is considered by investment professionals to be the finance sector likely to be impacted by artificial intelligence innovations, far ahead of banking, insurance or financial consulting. In the United States of America, England or France, several Fintech companies have emerged. They have developed robots specialized in asset management. Faced with these rapid technological innovations, we seek to answer the following question: What are the implications of using artificial intelligence in asset management? In order to answer our research question and using a research methodology based on documentary research, we will conduct a literature review related to the key notions of our question, namely artificial intelligence and asset management. Then, we will reveal the results of our literature review on some cases using artificial intelligence technologies that could constitute opportunities and threats for asset managers.
2 Literature Review Before starting the discussion on the topic of the implications of artificial intelligence on asset management, we will conduct a literature review about the key concepts of our research, namely asset management and artificial intelligence.
2.1 Asset Management and Asset Managers Asset management is a profession that involves the asset manager growing his clients’ assets by complying with regulatory and contractual obligations and applying investment strategies to achieve the best possible return for the risk chosen [3]. Asset managers operate in asset management companies which are responsible for the financial, administrative and accounting management of products managed on behalf of third parties and which are approved for this purpose by the regulator. Three types of activities can be distinguished in portfolio companies: front-office activities, middle-office activities and back-office activities. Regarding the functioning of asset management companies, it should be noted that there is no single organizational structure (business model) relating to these companies, but a multitude of framework that companies choose according to different parameters such as specialization, size, degree of expertise in investment, commercial strategy, shareholding structure, partnerships and distribution methods [4]. Historically, one name
Artificial Intelligence Technologies Applied to Asset Management: …
641
Fig. 1 Markowitz efficient frontier curve
unquestionably associated with asset management is that of the 1990 Nobel Prize in Economics, Markowitz. In 1952, Markowitz proposed a portfolio optimization model (Mean–Variance model) that defines the set of efficient portfolios. On the (P, σ 2) plane, Markowitz shows that the set of efficient securities and portfolios is a set limited by a curve resembling a half-hyperbola called the “efficient frontier” (see Fig. 1). • The curve represented all possible portfolio combinations offered by the market. However, not all points on it are optimal. • Only the upper branch of the hyperbola is the efficient frontier. • The efficient frontier is necessarily increasing, because the higher the agent accepts the risk, the higher the expected return. • The dominated portfolios, located in the lower part, will not attract the interest of a rational agent. • The outer part of the points on the curve is composed of impossible portfolios. According to Markowitz, a rational agent would place himself on a point of the curve of efficient portfolios, since outside it, the risk would be higher for a given return or conversely, the return would be lower for a given risk. It should be noted that based on Markowitz’s work, many variants have been developed among which we find the expected utility theory, the capital asset pricing model (CAPM) or the arbitrage pricing theory. In the field of asset management, other works are developing in parallel with those of Markowitz, such as Tobin’s separation theorem (1958) or Shefrin and Statman’s behavioral portfolio theory (2000). These developments are taking place at the same time as the globalization of finance and the evolution of technological means and the regulatory environment that have transformed the world of asset management of financial assets into a global one.
642
S. Kabak and A. Benjelloun
2.2 Financial Theory and Asset Managers There is no consensus on the status of asset managers in the financial market. They have evolved with the evolution of different financial theories. In what follows, we try to reconcile the different theoretical controversies on the profile of asset managers. Classical finance theory and asset managers Based on the classical financial theory, asset managers are assimilated to homo economicus, i.e., perfectly rational agents. Their place in the financial market is important because they are at the center of the efficiency mechanism. Indeed, their almost total dedication to the search for new information makes them true sources of information, which contributes to the dissemination of information and experts that counteract irrational agents and eliminates the differences between the fundamental and market values of assets. In terms of investment decision making of asset managers, the classical finance approach advocates a normative approach based on two duties, namely the calculative duty1 and the fundamentalist duty [5]. Behavioral finance theory and asset managers The status of asset managers in the financial market changes with the theory of behavioral finance [6]. In financial markets, the investment decisions of agents are constantly biased by emotions and faulty reasoning. Agents are not devoid of the psychological dimension as understood by the theory of classical finance, and the financial market is not a neutral space where agents exercise uniform rationality with a view to maximizing their utility. For the theory of behavioral finance, the decisions of agents cannot be normative in situations of complex calculations and with limited cognitive capacities. For this reason, behavioral finance theory proposes a behavioral hypothesis borrowed from Simon. This is a more modest behavioral hypothesis, of the bounded rationality (or procedural) type, which presents individuals as rational agents but with cognitive limits. In terms of financial asset pricing, behavioral finance theory criticizes fundamental analysis and emphasizes technical or chartist analysis.2
2.3 Artificial Intelligence Applied to Finance Definition of the concept of artificial intelligence Artificial intelligence is a science that emerged after the Second World War with the invention of the first computers and especially after the landmark discoveries of the Turing test that establishes a link between human intelligence and the machine that 1
Asset managers are presented in classical finance theory as geniuses endowed with numerous cognitive faculties and owners of elements allowing them to make the optimal decision [7]. 2 Technical analysis is the study of financial market trends, mainly on the basis of graphs, with the aim of predicting future trends in stock prices.
Artificial Intelligence Technologies Applied to Asset Management: …
643
qualifies as conscious [8]. According to Minsky [9], artificial intelligence is defined as the construction of computer programs that perform tasks that are, for the moment, more satisfactorily accomplished by human beings, because they require high-level mental processes. Russell and Norvig [10] propose 4 dimensions to define artificial intelligence: acting humanly, thinking humanly, acting rationally and thinking rationally. The concept of artificial intelligence refers to computers that mimic aspects of human thinking through the application of advanced analytical and logical techniques, including machine learning, to interpret events, support and automate decisions [11]. Artificial intelligence is a science that has a dual purpose of simulating human behavior in reasoning activities and replacing humans in certain automatic and repetitive tasks [12]. It should be noted that although the first models of artificial intelligence were established as early as the 1940s, they have only developed on a large scale very recently due to a number of factors that are as follows: software and cloud services, Big Data, processor speed and computing power [13]. Applications of artificial intelligence to asset management Table 1 lists some of the applications of the artificial intelligence applied to asset management which are well illustrated thus: Several artificial intelligence solutions and applications are currently applied to the activities of asset management companies in particular machine learning, roboadvisors and robo-traders, Big Data or blockchain. Different asset management companies use these technologies due to their advantages: reduction of financial market complexity, transparency, reduction cost, speed of decision taking, attraction of new clients, portfolio diversification, stock picking optimization, asset allocation, etc.
3 Methods In order to answer our research question about the implications of using artificial intelligence in the context of asset management, we relied on a documentary research methodology. We carried out a review of theoretical literature review on the previous work that has studied the implications of artificial intelligence tools on asset management. Only recent contributions in the field of research have been reported. In the next section, we advance the conclusions proposed by these studies, faithfully.
4 Result Analysis After defining the key concepts of our research, we will present the implications of artificial intelligence on the asset manager’s job. Indeed, solutions based on artificial intelligence bring both opportunities and risks for the asset manager.
644
S. Kabak and A. Benjelloun
Table 1 Summary of the applications of the artificial intelligence applied to asset management Application
Definition and application in the field of asset management
Machine learning or automatic learning
Machine learning is the implementation of algorithms to obtain predictive analysis from massive data. It is carried out in three stages: representation, evaluation and optimization. The representation phase consists in finding the most suitable mathematical model. The evaluation phase measures the gap between the model and the reality of the test data. And finally, the optimization phase aims at reducing this gap [14]. In asset management, the use of machine learning techniques is not new [15, 16]. What is rather innovative is, the design of methods that allow a machine to adapt and evolve thanks to systematic processes which are sources of learning and which make it possible to perform complex tasks via algorithmic operations [17, 18].
Automated tools: The case of robo-advisors and robo-traders
Robotization is the process of assigning tasks usually performed by humans to robots [19]. Robot is a machine, located in the physical world, that acquires data through sensors that detects and records physical signals, interprets the acquired data, to make decisions about actions and which it carries out in this physical world [20]. For asset management companies, there can be several modalities of robotization [21]. Several asset management firms have developed robo-advisors or robo-traders that complement the human model of financial analyst and asset manager by providing services with lower cost [22, 23].
Analysis of collected mega-data (Big Data)
Popularized by John Mashey during the 1990s, the notion of Big Data refers to data in growing volumes, with great variety and at great speed. It is important to point out that what is important about Big Data is not the size, but the possibility of exploiting all this data using powerful data tools. Several tools are used as computing infrastructure to analyze massive data: MySQL, HDFS Hadoop, etc. [24]. For asset management companies, the implementation of massive data analysis tools is likely to bring real added value, particularly in terms of data aggregation and filtering, portfolio diversification, stock picking optimization, risk alert, asset allocation, behavioral analysis to build client loyalty, predictive analysis to attract new clients [25]. (continued)
Artificial Intelligence Technologies Applied to Asset Management: …
645
Table 1 (continued) Application
Definition and application in the field of asset management
Blockchain
Blockchain is one of the technological developments brought forward by artificial intelligence. Blockchain is a “system open to both professionals and individuals, allowing them to conduct transactions at any time anywhere in the world in an efficient, public, inexpensive, secure and fast manner” [26]. Blockchain is a technology that turns the classical paradigm on which the organization and functioning of financial activities are built upside down. Indeed, in a traditional financial system, banks and financial institutions act as trusted third parties for financial transactions (payments, exchanges of financial securities, registration of guarantees). They create value in the markets by collecting information in a secure, confidential and organized way. However, in a decentralized network, i.e., the blockchain network, the various participants in the network compete with each other to carry out the tasks usually performed by the trusted third party: collecting information, verifying data, approving exchanges and recording property transfers. According to the blockchain’s mode of governance, these rules are defined in a decentralized way by all the members of the network, who approve the updates of the protocol. Blockchain technology applied to asset management allows, among other things, the securing of data and values, the automatic execution of instructions when all conditions are met thanks to smart contracts technology, the disintermediated transmission of information or values, the preservation and sharing of information, the speed of execution, the reduction of costs and the transparency [27].
4.1 Opportunities of Artificial Intelligence for the Asset Manager Several studies have analyzed the opportunities of the appropriation of artificial intelligence technologies on the asset manager’s job. A recent study conducted by PricewaterhouseCoopers (PwC) among asset managers in the United States (2021) highlights the opportunities of the adoption of artificial intelligence. The study states that the activities of asset management companies most affected by the appropriation of artificial intelligence technologies are those of the back and middle office
646
S. Kabak and A. Benjelloun
(cross-checking information, financial analysis, calculating net asset value, calculating ratios, etc.). In what follows, we review the opportunities of the appropriation of artificial intelligence technologies on the asset manager’s job. Firstly, artificial intelligence technologies enable Big Data analysis [28]. In their daily work, companies have a large volume of data. But what they do with this mass of information? Almost nothing, as revealed by a study conducted by Dell, which notes that about 90% of the data is not analyzed and evaluated by companies. By integrating artificial intelligence technologies, asset management firms can make the most of the information available to them and refine their forecasts, as shown in the study conducted by PwC (2021) among US asset managers. Through the robotization of data collection and analysis processes, it is now possible for algorithms to monitor company sites and retrieve resolutions from general meetings. This saves time and allows for a better selection of assets without increasing costs. For several years, several asset management companies have been using algorithms to assist them in their investment choices. For example, Triumph Asset Management uses artificial intelligence technologies to read hundreds of thousands of articles per hour to identify sentiment and opinions about companies. Artificial intelligence, in addition to refining and enriching the data, will automate this time-consuming, lowvalue-added process [29]. The asset manager gains productivity and quality in his work and can focus where he has the most value. He will have access to much more accurate and global information at a lower cost and in less time. Secondly, if the processing of financial data is relatively well defined by many solutions, this is not yet the case for extra-financial data and it is on this last point that artificial intelligence technologies such as Natural Language Processing or Object Character Recognition have a role to play. These two technologies, which appeared in 2010, have assets in finance, rich in graphic and textual data [30]. They seem to be able to help asset management companies to process large amounts of extrafinancial data in a qualitative way. Artificial intelligence technologies make it possible to supplement financial data with extra-financial data. Several asset management companies use artificial intelligence technologies to analyze extra-financial data [31]. For example, Nn Investment Partners combine fundamental analysis with sentiment analysis, provided by Marketpsych, to integrate investors’ emotions and behavior into its analysis. Third, the use of artificial intelligence technologies allows the asset manager to make unbiased investment decisions. Indeed, several studies have mentioned that even financial market professionals such as asset managers are often victims of behavioral biases and that their rationality is limited [32], not only by the insufficiency or quality of informational resources, but also by the time required to make a decision, which differs according to the cognitive resources of each decision maker. A good investment decision requires essential informational, analytical and technological skills, high quality useful information, time to make the decision and cognitive resources to ensure its rationality. PwC’s 2021 survey of US asset managers [33] notes that a large majority of respondents indicate that the use of artificial intelligence technologies leads to better investment decisions with less behavioral bias.
Artificial Intelligence Technologies Applied to Asset Management: …
647
Indeed, artificial intelligence derives significant benefits from increased knowledge, lowered emotional instability and enhanced managerial capabilities. Fourth, the use of artificial intelligence technologies allows asset management companies to optimize client relationships, as shown in the PwC study (2021). Indeed, artificial intelligence technologies allow a better anticipation of clients’ needs and reactions thanks to an in-depth knowledge of their behaviors as well as to offer clients the financial products that are best suited to their needs. Several companies offer asset management companies behavioral analysis tools based on artificial intelligence tools. For example, the Sequential Quantum Reduction and Extraction Model (SQREEM) algorithm analyzes heterogeneous datasets to cluster client interests and predict products of interest. This algorithm is used by investment management firms such as Ubs, Deutsche Bank and Blackrock. Fifth, the use of artificial intelligence technologies allows asset management firms to reduce their transaction costs. Indeed, thanks to automated functions and machine learning, calculation and recommendation times are reduced [34, 35]. The asset manager’s time is focused on his expertise and on high value-added missions. Sixth, artificial intelligence allows asset management companies to differentiate themselves from the competition. The artificial intelligence Global Equity Fund is an example, in which the analysis of market data and identification of trade-offs is performed by an algorithm. This ability to develop predictive models to generate investment ideas, based on artificial intelligence technologies, is a key differentiator. We can anticipate that several spheres of asset management are concerned by the integration of artificial intelligence technologies: the collection and analysis of information, asset selection, prediction of future stock price trends, detection of market sentiment, client profiling, etc. Artificial intelligence technologies have concerned both the quantitative and qualitative aspects of asset management, with the automation of the quantitative aspect and the enrichment of the qualitative aspect.
4.2 Risks of Artificial Intelligence on the Asset manager’s Job After highlighting the opportunities that artificial intelligence technologies offer to asset managers, we would like to emphasize their risks. To do so, we rely mainly on the study conducted by PwC in 2021 among US asset managers and the survey conducted by the CFA Institute in 2016 among its 3800 members of the financial professional world. The first risk associated with the integration of artificial intelligence technologies in asset management is due to the costs of implementing these technologies. In the short term, the integration of artificial intelligence technologies could come up against investment costs, the difficulty of integrating them into asset management companies’ applications and information systems, and the still limited number of experts to operate them. Not to mention the ongoing training efforts of employees. Secondly, the risk posed by artificial intelligence is that of misuse of data. While it is true that the asset management business lends itself well to innovative artificial
648
S. Kabak and A. Benjelloun
intelligence solutions that use automata, care must be taken to ensure that the mixing of data does not lead to the use of privileged or confidential information. It will be necessary to ensure that the shuffling of data does not lead to the use of either privileged or confidential information, or data that is false or likely to mislead the market. Third, the deployment of artificial intelligence in the field of asset management will raise cybersecurity issues. Financial markets could be subject to a cyberattack. Hence, the need for regulation and close monitoring gives the magnitude of the risks. Asset management companies and their supervisors should be particularly demanding on the quality of artificial intelligence algorithms. It is necessary to ensure the explicability of the results and the auditability of the different stages of algorithm design. Fourth, another risk associated with artificial intelligence technologies lies in the qualitative characteristics of the data. Some data may not be meaningful, or may even be rapidly obsolete. The difficulty will then be to develop tools to detect and discard them [36]. Fifth, another risk attributed to the integration of artificial intelligence technologies in asset management is that of algorithmic failure. Like any computer program, artificial intelligence technologies are subjected to bugs, hacks and implementation errors. Indeed, it is attributed to asset management or automatic trading programs based on artificial intelligence, the ability to disrupt financial markets. The auditability of the system is essential in this respect. Sixth, another risk associated with the integration of artificial intelligence technologies in asset management is related to the problem of the difficulty of auditing algorithmic operations. Indeed, understanding an algorithmic decision is sometimes impossible. The question of auditing software and algorithms will become increasingly acute as artificial intelligence develops. Security will become increasingly important for financial institutions and their regulators. In particular, data protection must be ensured at all times and customers must be protected from cybercrime. Finally, the risk attributed to the integration of artificial intelligence technologies in asset management is related to the responsibility of asset management companies toward their clients. The regulatory framework for asset management must adjust to the new requirements of artificial intelligence. As more and more tasks and decisions are delegated to algorithms, the notion of responsibility will become blurred. Indeed, by integrating artificial intelligence in asset management, the question of the responsibility of the asset management company toward its clients arises. All of these risks could lead to the reluctance of asset management companies toward artificial intelligence technologies as mentioned in the study conducted by PwC among US asset managers in 2021, which notes that the majority of participants express a lack of confidence in artificial intelligence technologies, or the study conducted by the CFA Institute, which states that the majority of respondents are concerned about possible flaws in the algorithms and risks in data protection.
Artificial Intelligence Technologies Applied to Asset Management: …
649
5 Conclusion Artificial intelligence has changed the whole financial system, providing faster services that are customer-centric, which has not been offered by the traditional financial system. This holds for asset management as well since the advances in artificial intelligence have changed the way asset managers work, through specific techniques and more sophisticated algorithms. The literature review conducted suggests that the areas of collaboration between asset management and artificial intelligence are infinite. If innovations in artificial intelligence have made it possible to better understand and anticipate the evolution of the increasingly complex financial system in its structure, there are still obstacles that hinder the adoption of artificial intelligence.
References 1. Dietz M, Harle P, Khanna S, Mazingo C (2015) The fight for the customer. McKinsey Global Banking Annual Review, McKinsey & Company 2. CFA institue (2016) CFA Institute survey indicates substantial impact of Robo-Advisers on investment management 3. Mattioli J, Lamoudi S, Robic PO (2019) La gestion d’actifs augmentée par l’intelligence artificielle. In: Conférence nationale d’intelligence artificielle 4. Larminat P (2003) Entre quantitatif et qualitatif. Comment les investisseurs professionnels évaluent les gérants d’actifs financiers. Troisième série, volume 63. Numéro 1. L’argent, circuits et circulation 5. Tadjeddine Y (2018) La décision financière au prisme de la théorie économique, de la finance comportementale et des sciences sociales. Regards croisés sur l’économie 1:100–112 6. Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 1974(185):1124–1131 7. Kabbaj T (2015) L’art du trading. Editions Eyrolles 8. Harnad S (2003) Can a machine be conscious? How? J Conscious Stud 10(4–5):69–75 9. Minsky M (1956) Heuristic aspects of the artificial intelligence problem, edn. Services Technical Information agency 10. Russell S, Norvig P (2005) AI a modern approach. Learning 2(3):4 11. Cassou P-H (2019) Quelle finance en 2030? 40 points de vue d’experts. RB édition, Paris 12. Mattioli J, Robic PO, Reydellet T (2018) L’intelligence artificielle au service de la maintenance prévisionnelle. Quatrième conférence sur les Applications Pratiques de l’Intelligence Artificielle 13. Hadjitchoneva J (2020) L’intelligence artificielle au service de la prise de décisions plus efficace. Pour une recherche économique efficace 149 14. Kara Y, Boyacioglu M, Baykan O (2011) Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the Istanbul stock exchange. Expert Syst Appl 38(5):5311–5319 15. López de Prado M (2020) Machine learning for asset managers (elements in quantitative finance). Cambridge University Press, Cambridge. https://doi.org/10.1017/9781108883658 16. López de Prado M (2018a) Advances in financial machine learning, 1st edn.Wiley 17. López de Prado M (2018) The 10 reasons most machine learning funds fail. J Portfolio Manage 44(6):120–133
650
S. Kabak and A. Benjelloun
18. Patel J, Sha S, Thakkar P, Kotecha K (2015) Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst Appl 42(1):259–268 19. Feuerriegel S, Prendinger H (2016) News-based trading strategies. Decis Support Syst 90:65–74 20. Bekey, G. A. (2012). Current trends in robotics: Technology and ethics. Robot ethics: The ethical and social implications of robotics, 17–34. 21. Scholz P, Tertilt M (2021) Robo-Advisory: the rise of the investment machines. In: RoboAdvisory, Palgrave Macmillan, Cham, pp 3–19 22. Scholz P, Grossmann D, Goldberg J (2021) Robo economicus? the impact of behavioral biases on Robo-Advisory. In: Robo-Advisory, Palgrave Macmillan, Cham, pp 53–69 23. Chan J (2019) Automation of trading machine for traders: how to develop trading models, Springer Nature 24. Einav L, Levin J (2014) Economics in the age of big data. Sci 346(6210) 25. Dixon M, Klabjan D, Bang J (2017) Classification-based financial markets prediction using deep neural networks. Algorithmic Financ 6(3):67–77 26. Cohen-Hadria Y (2016) Blockchain: révolution ou évolution? La pratique qui bouscule les habitudes et l’univers juridique. Dalloz IT/IP 11:537 27. Kolanovic M, Krishnamachari R (2017) Big data and AI strategies: machine learning and alternative data approach to investing. JP Morgan Quant Deriv Strategy, May 28. López de Prado M (2019c) Ten applications of financial machine learning. Working Paper 29. Verdier M (2018) Blockchain and financial intermediation. Revue d’economie Financiere 129(1):67–87 30. Zhengyao J, Jinjun L (2017) Cryptocurrency portfolio management with deep reinforcement learning. In: 2017 Intelligent systems conference (IntelliSys), pp 905–913 31. Booth A, Gerding E, McGroarty F (2014) Automated trading with performance weighted random forests and seasonality. Expert Syst Appl 41(8):3651–3661 32. Simon HA (1977) Artificial intelligence systems that understand. In: IJCAI, pp 1059–1073 33. Pwc (2021) AI for asset and wealth managers 2021: industry priorities and delivering benefits 34. Creamer G, Ren Y, Sakamoto Y, Nickerson J (2016) A textual analysis algorithm for the equity market: the European case. J Invest 25(3):105–116 35. Kahn R (2018) The future of investment management, 1st edn. CFA Institute Research Foundation 36. Bréhier B (2019) Intelligence artificielle: quels impacts pour les marchés financiers? Banque & Droit N HS-2019-2—Octobre 2019
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial Electric Field Algorithm Indu Bala
and Anupam Yadav
Abstract Optimizing the reactive power dispatch problem (RPD) is considered the most thought-provoking job of the power system because the concerns are the economics and security of the system. RPD is a complex nonlinear and constrained optimization problem. The main aim of the problem is minimizing the reactive power loss. For achieving the objective, fine-tuning continuous and non-continuous constraint parameters are nested job to be done. So, concerning the complexity of the problem, we implemented a recently developed heuristic artificial electric field algorithm (AEFA) to the RPD problem. This algorithm is inspired by Coulomb’s law of electrostatic force, and the IEEE-14 bus system benchmark is taken for evaluating the performance of AEFA. The outcomes are examined with six other metaheuristics over dimensions 15 and 30. The outcomes are presented numerically and later proved statistically using Wilcoxon signed rank test. In all the considered evaluation methods, AEFA presented better results among all algorithms with the least computing time. Based on the analysis, it can be claimed that AEFA can be an efficient optimizer for solving the IEEE-14 bus system. Keywords Real-life optimization problem · Nature-inspired algorithm · Reactive power dispatch problem · Artificial electric field algorithm · Global optimization
1 Introduction The reactive power dispatch problem was presented by Carpentier [1]. Ever since, it has been a complicated but inspiring task for researchers. The main goal of the problem is to decrease the power loss of the power system. Based on the daily requirements, the power system has been updated, and the changes lead to RPD being very complex from time to time. Consequently, we require more heuristic algorithms to I. Bala (B) The University of Adelaide, Adelaide, SA 5005, Australia e-mail: [email protected] A. Yadav Dr BR Ambedkar National Institute of Technology, Jalandhar, Punjab 144011, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_47
651
652
I. Bala and A. Yadav
solve. The RPD problem has an objective function designed for reducing power loss, along with some constraints. For achieving the goal, fine-tuning between the control variables like transformer tab setting, adjustment of the capacitor, and inductor and capacity of voltage are primitive. Many traditional and advanced techniques have been applied since the beginning. In general, some popular methods for solving problems are arithmetic or traditional methods [2, 3] and advanced or heuristics optimization methods. Newton–Raphson method [4], central point method [5], and branch method [6] are a few examples of arithmetic methods. The genetic algorithm [7], differential algorithm [8], comprehensive learning gravitational search algorithm [9], and seeker optimization algorithm [10] are a few popular heuristic algorithms. Initially, traditional methods [2–4] were used to find the optimality of the problem. Since the problem was growing complex with time, the resultant methods were also updated to meet the requirements of the problem. Later on, researchers found the heuristic methodology which was easy to implement and had a low computation cost. Consequently, due to the stochastic nature and easy implementation, heuristic optimization techniques have become more popular and feasible to solve the RPD problem. The heuristic algorithms have fast convergence speeds, but still a wellversed variable handling strategy is a basic necessity to handle the RPD. Moreover, researchers have kept developing and updating heuristic algorithms to solve reallife optimization problems. For instance, Zhao et al. [11] updated the famous PSO algorithm into multi-agent PSO to solve the optimal power reactive problem. A multi-agent search strategy was used to find the global solution. Similarly, Wen and Yutian [12] define multiple objectives to solve RPD through PSO, while Mahadevan and Kannan [13] improvised PSO in terms of comprehensive search techniques to solve the problem. Later on, Mao and Li [14] proposed the hybridization of simulated annealing with PSO and presented a framework of static voltage stability in power systems. In fact, some algorithms that were designed to solve global optimization problems are also implemented on the reactive power system to solve RPD. For instance, Khazali and Alanter [15] implemented the harmony search optimization (HS) to RPD, which reduces the power loss along with controlling the voltage gains. Bala and Yadav [16] employed gravitational search algorithm (GSA) to resolve the problem and presented good results. Roy et al. [17] used biogeography-based optimization (BBO) to solve the multi-constrained RPD problem. Dao et al. [18] applied an improved Bat algorithm to optimize the RPD network. Khonsandi et al. [19] designed the shuffled frog leaping and Nelder–Mead algorithm to handle the multivariate constraints and successfully solve the RPD with good accuracy. Similarly, due to good convergence speed and easy implementation, heuristic algorithms have been applied to solve the RPD problem very often. Continuing the research, we applied a very recently developed artificial electric field algorithm (AEFA) to RPD for decreasing the power loss. Originally, AEFA is designed to solve continuous constrained optimization problems and has proven a good convergence record. The motive behind choosing the algorithm was its popularity and good convergence speed. The algorithm has already solved continuous as well as discrete optimization problems [20, 21]. The RPD problem involves mixed
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
653
parameters setting of continuous and discrete variables. So, the motive was to evaluate the efficiency of the algorithm over the mixed-parameter problem. A detailed description of the algorithm is presented in the next section. The paper is summarized as follows: First, the AEFA algorithm is described in detail, then the formulated reactive power dispatch problem is presented. After that, the implementation of AEFA over RPD is explained, then the experimental results are demonstrated, and finally, the conclusion is drawn with future scope.
2 Artificial Electric Field Algorithm AEFA [20] is motivated by the law of physics: Coulomb’s law of electrostatic force. The solutions are reflected in terms of charged particles holding some position in search region with respect to the designed objective function. According to Coulomb’s law, every charged particle can attract or repel other particles. Hence, force is in direct proportion to the multiple of charges and inverse proportion to the distance square. The procedure of AEFA is described in the following steps: Initialization: The random candidate solutions are generated and considered as i for i = 1, 2, . . . , ps, charge particles at any time t such that xi = x1i , x2i , . . . , x D where ps represents quantity of charged particles with dimension D. The best part of the AEFA algorithm is that it stores previously found solutions (positions) and updates the new one accordingly. The personal best position of charges can be measured as follows: Pi (t) if fit(Pi (t)) < fit(xi (t + 1) ) (1) Pi (t + 1) = xi (t + 1) if fit(xi (t + 1) ) ≤ fit (Pi (t)) where Pi (t) is the personal best position w.r.t time t. Calculate Force: As per Coulomb’s law, every charged particle attracts or repels, thus a force can be expected between the particles. Also, the charged particles have their own fitness value as they move, that is
fit(Pi (t)) − worst(t) f i (t) = exp best (t) − worst(t)
f i (t) Such that Q i (t) = ps i=1 f i (t)
(2) (3)
where fit Pi (t) represents fitness of charged particle Pi . Also, for a minimization case, the best (minimum) values and worst (maximum) values of particles can be calculated as follows: best(t) = min fit j (t) , j ∈ {1, 2, 3, . . . , ps}
(4)
654
I. Bala and A. Yadav
worst(t) = max fit j (t) , j ∈ {1, 2, 3, . . . , ps}
(5)
Finally, the force is calculated as follows: FiDj (t) =
k(t)Q i (t)Q j (t) PiD (t) − xiD (t) Ri j + ε
(6)
FiDj (t) represents the force between the charged particles i and j with dimension D for time t. k(t) represents Coulomb’s proportionality constant. Qi and Qj are the charge particles of i and j. Ri j = xi (t), x j (t) 2
k(t) = k0 × exp −α
itr maxitr
(7) (8)
where k0 , α are the control parameters and itr, maxitr represent number of iterations and maximum number of iterations. Thus, the total force can be calculated as follows: F (t) = D
ps
rand()FiDj (t)
(9)
j=1, j=i
F D is total force exerted by the charged particles with dimension D. rand() is the uniform distribution of random number in [0, 1]. Now Newton’s law is employed between the charged particles such as aiD (t) =
Q i (t)E iD (t) m i (t)
(10)
m i is the mass of unit charged particles. E iD (t) presents the electric force extracted on any charged particles i and can be evaluated as follows: E iD (t) =
FiD (t) Q i (t)
(11)
Velocity and Position Update: Due to the force between the particles, the charged particles keep changing the position and updating the position for the best one. The velocity and the position of the charged particles can be measured as follows: Vi D (t + 1) = rand() × Vi D (t) + aiD (t)
(12)
xiD (t + 1) = xiD (t) + Vi D (t + 1)
(13)
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
655
We updated the velocity by Eqs. (12) and (13). At the end of the iteration, it has been assumed that the time-lapse, charged particles tend to be the best solution. Hence, the algorithm converges to the best value that would be the global value of the problem. The detailed pseudo-code is discussed in Algorithm 1. Algorithm 1. Pseudo-code of Artificial Electric Field Algorithm Initialization Randomly generate charge particles of size N = ps, x1t , x2t , . . . , x Nt , on interval[xmin , xmax ] Initialize the velocity V = (V1 , V2 , . . . , VN ), on interval [Vmin , Vmax ] Evaluate the fitness value say fitt1 , fitt2 , . . . , fittps Evaluate each function personal best values Set t = 0. Arrange the particles according to Wu et al. [22] Set up feasible particles before infeasible one Order the feasible particles as per their fitness values Order the infeasible solutions as per the mean value of the constraint ‘s violations penalty Calculating and updating particle’s parameters While defined termination threshold not reached do: Calculate k(t), bestt and worstt for i = 1 : ps do Calculate the fitness value f i (t) Compute total force Fi (t) Compute acceleration aiD (t) Vi D (t + 1) = rand() × Vi D (t) + aiD (t) xiD (t + 1) = xiD (t) + Vi D (t + 1) Evaluate the velocity bounds: particles crosses the lower velocity then Vi (t + 1) = if velocitymin of Vi (t + 1), V if velocity of particles crosses the upper velocity then Vi (t + 1) = min(Vi (t + 1), V max ) if position of particles crosses the lower bound then xi (t + 1) = xi (t + 1), x min if position of particles crosses the upper velocity then xi (t + 1) = min(xi (t + 1), x max ) Calculate the fitness value f [xi (t + 1)] if f [xi (t)] < f [xi (t + 1)] xi (t + 1) = xi (t)
656
I. Bala and A. Yadav
end if end for end while
3 Problem Formation of Reactive Power Dispatch Problem The RPD problem is a complex, constrained, nonlinear optimization problem. Our objective is to solve the RPD problem for decreasing the power loss of transmission subjected to the constraints. The goal can be achieved by the proper tuning of the control parameters. In the paper, control variables considered for this study are transformer tab setting, adjustment of capacitors or inductors, and voltage capacity. Based on the variables, we can formulate the RPD problem as follows:
3.1 Objective Function The possible least power loss can be calculated as follows: h(x, u) = min Powloss =
bk Ui2 + U 2j − 2Ui U j cos αi j (14)
k ∈ N E , k = (i, j) i ∈ Ni , j ∈ N E where Powloss symbolizes power loss of networks. bk , αi j are k branch inductance and voltage angle difference, and N E is the count of networks branches. The three modules of x are as follows: Ui and U j are the bus voltages w.r.t. i and j, G g is power generation, and Sl is power flow. The three modules of u are as follows: Ug is generation bus voltage, G c are inductors, and T is transformer setting. Consequently, x and u parameters are described as follows:
x T = U L1 . . . U L N P Q , G g1 . . . G g Ng , Sl1 . . . Sl Nl
(15)
u T = Ug . . . Ug P Q , T1 . . . Tg NT , G c1 . . . G cNc
(16)
N P Q are load buses with constant installations P and Q. N T , N g are number of tap setting, and Ni , Nc are ith adjacent bus and capacitor bus, respectively.
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
657
3.2 Problem Constraints The objective function is associated with some constraints which are discussed in detail. Equality constraints: The RPD problems comprising some problem constraints, those better tuning with objective function, led us to good results. These constraints control the system through the power flow equations. They are mathematically expressed as below:
Powgi − Ui
U j Hi j cos αi j + Bi j sin αi j = 0
(17)
j∈Ni ,i∈N B−1
G gi − Ui
U j Hi j sin αi j − Bi j cos αi j = 0
(18)
j∈Ni ,i∈N P Q
where Pgi , G gi are real power and active power inserted in the system for bus i. Hi j , Bi j and Hii , Bii are transfer and self conductance and susceptance between bus i and j, respectively. Inequality constraints: These were designed for security and controlling the parameters. They can be written mathematically as follows: • Load buses voltages min max U Li ≤ U Li ≤ U Li , i ∈ NB
(19)
• Generated reactive power max G min gj ≤ Ggj ≤ Ggj ,
j ∈ Ng
(20)
• Active power generated at slack bus max Powmin g, slack ≤ Powg, slack ≤ Powg, slack
(21)
• Thermal limits S min ≤ Sl ≤ S max j j ,
j ∈ Nl
(22)
G min ≤ G c ≤ G max c c , c ∈ N
(23)
• Shunt capacitor limit c
658
I. Bala and A. Yadav
• Transformer tab limit Tkmin ≤ Tk ≤ Tkmax , k ∈ N
(24)
T
where Pg,slack represents generation of power on slack buses.
3.3 Penalty Function For verifying the competence of the AEFA, penalty factors have been added. The job of this function is to penalize the infeasible solution that violated the constraints [23]. So, the objective function (Eq. 14) can be rewritten as follows: Min f (x, u) = Powloss + λu
U L2 + λθ
NUlim
G 2gi
(25)
Nθlim gi
λu and λθ are penalties and U L and G gi are calculated as follows: U L = G gi =
U Lmin − U L if U L < U Lmin U L − U Lmax if U L > U Lmax
min G min g j − G g j if G g j < G g j max G g j − G max g j if G g j > G g j
(26) (27)
where Uload is voltage of load buses and NUlim , Nθlim are the buses that added voltage g to the system and buses that generated power (outside the limit), respectively.
4 AEFA Implementation in the Reactive Power Dispatch Problem The AEFA algorithm is designed to solve continuous, constrained optimization problems. The RPD problem comprises some parameters like transmission tap arrangements, the voltage of transmission, and power installation that are discrete in nature and require attention to handle carefully. It is also important to tune these parameters carefully without any effect of AEFA. Hence, first, the charged particles are probing in continuously and later truncate the particles’ real value in the integral value’s position through the method by Zhao [24] and Wu [25]. The procedure for implementing AEFA has been described as follows:
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
659
Step 1. Initialization of the Candidate Solution Initialize random solutions and velocity in search space of any jth charged particle as follows: x = x 1j , . . . , x dj , . . . , x Nj and V j = V j1 , . . . , V jd , . . . , V jN for j = 1, . . . , ps j
j
where xs , Vs are random control variables mapped as U P V , G c , T j , j = 1, 2, . . . , ps. max and The population size ps is mapped uniformly to U min , U max , G min c , Gc min T , T max . Set t = 0. Step 2. Calculate the Fitness Value Calculate the fitness value of each charged particle as described in Algorithm 1; if all constraints are satisfied, calculate the objective function by Eq. (14). However if any constraints violate the defined threshold, add a penalty and evaluate the objective function by Eq. (25). Step 3. Update Velocity Update the velocity of all charged particles and check velocity bound criteria for the candidate solution presented in Algorithm 1. Step 4. Update Position The position of charged particles and their bounds are updated, and the new position has stored in inbuilt memory carried by the algorithm. Step 5. Update Particle Best and Global Best By greedy search technique, if a new fitness value is superior to the earlier particle’s best value, replace the particle best value with new fitness. Similarly, if the global best value is not superior than the particle best, replace the global best with the best value found so far. Step 6. Termination Criteria The process will terminate if either the desirable result has been achieved or iterations have reached to defined termination threshold.
660
I. Bala and A. Yadav
5 Experimental Setup We evaluated the performance of AEFA over the standard benchmark of the IEEE14 bus system. The platform for the implementation of AEFA is MATLAB 2016. The experiment of solving the IEEE-14 bus system carried out 250 iterations, and the total runs are considered 50. The population size is 300. The discrete variables like shunt capacitors and others are taken for step size 0.01 per unit (p.u), and penalties in Eq. (25) are taken 500 for this experiment. To evaluate the compatibility of the algorithm, the outcomes are compared with other popular heuristics like differential algorithm (DE) [8], particle swarm optimization [11], canonical genetic algorithm [26], seeker optimization algorithm [10], comprehensive learning gravitational search algorithm (CLGSA) [27], comprehensive learning particle swarm optimization (CLPSO) [13], etc.
5.1 IEEE-14 Bus System The IEEE-14 benchmark comprises five generator buses and two shunt power source buses. The generator buses are 1, 2, 3, 6, 8, and the shunt buses are 9 and 14. The benchmark also comprises 20 transmission lines. Table 1 includes limiting values of each variable of power generation [28]. The system load is initially taken as follows: Pload = 258 megawatt(MW), Q load = 73.2 per unit. Table 1 Limiting values of parameters Limiting value of power parameters Buses
1
2
3
6
8
Q max g
1.10
0.50
0.40
0.24
0.24
Q max g
0.0
− 0.4
0.0
− 0.06
− 0.06
Limiting value of tab setting and voltage Tkmax
Tkmin
Ugmax
Ugmin
max Uload
min Uload
1.10
0.90
1.11
0.949
1.06
0.938
Limiting value of reactive power sources Buses
18
25
53
Q max c Q min c
0.10
0.05
0.05
0.00
0.00
0.00
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
661
5.2 Results and Discussion For fair evaluation, outcomes are compared with all considered methods with the same experimental conditions. The considered heuristics for this study are PSO, CLPSO, CLGSA, DE, CGA, SOA, and AEFA. The performance measures of this study are Best, Worst, and Average Power Loss abbreviated as Best, Worst, and Average, respectively, in Table 2. The standard deviation (std. dev) is also calculated corresponding to each algorithm. The outcomes are noted for dimensions 15 and 30. The performance comparison is presented in Table 2. The results show that AEFA performance is very good for the benchmark taken. For proving statistical convergence, the Wilcoxon signed test with 95% confidence interval is applied. The performance is marked with ‘ + ’, ‘−’, ‘ = ’ signs, where the ‘ + ’ sign indicates the performance of AEFA is better, the ‘−’ sign presents AEFA is not better, and the ‘ = ’ sign represents the AEFA algorithm is equal to the paired algorithm. Table 2 shows the performance of AEFA is better among all the considered algorithms for N = 15. In generation N = 30, CLGSA performs equivalent to AEFA. But, the computational time of AEFA is less than CLGSA. In Table 3, the power-saving percentage of all considered algorithms is presented. It shows the power loss in the AEFA algorithm is approx. 10%, which is the lowest among all. Furthermore, the power-saving percentage is more, which is a plus point of the algorithm. The same efficiency can be seen in convergence in Figs. 1 and 2. AEFA converges fast and remains stable after some iterations. The CLGSA algorithm is also presenting good convergence results in Figs. 1 and 2, while SOA and DE fluctuate more and are not stable till the last. Consequently, the overall results show the AEFA algorithm can be considered a good optimizer to solve the RPD problem.
6 Conclusion AEFA is a stochastic optimization algorithm inspired by physics laws. The algorithm is very simple and easy to understand, and it is considered another advantage. In this paper, AEFA is applied to constraints and complex optimization problems of the power system. The IEEE-14 benchmark was considered for minimizing the power loss of the power system. The results are presented in verities of ways such as graphically, numerically, and statistically. In all ways, it has been found that the AEFA algorithm is an efficient optimizer for the reactive power dispatch problem. Hence, the algorithm can be applied to solve other benchmarks of the power system. In fact, in the future, the algorithm can be enhanced to solve the multi-objective optimization problems of the power system.
662
I. Bala and A. Yadav
Table 2 Performance evaluation of IEEE-14 benchmark Generations
Algorithm
Best (megawatt)
Worst (megawatt)
Mean (megawatt)
Std. dev
Rank
N = 15
PSO
12.301
12.897
12.599
6.23 × 10−3
+
CLPSO
12.432
13.012
12.722
1.42 × 10−3
+
DE
13.089
13.576
13.334
8.91 × 10−2
+ +
N = 30
CGA
13.096
13.687
13.374
0.23 × 10−2
SOA
12.897
12.998
12.948
2.34 × 10−4
+
CLGSA
12.423
12.879
12.651
3.12 × 10−5
+
× 10−5
AEFA
12.123
12.450
12.279
3.00
PSO
12.235
12.456
12.345
5.34 × 10−3
+
CLPSO
12.011
12.675
12.344
9.34 × 10−3
+ +
DE
13.234
13.800
13.516
2.56 × 10−3
CGA
12.990
13.231
13.111
8.19 × 10−3
+
SOA
12.922
13.023
12.972
2.13 × 10−4
+
× 10−5
=
CLGSA
12.321
12.723
12.524
3.12
AEFA
12.234
12.354
12.294
3.10 × 10−5
Table 3 Results of all methods over IEEE-14 benchmark Pg Q g Ploss Generations Algorithm Q loss (MW) (MW) (MW) (MW) N = 15
N = 30
Psave %
Mean CPU time(s)
PSO
280.11
84.15
20.10
− 51.12
3.29
84
CLPSO
275.56
82.12
17.13
− 42.67
6.00
103
DE
279.32
90.12
19.12
− 49.54
3.99
89
CGA
274.43
87.12
21.78
− 51.23
5.72
65
SOA
277.32
88.20
17.34
− 51.87
6.19
48
CLGSA
270.10
81.14
10.99
− 54.23
1.120
28
AEFA
268.00
76.12
10.09
− 54.11
13.72
13
PSO
284.12
85.12
20.34
− 48.12
3.67
96
CLPSO
282.10
82.45
17.12
− 40.78
5.45
99
DE
283.45
90.23
20.13
− 38.12
3.12
67
CGA
283.26
90.12
21.12
− 42.09
4.34
65
SOA
279.10
86.23
17.34
− 50.12
6.45
34
CLGSA
272.96
81.67
10.67
− 53.23
12.15
40
AEFA
270.05
77.67
10.34
− 53.99
12.89
12
Power Loss
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
663
22
22
20
20
18
18 16
16 PSO CLPSO DE CGA SOA CLGSA AEFA
14 12
14 12 10
10 0
50
100
Iterations
150
200
250
Power Loss
Fig. 1 Power loss comparison of all algorithms when N = 15
22
22
20
20 18
18 PSO CLPSO DE CGA SOA CLGSA AEFA
16 14
16 14
12
12
10
10 0
50
100
Iterations
150
200
250
Fig. 2 Power loss comparison of all algorithms when N = 30
References 1. Carpentier J (1979) Optimal power flows. Int J Electr Power Energy Syst 1(1):3–15 2. Chattopadhyay D, Bhattacharya K, Parikh J (1995) Optimal reactive power planning and spot pricing: an integrated approach. IEEE Trans Power Syst 10(4):2010–2014 3. Li F, Zhang W, Tolbert LM, Kuchk JD, Rizy D (2008) A frame work to qualify the economic benefit from local var compensation. Int Rev Electr Eng 3(6):989–998 4. Lin C, Lin S, Horng S (2012) Iterative simulation optimization approach for optimal voltampere reactive sources planning. Int J Electr Power Energy Syst 43(1):984–991 5. Mahmoudabadi A, Rashidinejal M (2013) An application of hybrid heuristic method to solve concurrent transmission network expansion and reactive power. Int J Electr Power Energy Syst 45(1):71–77
664
I. Bala and A. Yadav
6. Liu H, Jin L, Mccalley JD, Kumar R, Ajjarga V, Elia N (2009) Planning reconfigurable reactive control for voltage stability in limited power system. IEEE Trans Power Syst 24(2):1029–1038 7. Bakare GA, Venayagamoorthy G, Aliyu UO (2005) Reactive power and voltage control of the Nigerian grid system using micro genetic algorithm. In Proceedings of IEEE power engineering society general meeting, San Francisco, CA 2:1916–1922 8. Bakare GA, Krost G, Venayagamoorthy GK, Aliyu UO (2007) Differential evolution approach for reactive power optimization of Nigerian grid system. In: Proceedings of IEEE power engineering society, general meeting FR temba, pp 1–6 9. Bala I, Yadav A (2019) Comprehensive learning gravitational search algorithm for global optimization of multimodal functions. Neural Comput Appl 1–36 10. Dai C, Chen W, Zhu Y, Zhang X (2012) Seeker optimization algorithm for optimal reactive power dispatch. IEEE Trans Power Syst 24(3):1218–1231 11. Zhao B, Guo C, Cao YJ (2012) A multi-agents based particle swarm optimization approach for optimal reactive power dispatch. IEEE Trans Power Syst 20(2):1070–1078 12. Wen Z, Yutian L (2014) Multi objective reactive power and voltage control based on fuzzy optimization strategy and fuzzy adaptive particle search optimization. Electr Power Energy Syst 30(9):525–532 13. Mahadeven K, Kannan PS (2010) Comprehensive particle swarm optimization for reactive power dispatch. Appl Soft Comput 10:641–652 14. Mao Y, Li M (2008) Optimal reactive power planning based on simulated annealing particle swarm algorithm considering static voltage stability. In: Proceeding of the international conference on intelligent computation technology and automation (ICICTA), pp 106–110 15. Khazali AH, Alanter M (2014) Optimal reactive power dispatch based on harmony search algorithm. Electr Power Syst 33(3):684–692 16. Bala I, Yadav A (2020) Optimal reactive power dispatch using gravitational search algorithm to solve IEEE-14 bus system. Lecture notes in networks and systems vol 120. Springer, pp 463–473 17. Roy PK, Goshal SP, Thakur SS (2012) Optimal power reactive dispatch considering FACTS device using biogeography based optimization. Electr Power Compon Syst 40(9):956–976 18. Dao TK, Nguyen TD, Nguyen TT, Thandapani J (2022) An optimization reconfiguration reactive power distribution network based on improved bat algorithm. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (Eds) Congress on intelligent systems (Lecture notes on data engineering and communications technologies), vol 114. Springer, Singapore 19. Khonsandi A, Alimardani A, Vahidi B, Hosseinian SH (2011) Hybrid shuffled frog leaping algorithm and Nelder-Mead simplex search for optimal power dispatch. IET Gener Trans Distrib 5(2):249–256 20. Anita, Yadav A, Kumar N (2020) Artificial electric field algorithm for engineering optimization problems. Expert Syst Appl 149:113308 21. Anita, Yadav A (2020) Discrete artificial electric field algorithm for high-order graph matching. Appl Soft Comput 92:06260. ISSN 1568-4946 22. Wu G, Mallipeddi R, Suganthan PN (2017) Problem definitions and evaluation criteria for the CEC 2017 competition on constrained real-parameter optimization. National University of Defense Technology, Changsha, Hunan, PR China and Kyungpook National University, Daegu, South Korea and Nanyang Technological University, Singapore, Technical Report 23. Bouchekara HR, Abido MA, Boucherma M (2014) Optimal power flow using teachinglearning-based optimization technique. Electr Power Syst 114:49–59 24. Zhao B, Guo CX, Cao YJ (2005) A multi-agent based particle swarm optimization approach for reactive power dispatch. IEEE Trans Power Syst 20(2):1070–1078 25. Wu QH, Cao YJ, Wen JY (1998) Optimal reactive power dispatch using an adaptive genetic algorithm. Int J Elect Power Energy Syst 20:563–569 26. Oppacher F, Wineberg M (1998) A canonical genetic algorithm based approach to genetic programming. Artif Neural Netw Genet Algorithms 2:401–404 27. Bala I, Yadav A (2022) Niching comprehensive learning gravitational search algorithm for multimodal optimization problems. Evol Intel 15:695–721
Optimizing Reactive Power of IEEE-14 Bus System Using Artificial …
665
28. Biswas S, Chakraborty N (2012) Tuned reactive power dispatch through modified differential evolution technique. Energy 6(2):138–147
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles Dipali Ramdasi, Lokita Bhoge, Binita Jiby, Hrithika Pembarti, and Sakshi Phadatare
Abstract Road accidents claim over a million lives every year. Some of the causes of these accidents are poor visibility of roads, false estimation of nearby vehicles and delay of driver to hit the brake. The developed IoT-based system focuses on reduction of accidents by addressing these causes. It alerts the driver about the presence of humps and potholes on the road by detecting it in advance. The visual alerts are provided by various coloured LEDs and the audio alert is provided using voice communication. The system also measures distance between the host vehicle and the vehicle ahead to maintain a safe distance of 400 m and warns the driver if safe distance is not maintained. This feature also helps in avoiding collisions with other vehicles and unidentified objects. In the worst-case scenario, if an accident occurs, this system tracks the vehicle’s geographical location and provides a message alert to the registered emergency contacts. The system is equipped with vehicle-to-vehicle communication for data transmission amongst vehicles using Li-Fi technology. The range of this V2V communication is up to 2–3 m. With this feature, the host vehicle can transmit information about emergency brake situations and presence of emergency service vehicles in the vicinity for clearing the driving path. Audio and visual mechanisms are employed for alerting the driver. The system is developed by incorporating ultrasonic sensors with Arduino for testing purposes. Every feature of the system is tested with real vehicles in simulated circumstances. The performance of the system is satisfactory in all test environments. Keywords Pothole and hump detection · Vehicle location tracker · Collision avoidance · Alerting system · Vehicle-to-vehicle communication
D. Ramdasi (B) · L. Bhoge · B. Jiby · H. Pembarti · S. Phadatare MKSSS’s Cummins College of Engineering for Women, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_48
667
668
D. Ramdasi et al.
1 Introduction Road networks are an important means of transport which carries 90% of the country’s traffic [1]. The number of motor vehicles is growing at a faster rate than the population and economic growth [2]. In India, most roads are narrow and congested with poor surface quality and maintenance of the roads is not satisfactory [3]. The poor maintenance and servicing of the roads have led to the creation of potholes [4]. Accidents and death rates caused due to the road accidents are also increasing at an alarming rate [5]. According to a survey done by the automation association, some of the major reasons for road accidents are the presence of potholes and humps on roads and also collision amongst vehicles [6]. Recently, there has been a global increase in the annual number of road accidents occurring even in developed nations with good road safety measures [7]. Also due to delayed communication, many people lose their lives every day around the world [8]. To address this issue, a number of collision avoidance, accident detection, monitoring, alerting and vehicle tracking systems have been developed in the recent years. A review of some of these systems is presented in the next section.
2 Literature Review Several researchers have presented their work on vehicle collision avoidance and messaging systems [9]. Some systems comprise database and mobile applications along with the microcontroller and sensing systems. Flash messages and audio alerts are included in some systems. Some systems use Bluetooth communication while some use Li-Fi or equivalent. Some systems store the event credentials in the black box of the vehicle for future use. Some systems use vision-based technologies for vehicle tracking but use of vision sensors leads to the costly system [10]. While some systems use CNN and deep learning convolution network method [11, 12]. All relevant work on this topic is studied by considering some parameters and represented effectively in a tabular manner in Tables 1 and 2 for better comparison. The systems studied used different sensing techniques for pothole, hump or object detection. A summary of these techniques is presented in Table 2. This detailed study reveals that having all the features in one system is challenging. Alquhali et al. [21] used ThingSpeck and Freeboard software to display the data in the form of longitude and latitude charts, and a map that is accessible to all. It makes use of mobile phones with Internet, however, this solution is not cost effective. Gomes et al. [22] designed an IoT-based collision detection system on guardrails where a theoretical analysis for the energy consumption is simulated and Markov chain model is used for analysing the performance of this system. Alzahri and Sabudin [23] discuss the process of developing a vehicle tracking device using microcontroller, sim-card slot, voice-alarm module, signal antenna, battery and mobile phone with a programme controller interface and integrating it with Waze or Google Maps.
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
669
Table 1 Inferences of existing methods for accident detection, tracking and alerting systems S. No.
Sensing Allied technology used technology incorporated
Microcontroller used
Presence of alerting system
Cost effectiveness
1 [13]
Ultrasonic sensors
PIC16F877A
Yes
Yes
2 [14]
Vibration sensor GSM
AT89S52
Yes
Yes
3 [15]
Ultrasonic sensor, U slot sensor, collision sensor
Visible light ATMEGA328P, (Li-Fi) and Wi-Fi ATMEGA 2650 communication, cloud, GPS, GSM, Bluetooth, SD card module
Yes
No
4 [16]
MEMS sensor, temperature sensor, gas sensor, alcohol sensor
GPS, GSM, MEMS, serial communication
ARM7 LPC2148 Yes
Yes
5 [17]
Eye blink sensor, vibration sensor, alcohol sensor
GPS and GSM
PIC16F877A
Yes
No
6 [18]
Ultrasonic sensors
Autonomous braking using android app and Bluetooth
Arduino Uno
No
Yes
7 [19]
Imaging technique
Deep learning algorithms, convolutional neural networks, residual neural networks
Computer
No
Yes
8 [20]
Ultrasonic sensor, LDR sensor
Li-Fi-based V2V Arduino Uno communication
No
Yes
Cloud, GPS and GSM
The GPS and GSM modules interfaced with high-end processor boards for accident detection and alerting are presented in published literature [24, 25]. Damani et al. [26] discussed the operation of the GPS tracking system and the environments in which it is practical. A method for automatically detecting and reporting accidents is designed by Wakure et al. [27] using an accelerometer for accident detection. A brief message with the accident’s GPS location is delivered over the GSM network. Fogue et al. [28] discussed the e-NOTIFY system that enables quick detection of traffic accidents and enhances aid to injured passengers by speeding up emergency services reaction times through the effective transmission of pertinent accident information utilizing a combination of V2V and V2I communications. In order to address limitations of
670
D. Ramdasi et al.
Table 2 Inferences of different sensors used for detection in existing systems S. No.
Sensors used Collision detection
Pothole detection
Hump detection
For safety
1 [13]
X
Ultrasonic sensor
Ultrasonic sensor
X
2 [14]
Vibration sensor
X
X
X
3 [15]
Collision/piezoelectric sensor
X
X
X
4 [16]
MEMS accelerometer
X
X
Gas sensor, alcohol sensor
5 [17]
Vibration sensor
X
X
Eye blink sensor, alcohol sensor
6 [18]
Ultrasonic sensor
X
X
X
7 [19]
Imaging technique
Convolutional neural network
X
X
8 [20]
X
X
X
Ultrasonic sensor
X = Sensing method not employed for the said feature.
literature reviewed, an automotive collision avoidance and safety system with the help of IoT technology is developed.
3 System Details The block diagram of the developed system is illustrated in Fig. 1. A smart measuring system developed using the Arduino is used to detect potholes, humps, collisions and crashes. The sensing devices measure the distance of the vehicle from the object and check if the value obtained is within the normal conditions. If they exceed the preset limits, then alerts are set, and warnings are displayed. The system notifies the driver to slow down upon approaching a pothole or hump. The GPS is used for exact and accurate location tracking of the vehicle. The obtained geographical location of the vehicle is sent using the Global System for Mobile Communication (GSM) module using SMS protocol. In case of accidents, emergency alert messages are sent to the nearest hospital and police station for immediate action. The system also has provision for vehicle-to-vehicle communication (V2V) for data transmission amongst vehicles.
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
671
Fig. 1 Block diagram
3.1 Sensing and Detection The system consists of two ultrasonic sensors, HC SR04, out of which one is used for object detection like another vehicle, a wall, a barricade, etc. The other sensor is used for pothole and hump detection. These ultrasonic sensors are interfaced with a smart measuring system developed using the Arduino. Arduino is an opensource electronics platform which has easily operable hardware and software. The microcontroller board works on a set of instructions using Arduino programming language and software. It processes the data obtained from the sensors and provides visual and audio alerts.
3.2 Alerting System When a pothole/hump is detected, visual alerts are given by turning the LED ON. Audio alert is provided by the buzzer, while an LCD screen mounted on the dashboard of the vehicle displays the distance of the object. Visual Alerts • LED: According to the distance between vehicle and object, the corresponding coloured LED turns ON to alert the driver.
672
D. Ramdasi et al.
• Display: The distance between the vehicle and the object is indicated on the LCD, alerting the driver when the distance is beyond safe distance. A 20 × 4 LCD is used to display the value. Audio Alerts • Buzzer: When the distance exceeds the safe distance, the driver is alerted by sounding a buzzer. • Voice messages by Li-Fi: Whenever the vehicle slows down due to detection of pothole, hump or object, the following vehicle may crash. Voice messages can be transmitted to the following vehicles and notified about the situation. SMS Alerts • In situations where an accident or crash could not be avoided, the emergency contact numbers are alerted via SMS. The exact location of the emergency is conveyed through the SMS.
3.3 V2V Communication Using Li-Fi The objective of this communication is: • To warn the preceding vehicle about emergency application of brakes. • To inform the preceding vehicles about the presence of emergency service vehicles like fire brigade or ambulance in the vicinity. Serial communication on Arduino is used to implement the same. The audio message that the user wants to send is considered as data. This data is transmitted through LED which acts as a data transmitter. The transmitted data is received by the solar panel which acts as a receiver. The data is then sent to the microcontroller and further we get the output in the form of audio, i.e. by using speaker or any other audio device. This audio transmission of data from one vehicle to another vehicle is done by using Li-Fi technology and thus the vehicle-to-vehicle communication is established wirelessly. Figure 2 shows the block diagram of this communication.
Fig. 2 Block diagram for V2V communication
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
673
Fig. 3 Sensor positioning
3.4 Sensor and System Positioning The sensors US1 (Horizontal Mounted Ultrasonic Sensor) and US2 (Inclined Ultrasonic Sensor) are used for object detection and pothole/hump detection, respectively, and are mounted as shown in Fig. 3. The placement of these sensors is decided considering the task they perform. The hardware components placement is shown in Fig. 4. The Arduino, GPS and GSM module are placed inside the dashboard. The LED, LCD and buzzer are placed on the dashboard as shown in the figure for providing visual and audio alerts to the driver along with an ON/OFF switch on the steering wheel to acknowledge the given alerts. Taking into consideration the universality of Arduino microcontroller and its incorporation with other peripheral devices, it is most suitable for prototype versions. However, any core microcontroller with small packaging can be used in the final version of the product.
3.5 System Implementation The function blocks discussed in earlier sections are integrated together and mounted in the test vehicle. The sensing system is mounted on the front of the vehicle and the microcontroller board is mounted near the dashboard of the test vehicle. Suitable
674
D. Ramdasi et al.
Fig. 4 System positioning
converters are used to power up the system from the battery of the car. The V2V communication using Li-Fi is implemented by mounting the LED on the rear side of the test vehicle. Testing of the system is done for various conditions and the results are discussed in the next section.
4 Results and Analysis The developed system is tested in various stages, right from acquiring each sensor response, to testing the complete system. Though the positioning is as shown in Figs. 3 and 4, for testing purposes, the set-up was fixed on the bonnet temporarily. The LCD was mounted inside the car above the vehicle audio for driver’s convenience. The main task of the ultrasonic sensor is to detect potholes, humps and objects at the desired range. The performance of the system for various simulated conditions and generation of relevant alerts is tested. The results for various conditions are illustrated in Table 3. The alerting system is tested separately before mounting it on the vehicle. Different conditions are simulated to observe the performance of the system. Figures 5, 6 and 7 show the system testing results before mounting on the test vehicle. A moving
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
675
Table 3 Results of condition testing Condition
Visual alarm (LED)
Audio alarm (Buzzer)
LCD alert display
SMS alert
Pothole
Aqua
Yes
Pothole ahead__cm
X
Hump
Aqua
Yes
Hump ahead__cm
X
Safe zone
Green
No
Safe distance
X
Warning
Orange
Yes
Orange alert
X
Action required
Red
Yes
Caution!! red alert
X
Accident
Red
Yes
Accident at Long:__, Lat:__
Accident at Long:__, Lat:__
trolley with height adjustment facility is used to mount the system and simulate the vehicle. Structures similar to potholes and humps are included for testing purposes. Figure 5a shows the green alert which represents the safe zone. This condition is achieved when the car and the object are at a safe distance. Figure 5b shows the orange alert which represents the warning zone. This condition is achieved when the distance of the car and the object is less than the safe limit. Figure 5c shows the red alert. In this condition, the car needs to take immediate action to avoid collision with the object. Figure 6a and b shows hump detection, wherein the driver is alerted to take respective action like reducing the vehicle’s speed Figure 7 shows pothole detection, in this condition, the driver is alerted to take respective action to avoid mishap. Figures 8, 9, 10 and 11 show the various alerts, pothole, hump, clear road and accident detection when the system is actually mounted in the test vehicle. The LCD
Fig. 5 a–c. Vehicle/obstacle detection
676
D. Ramdasi et al.
Fig. 6 a and b Hump detection
Fig. 7 Pothole detection
screen displays the distance between the vehicle and the object. In case of accident/hump/pothole detection, the location coordinates of the vehicle are displayed on the screen. Figure 12a shows the mounting of hardware components for testing purposes. Figure 12b shows the distance from ground where the sensors are to be placed for object and pothole/hump detection. Figure 13a shows accident detection between the vehicles. Figure 13b shows the message alert sent to the registered mobile number with the location coordinates of the accident. The range of detection of potholes, humps or objects is 500 cm; however, the range can be increased by using sensors with higher detection range. Figures 14 and 15 show the results of Li-Fi communication. In this, the data transmission in the form of audio signals is established using Li-Fi technology. Data is transferred through light waves on the solar panel and converted into audio
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
677
Fig. 8 Accident and hump detection
Fig. 9 Red alert and clear road detection
signals which is then amplified using a speaker. All the results show successful implementation of the system in the test vehicle. Though the developed system uses sensors with short range measurements, sensors and modules with higher capacity can be used for commercial systems.
5 Conclusion An IoT-based Automotive Collision Avoidance and Safety System for vehicles is developed and tested in the simulated environment. The earlier published literature
678 Fig. 10 Orange alert and pothole detection
Fig. 11 Safe distance and hump detection
Fig. 12 a and b. Sensor positioning and distance from ground
D. Ramdasi et al.
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
679
Fig. 13 a and b. Accident detection
Fig. 14 Audio transmission using Li-Fi communication in dark environment
had limited parameters considered and did not specify a safe distance. However, this issue is handled in the developed system which uses an IoT-based sensor network to detect potholes, humps and alert the user to avoid collisions. It helps to avoid accidents by sensing the object distance and pothole distance using ultrasonic sensors and checking if the values obtained are within the normal conditions in real time. The range of detection of potholes, humps or objects is 500 cm. If they exceed the specified limits, then alerts are set, and warnings are displayed. In case an accident occurs, the system informs the registered mobile number with the GPS location of the accident and can mobilize the rescue of the victim with the use of a robust sensor
680
D. Ramdasi et al.
Fig. 15 Audio transmission using Li-Fi communication in a well-lit environment
network. Audio transmission by using Li-Fi communication technology is established to implement Vehicle-to-vehicle communication thus ensuring total awareness of the surroundings. The vehicle-to-vehicle communication is used to alert the following cars about emergency application of brakes or emergency service vehicles in the vicinity that need a clear passage within the range of 2–3 m. The results clearly indicate the successful implementation of the features. However, the range can be increased by using sensors with higher detection range. Therefore, the system with minor modifications can be included in the existing vehicles for collision avoidance and safety.
References 1. Road Transport Year Book 2016–2017, Ministry of Road Transport and Highways. https:// morth.nic.in/sites/default/files/Road%20Transport%20Year%20Book%202016-17.pdf. Last accessed 23 Sept 2021 2. Car fleet growing faster than population, CBS home page. https://www.cbs.nl/en-gb/news/ 2020/10/car-fleet-growing-faster-than-population. Last accessed 09 Sept 2021 3. Kumar GA, Kumar AS, Kumar AA, Maharajothi T (2017) Road quality management system using mobile sensors. In: 2017 International conference on innovations in information, embedded and communication systems (ICIIECS), India, pp 1–6. https://doi.org/10.1109/ICI IECS.2017.8276014 4. Jo Y, Ryu S (2015) Pothole detection system using a black-box camera. Sensors 15(11):29316– 29331 5. Road traffic injuries. World Health Organization. https://www.who.int/news-room/fact-sheets/ detail/road-traffic-injuries. Last accessed 19 June 2022 6. More than 3500 accidents took place in India due to potholes in 2020. HT Auto. https://auto.hindustantimes.com/auto/news/more-than-3-500-accidents-took-place-inindia-due-to-potholes-in-2020-41628571000162.html. Last accessed 10 Sept 2021
IoT-Based Automotive Collision Avoidance and Safety System for Vehicles
681
7. Road Safety, Ministry of Road Transport & Highways, Government of India. https://morth.nic. in/road-safety. Last accessed 10 Oct 2021 8. Khaliq KA, Chughtai O, Shahwani A, Qayyum A, Pannek J (2019) Road accidents detection, data collection and data analysis using V2X communication and edge/cloud computing. Electronics 8(8):896. https://doi.org/10.3390/electronics8080896 9. Tachwali Y, Refai HH (2009) System prototype for vehicle collision avoidance using wireless sensors embedded at intersections. J Franklin Inst 346(5):488–499 10. Javed Mehedi Shamrat FM, Chakraborty S, Afrin S, Moharram M, Amina M, Roy T (2022) A model based on convolutional neural network (CNN) for vehicle classification. In: Congress on intelligent systems, Springer, Singapore, pp 519–530 11. Kavitha N, Chandrappa DN (2020) Vision-based vehicle detection and tracking system. In: Congress on intelligent systems, Springer, Singapore, pp 353–364 12. Seegolam S, Pudaruth S (2022) A real-time traffic jam detection and notification system using deep learning convolutional networks. In: Congress on intelligent systems, Springer, Singapore, pp 461–475 13. Madli R, Hebbar S, Pattar P, Golla V (2015) Automatic detection and notification of potholes and humps on roads to aid drivers. IEEE Sens J 15(8):4313–4318 14. Tushara DB, Vardhini PH (2016) Wireless vehicle alert and collision prevention system design using Atmel microcontroller. In: 2016 International conference on electrical, electronics, and optimization techniques (ICEEOT), India, pp 2784–2787 15. Khan RA, Gogoi A, Srivastava R, Tripathy SK, Manikandaswamy S (2019) Automobile collision warning and identification system using visible light and Wi-Fi communication. Int J Eng Adv Technol 8(583):72–77 16. Bhavthankar S, Sayyed HG (2015) Wireless system for vehicle accident detection and reporting using accelerometer and GPS. Int J Sci Eng Res 6(8):1068–1071 17. Shaik A, Bowen N, Bole J, Kunzi G, Bruce D, Abdelgawad A, Yelamarthi K (2018) Smart car: An IoT based accident detection system. In: 2018 IEEE global conference on internet of things (GCIoT), Egypt, pp 1–5 18. Vairavan R, Kumar SA, Ashiff LS, Jose CG (2018) Obstacle avoidance robotic vehicle using ultrasonic sensor, Arduino controller. Int Res J Eng Technol (IRJET) 5(02) 19. Arjapure S, Kalbande DR (2021) Deep learning model for pothole detection and area computation. In: 2021 IEEE International conference on communication information and computing technology (ICCICT), India, pp 1–6 20. Ravikumar DNS, Nagarajan G (2018) Vehicle to vehicle communication using Li-Fi technology. Int J Pure Appl Math 119(7):519–522 21. Alquhali AH, Roslee M, Alias MY, Mohamed KS (2019) Iot based real-time vehicle tracking system. In: 2019 IEEE Conference on sustainable utilization and development in engineering and technologies (CSUDET), Malaysia, IEEE, pp 265–270 22. Gomes T, Fernandes D, Ekpanyapong M, Cabral J (2016) An IoT-based system for collision detection on guardrails. In: 2016 IEEE International conference on industrial technology (ICIT), Taiwan, pp 1926–1931 23. Alzahri FBB, Sabudin M (2016) Vehicle tracking device. In: 2016 International conference on advanced informatics: concepts, theory and application (ICAICTA), Malaysia, pp 1–6 24. Hussein LF, Aissa AB, Mohamed IA, Alruwaili S, Alanzi A (2021) Development of a secured vehicle spot detection system using GSM. Int J Interact Mobile Technol 15(4) 25. Chaturvedi N, Srivastava P (2018) Automatic vehicle accident detection and messaging system using GSM and GPS modem. Int Res J Eng Technol 05(03):252–254 26. Damani A, Shah H, Shah K, Vala M (2015) Global positioning system for object tracking. Int J Comput Appl 109(8):3977–3984 27. Wakure AR, Patkar AR, Dagale MV, Solanki PP (2014) Vehicle accident detection and reporting system using GPS and GSM. Int J Eng Res Develop 10(4):25–28 28. Fogue M, Garrido P, Martinez FJ, Cano JC, Calafate CT, Manzoni P (2012) Automatic accident detection: aAssistance through communication technologies and vehicles. IEEE Veh Technol Mag 7(3):90–100
Computer Vision-Based Electrical Equipment Condition Monitoring and Component Identification R. Vidhya, P. Vanaja Ranjan, R. Prarthna Grace Jemima, J. Reena, R. Vignesh, and J. Snegha
Abstract The constant use of electrical equipment for years makes it necessary to maintain them in good condition for proper working, and this is mostly done at the risk of the people in charge of maintenance. The key factor for electrical accidents is the lack of maintenance of the electrical equipment, which eventually results in injuries, fatalities to people and monetary loss. The proposed solution is to reduce such hazards caused to human life and property. In this work, the electrical equipment is continuously monitored, internal faults are identified using thermal images and the changes in the heat spread are analyzed. The maximum prediction accuracy of the proposed methodology is 97.47%. The components of the equipment are labeled and significant physical defects are detected in the images captured using a normal camera through image processing using You Only Look Once-version 3 (YOLOV3). In this way, computer vision and image processing can be utilized to make the preventive/maintenance mechanisms smarter. Keywords YOLO · Thermal image · Heat spread · Pixel analysis
1 Introduction To enhance the safety of personnel monitoring the electrical equipment, the proposed solution identifies the physical defects in various components of electrical equipment installed in remote locations using normal camera images and preventive maintenance could be achieved by monitoring the operating condition using thermal images. R. Vidhya (B) Department of Electronics and Communication Engineering, Loyola-ICAM College of Engineering and Technology, Chennai, India e-mail: [email protected] P. V. Ranjan Embedded System Technologies, Department of Electrical and Electronics Engineering, College of Engineering - Guindy, Chennai, India R. P. G. Jemima · J. Reena · R. Vignesh · J. Snegha Loyola-ICAM College of Engineering and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_49
683
684
R. Vidhya et al.
Continuous monitoring of electrical equipment is required to ensure health and safety especially during sudden breakdowns, failing to, may result in serious hazards or time-consuming fault conditions or may even need a replacement, leading to high repairing costs and man power. According to a survey conducted by The Institute of Electrical and Electronics Engineers (IEEE), yearly the failure rate of an oil immersed transformer is around 0.00625 by Rigatos and Siano [1]. This could be estimated roughly as 10 in 100 transformers may become faulty in the next 16 years. The survey conducted by The International Council on Large Electric Systems (CIGRE) predicts the failure rate of power transformers range from 1 to 2% per year. A solution to these transformer faults and failures is the utmost necessity. Condition monitoring is done to monitor the operating characteristics of an electrical equipment. Pre-trained data can be utilized to predict the need for repair and service before the occurrence of hazards or unexpected breakdowns. In earlier days, there was time-based maintenance (TBM) to examine and repair the electrical equipment on a regular basis usually after a particular time interval or running hours by Han and Song [2]. According to Huda and Taib [3], maintenance of electrical equipment based on actual operating conditions by regularly collecting measurements, instead of depending on statistical data, is called predictive maintenance. The main parameters that influence the electrical equipment are temperature profile and the weather conditions. Median filtering is applied to extract regions of interest (ROI) containing higher probability of occurrence of faults and to distinguish between a critical fault given by Pal et al. [4]. Identification of faults in the cooling system by Shiravand et al. [5] using probabilistic neural networks (PNN) to construct the fault diagnosis model and bat algorithm to optimize the model. Zou and Huang [6] clustered the infrared image using K-means and extracted the statistical characteristics containing the temperature and area information. Junos et al. [7] used YOLO-V3 to detect and localize different classes of objects. Dynamic transformer rating allows for the prediction of maximum transformer capacity at a thermally steady state according to Alvarez et al. [8]. Djamali and Tenbohlen [9] used an alarm to indicate the failure when the standardized error exceeds the threshold band. Maintenance of the transformer by the information obtained from periodic inspection by Liang and Parlikad [10]. The cause of the failures is identified by Murugan and Ramasamy [11] using root cause analysis (RCA) and the condition-based maintenance (CBM) model is developed. Raja et al. [12] claims that condition monitoring involves the early detection and a maintenance of machining accuracy. Djamali and Tenbohlen [13] suggest that failures in the transformers are based on the changes in the estimated parameters of a thermal model. Wang et al. [14] determined the fault condition of the motor, by calculating an envelope order spectrum followed by the determination of the bearing fault pattern. Tian et al. [15] augmented the images for detection. Ammar et al. [16] detected faults based on the HSV color model and different image segmentation methods. The proposed work is to identify the components present in the electrical equipment through image processing, monitor the operating condition of the electrical equipment using thermal images acquired through computer vision and to detect significant physical defects in the electrical equipment.
Computer Vision-Based Electrical Equipment Condition Monitoring …
685
2 Methodology The proposed work helps in identifying the components, detecting the significant physical faults and the internal faults of an electrical equipment by using normal and thermal camera images. Figure 1 depicts the block diagram of the proposed methodology.
2.1 Methods of Denoising Depending on the noise model, various denoising methods are incorporated to remove unscaled white noise from the image. They include block matching and 3D filtering (BM3D), non-local means, Gaussian filter, median filter and pillow filter [17–24]. Tiana et al. [17] say enlarging the receptive field to get more context information of the image. According to Buades et al. [18] a methodology is proposed to classify image denoising algorithms. BM4D, implements the grouping and collaborative filtering paradigm by Maggioni et al. [19]. The work is based on the principle wherein a pixel color is replaced with an average of colors of similar pixels proposed by Buades et al. [20]. The Wiener filter center weight (WFCW) is calculated by introducing Wiener filter in the CW computation, as per Zhang [21]. Storing and updating the gray level histogram described by Huang et al. [22]. Oil level detection of oil pillow in power transformers based on visible and infrared imaging is proposed by Ma et al. [23]. According to Wang et al. [24], image noise learns from the weighted average thoughts of particle filter. Table 1 shows that the BM3D filter produces better accuracy and is found to be suitable for processing the images. Table 2 reveals that the mean is increased, as the pixel values are enhanced, standard deviation (SD) is decreased as smoothness
Fig. 1 Block diagram of the proposed work
686
R. Vidhya et al.
Table 1 Outputs of various denoising methods (DC motor–rotor) Input image
BM3D
NLM
Gaussian filter
Median filter
Pillow filter
Table 2 Comparison of various denoising methods Denoising method
Mean before denoising
Mean after denoising
Standard deviation before denoising
Standard deviation after denoising
Entropy before denoising
Entropy after denoising
Gaussian filter
156.36
161.23
84.09
81.95
7.8277
7.7919
Median filter
175.71
177.45
68.15
64.59
7.7739
7.6184
55.87
60.46
41.28
43.77
6.7981
6.5491
161.303
75.513
71.542
7.7125
7.5200
171.60
79.43
75.37
7.8198
6.9644
Pillow filter NLM
156.826
BM3D filter 167.44
between one pixel and another pixel is improved and entropy is reduced indicating the reduction in noise.
2.2 Physical Fault Detection The visible physical faults that occur in the electrical equipment are identified using image processing and compared with the pre-trained data. YOLO-V3 is used which is a single stage deep convolutional neural network (CNN), which works directly on the images to obtain the coordinates of the bounding boxes and finds out the probability of each class through regression. Therefore, Ge et al. [25] say, the calculation speed of this model is improved compared to other models of object detection. The equipment is labeled using a software called LABELIMG. Bounding boxes are drawn to label the classes. The accuracy and correctness of the class prediction after training can be verified using test images of transformers. In this way, the images are processed to detect the visible physical faults. As per Fig. 2, each label refers to a separate class in YOLO and each class is identified depending on the detected coordinates.
Computer Vision-Based Electrical Equipment Condition Monitoring …
687
Fig. 2 Labeling image using YOLO
Fig. 3 Thermal image of a 1 kVA transformer b DC motor
2.3 Internal Fault Detection Thermal camera is used to monitor the amount of infrared radiation emitted from the transformer and DC motor as shown in Fig. 3, which differs based on its temperature. When the health condition is not monitored properly, the deterioration of components leads to increase in electrical resistance and increases the heat generated says Garavandad et al. [26] and CIGREWG [27].
2.4 Pixel Analysis The heat spread is calculated by counting the number of pixels that each color occupies, yellow represents the hotter region and green represents the hottest region as shown in Fig. 4.
688
R. Vidhya et al.
Fig. 4 a Thermal image of 1 kVA transformer, b yellow color mask—indicating the hotter region and c green color mask—indicating the hottest region
2.5 Multiple Linear Regression (MLR) MLR is used to determine a mathematical relationship between several variables. According to Nimon and Oswald [28], the term R-squared stands for coefficient of determination and is used to measure the variation in the output by the variation that occurs in the independent variables. P-value tells how likely the calculated t-statistics has occurred by chance if the null hypothesis was true.
3 Equipment Condition Monitoring The operating condition of a 1 kVA single-phase transformer and DC motor is monitored using the experimental test bed in the laboratory by employing a non-invasive method.
3.1 Transformer A 1 kVA transformer is considered as shown in Fig. 5, for study and analysis. This transformer operates on a single-phase power supply of 230 V with a full load current of 4.3 A as described by Rigatos and Siano [29]. Fault Condition. One of the major faults that are prone to occur is the insulation problem. It is indicated by the rise in the temperature. Under overloading conditions, temperature increases and has an adverse effect on the life of the transformer. Insulation breakdown occurs in the windings of the transformer due to the aging factor of the transformer. Bushing failures occur due to prolonged exposure to extreme electrical, thermal, mechanical and environmental stresses. The on-load tap changer (OLTC) failures occur due to aging, according to Zarshenas and Suzuki [30].
Computer Vision-Based Electrical Equipment Condition Monitoring …
689
Fig. 5 1 kVA transformer
3.2 Motor A direct current (DC) motor is used for study that operates on 230 V under various load conditions. Fault Condition. Heat is the most common cause of motor failure. Every increase of 10° centigrade of a motor’s windings above its designated operating temperature cuts the life of the motor’s windings insulation by 50%.
3.3 Experimental Setup The behavior of the 1 kVA transformer was studied under various load conditions from 0 to 4.3 A (normal) and 4.7 A (overload) for different time durations using the test setup shown in Fig. 6. Images of the transformer were captured and processed. The images were then denoised. The same process was repeated with a DC motor.
690
R. Vidhya et al.
Fig. 6 Experimental setup of the transformer under study
4 Results and Discussion 4.1 Transformer The YOLO model is tested for custom objects. As per Figs. 7 and 8, the equipment and its components are labeled. An output window is obtained as shown in Fig. 9.
Fig. 7 a Input image, b transformer identification and c component identification and no-fault detection
Fig. 8 a Input image, b transformer identification and c component identification and fault detection
Fig. 9 Output window
Computer Vision-Based Electrical Equipment Condition Monitoring …
691
Physical Fault Detection Thermal Analysis. The heat spread is compared with the fixed threshold. If the number of pixels indicating the heat spread identified exceeds the fixed threshold value, the equipment is said to operate in an overload/faulty condition, else the equipment is working under good condition. From Fig. 10, longer the time period, greater the temperature. It is also noted that the temperature increase is linear with respect to load. The values are given in Table 3 (normal condition), Table 4 (overloading condition) and Table 5 (short circuiting condition) before processing the thermal images.
Fig. 10 a Heat spreads in the transformer over time and b number of pixels of heat spread in the transformer with respect to load
Table 3 Analysis before processing the thermal images (normal working condition) Time duration
Load current (A)
Input voltage (V)
Heat spread pixel count
Mean
SD
Entropy
Error %
Accuracy
Initial time
0
230
0
137.28
75.44
6.052
41.510
58.489
15 min
1
220
2400
146.6
82.52
5.811
94.064
5.930
30 min
1
220
8419
161.86
69.06
5.413
2.076
97.923
45 min
2
218
11,231
119.45
65.54
6.967
3.066
96.933
1h
2
218
12,149
161.33
68.96
5.469
0.120
99.879
1h 15 min
2.8
214
19,867
127.89
69.76
6.832
41.822
58.177
1h 30 min
2.8
214
32,334
134.67
59.51
6.278
62.479
37.520
1h 45 min
3.8
210
41,292
123.76
68.37
7.017
36.616
63.383
2h
3.8
210
48,481
138.38
72.1
6.545
58.537
41.462
692
R. Vidhya et al.
Table 4 Analysis before processing the thermal images (overload condition) Time duration
Load current (A)
Input voltage (V)
Heat spread pixel count
Mean
SD
Entropy
Error %
Accuracy
Initial time
4.7
208
65,470
164.64
80.95
5.379
6.161
93.832
5 min
4.7
208
68,238
143.22
84.58
5.911
4.797
95.202
10 min
4.7
208
82,149
129.2
78.92
6.7767
3.104
96.895
15 min
4.7
208
85,208
134.78
77.68
6.107
21.616
78.384
Table 5 Analysis before processing the thermal images (short circuit condition) Time duration
Load current (A)
Input voltage (V)
Heat spread pixel count
Mean
SD
Entropy
Error %
Accuracy
Initial time
4.3
10
82,024
125.91
77.66
6.978
31.470
68.521
10 min
4.3
10
79,406
148.54
83.04
5.924
29.720
70.277
4.2 Motor Images of the motor with and without fault condition were collected and trained using YOLO real-time object detection system. The results obtained are shown in Fig. 11.
Fig. 11 a Motor identification, b component identification and c no-fault detection
Computer Vision-Based Electrical Equipment Condition Monitoring …
693
Table 6 Thermal images and their respective heat spread analysis of motor Line current (A)
Source voltage (V)
Field current (A)
Speed (RPM)
Load
Heat spread pixel count
Mean
SD
Entropy
4.4
210
0.65
1492
0
18,431
103.73
61.23
6.520
4.4
210
0.65
1496
5
20,348
106.44
63.5
6.611
11
206
0.5
1496
7.8
22,312
102.15
61.34
6.505
11
206
0.5
1496
7.8
27,948
101.33
59.04
6.689
15.4
202
0.5
1820
7.8
28,250
102.06
59.8
6.652
16.8
200
0.5
1516
9.8
30,913
102.31
60.62
6.774
Fig. 12 Heat spread with respect to a line current and b load
Thermal Analysis. The same procedure was repeated for the motor. The following observations were obtained as in Table 6, and the extent of heat spread is plotted as shown in Fig. 12.
4.3 Feasibility Analysis Machine learning techniques could be adapted for predictive and preventive maintenance of the electrical equipment by Chilukuri et al. [31] and Elsisi et al. [32]. MLR before processing the thermal image of the 1 kVA transformer. Using the Eq. (1), the operating condition of the system is predicted and the MLR predictor variables before processing the thermal images were calculated as in Table 7. The histogram and normal probability of residues is illustrated in Fig. 13. Operation condition = − 11.7864 + (0.0216 ∗ Mean) + (0.0726 ∗ Standard deviation) + (0.7863 ∗ Entropy)
(1)
694
R. Vidhya et al.
(OP * 1-normal, 2-overload, 3-short circuit condition). MLR calculation after processing the thermal image of the 1 kVA Transformer. Using Eq. (2), the operating condition of the system is predicted and the MLR predictor variables after processing the thermal images were calculated as in Table 9. The histogram and normal probability of residues are illustrated in Fig. 14. Operation condition(OP∗) =0.838 + (0.0134 ∗ Mean) + (0.0003 ∗ Standard deviation) + (0.2063 ∗ Entropy)
(2)
(OP * 1-normal, 2-overload, 3-short circuit condition). Table 7 MLR calculation before processing thermal images Predictor
Coefficient
Estimate
Standard error
t-statistic
p-value
Constant
Bβ 0
− 11.786
10.846
− 1.086
0.300
Mean
Bβ 1
0.021
0.036
0.589
0.567
Standard deviation
Bβ 2
0.072
0.023
3.159
0.009
Entropy
Bβ 3
0.786
0.897
0.876
0.399
Fig. 13 a Histogram of the residues and b normal probability plot of residues
Fig. 14 a Histogram of the residue and b normal probability plot of residues
Computer Vision-Based Electrical Equipment Condition Monitoring …
695
Table 8 Summary of overall fit before processing the thermal image R-squared
r 2 = 0.482
Adjusted R-squared
r 2 adj = 0.341
Residual standard error
0.6031 on 11 degrees of freedom
Overall F-statistic
3.4194 on 3 and 11 degrees of freedom
Overall p-value
0.056
Table 9 MLR calculation after processing the thermal image Predictor
Coefficient
Estimate
Standard error
t-statistic
p-value
Constant
β0
0.838
0.306
2.735
0.019
Mean
β1
0.013
0.012
1.038
0.321
Standard deviation
β2
0.000
0.008
0.035
0.972
Entropy
β3
0.206
0.572
0.360
0.725
Table 10 Summary of overall fit after processing the thermal image
R-squared
r 2 = 0.83
Adjusted R-squared
r 2 adj = 0.783
Residual standard error 0.3457 on 11 degrees of freedom Overall F-statistic
17.901 on 3 and 11 degrees of freedom
Overall p-value
0.0002
Inference. In both the cases of the transformer and the motor, the p-value decreases from 0.056 to 0.0002 before and after processing the thermal image of the 1 kVA transformer as shown in Tables 8, 9 and 10, respectively, which indicates that the operating condition can be predicted accurately after processing the image. The results obtained for the motor were the same as that of the transformer.
4.4 Accuracy of the Output The accuracy of the output was derived by calculating the error of the system. Accuracy and error are compliments of each other. If the error is high, then it can be interpreted that the system has very less accuracy, and hence, the system does not give a proper/desired output and vice versa. (OP* 1-normal, 2-overload, 3-short circuit condition). Comparing Tables 3, 4 and 5 (before processing the thermal images) and Table 11 (after processing the thermal images), it is found that the error is less, the prediction accuracy is high when the thermal images are processed and the maximum prediction accuracy of the proposed method is compared with similar fault identification strategies as shown in Table 12.
2.8
2.8
3.8
3.8
4.7
4.7
4.7
1h1m
1 h 30 m
1 h 45 m
2h
Initial time
5 min
10 min
4.3
2
1h
10 min
2
45 min
4.7
1
30 min
4.3
1
15 min
Initial time
0
Initial time
15 min
Load current (A)
Time duration
10
10
208
208
208
208
210
210
214
214
218
218
220
220
230
Input voltage (V)
79,406
82,024
85,208
82,149
68,238
65,470
48,481
41,292
32,334
19,867
12,149
11,231
8419
2400
0
Heat spread pixel count
Table 11 Analysis of thermal images after image processing
75.335
93.994
84.720
92.423
66.382
46.973
7.445
5.798
6.553
14.92
1.283
1.559
1.559
1.559
1.559
Mean
14.021
120.696
117.732
119.773
109.143
96.414
41.084
35.625
38.866
58.777
15.463
13.675
13.675
13.675
13.675
SD
1.874
2.168
2.149
2.101
1.670
0.777
0.777
0.575
0.425
0.525
0.298
0.533
0.533
0.533
0.533
Entropy
3
3
2
2
2
2
1
1
1
1
1
1
1
1
1
OP*
24.387
13.964
21.29
27.782
9.685
7.954
11.08
4.513
2.535
16.39
7.859
2.689
2.689
2.689
2.689
Error
75.612
86.035
81.701
82.210
90.312
92.045
88.951
95.486
97.469
83.608
92.140
97.310
97.310
97.310
97.310
Prediction accuracy
696 R. Vidhya et al.
Computer Vision-Based Electrical Equipment Condition Monitoring … Table 12 Comparison of prediction accuracy
697
Technique
Prediction accuracy
Thermal modeling
96.4
Deep one-dimension convolutional neural 94.8 network (1D-CNN) Proposed method
97.47
5 Conclusion Identification of the components presents in the electrical equipment through image processing to monitor the operating condition using thermal images, followed by the detection of the significant physical defects in the electrical equipment is done. The work progress included exploration of different denoising methods for obtaining the enhanced equipment image and training the YOLO model with few normal/faulty equipment images. The proposed work leads to increasing the life-time of electrical equipment by detecting faults earlier. In this way, computer vision and image processing can make the preventive/maintenance mechanisms smarter. This work could be extended to replace the need for humans to monitor and maintain the defect conditions over large substations, thereby increasing human safety. Acknowledgements We thank Dr. N.R. Shanker, Professor, Department of Computer Science and Engineering, Aalim Muhammed Salegh college of Engineering, Chennai for his continuous support.
References 1. Rigatos G, Siano P (2016) Power transformers condition monitoring using neural modeling and the local statistical approach to fault diagnosis. Electr Power Energy Syst 2. Han Y, Song YH (2003) Condition monitoring techniques for electrical equipment-a literature survey. IEEE Trans Power Deliv 18(1) 3. Huda ASN, Taib S (2013) Application of infrared thermography for predictive/preventive maintenance of thermal defects in electrical equipment. Appl Therm Eng 61 4. Pal D, Meyur R, Menon S, Reddy MJB, Mohanta DK (2018) Real-time condition monitoring of substation equipment using thermal cameras. IET Gener Trans Distrib 5. Shiravand V, Faiz J, Samimi MH, Kerman MM (2020) Prediction of transformer fault in cooling system using combined advanced thermal model and thermography. IET Gener Transm Distrib 6. Zou H, Huang F (2015) A novel intelligent fault diagnosis method for electrical equipment using infrared thermography. Infrared Phys Technol 7. Junos MH, Khairuddin ASM, Thannirmalai S, Dahari M (2021) An optimized YOLO based object detection model for crop harvesting system. IET Image Process 8. Alvarez D, Rivera S, Mombello E (2019) Transformer thermal capacity estimation and prediction using dynamic rating monitoring. IEEE Trans Power Deliv 9. Djamali M, Tenbohlen S (2017) Malfunction detection of the cooling system in air-forced power transformers using online thermal monitoring. IEEE Trans Power Deliv 10. Liang Z, Parlikad A (2018) A Markovian model for power transformer maintenance. Int J Electr Power Energy Syst
698
R. Vidhya et al.
11. Murugan R, Ramasamy R (2015) Failure analysis of power transformers for effective maintenance planning in electric utilities. Eng Fail Anal 12. Raja TM, Shankarb S, Rajasekar R, Sakthivela NR, Pramanik A (2020) Tool condition monitoring system: a review. J Mater Res Technol 13. Djamali M, Tenbohlen S (2017) A validated online algorithm for detection of fan failures in oil-immersed power transformers. Int J Therm Sci 116:224–233 14. Wang X, Guo J, Lu S, Shen C, He Q (2017) A computer-vision-based rotating speed estimation method for motor bearing fault diagnosis. Meas Sci Technol 15. Tian YN, Yang GD, Wang Z, Wang H, Li E, Liang ZZ (2019) Apple detection during different growth stages in orchards using the improved YOLO-v3 model. Comput Electron Agric 16. Ammar K, Al-Musawiab, Anayib F, Packianather M (2020) Three-phase induction motor fault detection based on thermal image segmentation. Infrared Phys Technol 17. Tiana C, Feic L, Zhengd W, Xua Y, Zuof W, Lin C-W (2020) Deep learning on image denoising: an overview. Neural Netw 18. Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Soc Ind Appl Math 19. Maggioni M, Katkovnik V, Egiazarian K, Foi A (2013) Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans Image Process 20. Buades A, Coll B, Morel JM (2011) Non-local means denoising. Image Process Line 21. Zhang X (2021) Center pixel weight based on wiener filter for non-local means image denoising. Optik Int J Electr Optics 244 22. Huang TS, Yang GJ, Tang GY (1979) A fast two-dimensional median filtering algorithm. IEEE Trans Acoustics. Speech Signal Process 27(1):13–18 23. Ma Y, Shao X, Wang Z (2019) Research on image-based oil level monitoring technology for power transformer pillow. Int J Sci 6(4) 24. Wang M, Zheng S, Li X, Qin X (2014) A new image denoising method based on Gaussian filter. Int Conf Inf Sci Electr Electric Eng 25. Ge L, Dan D, Li H (2020) An accurate and robust monitoring method of full-bridge traffic load distribution based on YOLO-v3 machine vision, Wiley 26. Garavandad AT, Ahmadi H, Omid M, Mohtasebi SS, Mollazad K, Smith AJR, Carlomagno GM (2015) An intelligent approach for cooling radiator fault diagnosis based on infrared thermal image processing technique. Appl Thermal Eng 87:434e443 27. CIGREWG A2.38 (2016) Transformer thermal modeling, Technical Brochure 659 28. Nimon KF, Oswald FL (2013) Understanding the results of multiple linear regression: beyond standardized regression coefficients, University of North Texas, Department of Learning Technologies, Denton, TX, USA 29. Rigatos.G, Siano P (2016) Power transformers’ condition monitoring using neural modeling and the local statistical approach to fault diagnosis. Electr Power Energy Syst 30. Zarshenas A, Suzuki K (2018) Deep neural network convolution for natural image denoising. In: IEEE International conference on systems, man, and cybernetics (SMC), IEEE, pp 2534–2539 31. Chilukuri SK, Mohan JSS, Gokulakrishnan S, Mehta RVK, Suchita AP (2020) Effective predictive maintenance to overcome system failures-a machine learning approach—proceedings of CIS 1:341–357 32. Elsisi M, Tran MQ, Mahmoud K, Mansourf DA, Lehtonen M, Darwish MMF (2022) Effective IoT-based deep learning platform for online fault diagnosis of power transformers against cyberattacks and data uncertainties. Measurement 190:150–167
Deep CNN Model with Enhanced Inception Layers for Lung Cancer Identification Jaya Sharma and D. Franklin Vinod
Abstract The worldwide statistical data shows that the timely diagnosis of pulmonary cancer uncovers the complexities of the disease and increases the possibility of cure at an early stage. In computed tomography, a computerized diagnosis of pulmonary nodules with a broad gamut of impressions poses a great challenge which is still to be taken into consideration in order to deal with it. However, having applied the techniques of deep learning, it has been a leading and robust system in a variety of image diagnostic fields. Deep learning has an important role because it can detect the nodules of different types, such as size and the location of the pulmonary from the three-dimensional CT images. In this research, we conceived a CNN-based multilayer deep learning model to identify the benign and malicious pulmonary nodules. The proposed arrangement is validated on the LIDC-IDRI databases and achieved an efficient outcome with 96.70% accuracy, which helped in boosting the performance by minimizing the error rate. The automated detection of epithelial cells/cancer regions would help the pathologist to a great extent and speed up the whole process significantly. Keywords Computed tomography scan · Pulmonary nodule diagnosis · Deep learning · Convolutional neural network (CNN)
1 Introduction As per the data examined by researchers, the big data plays a significant role in the computer vision field. Over the last few years, data has been generated extremely fast in the digitized world, so managing and analyzing the data have been a very challenging task. The big data is organized as Five V’s, the “volume” speaks how much amount of data is dealt with, “variety” is defined as how many different ways we can use the data, “velocity” refers to how fast that data is being generated and J. Sharma (B) · D. F. Vinod Department of Computer Science and Engineering, Faculty of Engineering and Technology, Delhi-NCR Campus, SRM Institute of Science and Technology, NCR Campus, Delhi-Meerut Road, Modinagar, Ghaziabad, UP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_50
699
700
J. Sharma and D. F. Vinod
analyzed, “veracity” defines the quality of the data, and the last “V” of big data is a value that refers to the worth of the data. There are various challenges with big data that in those we focus on the classification problem. In this paper, we classify pulmonary disease (cancer versus non-cancer) and overcome the problem of classification with more accurate results. We know the fact that nodules or pulmonary cancer have become one of the most prevalent diseases throughout the world which is normally diagnosed radiologically using laboratory analysis like biopsy and imaging technologies like magnetic resonance imaging (MRI), computed tomography or computerized X-ray imaging procedure (CT) scan, positron emission tomography (PET) scan X-ray (chest radiograph), etc. The cancer nodules are formed due to the unusual development of tissues in the human anatomy. These abnormal tissues can be grown anywhere in the human body directly (spread) or by the lymphatic flowing or blood-borne extension [1]. If cancer root starts from the pulmonary, it is primary pulmonary cancer called carcinoma. Carcinoma is a malignant tumor that develops in cells that line the inner or outer surface of the body. The part of the body itself decides which type of cancer exists. The most common type of cancer that causes maximum death is pulmonary cancer or bronchial carcinoma [2]. We can classify pulmonary cancer in two ways. The first one is non-small cell pulmonary cancer (NSCLC); around 85% cases come into this category. We can further categorize the first type into three parts which are adenocarcinoma mostly discovered in smokers, epidermoid carcinoma mostly emerges in the wider skyways of the pulmonary, and large cell undifferentiated carcinoma can be diagnosed anywhere in the pulmonary. It extends and transmits fast and is hard to treat. This kind of cancer is the speedy-spreading pulmonary cancer which is categorized under type small cell pulmonary cancer (SCLC). The second one is SCLC; the rest 15% of cases come under this category. This type of cancer grows and transmits more quickly than NSCLC, and it is diagnosed late when it spreads in the maximum part of the body. Since it grows fast, therapies like chemotherapy and radiation are used to slow it down. But it was observed that in most cases, it will spread it all over the body. Some other types of pulmonary tumors are also available as pulmonary carcinoid tumors, adenoid cystic lymphomas, carcinomas, and sarcomas. Many cancers can start from other parts of the body and come into the pulmonary, but this type of cancer is not pulmonary cancer. It is very important to identify the root organ and severity of cancer before starting any treatment. The cancer stage or severity describes the location and size of the cyst or how much it had transmitted to the lymph nodes as well as in the other organs. There are four stages of NSCLC grouped as Stage# 1 through Stage# 4, lower side number in stages shows fast recovery, and as we go upper side, it will be hard to recover. If the tumor size is 3 cm or less, it comes into Stage 1 category. Stage 1 tumor is completely removable by surgery, and it does not spread to any lymph nodes. Stage 2 tumors can be further divided into two substages. In Stage 2A, tumor can be 4 cm but not more than 5 cm in size, and also this stage tumor does not spread to the nearby lymph nodes. In Stage 2B, tumor possibilities can be: (i) size can be 5 cm and can spread to the nearby lymph nodes; (ii) size can be more than 5 cm, but it will not spread
Deep CNN Model with Enhanced Inception Layers for Lung Cancer …
701
to the nearby lymph nodes. Stage 3 cancer can be further divided into 3A, 3B, and 3C categories. The 3A category possibilities are: (i) size can be maximized by 5 cm and contaminated to the nodes between the lymphatic vessels in the half of the chest (propagated to the lymphatic nodes in the center of the chest); (ii) can find more than one tumors in the same lobe of the pulmonary, and tumor size can vary from 5 to 7 cm; (iii) if tumor size goes more than 7 cm but has not transmitted till lymph nodes but the possibility can be spread in the muscle under the pulmonary, a spinal bone, and the main blood vessel. The 3B category cancer can transmit to the opposite side of the chest from the source pulmonary or the neck with less than 5 cm. In the last 3C category, tumor size can be exceeded by 5–7 cm and transmitted into the heart area. At Stage 4, the tumor is transmitted to many areas in one or more body parts. The main significance of the paper is to build an automatic diagnostic system for the problem of classification without nodule segmentation or detailing handcrafted features based on an improved deep learning model to classify lung cancer for early detection. Deep learning is a good example to build deep learning systems in computer vision and other pattern recognition areas [3–5]. The basic deep learning algorithm behind this work is the convolution neural network (CNN), which takes images as input and automatically learns features from different training images. The CNN model provides additional information and improves more accurate performance of the system.
2 Literature Survey Singh et al. [6] evaluated feature extraction to classify the stages of malignancy from pulmonary CT images. To obtain this, the first preprocessing step is done in the input image by image processing approach which can be done by histogram equalization and thresholding filtering. The thresholding is used to improve the accuracy, to extract the features from the modified image, and finally, the image classification is performed on the neural network. Preprocessing step removes all unnecessary information from the image and highlights only relevant features to find out the accurate result. This process comes under image enhancement which is described into two categories that are spatial domain and frequency domain. In the spatial domain, modification can be done directly on the pixel, whereas in the frequency domain, modification can be done by Fourier transformation of an image. In this work, histogram equalization is used for the spatial domain approach. Histogram equalization is a graphical representation of the frequency occurrence for each gray level in the image. To extract features, the author considered six parameters which are area, perimeters, shape complexity, etc., but accuracy is not up to the mark. Krishna et al. [7] proposed a computerized classification model to detect cancer from CT images with the help of CNN with watershed segmentation and Gaussian filtering for better results. The datasets with 500 CT images that consist of bone, pulmonary, brain, neck, and kidney are used to classify the cancer nodule. The Gaussian smoothing filtering is used to remove blue images, irrelevant data, and noise
702
J. Sharma and D. F. Vinod
from the image. The filtered image will be input for the segmentation process. Here, a region-based watershed segmentation approach is used with ridges and valleys. Texture feature extraction is used that will give color, edges from the image, and finally features given to the CNN model to classify cancer. The performance was measured in terms of accuracy and achieved 94.5% only. Bhat et al. [8] proposed a deep learning model to classify and recognize the pulmonary nodule using CT pulmonary images. The CNN is arranged or trained in such a way to classify and recognize the pulmonary nodules which consist of three convolution blocks. In these convolution blocks, 16, 32, and 64 numbers of kernels are applied individually subsequent to which a 2 × 2 max pooling tier with stride 1 is used. The input image size is 40 × 40. After three convolution blocks, they flatten the values into a one-dimensional vector. Fully connected blocks include two dense layers with ReLU activation function and a dropout layer. Each dense layer consists of 1024 neurons. Dropout is an important technique to reduce the overfitting problem. Finally, the softmax classifier classifies the binary prediction (tumor or not). In this study, 2948 datasets were taken from LIDC-IDRI and using CNN architecture got an overall accuracy of 96.6%. Mhaske et al. [9] designed a deep learning system that processes the CT images to predict and classify pulmonary cancer. The primary goal of this study is to design a system based on a deep learning algorithm that extracts features efficiently and gives absolute and speedy diagnosis results for pulmonary cancer. This work is categorized into three parts. In the first part, images are segmented by OTSU thresholding; in the second part, the CNN arrangement is used to extract the feature. At last for classification, RNN-LSTM is used to classify pulmonary cancer. In this work, the LIDC dataset is used that contains images in DICOM format. Before the images are fed into the segmentation method, it is converted from DICOM format to.png or.jpeg format. GoogleNet architecture is used as a CNN model which consists of 22 deep layers, in that 9 layers consist of inception modules and achieved 97% accuracy. Mukherjee and Bohra [10] proposed a machine learning approach to diagnose pulmonary cancer disease. The primary goal of this work is to build a framework based on AI to predict tumor growth and identify stages of tumor inefficient time with minimal human effort. This model predicts cancer stages which can be type 1 to type 4. Types of cancer classify the severity of cancer. This AI-based model includes the following phases: image processing which includes the function for enhancing the image, image filtering is used in a CNN model for feature extraction, edge detection is an essential tool in image processing to determine points at which the image brightness changes instantly, and additionally to detect the edge, canny edge detection operator is used. Abdul [11] proposed a CNN-based automatic detection and classification system to identify the tumor whether it is malignant or benign using deep learning. CNN includes convolutional operation, pooling operation, and a fully connected layer. This study is based on CNN architecture, which has 32 kernels followed by the ReLU activation function. The second convolutional tiers consist of a group of 16 kernel with ReLU activation functions in conjunction with a pooling layer of kernel size 2.
Deep CNN Model with Enhanced Inception Layers for Lung Cancer …
703
This proposed architecture returned a calculated accuracy, sensitivity, and specificity of 97.2%, 95.6%, and 96.1%, respectively. Zheng et al. [12] proposed a deep learning convolutional neural network based on maximum intensity projection that detects automatic pulmonary nodules using CT scans. The author focused on radiological evaluation in maximum intensity projection (MIP) images to enhance the result in terms of detecting pulmonary nodules using CT images through CNN architecture. To classify pulmonary nodules through their morphologies, the author developed a deep learning model with MIP images that use different slab dimensions as input related to thickness and axial section slices. The LIDC-IDRI datasets used that consist of slice thickness vary from 0.6 to 5.0 mm. Nodules are categorized in three ways: the first one is non-nodules, the second one is nodules with more than or equal to 3 mm in diameter, and the third one is nodules with less than 3 mm in diameter. The proposed system is a MIP-based CAD system that holds a low false positive rate. Furthermore, some improvements are still open for future work, can optimize the model, can be focused on false positive rate, and can be focused on small nodules for the significance of the performance.
3 Materials and Methods In the following section, we will see the main steps, i.e., data to be used and the decision of hyper-parameters needed to train a CNN model and the proposed deep CNN model.
3.1 Data Preprocessing In this study for the classification of pulmonary cancer, we have considered the CT images of the impressions in lungs that are sourced from LIDC-IDRI dataset [13]. Identifying malignancy in the dataset is decided by five levels which are extremely unlikely, adequately unlikely, undefined, adequately suspicious, and extremely suspicious. From this dataset, 300 CT images are used and around 3500 pulmonary nodules are evaluated in this model. The input CT image size is 512 × 512 pixels in DICOM format, and the unit of measurement of the radio density is the Hounsfield Unit (HU). Each voxel or 3D pixel in the CT images has an attenuation value which is described by scale in Table 1. The preprocessing step is established to erase noise generated during an image acquisition process to increase the classification accuracy of the model and enhance the quality of the image. This work is implemented in TensorFlow 1.8 which is an open-source framework designed by Google to solve problems using machine learning and deep learning techniques. This TensorFlow framework is utilized with Intel core i5 processor, 8 GB RAM, and 64-bit Operating System. To verify our work, we used LIDC-IDRI dataset which is available online.
704 Table 1 Malignancy marked with five levels
J. Sharma and D. F. Vinod Air
− 1000
Pulmonary
From − 400 to + 80
Fat
From − 60 to − 100
Water
0
Soft tissue
From + 40 to + 80
Bone
From + 400 to + 1000
3.2 Slice Fabrication and Image Enhancement The LIDC-IDRI is used as a dataset in this work. The whole images which are relevant are considered in the DICOM template and contain a size of 512 × 512 pixels. To generate the slices, more relevant nodules are detected and divide into two stages. In the first stage, we find out the center coordinate values of the nodule (positive nodule) and non-nodule (negative nodule) provided in the annotation file. And then, we expand the extracted center coordinates in the second stage to form some square slices. From this strategy, a total of 19,250 slices of size 64 × 64 were extracted for the benign and malignant tumors and labeled as class 0 and class 1, respectively, in the LIDC-IDRI database.
3.3 Proposed Deep CNN Model The proposed model works on a classical deep convolutional neural network that has the capabilities to detect feature patterns in all the layers. It starts from lower layers and encounters basic features by using the network weights and learned parameters. At the same layer, it provides input to the next higher layers and detects more complex features. The model of CNN consists of many learning tiers which are known as convolution tier, downsampling using max pooling tier, activation function, fully connected tier, and many more [14]. The first layer of CNN, which is the convolution layer, utilizes a mathematical operation that selects a filter mask and slides all over the input image from the left of the image to the right and top of the image to the bottom and constructs the feature maps. The basic aim of the convolution layer is to obtain the local characteristics of the image. The resulted product of the convolutional tier is given to the activation functions and produces nonlinearity and creates the model to learn respective features from each filter. The primary role of the pooling tier is to minimize the spatial measurable extent of the feature maps and supports reducing the computational resources required for the formation of the feature maps by minimization of dimensionality. The proposed architecture consists of several sequential convolutional layers succeeded by a downsampling max pooling tier and absolutely connected tiers as displayed in Table 2. The dimension of the input image is 64 × 64 which is given to
Deep CNN Model with Enhanced Inception Layers for Lung Cancer …
705
Table 2 Overview of the layers and the filter mask used in our proposed model (for the binary classification). Conv and MP represent the convolution operation and max pooling, respectively, in the architecture Type/stride
Filter mask
Input image
Conv/S-1
3 × 3 × 32
64 × 64 × 16
MP
2×2
64 × 64 × 32
Conv/S-1
3 × 3 × 64
32 × 32 × 32
Conv
1 × 1 × 128
32 × 32 × 64
MP
2×2
32 × 32 × 128
Inception layers
3 × 3 Conv/2[3 × 3 Conv]/2 × 2 MP
16 × 16 × 128
Conv/S-1
3 × 3 × 1024
8 × 8 × 512
MP
2×2
8 × 8 × 1024
FC
1024 × 1000
4 × 4 × 1024
FC
1024 × 1000
1 × 1 × 1024
Softmax
Classifier
1 × 1 × 1000
the architecture. This model has a pointwise convolutional block that acts as a basic convolutional tier with kernel size 1 × 1. The main purpose of pointwise convolution is to reduce the slice depth or a number of feature maps [15]. For convolutional operation, we select a 3 × 3 kernel size mask and slide it all over the image. Single convolutional layers are followed by the subsampling layers to trim the entire dimension of the output image which is to be sent to the next layers. For the subsampling layer, a max pool is used that selects the maximum pixel element value from the window size 2 × 2 with stride 1. To get the one-dimensional vector of numerals, we flatten the feature maps after the convolution block and feed as input to the fully connected block. This fully connected block includes dense layers with activation and dropout layers. ReLU is utilized for activation function in all the tiers, and dropout is a way to regularize the neural network which supports minimizing interdependent learning of neurons. The ultimate part of the absolutely connected layer is a classifier termed “softmax” that gives a prediction of binary classification. Figure 1 represents the graphical structure of the architecture which consists of all operations of the model.
4 Results and Discussion By statistical measures including specificity, sensitivity, accuracy, and the area under the curve (AUC) [16], we identified and compared the performance of our system. There are 3500 pulmonary nodules evaluated for comparisons with ground truth. The dataset included 300 CT images which are split into 75% and 25% with respect to training and testing, respectively, and used a tenfold cross-verification test for
706
J. Sharma and D. F. Vinod
Fig. 1 Outline of the projection convolution architecture
evaluating the robustness of the system. It is observed that if we get a higher value of the AUC, the mean model achieves significant categorization results. The dataset includes 85 number images for pulmonary tumor type 1, 45 number images for pulmonary tumor type 2, 75 number images for pulmonary tumor of type 3, and 95 number images of pulmonary tumor for type 4. The proposed model is evaluated against olden models that classify cancer (benign and malignant) using machine learning techniques. As illustrated in Table 3, Nascimento et al. [17] used LIDC dataset with 73 CT images (47 benign and 26 malignant) and classified the diseases using a support vector machine (SVM), and they achieved 92.78% accuracy, QingZeng Song et al. [18] classified the lung nodules using CT images. The LIDCIDRI datasets used 244,527 images and are classified by the deep neural network by achieving 84.15% accuracy. Da Silva et al. [19] used 8296 CT images from the LIDC-IDRI dataset by classifying convolution neural networks with 82.3% accuracy. Hua et al. [20] classified lung nodule using deep learning techniques with 2545 CT images from the LIDC dataset with 73.30% sensitivity and 78.70% specificity. Hence, it is proved that our model produces more efficient results than others as shown in Fig. 2. Some convolution works represented in Table 3 are slightly near to our model because of a tiny set of images used in our work.
Deep CNN Model with Enhanced Inception Layers for Lung Cancer …
707
Table 3 Comparison of the proposed model with other CNN models Work
Database
Accuracy (%)
Sensitivity (%)
Specificity (%)
Nascimento et al. [17]
LIDC
92.78
85.64
97.89
Song et al. [18]
LIDC
84.15
83.96
84.32
Da Silva [19]
LIDC-IDRI
82.3
79.4
83.8
Hua et al. [20]
LIDC
–
73.30
78.70
Proposed work
LIDC-IDRI
87.75
89.80
96.70
120 100 80 60 Accuracy (%) 40
Sensivity (%)
20
Specificity (%)
0
Fig. 2 Performance analysis
5 Discussion and Conclusion As it is evident universally, the timely diagnosis of pulmonary cancer extends the life of patient expectancy rate. This motivates us to propose a proven computerized system in order to categorize the pulmonary nodules without the need for clinical experts. For early diagnosis, a multilayer deep learning algorithm-based pulmonary nodule detection model is developed for pulmonary disease. The established system is validated and estimated on databases, i.e., LIDC-IDRI and produces an efficient outcome of 96.70%. The enhancement of the convolutional neural network architecture efficiently classifies the pulmonary nodules from the CT images. Analysis of the results obtained from the assessment of the datasets has proven that enhanced CNN outstands the classification rate by identifying the pulmonary nodules efficiently in comparison with the old methods.
708
J. Sharma and D. F. Vinod
References 1. Falk S, Williams C (2010) Pulmonary cancer—the facts. 3rd edn Oxford University Press 2. White V, Ruparelia P (2020) Kumar & Clark’s Clinical Medicine. Elsevier, p 975. ISBN 978-0-702078705 3. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention_MICCAI (Lecture notes in computer science), vol 9351. Cham, Switzerland, Springer, pp 234–241 4. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu 21–26:2117–2125 5. Lin TY, Goyal P, Girshick R (2017) Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision, Venice, Italy, 22–29 Oct 2017, pp 2980–2988 6. Singh S (2016) An evaluation of features extraction from Pulmonary CT images for the classification stage of malignancy. IOSR J Comput Eng 01:78–83 7. Krishna A Srinivasa Rao PC, Basha CMAKZ (2020) Computerized classification of CT pulmonary images using CNN with watershed segmentation. In: Second international conference on inventive research in computing applications (ICIRCA), pp 18–21. https://doi.org/10. 1109/ICIRCA48905.2020.9183203 8. Bhat S, Shashikala R, Kumar S, Gururaj K (2020) Convolutional neural network approach for the classification and recognition of pulmonary nodules. In: 4th International conference on electronics, communication and aerospace technology (ICECA), pp 1310–1314. https://doi. org/10.1109/ICECA49313.2020.9297626 9. Mhaske D, Rajeswari K, Tekade R (2019) Deep learning algorithm for classification and prediction of pulmonary cancer using CT scan images. In: 5th International conference on computing, communication, control and automation (ICCUBEA), pp 1–5. https://doi.org/10. 1109/ICCUBEA47591.2019.9128479 10. Mukherjee S, Bohra SU (2020) Pulmonary cancer disease diagnosis using machine learning approach. In: 3rd International conference on intelligent sustainable systems (ICISS), pp 207– 211. https://doi.org/10.1109/ICISS49785.2020.9315909 11. Abdul W (2020) An automatic pulmonary cancer detection and classification (ALCDC) system using convolutional neural network. In: 13th International conference on developments in esystems engineering (DeSE), pp 443–446. https://doi.org/10.1109/DeSE51703.2020.9450778 12. Zheng S, Guo J, Cui X, Veldhuis R, Oudkerk M, van Ooijen P (2020) Automatic pulmonary nodule detection in CT scans using convolutional neural networks based on maximum intensity projection. IEEE Trans Med Imaging 39(3):797–805. https://doi.org/10.1109/TMI.2019.293 5553 13. Armato III SG, McLennan G et al. (2011) The pulmonary image database consortium (lidc) and image database resource initiative (idri). a completed reference database of pulmonary nodules on ct scans. Medical physics 38(2):915–931 14. Heaton J, Goodfellow I, Yoshua B, Courville A (2016) Deep learning. Genet. Program Evolvable Mach 19. https://doi.org/10.1007/s10710-017-9314-z 15. Howard AG, Zhu M, Chen B, Kalenichenko et al. (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. ArXiv. abs/1704.04861 16. Metz C, Herman B, Shen J (1998) Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 17:1033–1053 17. Nascimento LB, de Paiva AC, Silva AC (2012) Pulmonary nodules classification in CT images using Shannon and Simpson diversity indices and SVM. Mach Learn Data Min Pattern Recog 454–466
Deep CNN Model with Enhanced Inception Layers for Lung Cancer …
709
18. Song Q, Zhao L, Luo X, Dou X (2017) Using deep learning for classification of pulmonary nodules on computed tomography images. J Healthc Eng 2017(2–17):8314740 19. da Silva G, Silva AC, de Paiva A, Gattass M (2018) Classification of malignancy of lung nodules in CT images using convolutional neural network 20. Hua KP, Hsu CH, Hidayati et al. (2015) Computer-aided classification of pulmonary nodules on computed tomography images via deep learning technique. In: OncoTargets and therapy, vol 8, pp 2015–2022
Impact of Dimensionality Reduction on Membership Privacy of CNN Models Ashish Kumar Lal and S. Karthikeyan
Abstract Dimensionality reduction is an essential tool for exploratory data analysis, classification and clustering, manifold learning, and preprocessing in deep learning. The curse of dimensionality is a well-recognized serious problem in machine learning. Many researchers have attempted to improve machine learning accuracy and performance with dimensionality reduction. Nowadays, machine learning models’ privacy vulnerability is an essential quality measure. The effect of dimensionality reduction on the privacy leakage of deep learning models is an understudied research area. This work explored the effects of dimensionality reduction on the privacy leakage of the deep learning model. The experiments for image classification using CNN model were performed on the widely used Cifar10 dataset. The results show that although the PCA technique improves the classification task, it does not enhance the model’s privacy when tested against the membership inference attack. Keywords Privacy · Dimensionality reduction · Membership inference attack
1 Introduction With the advancement of data mining technology, classification models have become more accurate. However, this also puts the privacy of personal information at greater risk. In image classification, the algorithm not only extracts the features of the image but also extracts the features of the person in the image. These unique features could be misused to identify the person in the image, violating privacy. There are two main ways to protect the privacy of personal information in image classification: data anonymization and data de-identification. Data anonymization is to remove all personal information from the data. Data de-identification removes the information that can identify a specific individual from the data while retaining valuable informaA. K. Lal (B) · S. Karthikeyan Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India e-mail: [email protected] S. Karthikeyan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_51
711
712
A. K. Lal and S. Karthikeyan
tion for image classification. Dimensionality reduction belongs to the method of data de-identification. It is the process of reducing the number of features in a dataset by selecting a subset or combining some features. It retains the information that is useful for image classification. Dimensionality reduction can impact privacy in several ways. First, it can reduce the amount of information available about each individual, making it more difficult to identify individuals in the dataset. Second, it can reduce the accuracy of information about individuals, leading to less precise predictions about individuals’ behavior. Finally, it can reduce the amount of information shared between individuals. This can lead to less information being available to adversaries who may be able to use it to infer sensitive information about individuals. There are trade-offs to be made when reducing dimensionality for privacy purposes. On the one hand, reducing dimensionality too much can make the data too abstract and difficult to use. On the other hand, reducing dimensionality too little can leave too much information about individuals in the dataset and decrease privacy. The first thing to consider is how dimensionality reduction will affect the accuracy of the classification models. In general, reducing the dimensionality of the data will decrease the accuracy of the models because it may inadvertently remove information from the data that the model could use to make better predictions. However, the decrease in accuracy is not always significant and, in some cases, may be negligible. The decrease in accuracy may be offset by the other benefits of dimensionality reduction, such as improved computational efficiency or improved interpretability of the results. Another consideration is how dimensionality reduction will affect the privacy of the data. In general, reducing the dimensionality of the data will increase privacy, as suggested by Liu et al. [1]. Because removing sensitive information from the data may prohibit identifying individuals. This article aims to explore privacy preservation in image classification with dimensionality reduction. In particular, we will discuss how membership privacy of the training set is affected by dimensionality reduction in CNN models. We will also explore how the amount of dimensionality reduction affects the privacy of classification. There are many methods of dimensionality reduction, such as principal component analysis (PCA), linear discriminant analysis (LDA), and uniform manifold approximation and projection (UMAP). In this article, we will use PCA to reduce the dimensionality of data.
2 Background Study 2.1 Membership Inference Attack Membership inference attacks are used to determine whether or not a specific data point was used to train a machine learning model. This type of attack can be used to reverse engineer a machine learning model and potentially learn sensitive information about the data that was used to train it. It can be used to infer sensitive information
Impact of Dimensionality Reduction on Membership Privacy
713
about individuals, such as their medical condition or political affiliation. There are a few different ways to perform a membership inference attack. However, the most common is to train a second machine learning model, also known as the shadow model, to predict whether or not a given data point was used to train the first model. If the second model can accurately predict which data points were used to train the first model, then it is likely that the first model is leaking information about its training data. There are two types of membership inference attacks, white-box, and black-box. In a white-box attack, the attacker has full access to the target machine learning model. This attack is typically used to reverse engineer a machine learning model and learn sensitive information about the data that was used to train it. In a black-box attack, the attacker does not have access to the target machine learning model. This attack is typically used to determine whether or not a specific data point was used to train the machine learning model. A membership inference attack reveals the vulnerability of machine learning models that could lead to an imminent privacy breach. For example, a membership inference attack on a machine learning model diagnosing medical conditions could be used to learn information about the data that was used to train the model. This information could be used to diagnose patients without their consent.
2.2 Privacy Metrics There are a few different privacy metrics used to measure the privacy of a machine learning model. The most common metric is the accuracy of the second machine learning model, which is used to predict whether or not a given data point was used to train the first model. Another metric is the number of false positives and false negatives that the second model produces. A false positive is when the second model predicts that a data point was used to train the first model, but it was not. A false negative is when the second model predicts that a data point was not used to train the first model, but it was. There is also a notion of privacy budget or privacy guarantee metric being used in the differential privacy (DP) model of privacy preservation. This metric is independent of the attack model and is used to bound the amount of information that can be learned about any individual data point. A privacy budget can be interpreted as a limit on the number of queries that can be made about a dataset. Below are some of the metrics used in this work: Membership Advantage The membership advantage was proposed by Yeom et al. [2] as a quantitative metric to measure a membership inference attack’s success. Given a model M and a dataset D, the membership advantage is defined as follows: MA(M, D) = Pr[A(x, M, D) = 1|x ∈ D] − Pr[A(x, M, D) = 1|x ∈ / D] where A(x, M, D) is the membership inference attack and x is a data point. The membership advantage measures the difference in the probability that the attack will
714
A. K. Lal and S. Karthikeyan
correctly identify a data point as being in the training set, given that the data point is actually in the training set and the probability that the attack will correctly identify a data point as being in the training set, given that the data point is not in the training set. A membership advantage of 0 means that the attack is no better than a chance at identifying data points as being in the training set. In contrast, a membership advantage of 1 means that the attack can always correctly identify data points as being in the training set. AUC AUC is Area-under-the-ROC-curve. The AUC is a metric that is used to measure the success of a binary classification model. Given a model M and a dataset D, the AUC is defined as follows: AUC(M, D) = Pr[A(x, M, D) = 1|x ∈ D] where A(x, M, D) is the membership inference attack and x is a data point. The AUC measures the probability that the attack will correctly identify a data point as being in the training set. An AUC of 0.5 means that the attack is no better than a chance at identifying data points as being in the training set, while an AUC of 1.0 means that the attack is always able to correctly identify data points as being in the training set. Privacy Risk Score The privacy risk score was introduced by Song et al. [3] and defined as: “The privacy risk score of an input sample z = (x, y) for the target machine learning model F is defined as the posterior probability that it is from the training set Dtr after observing the target model’s behavior over that sample denoted as O(F, z), i.e., r(z) = P(z∈ Dtr|O(F, z)).” This metric provides a better explanation of fine-grained privacy risks and represents an individual sample’s likelihood of being a training member.
2.3 Dimensionality Reduction Machine learning models face a challenge known as the “curse of dimensionality” when it is designed to be trained on a dataset of high dimensions [4]. This poses challenges in manifolds. Firstly, the number of samples required to train a model increases exponentially with the number of dimensions. Secondly, the model’s generalization ability decreases with the increase in dimensions. Moreover, the explainability of the model decreases. To address these issues, dimensionality reduction can be performed to reduce the number of dimensions in the dataset while preserving as much information as possible. Common dimensionality reduction techniques include: • Feature selection: select a subset of features most relevant to the task. • Feature extraction: transform the data into a lower-dimensional space while preserving as much information as possible.
Impact of Dimensionality Reduction on Membership Privacy
715
Some commonly used feature extraction techniques include principal component analysis (PCA), linear discriminant analysis (LDA), kernel PCA, autoencoders, uniform manifold approximation and projection (UMAP), etc.
3 Literature Review A survey on the earlier attempts to preserve the privacy using anonymity techniques was done by Wang et al. [5]. Anonymity techniques were shown to be insufficient in ensuring privacy if auxiliary data is used in conjunction with the published data, as shown by Narayanan and Shmatikov [6]. Empirical methods to measure privacy vulnerability are imperative to evaluate privacy measures systematically. Various privacy attack methods have been proposed to evaluate the security of privacy protection mechanisms. A detailed review of attacks on machine learning is done by Rigaki and Garcia [7]. This survey conducted a comprehensive study of the stateof-the-art privacy-related attacks and proposed a threat model and a unifying taxonomy of the different types of attacks based on their characteristics. Shafee and Awaad [8] also conducted a survey about the attacks on the shared model and the countermeasures that could be taken to preserve the privacy of sensitive information that is used for training purposes. Yuan and Zhang [9] investigated the privacy risk of neural network pruning. They performed a self-attention membership inference attack and proposed a new defense mechanism to protect the pruning process. One of the most popular privacy attacks on machine learning models is the membership inference attack. Shokri et al. [10] performed the black-box model attack on the models trained in the cloud. The article by Hu et al. [11] conducts first of its kind survey on membership inference attacks. Based on their characterizations, they provide the taxonomies for both attacks and defense and highlight some open research problems. A survey of defense mechanisms against membership inference attacks is done by Nasr et al. [12]. The authors proposed several defense mechanisms against membership inference attacks, including differential privacy, secure multi-party computation, and homomorphic encryption. They also conducted a comprehensive evaluation of the effectiveness of these defense mechanisms against membership inference attacks. Based on the literature review, there appears to be a lack of research on the topic of privacy by using dimensionality reduction. These papers mostly focus on the technical aspects of the dimensionality reduction process and do not discuss the privacy implications of using dimensionality reduction. The work by Tai et al. [13] attempts to address this issue. They used autoencoder to reduce the dimensionality and to measure privacy; they used the concept of K-anonymity. However, not much research followed their work. The application of dimensionality reduction for private data publishing was proposed by Jiang et al. [14]. They proposed two novel differential
716
A. K. Lal and S. Karthikeyan
private methods for publishing data. There is also a lack of empirical studies on the topic, which makes it difficult to assess the effectiveness of dimensionality reduction for privacy protection.
4 Methodology 4.1 Database CIFAR 10 dataset [15] is used for experimentation. This dataset consists of 60000 images, with 50000 training samples and 10000 test samples. This dataset has ten mutually exclusive classes; each sample is a 32 × 32 color image. This dataset is one of the most used for benchmarking image classification models.
4.2 Image Classification For the classification of images, it is needed to train a convolutional neural network (CNN) on the Cigar 10 dataset. It is required to first preprocess the data by normalizing the images and one-hot encoding the labels. Then, build a CNN model with two or three convolutional and fully connected layers. Intentionally, complex regularization in the model is not introduced to enable a fair comparison with the dimensionally reduced data. Finally, we will train the model on the training set and evaluate it on the test set.
4.3 Dimensionality Reduction Principal component analysis (PCA) is performed on the dataset. As the image data is in the shape of a three-dimensional array, it needs to be first reshaped and flattened in a one-dimensional array. Then, it is analyzed for the number of components it should be reduced to. After experiments, it was deduced that 99% of the variability of the dataset could be explained by the first 658 principal components. So, the first 658 principal components were used in further experiments with reduced dimensions.
4.4 Evaluation of Privacy Metrics The evaluation of privacy metrics involves a quantifiable and consistent measure across models and datasets. For membership inference attacks, the performance of the attack model in predicting the membership of a sample in the dataset is an implied metric for privacy. The higher the performance metric of the attack model, the lower
Impact of Dimensionality Reduction on Membership Privacy Table 1 Membership probability analysis results Model Data slice 3 Layer CNN
3 Layer CNN with PCA
2 Layer CNN
2 Layer CNN with PCA
Entire dataset Correctly classified = True Correctly classified = False Entire dataset Correctly classified = True Correctly classified = False Entire dataset Correctly classified = True Correctly classified = False Entire dataset Correctly classified = True Correctly classified = False
717
AUC
Attacker advantage
0.58 0.52 0.62 0.54 0.51 0.57 0.57 0.52 0.61 0.64 0.55 0.73
0.12 0.04 0.18 0.06 0.02 0.10 0.11 0.03 0.17 0.21 0.08 0.35
the privacy of the trained model. The AUC, membership advantage, and privacy risk score of the attack model are used to measure the efficacy of attack models in predicting the membership of a sample. To measure the vulnerability of a model during the training process, membership inference attacks are performed after certain intervals of epochs. Different attack types are used to infer any common trend. The AUC and Attacker Advantage metric of the attack model for the entire dataset is evaluated for each epoch, and the results are plotted on a graph for analysis.
5 Result and Discussion To implement the CNN models, the Keras machine learning framework was used on the Google Colab TPU environment. Max pooling 2D layer of Keras was used to perform the downsampling operation along the spatial dimensions. For dimensionality reduction, principal component analysis is done using the PCA module of the Sklearn framework. The library functions of Tensorflow privacy [16] were used to evaluate the privacy metrics, and the graphs are plotted by the Matplotlib library. The experimental result showed that the first 658 components of PCA were able to explain the 99% of the variations in the Cifar-10 dataset. The data was transformed into these components. For the evaluation of PCA transformation on the performance of the model, the data had to be in the same format to be consumed by the model. Therefore, the data was inverse-transformed after dimension reduction and reshaped in 32 × 32 × 3 format. To further explore the effect of the complexity of the model on this research, two variants of CNN models were developed, with two hidden layers and with three hidden layers. So, in total, four CNN image classification models were trained: 1. 2. 3. 4.
Three layer CNN model with original data Three layer CNN model with dimensionally reduced data Two layer CNN model with original data Two layer CNN model with dimensionally reduced data
718
A. K. Lal and S. Karthikeyan
All the models were trained for 30 epochs. Table 1 lists the membership probability analysis of the membership inference attack. The Attacker Advantage for the data slice, “Correctly Classified = False”, is significantly high in comparison to other data slices. This means that these instances of data are more vulnerable to privacy attacks.
5.1 Vulnerability During Training One of the main reasons attributed to the vulnerability of deep learning models is the overfitting of models to the training data. But, Yeom et al. [2] point out that may not always be the case. It is, therefore, important to monitor the privacy levels of a model during the training process. Also, it needs to be examined whether the PCA technique applied to the data provides any divergent results with respect to the privacy vulnerability. The model needs to be evaluated after certain intervals of epochs during training. Keras callbacks were applied during the training phase to record the privacy metric scores. As the models were trained for 30 epochs each, callbacks were evaluated after every two epochs. Membership inference attacks were carried out in each callback to calculate the AUC and Attacker Advantage of the attack model. Apart from the threshold attack, a logistic regression attack was performed by the shadow models to check for the trends in the privacy metrics. The results of this experiment are represented in Fig. 1. There are three main observations from these results: • The PCA transformation of the data decreases the privacy vulnerability. This is observed in both the privacy metrics, the AUC and Attacker Advantage. • The effect of PCA transformation on privacy is more evident in higher epochs of model training. • The two attack types, Threshold attack, and Logistic regression attack, show similar trends.
Fig. 1 Vulnerability per epoch
Impact of Dimensionality Reduction on Membership Privacy
719
Logistic regression attack and threshold attack result were giving similar indication, so for simplicity, we chose only the threshold attack for further experiments. The results indicate that PCA transformation improves privacy, and this becomes more significant when the model tries to fit the training samples. As the number of epochs of training increases, the model also memorizes the private information of the samples, and therefore the vulnerability of the model also increases.
5.2 Privacy and Utility Researchers have suggested many noise-perturbation techniques to implement privacypreserving machine learning. Most of these implementations report a trade-off in the privacy and utility of models. Therefore, the noise-perturbation technique improves a model’s privacy by making it difficult for the attacker to infer private or sensitive information. However, in the process, it reduces the utility of the models. Dimensionality reduction techniques have been reported to improve the utility of models by removing the noise in data. If the dimensionality reduction improves privacy too, is it possible to have the best of both worlds? The answer to this question has to be investigated. The model’s complexity, fitness, and the chosen privacy metrics are expected to play an essential factor. As many variables exist in the experiment, fixing a few components and trying with the rest is required. To study the role of complexity on privacy, we performed membership inference threshold attacks on a three-layer CNN and two-layer CNN. The AUC and Attacker Advantage metrics were evaluated for various validation accuracies. The validation accuracies were calculated using the test data during the training phase of model training. The result is plotted and shown in Fig. 2. It is observed that for lower validation accuracies, the simpler model is less vulnerable to privacy attacks than the complex model. But, for higher accuracies, the vulnerability of simpler models increases. The elbow point in the privacy-utility graph for two-
Fig. 2 Privacy versus utility for comparing the effect of the complexity of models on the vulnerability
720
A. K. Lal and S. Karthikeyan
layer CNN is at a lower level of accuracy than three-layer CNN. Therefore, it can be inferred that, in this case, the simpler model is more vulnerable to privacy attacks than a complex model at the same level of accuracy. The results in Figs. 2, 3, 4 and 5 suggest a common trend that vulnerability of the model increases as the accuracy of the model increases.
Fig. 3 Privacy versus utility for 3 layer CNN—on the effect of PCA transformation
Fig. 4 Privacy versus utility for 2 layer CNN—on the effect of PCA transformation
Fig. 5 Privacy versus utility for all four models
Impact of Dimensionality Reduction on Membership Privacy
721
6 Conclusion and Future Work Dimensionality reduction has widely been used and accepted as a method to preprocess high-dimension datasets for training machine learning models. Most of the research has focused on the impact of dimensionality reduction on the accuracy and speed of models. Moreover, few research studies have studied dimensionality reduction to improve the interpretability of complex models. This research article explores a relatively understudied aspect of dimensionality reduction for privacy enhancement. This work provides an empirical analysis of the privacy vulnerability of deep learning models when we perform principal component analysis on the dataset. Membership inference attacks were carried out on the CNN models to assess the vulnerability of the trained models. Three privacy metrics were used to measure the vulnerability; AUC, Attacker Advantage, and privacy risk score/membership probability score. The results corroborate earlier research results of a trade-off in privacy and utility and conclude that PCA may improve the training process as a preprocessing step; however, it is not enhancing privacy in CNN models. Future work may include exploring other dimensionality reduction techniques like autoencoders, LDA, or feature subset selection. Other techniques may provide a better comparison between such techniques. Researchers with access to higher computational power may extend this research for higher epochs of model training.
References 1. Liu K, Kargupta H, Ryan J (2005) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans knowl Data Eng 18(1):92–106 2. Yeom S, Giacomelli I, Fredrikson M, Jha S (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st computer security foundations symposium (CSF), IEEE, pp 268–282 3. Song L, Mittal P (2021) Systematic evaluation of privacy risks of machine learning models. In: 30th USENIX security symposium (USENIX security 21), pp 2615–2632 4. Köppen M (2000) The curse of dimensionality. In: 5th online world conference on soft computing in industrial applications (WSC5), vol 1, pp 4–8 5. Wang J, Luo Y, Jiang S, Le J (2009) A survey on anonymity-based privacy preserving. In: 2009 International conference on e-business and information system security, IEEE, pp 1–4 6. Narayanan A, Shmatikov V (May 2008) Robust de-anonymization of large sparse datasets. In: 2008 IEEE Symposium on security and privacy. IEEE, Oakland, CA, USA, 111–125. http:// ieeexplore.ieee.org/document/4531148/ 7. Rigaki M, Garcia S (Apr 2021) A survey of privacy attacks in machine learning (arXiv:2007.07646). http://arxiv.org/abs/2007.07646, arXiv:2007.07646 [cs] 8. Shafee A, Awaad TA (2021) Privacy attacks against deep learning models and their countermeasures. J Syst Architect 114:101940 9. Yuan X, Zhang L (Aug 2022) Membership inference attacks and defenses in neural network pruning. In: 31st USENIX security symposium (USENIX security 22), USENIX Association, Boston, MA. https://www.usenix.org/conference/usenixsecurity22/presentation/yuanxiaoyong
722
A. K. Lal and S. Karthikeyan
10. Shokri R, Stronati M, Song C, Shmatikov V (May 2017) Membership inference attacks against machine learning models. In: 2017 IEEE symposium on security and privacy (SP), pp 3–18 11. Hu H, Salcic Z, Sun L, Dobbie G, Yu PS, Zhang X (2021) Membership inference attacks on machine learning: a survey. ACM Comput Surv (CSUR) 12. Nasr M, Shokri R, Houmansadr A (May 2019) Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In: 2019 IEEE symposium on security and privacy (SP), pp 739–753 13. Tai BC, Li SC, Huang Y, Suri N, Wang PC (2018) Exploring the relationship between dimensionality reduction and private data release. In: 2018 IEEE 23rd Pacific rim international symposium on dependable computing (PRDC), IEEE, pp 25–33 14. Jiang X, Ji Z, Wang S, Mohammed N, Cheng S, Ohno-Machado L (2013) Differential-private data publishing through component analysis. Trans Data Priv 19 15. https://www.cs.toronto.edu/~kriz/cifar.html 16. https://www.tensorflow.org/responsible_ai/privacy/guide
Computational Modelling of Complex Systems for Democratizing Higher Education: A Tutorial on SAR Simulation P. Jai Govind and Naveen Kumar
Abstract Engineering systems like Synthetic Aperture Radar (SAR) are complex systems and require multi-domain knowledge to understand. Teaching and learning SAR processing is intensive in terms of time and resources. It also requires software tools and computational power for preprocessing and image analysis. Extensive literature exists on computational models of SAR in MATLAB and other commercial platforms. Availability of computational models in open-source reproducible platforms like Python kernel in Jupyter notebooks running on Google Colaboratory democratizes such difficult topics and facilitates student learning. The model, discussed here, generates SAR data for a point scatterer using SAR geometry, antenna pattern, and range equation and processes the data in range and azimuth with an aim to generate SAR image. The model demonstrates the generation of synthetic aperture and the echo signal qualities as also how the pulse-to-pulse fluctuating range of a target requires resampling to align the energy with a regular grid. The model allows for changing parameters to alter for resolution, squint, geometry, radar elements such as antenna dimensions, and other factors. A successful learning outcome would be to understand where parameters need to be changed, to affect the model in a specific way. Factors affecting Range Doppler processing are demonstrated. Use of the discussed model nullifies use of commercial software and democratizes SAR topic in higher education. Keywords SAR · Reproducible research · Python · Jupyter · Colaboratory
1 Introduction World attention on United Nations Sustainable Development Goals (UNSDG) and climate change has brought satellite-based Geodesy and image analysis to the forefront. Satellite-based Earth observation (EO) requires complex multi-domain engineering knowledge namely system design, launch systems, placement and orbit in P. J. Govind (B) · N. Kumar CHRIST (Deemed to be University), Kengeri, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_52
723
724
P. J. Govind and N. Kumar
space, communication, relevant data collection, etc. [1, 2] Usually this requires advanced engineering education. SAR parameters collected can be intensity, polarization, and phase. Based on these parameters, techniques like change detection, classification, and InSAR can be implemented. These techniques are being used in various applications ranging from construction industry to hazard prevention. Computational models in open-source reproducible platforms like Jupyter notebooks are a good source of learning and teaching to democratize such difficult topics and facilitate learning to even non-engineering students [3]. Reports have also mentioned that only 46.21% of Indian graduates are employable [4]. Toward that end, use of computational models in open-source reproducible research platforms will facilitate students to study complex topics by doing and doing it repeatedly. In colleges, it is prudent to follow British/US universities like MIT and use python language to teach computational topics like data structures, programming, statistics, and web applications. Instructors need to use from the large number of freely available Jupyter notebooks. Such tools for reproducible research also assist in better data analysis [5]. In this paper, the advantage of open-source reproducible research platforms for teaching complex Engineering subjects is discussed. A computational model for SAR using open-source resources is implemented. Computational models in such platforms facilitate teaching and learning of complex systems like SAR. Use of simulation in open-source platforms to understand SAR principle, radar range equation, generating SAR signal, understanding antenna patterns, reception of returned signal and signal processing is discussed.
2 Reproducible Research and Learning Platforms Most of the research papers generated on a complex topic like SAR are from the USA and China. Papers emanating from countries like India are less in number. One way to solve this problem is to emphasize the use of open-source tools to democratize technical education. Topics requiring larger computational resources can be taught using Jupyter notebooks in Google Colaboratory or Amazon Web Services. To use reproducible research in Jupyter notebooks, Colaboratory, etc., students need to be first taught how to handle various data formats and how to organize them in folders. Followed by, how to produce a notebook with markdown cells, code cells and how to navigate through a notebook. Thereafter, they can graduate to computationally understanding complex engineering topics [3]. Finally, they can explain methods of sharing and publishing their research work. The advantage of a notebook is that a single computational artefact holds theory, data simulation and result. One can include graphs and videos and change various parameters to study the effects. In addition, the theoretical concepts can be written in the same notebook with markdown cells. Notebooks which also include student exercise can be shared between teachers and students by tools like GitHub.
Computational Modelling of Complex Systems for Democratizing …
725
Understanding complex topics like SAR principles requires understanding properties of Electromagnetic waves. These can be explained in a shorter time frame with notebooks in Jupyter. Similarly, signal processing is to be covered. Processing of signals like SAR signals requires computational resources and is difficult in normal laptops. For such topics, cloud-based processing platforms like Google Colaboratory or Amazon Web Services can be used. As an example, background knowledge for SAR includes signal processing. Students can be explained how a complex signal differs in time and frequency domains. A multi-frequency complex signal can be generated in a Jupyter notebook and then applied to it. Figure 1 shows generation of one such three frequency complex signal using various python libraries like NumPy and matplotlib. Explaining Fourier transformation to analyse the above wave characteristics with theory is easy, by generating FFT of the above signal using simple python libraries like NumPy. Students can easily appreciate that with FFT, one can decompose any signal and measure its parameters like amplitude, frequency and phase. Figure 2 shows the decomposition of the three frequency complex signal into its constituent frequencies. Similarly, SAR analysis also requires knowledge of linear algebra and decomposition of matrices. Students find concepts of eigenvalues and eigenvectors difficult to comprehend. By using python libraries like NumPy. Linalg or SciPy, a matrix can be imported as an array, and its eigenvalues and eigenvectors are calculated. The concept can also be explained graphically, as to how some vectors are transformed in terms of rotation and scaling by a matrix. But with special vectors like eigenvectors, only stretching happens and not rotation. Figure 3 shows an original signal in red, rotated and scaled as a blue vector. Figure 4 shows the vector only scaled.
Fig. 1 Complex signal generated in a Jupyter notebook
726
P. J. Govind and N. Kumar
Fig. 2 FFT of a complex signal generated in a Jupyter notebook
Fig. 3 Rotation and stretching of a vector by a matrix in a Jupyter notebook
In addition, other resources like GitHub repositories, Binder and Docker make online teaching and learning possible. Complete notebooks can be shared to explore and practise at your own pace. All this also results in a paradigm shift in research as reproducibility is seamlessly achieved.
Computational Modelling of Complex Systems for Democratizing …
727
Fig. 4 Scaling of an eigenvector in a Jupyter notebook
3 SAR For solving societal challenges and problems, various EO techniques using Panchromatic, Multispectral, Hyperspectral and SAR images are used [6]. Amongst these, students find EO using SAR most challenging. SAR is emerging as an important source of information for measuring UNSDG parameters remotely and in GEOINT and Geodesy [7]. The advantage of SAR lies in its active nature and independence from the amount of light, cloud cover, smoke and camouflage [8]. Techniques of applications in SAR like time series analysis [9], PolSAR [10, 11] and InSAR [12, 13] using machine learning and deep learning are vastly used in Geoscience for ecological applications to avoid geohazard risks [14]. Recent advances InSAR [15] has facilitated many applications using CNN architecture [16], for classification [16]. Also, remote sensing of dams [17], Coherent Change Detection, monitoring glaciers [18], oil spills [19], volcanoes [20], pipeline surveillance [14] and measuring forest biomass [21] are emerging applications. Research on SAR can also benefit from research on Hyperspectral Imagery. Research on change detection [22] and object recognition [23–26] exist [27]. These topics are also relevant for the defence and security domain. But existing research in these fields is not carried out with open-source reproducible tools. The journey of SAR learning commences with the SAR geometry and processing SAR image data set. SAR image processing is basically solving the inverse problem [28]. Many algorithms have been developed for classification and object recognition in remote sensing. Learning SAR also includes understanding techniques like PolSAR and InSAR. Later, students graduate to machine learning and deep learning techniques [29]. In deep learning with SAR, use of CNN architecture is a well-studied problem [23].
728
P. J. Govind and N. Kumar
However, SAR technology is usually proprietary with software on standalone powerful machines making learning and teaching very difficult and limited to a few. Thus, teaching such complex topics like SAR requires many hours in the classroom and maximum use can be made of freely available Jupyter notebooks on GitHub [30] to teach these topics in shorter time frames.
3.1 SAR Simulation Extensive literature exists on SAR simulation. The book on Synthetic Aperture Radar with MATLAB Algorithms [31] establishes the constraints for acquiring the SAR data and provides digital signal and image processing algorithms for implementation of the SAR wavefront reconstruction. In their paper on SAR simulation, Kim and Ka [32] describe the generation of SAR images for a realistic target model using the general purpose EM simulator FEKO and demonstrate its validity by processing the simulated SAR raw data with the Range Doppler algorithm. A GPU-based computation is presented by Chaing et al. [33] for modelling the complicated target’s Synthetic Aperture Radar (SAR) image. They produced the SAR echo signal for both Stripmap and Spotlight modes while taking into account the multiple scattering field and antenna pattern tracking for a more realistic result. The backscattering field computation is the most computationally demanding of the signal chains. They implemented a parallelization of computing for the creation of SAR echo signals to fix the problem. But, most of these algorithms are not available to the public due to the reasons of defence and security or licensed software issues. Towards that end, models developed in open-source algorithms can benefit SAR learning.
3.2 SAR Geometry Discussion of EO-based SAR commences with SAR Geometry [34]. For simulation, typical values of parameters of a SAR satellite’s antenna length, antenna width, wavelength, boresight angles, velocity and altitude, radar bandwidth, PRF, Pulse width, transmitted peak power, noise temperature and dimensions of target are used. Associated terms like along and across track, fast versus slow time look angle, etc., can also be explained pictorially in Jupyter notebooks as given below [8]. Figure 5 shows a SAR satellite moving from left to right. The direction of flight is called Along track. Perpendicular direction is the Across track. Various angles and important associated terms are as shown.
Computational Modelling of Complex Systems for Democratizing …
729
Fig. 5 SAR geometry generated in Jupyter notebook
3.3 Radar Range Equation This can be generated in the Jupyter notebook to explain how received signal is dependent on various parameters and the signal to noise ratio can be calculated. The radar range equation is as follows [35]. PR = PT · G T ·
1 1 ·σ · · GR · λ · λ 2 4π R 4πρ R 2
where the received power PR is a function of transmitted pulse PT, antenna transmit Gain GT , Range to Radar R, Radar cross section, gain of receiver antenna GR and wavelength λ [30]. Students can vary the various parameters to understand how the received power depends on various parameters. The relation between maximum range and minimum signal strength can be simulated [35].
730
P. J. Govind and N. Kumar
Fig. 6 Antenna radiation pattern generated in Jupyter notebook
3.4 Antenna and Its Radiation This is generated in the notebook using the Sinc equation. Figure 6 shows the generation of the long track and elevation half-power beamwidths. From this, the extent of the beam and the swath is also calculated and shown to students. Issues of Doppler shift, range resolution and azimuth resolution are also calculated, and using the above-mentioned parameters, a signal is generated and received from the target [30]. Students can be explained the concept of half-power points and how power drops away from the boresight. The extent of the beamwidth in the two dimensions can be calculated. Also concepts of Along track range and swath can be explained.
3.5 Focusing Focusing is achieved using the Range Doppler Algorithm. The received signal is invariably below the signal-to-noise ratio and cannot be displayed, thus requiring focusing in range, as shown in Fig. 7. For range compression, the model does an FFT on both the range reference data and the raw data to convert them from time domain to frequency domain. By complex multiplication of these, the model achieves range compression data in the frequency domain. Subsequently, Inverse’s FFT is carried out on the range of compressed data in the frequency domain to convert it into time domain. Also, before azimuth focusing, the model corrects to account for the fact that instantaneous slant range changes with azimuth time. Then, complex multiplication of this range migration data in frequency domain and azimuth reference function in frequency domain is carried out. Again, inverse FFT is carried out to get azimuth compressed data in the time domain. Final focused image is achieved. The same is shown in Fig. 8. The final range and azimuth resolution achieved can be calculated [30].
Computational Modelling of Complex Systems for Democratizing …
731
Fig. 7 Range compression in Jupyter notebook
Fig. 8 Azimuth compression in Jupyter notebook
4 Result Analysis The modelling clearly demonstrates that Range Doppler processing is faster in frequency domain than time domain by a factor of at least ten. The dependence of the azimuth reference function on range is also demonstrated. In addition, it was also demonstrated that focusing after Range Doppler processing is better achieved by catering for range cell migration. Measurement of range and azimuth resolution were also achieved.
732
P. J. Govind and N. Kumar
5 Conclusion Teaching complex engineering topics like SAR is facilitated by use of freely available open-source Jupyter notebooks. Complex engineering topics like SAR require time and effort. Platforms for reproducible research like Jupyter notebooks vastly assist in learning and teaching. The instructors need to ingeniously combine conventional blackboard teaching with computational models in the open-source domain and insist on students to graduate to this model for further research. In this article, the advantage of computational models in open-source reproducible platforms to teach and learn complex engineering topics has been shown. Simulation to generate radar signal, understand radar range equation, study of antenna radiation pattern and focusing received signal is shown. The model discussed here, generates SAR data for a point scatterer and processes the data. SAR geometry, antenna pattern and range equation are used to generate the data. SAR data is generated by simulating the transmitted pulse, reflected signal from a point scatterer and processing the returned echo in range and azimuth. The model demonstrates the generation of synthetic aperture and the echo signal qualities as also how the pulse-to-pulse fluctuating range of a target requires resampling to align the energy with a regular grid. The model allows for changing parameters to alter for resolution, squint, geometry, radar elements such as antenna dimensions, and other factors. A successful learning outcome would be to understand where parameters need to be changed, to affect the model in a specific way. Use of the discussed model nullifies use of commercial software and democratizes SAR topic in higher education. A simulation of the above type clearly explains the intricacies of SAR to students. From here on, students can easily graduate to the applications of SAR in Agriculture, Geodesy, Disaster management, Security, etc., as per their chosen domain. This will increase the number of students solving societal problems for the betterment of humankind by using SAR data. It is hoped that in coming years many such endeavours will be made to truly make Science democratized.
References 1. Ridolfi G, Mooij E, Corpino S (2009) A system engineering tool for the design of satellite subsystems. https://doi.org/10.2514/6.2009-6037 2. Canty MJ (2012) Image analysis, classification, and change detection in remote sensing with algorithms for Python, 4th edn, vol 53, no 9 3. Beg M et al (2021) Using Jupyter for reproducible scientific workflows. Comput Sci Eng 23(2). https://doi.org/10.1109/MCSE.2021.3052101 4. CII (2019) India skill report 2019. BMC Public Health 5(1) 5. Peng RD, Hicks SC (2020) Reproducible research: a retrospective. Ann Rev Public Health 42. https://doi.org/10.1146/annurev-publhealth-012420-105110 6. Zhao Q, Yu L, Du Z et al (2022) An overview of the application of earth observation satellite data: impacts and future trends. Remote Sens MDPI 7. Secker J, Vachon PW (2007) Exploitation of multi-temporal SAR and EO satellite imagery for geospatial intelligence. https://doi.org/10.1109/ICIF.2007.4408199
Computational Modelling of Complex Systems for Democratizing …
733
8. Meyer F (2019) Spaceborne synthetic aperture radar: principles, data access, and basic processing techniques. In: The synthetic aperture radar (SAR) handbook: comprehensive methodologies for forest monitoring and biomass estimation 9. Wang J et al (2019) Demonstration of time-series InSAR processing in Beijing using a small stack of Gaofen-3 differential interferograms. J Sens 2019. https://doi.org/10.1155/2019/420 4580 10. Dey S, Bhattacharya A, Frery AC, Lopez-Martinez C, Rao YS (2021) A model-free four component scattering power decomposition for polarimetric SAR data. IEEE J Sel Top Appl Earth Observ Remote Sens 14. https://doi.org/10.1109/JSTARS.2021.3069299 11. Nascimento ADC, Frery AC, Cintra RJ (2019) Detecting changes in fully polarimetric SAR imagery with statistical information theory. IEEE Trans Geosci Remote Sens 57(3). https:// doi.org/10.1109/TGRS.2018.2866367 12. Bonì R et al (2017) Exploitation of satellite A-DInSAR time series for detection, characterization and modelling of land subsidence. Geosciences (Switzerland) 7(2). https://doi.org/10. 3390/geosciences7020025 13. Abbate C, di Folco R, Evangelista A (2015) Multi-baseline SAR interferometry using elaboration of amplitude and phase data. Univ J Electr Electron Eng 3(2). https://doi.org/10.13189/ ujeee.2015.030204 14. Bayramov E, Buchroithner M, Kada M (2020) Radar remote sensing to supplement pipeline surveillance programs through measurements of surface deformations and identification of geohazard risks. Remote Sens 12(23). https://doi.org/10.3390/rs12233934 15. Sun H, Shimada M, Xu F (2017) Recent advances in synthetic aperture radar remote sensing— systems, data processing, and applications. IEEE Geosci Remote Sens Lett 14(11). https://doi. org/10.1109/LGRS.2017.2747602 16. Geldmacher J, Yerkes C, Zhao Y (2020) Convolutional neural networks for feature extraction and automated target recognition in synthetic aperture radar images. In: CEUR workshop proceedings, vol 2819 17. Scaioni M, Marsella M, Crosetto M, Tornatore V, Wang J (2018) Geodetic and remote-sensing sensors for dam deformation monitoring. Sensors (Switzerland) 18(11). https://doi.org/10. 3390/s18113682 18. Trouvé E et al (2007) Combining airborne photographs and spaceborne SAR data to monitor temperate glaciers: potentials and limits. IEEE Trans Geosci Remote Sens 45(4). https://doi. org/10.1109/TGRS.2006.890554 19. Shaban M et al (2021) A deep-learning framework for the detection of oil spills from SAR data. Sensors 21(7). https://doi.org/10.3390/s21072351 20. Jung J, Kim DJ, Lavalle M, Yun SH (2016) Coherent change detection using InSAR temporal decorrelation model: a case study for volcanic ash detection. IEEE Trans Geosci Remote Sens 54(10). https://doi.org/10.1109/TGRS.2016.2572166 21. Minh DHT, Toan TL, Rocca F, Tebaldini S, d’Alessandro MM, Villard L (2014) Relating P-band synthetic aperture radar tomography to tropical forest biomass. IEEE Trans Geosci Remote Sens 52(2). https://doi.org/10.1109/TGRS.2013.2246170 22. Gong M, Zhao J, Liu J, Miao Q, Jiao L (2016) Change detection in synthetic aperture radar images based on deep neural networks. IEEE Trans Neural Netw Learn Syst 27(1):125–138. https://doi.org/10.1109/TNNLS.2015.2435783 23. Yu J, Zhou G, Zhou S, Yin J (2021) A lightweight fully convolutional neural network for SAR automatic target recognition. Remote Sens 13(15). https://doi.org/10.3390/rs13153029 24. Bovenga F (2020) Special issue synthetic aperture radar (SAR) techniques and applications. Sensors (Switzerland) 20(7). https://doi.org/10.3390/s20071851 25. Wang L, Bai X, Zhou F (2019) SAR ATR of ground vehicles based on ESENet. Remote Sens 11(11). https://doi.org/10.3390/rs11111316 26. Xinyan F, Weigang Z (2019) Research on SAR image target recognition based on convolutional neural network. J Phys: Conf Ser 1213(4). https://doi.org/10.1088/1742-6596/1213/4/042019 27. Marino A, Hajnsek I (2014) A change detector based on an optimization with polarimetric SAR imagery. IEEE Trans Geosci Remote Sens 52(8). https://doi.org/10.1109/TGRS.2013. 2284510
734
P. J. Govind and N. Kumar
28. Karakus O, Achim A (2021) On solving SAR imaging inverse problems using nonconvex regularization with a Cauchy-based penalty. IEEE Trans Geosci Remote Sens 59(7). https:// doi.org/10.1109/TGRS.2020.3011631 29. Ferreira B, Iten M, Silva RG (2020) Monitoring sustainable development by means of earth observation data and machine learning: a review. Environ Sci Eur 32(1). https://doi.org/10. 1186/s12302-020-00397-4 30. https://github.com/uafgeoteach 31. Soumekh M (2022) Synthetic aperture radar signal processing with MATLAB algorithms. (https://www.mathworks.com/matlabcentral/fileexchange/2188-synthetic-aperture-radar-sig nal-processing-with-matlab-algorithms). MATLAB Central File Exchange. Retrieved 16 Aug 2022 32. Kim S, Ka MH (2016) SAR simulation of realistic target using general purpose em simulators. https://doi.org/10.1109/ICARES.2015.7429828 33. Chiang CY, Chen KS, Yang Y, Zhang Y, Zhang T (2021) SAR image simulation of complex target including multiple scattering. Remote Sens 13(23). https://doi.org/10.3390/rs13234854 34. Moreira A, Prats-Iraola P, Younis M, Krieger G, Hajnsek I, Papathanassiou KP (2013) A tutorial on synthetic aperture radar. IEEE Geosci Remote Sens Mag 1(1). https://doi.org/10. 1109/MGRS.2013.2248301 35. Skolnik MI (2001) Introduction to radar systems. Electrical engineering series. McGraw-Hill
Efficient Segmentation of Tumor with Convolutional Neural Network in Brain MRI Images Archana Ingle , Mani Roja, Manoj Sankhe , and Deepak Patkar
Abstract Brain and other nervous system cancer are the 10th major reason for morbidity. It is highly required to exactly locate the boundaries and tumor area before the treatment such as chemotherapy or brain surgery to resume a normal life. This paper discusses different traditional segmentation techniques and various deep learning approaches incorporating convolutional neural networks (CNN). An automatic tumor segmentation and classification model are implemented for four different classes with U-shaped encoder–decoder architecture. Architecture performance is measured and compared with the best available models using various standard metrics like accuracy, sensitivity, specificity, Dice similarity coefficient (DSC), and mean intersection over union (MIoU) on different datasets with freely available resources. The implemented architecture outperforms various existing algorithms in comparison with accuracy and sensitivity metrics. Keywords Brain tumor segmentation · CNN · Encoder–decoder architecture · Deep learning · U-Net
1 Introduction According to a study, the occurrence of central nervous system tumors is near about 10 individuals per 100,000 population in India [1]. Adequate knowledge and timely treatment of tumors can save many lives from these severe and deadly diseases. A. Ingle (B) · M. Roja TSEC, University of Mumbai, Mumbai, India e-mail: [email protected] M. Roja e-mail: [email protected] M. Sankhe MPSTME, NMIMS Mumbai, Mumbai, India e-mail: [email protected] D. Patkar Nanavati Hospital, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_53
735
736
A. Ingle et al.
Our body cells grow, as age gets dead, and are replaced by new cells within the regular body cycle. If this regular body cycle disrupts, more and more cells grow to create a bulge of mass, known as a tumor. This abrupt development of cells in the brain causes brain tumors. Cancerous brain tumors are known as malignant and noncancerous brain tumors are known as benign. A very rigid skull encloses the brain so any development inside such a compact area can cause an increase in pressure within the brain. This can lead to brain impairment, and it is highly risky. The function of the nervous system will be affected depending on the tumor location and its growth rate [2]. In the diagnosis of tumors, imaging tumors with better precision is very important. High-resolution techniques like magnetic resonance imaging (MRI), positron emission tomography (PET), computed tomography (CT), etc., are used for imaging. High-resolution and quality images of malignant tissue of human beings prove MRI to be a better choice over other imaging technologies. Image processing in the medical field recognizes accurate and relevant data from images acquired from different imaging techniques with the least possible error [3, 4]. But still, the identification of tumors is hard due to their complicated design. Contributions to this work Traditional tumor segmentation methods are discussed and compared with their advantages and disadvantages. Exhaustive study and comparison of different convolutional neural networks for tumor segmentation and classification in brain MRI images. With a different degree of aggressiveness, tumor regions are classified into edema, necrotic core, enhancing tumor core, and the whole tumor with all tumor tissues. Implemented U-shaped encoder–decoder architecture for tumor segmentation and classification outperforms other latest deep learning methods on the basis of accuracy and sensitivity performance measures.
2 Literature Review 2.1 Traditional Segmentation Methods Many traditional brain tumor segmentation and classification techniques are available and can be broadly classified into the following: Edge-Based Segmentation Thakur et al. [5] discussed various edge-preserving filtering approaches for improving poor-quality images. Aslama et al. [6] implemented a procedure in C. Sobel, and image-dependent thresholding is combined to segment regions using a closed contour algorithm. Within the closed contours, tumors are segmented using intensity information. The performance of this improved edge detection algorithm is measured and proves to give better performance over conventional segmentation methods using simulation results.
Efficient Segmentation of Tumor with Convolutional Neural Network …
737
Thresholding-Based Segmentation Ilhan and Ilhan [7] obtained enough clear images with more details about the tumor obtained by segmentation, which helps in diagnosis by a medical practitioner. The algorithm includes morphological operations, pixel subtraction, and segmentation based on thresholding with image filtering techniques to achieve an overall success rate of 96%. Region-Based Segmentation Chaudhari et al. [8] conducted an experiment on open BraTS FLAIR dataset for tumor core segmentation from images. Detailed information on image features is required in the seed selection procedure, and it is a very well accepted algorithm used in image segmentation. But automatic seed generation is the drawback of such a seeded region growing algorithm. The author implemented a novel method for the automatic seed detection process by using fuzzy C means in region growing. Clustering-Based Segmentation Alam et al. [9] proposed the model consisting of a combination of the K means algorithm and an enhanced fuzzy C means algorithm for tumor segmentation. Various image features like contrast, homogeneity, difference, correlation, and entropy are considered in an improved model for detecting tumor position. This algorithm proves better in detecting brain tumors in a few seconds as compared to another algorithm which requires minutes for detection. Traditional segmentation techniques are listed in Table 1 and assessed on the basis of different performance measures with their advantages and disadvantages.
2.2 Convolutional Neural Network-Based Methods Manual brain tumor segmentation is a tough and time-intensive work from a large amount of produced MRI images. Another option for segmentation is neural network. Convolutional neural networks require much fewer parameters to train as compared to a fully connected network. In maximum pictures, the neighboring pixels are usually associated. A fully connected neural network does not consider this relation between nearby pixels. So a traditional neural network is not feasible for image pre-processing as it does not gage properly for images and neglects the data taken from pixel location and correlation of pixels with adjacent pixels. Also, it cannot control transformations. Thus, a way is needed to handle the structural interrelationship of image features. CNN considers this structural interrelationship within an image. To avoid the above drawbacks, CNN can be used for tumor segmentation. It uses fewer pre-processing techniques as compared to other algorithms. Different CNN architectures can be used. Authors in their article give a perception of the elementary concepts of CNN and its implementation in different radiological tasks. The author also deliberates the difficulties of a small dataset, overfitting, and further guidelines in the field of radioscopy [10, 11]. Akkus et al. [12] discussed numerous kinds of deep learning methods. The latest self-learning method is used to extract features as compared to the traditional handcrafted removal of features. For image segmentation and classification, different CNN
738
A. Ingle et al.
Table 1 Comparison of traditional segmentation techniques Author/dataset Method
Accuracy Sensitivity Specificity Advantages Disadvantages
Aslama et al. [6]
Improved Sobel
–
Ilhan and Ilhan [7] (100 images from TCIA)
Chaudhari et al. [8] BraTS 2012
–
–
Proved to be better for edge detection in comparison with the Sobel edge detection method
Verification on the standard dataset and comparison of the performance metric is not done
Thresholding 96%
94.28%
100%
–
The inaccurate classification was noticed for some images Not an automatic procedure
Seed point selection using fuzzy clustering
70%
98%
Automatic seed detection for region growing
–
97.43%
100%
Execution time between 40 and 50 s
Better detection of abnormal and normal tissues
97%
Alam et al. [9] K means and 97.5% improved fuzzy C means clustering
architectures are discussed. In patch-wise CNN architecture, an N × N patch on all sides of each pixel is removed. Model is trained along these patch areas with specified class labels. Two class labels are defined as normal brain and tumor. In semantic CNN architecture, image classification at the pixel level is connecting every pixel in an image to a specific class label. It is similar to autoencoders. Encoders extract features, and decoder deconvolves the higher-level features to classify pixels. It minimizes loss function. Havaei et al. [13] use a cascade design structure for the automatic brain tumor segmentation. The first CNN output is provided as input to the next CNN. So, from the first CNN, a supplementary source of information is provided to subsequent CNN. Results are testified on the 2013 BraTS available dataset for the test. Results proved that this kind of structure performs better than the presently published work. Even the cascade structure is 30 times faster. The time required for tumor segmentation is in between 25 s and 3 min using different CNN architectures. To remove flat holes, a
Efficient Segmentation of Tumor with Convolutional Neural Network …
739
post-processing algorithm is implemented. They investigated three cascaded architectures like input concatenation, local pathway concatenation, and local pathway concatenation. Pereira et al. [14] built a deep CNN with small 3 × 3 filters. The author proved that intensity normalization is required to tackle heterogeneity, which is produced by capturing MRI images from different MRI machines which helps in better segmentation results. The training dataset is amplified by rotating the patches and also by sampling from classes of high-grade glioma that were underrepresented in low-grade gliomas. Data augmentation is also very important in deep learning though needs to be explored. They observed that more feature maps with shallow architecture are giving a poor performance. The activation function of leaky rectifier linear unit (LReLU) was more significant than rectifier linear units (ReLU) in successfully training CNN with the better DSC. Lakshmi [15] classifies malignant and benign tumors. For this classification, pointing kernel classifier (PKC) is proposed. If the classified image is abnormal, again, the given test image is classified as benign and malignant. Due to dual time classification, the classification performance will be improved. Various parameters are calculated and compared for proposed PKC and existing classifiers. The proposed algorithm gives better accuracy results by introducing priority particle cuckoo search optimization. Dong et al. [16] developed U-Net-based fully automatic deep CNN using data augmentation for increasing dataset and five-fold cross-validation. Soft Dice metric is used as an error function with an ADAM optimizer. Performance is assessed on the BraTS 2015 dataset. Hasan and Linte [17] use image resizing with the nearest neighbor operation, and elastic transformation is used as data augmentation. Best suited kernels and stride selected for convolution layers to implement modified U-Net architecture. The architecture is trained on the BraTS 2017 dataset with both high-graded and low-graded glioma tumors. Anaraki et al. [18] discovered a suitable architecture of CNN by selecting proper parameters for MRI image classification using a genetic algorithm by reducing the computational cost. Bagging is used to decrease the classification error variance on the best-discovered model by the genetic algorithm. Sultan et al. [19] classify brain tumors for two different datasets on the basis of type and grades, respectively, using a 16-layer CNN architecture. A drop layer is used to avoid overfitting. Finally, output from a drop-out layer is given to the softmax function through a fully connected layer to predict class. Stochastic gradient descent with momentum was found to be the best optimizer for the proposed architecture. Jijja and Rai [20] implemented an automatic brain tumor detection algorithm. The database consists of grayscale images. Median filtering is used as pre-processing to remove noise and distortion present in input grayscale images, and then, tumor images are segmented using CNN. For optimal clustering of the tumor images, a water cycle optimization algorithm (WCA) is applied.
740
A. Ingle et al.
Shelke and Mohod [21] implemented simple CNN for semi-automatic segmentation using MATLAB graphical user interface. For better results, different preprocessing techniques like denoising, skull stripping, and normalization are implemented. For feature extraction from images, gray level co-occurrence matrix (GLCM) is utilized. Classification is done under the CNN classifier using MATLAB for test images. Afshar et al. [22] designed a modified CapsNet architecture to avoid the need for exact annotated images and increase the CapsNet focus by giving extra input in its pipeline of tumor boundaries for brain tumor classification. Febrianto et al. [11] implemented two convolutional networks with one and two convolutional layers, respectively, to prove classification quality improves with the number of convolution layers with an increase in training time. Also, data augmentation improves accuracy results. Mehrotra et al. [23] evaluated five different deep learning architectures with hyperparameter optimization for brain tumor classification into two classes benign or malignant using the transfer learning approach. Training results of these number of architectures using different optimizers are evaluated and compared. Pre-trained AlexNet using SGDM optimizer performs best for accuracy with minimum training time. Various CNN architectures with the used dataset for brain tumor segmentation and classification are listed in Table 2. Table 2 Various CNN algorithms for brain tumor segmentation Author
Dataset
Algorithm/architecture
Havaei et al. [13]
BraTS 2013
INPUTCASCADECNN LOCALPATHCNN MFCASCADECNN
Pereira et al. [14]
BraTs 2013
Fully CNN
Lakshmi [15]
TCIA
M-MOTF
Dong et al. [16]
BraTS 2015
U-Net-based deep CNN
Hasan and Linte [17]
BraTS 2017
NNRET U-Net
Anaraki et al. [18]
IXI, CIA, REMBRANDT, and TCGA-GBM
GA-CNN
Sultan et al. [19]
Nanfang Hospital dataset
CNN
TCIA Jijja and Rai [20]
Data obtained from Internet [400 images]
CNN with WCA
Shelke and Mohod [21]
–
GLCM feature extraction for CNN
Afshar et al. [22]
Nanfang Hospital dataset
CapsNet
Febrianto et al. [11]
Obtained from Kaggle.com
CNN
Mehrotra et al. [23]
TCIA
PT-CNN (AlexNet)
Efficient Segmentation of Tumor with Convolutional Neural Network …
741
3 Convolutional Neural Network Training Process The convolutional neural network is composed of several convolutions, pooling, and fully connected layers. High-level feature removal is carried by the convolution layer which consists of convolution operation and activation function, mainly rectified linear units (ReLU). For extracting dominant features, downsampling operation pooling is used. Finally, softmax activation is used for the conversion of output values to the probability of different classes. The training process illustrated in Fig. 1 is to find kernels in convolution layers to decrease the ground truth and prediction difference using the training dataset. Prediction from the final convolution layer and ground truth is applied to the loss function like cross-entropy loss as shown in Eq. (1). Cross Entropy Loss = −t log( p) − (1 − t) log(1 − p)
(1)
where p represents predicted probabilities and t represents ground truth. If the model predictions are nearer to the ground truth, the loss will be minimum, and if the predictions are away from the ground truth, the loss value will be the maximum. Optimizer finds out a weight vector that diminishes error by initializing with an approximately random initial value and progressively modifies it in small steps. This procedure is repeated till it reaches the global minimum error through Eqs. (2–4). From the loss function, the optimizer finds weights to further reduce error. Error is calculated from the loss function. A slope is needed to find the path to transfer the weights to get a lesser error on subsequent iterations.
Fig. 1 CNN training process [12]
742
A. Ingle et al.
Error = f (Weights)
(2)
Gradient = derivative(Error)
(3)
Knowing the downhill direction from the derivative, weights need to update. Learning rate determines how much the weights can alter on every update. Weight = Weight − (learning rate ∗ Gradient)
(4)
The process is repeated till the error becomes zero or close to zero. Momentum is another argument in optimizers which could be tweaked to obtain faster convergence. The momentum term rises for dimensions whose gradients point in similar directions and decreases updates for dimensions whose gradients alter directions. Consequently, we gain quicker convergence and reduced oscillation. To introduce momentum a temporal element, it need to add as shown in Eqs. (5) and (6). Vt = γ Vt−1 + Learning Rate ∗ Gradient
(5)
Weights = Weights − (Vt )
(6)
where γ: is momentum, Vt : is current time step, and Vt−1 : is past time step.
4 Methodology 4.1 Dataset BraTS 2020 multimodal MRI dataset consisting of T 1-weighted, T 1c, T 2-weighted, and T 2 flair modalities samples as shown in Fig. 2 is used for testing implemented architecture. Acquired data is from various institutions. Dataset consists of 293 highgrade glioma (HGG) and 76 low-grade glioma (LGG) tumor patients images [24]. Flair
T1
T1c
T2
Fig. 2 Visualization of image samples of the dataset in different modalities
Mask
Efficient Segmentation of Tumor with Convolutional Neural Network …
743
4.2 Implementation All U-shaped encoder–decoder architecture layers are written in TensorFlow rather than using any library functions. Python programming is used to build models on Kaggle. TensorFlow and Keras are used for model building. For data preparation, ScikitLearn, Numpy, and Pandas are used, while data visualization is done using Numpy and Matplotlib. Data in 3D format is loaded using Nibabel and Nilearn neural imaging library. Available data is divided in train–test and validation set in 68:12:20 ratio as illustrated in Fig. 3. The original image of size 240 × 240 × 155 is resized to 124 × 124 × 100. Up to 100 slices of each patient are utilized by skipping a few first and last slices. The learning algorithm is repeated for 35 epochs with categorical cross-entropy used for loss function with ADAM optimizer using reduce LR on plateau adaptive learning rate. A 23-layered semantic segmentation architecture is implemented that should work with a smaller number of training images and give better segmentation outcomes [25]. It consists of repeated two convolutional layers, each continued with ReLU activation. Max-pooling with stride 2 is used for downsampling in the contracting path. Total feature channels are increased twice at every downsampling. To convert the image back to its original size, an expansive path consists of upsampling, followed by 2 × 2 convolution to reduce the total feature maps to half. These feature channels are concatenated with respective feature channels from the encoder path, followed by two convolutions each followed by ReLU activation. Finally, 1 × 1 convolution is used with softmax activation to align 32 feature maps to four classes.
Fig. 3 Data distribution for training, testing, and validation
744
A. Ingle et al.
5 Results and Discussion The modeled architecture is evaluated using standard metrics like accuracy, sensitivity, specificity, Dice similarity coefficient (DSC), and intersection over union (IoU) [23]. Where tp signifies true positive, tn signifies true negative, fn signifies false negative, and fp signifies false positive. Sensitivity is the true positive rate to determine tumor, while specificity is the true negative rate for the determination of healthy tissues and calculated using Eqs. (7) and (8), respectively. Sensitivity =
tp tp + fn
(7)
Specificity =
tn tn + fp
(8)
The accuracy of the model is calculated for healthy and tumor areas using Eq. (9). Accuracy =
tp + tn tp + tn + fp + fn
(9)
Due to the class imbalance problem, other suitable metrics used are DSC and IoU and are calculated using Eqs. (10) and (11), respectively. DSC =
2tp 2tp + fp + fn
(10)
IoU =
tp tp + fp + fn
(11)
During the training and validation phase, metrics like accuracy, DSC, and mean IoU are increased with an increasing number of epochs, while training and validation loss decrease with an increasing number of epochs which is represented graphically in Fig. 4. After the training and validation phase, model is tested on the testing dataset for performance evaluation. Table 3 lists performance metrics for implemented encoder– decoder architecture like loss, accuracy, mean IoU, DSC, sensitivity, specificity, and DSC for necrotic core, edema, and enhancing tumor core for 35 epochs. Achieved performance metrics can be analyzed in Fig. 5. Table 4 compares various algorithms for tumor segmentation in brain MRI images for different performance measures. Implemented encoder–decoder architecture performs best for accuracy and sensitivity metrics at 99.51% and 99.39%, respectively, with a minimum loss of 0.0142. Visualize the available dataset with its ground truth and respective segmentation in different tumor regions using different colors in Fig. 6.
Efficient Segmentation of Tumor with Convolutional Neural Network …
745
Fig. 4 Graphical representation of different metrics through training and validation phase for 35 epochs Table 3 Different performance metrics were achieved for 35 epochs
Performance metrics Loss
Value 0.0142
Accuracy
99.51%
Mean IoU
84.06%
DSC
65.92%
Sensitivity
99.39%
Specificity
99.84%
DSC (necrotic core)
65.90%
DSC (edema)
76.22%
DSC (enhancing tumor Core)
73.25%
Value DSC (Enhancing Tumor Core) DSC (edema) DSC (Necrotic Core) Specificity Sensitivity DSC Mean IoU Accuracy 0.00%
20.00% 40.00% 60.00% 80.00% 100.00%
Fig. 5 Illustration for different metrics achieved for 35 epochs
746
A. Ingle et al.
Table 4 Comparison of various algorithms for different metrics Algorithm
Accuracy (%) Sensitivity (%) Specificity (%) Loss
Fuzzy C means [8]
97.88
70.40
98.74
TKFCM [9]
97.5
97.43
100
CNN with 2 convolutions [11]
93
INPUTCASCADECNN [13]
87
89
LOCALPATHCNN [13]
80
91
MFCASCADECNN [13]
81
92
Fully CNN [14]
92
M-MOTF [15]
98
GA-CNN [18]
94.2
CNN [19]
96.13
CNN with WCA algorithm [20]
0.2326
94.43
96.93
98.7
98.33
99.33
98.50
50
98.5
99.39
99.84
GLCM feature extraction for CNN 92.86 algorithm [21] CapsNet [22]
90.89
PT-CNN (AlexNet) [23]
99.04
Implemented encoder–decoder architecture
99.51
0.0142
Bold signifies the highest respective metric achieved
6 Conclusion and Future Scope Automatic brain tumor segmentation is hard due to its complicated design. With a partial survey of traditional segmentation techniques, a thorough comparison of different convolutional neural networks is done. U-shaped encoder–decoder architecture is implemented with the minimum loss for tumor segmentation which outperforms the latest existing algorithms for accuracy and sensitivity at 99.51% and 99.39%, respectively. Metrics like MIoU and DSC can be improved using suitable data augmentation techniques and cross-validation due to scarcity in the number of images available in the medical domain. Further improvements in metrics can be achieved using transfer learning from different benchmark deep learning models.
Efficient Segmentation of Tumor with Convolutional Neural Network …
747
Fig. 6 a Initial image with b ground truth and segmented tumor images in c whole tumor (all classes), d necrotic core (red), e edema (green), f enhancing tumor core (blue)
References 1. https://indianexpress.com/article/lifestyle/health/world-brain-tumour-day-2020-symptomscause-treatment-6448213/. Last Accessed 31 July 2022 2. Hebli A, Gupta S (2016) Brain tumor detection using image processing: a survey. In: Proceedings of 65th IRF international conference, 20th November 2016, Pune, India. ISBN: 978-93-86291-38-7 3. Swamy S, Kulkarni P (2015) Image processing for identifying brain tumor using intelligent system. Int J Innovative Res Sci Eng Technol 4(11). ISSN (Online): 2319-8753; ISSN (Print): 2347-6710 4. Tahir M, Iqbal A, Khan A (2016) A review paper of various filters for noise removal in MRI brain image. Int J Innovative Res Comput Commun Eng 4(12). ISSN (Online): 2320-9801; ISSN (Print): 2320-9798 5. Thakur N, Khan N, Sharma S (2021) A comparative analysis of edge-preserving approaches for image filtering. In: Intelligent learning for computer vision, CIS 2020, vol 61. Springer 6. Aslama A, Khan E, Beg M (2015) Improved edge detection algorithm for brain tumor segmentation. Procedia Comput Sci 58:430–437. Elsevier Science Direct 7. Ilhan U, Ilhan A (2017) Brain tumor segmentation based on a new threshold approach. Procedia Comput Sci 120:580–587 8. Chaudhari A, Choudhari V, Kulkarni J (2017) Automatic brain MR image tumor detection using region growing 5(12). ISSN (p): 2347-6982 9. Alam M et al (2019) Automatic human brain tumor detection in MRI Image using templatebased K means and improved fuzzy C means clustering algorithm. Big Data Cogn Comput 3:27
748
A. Ingle et al.
10. Yamashita R, Nishio M, Do R, Togashi K (2018) Convolutional neural networks: an overview and application in radiology 9(4):611–629. Springer 11. Febrianto D, Soesanti I, Nugroho H (2020) Convolutional neural network for brain tumor detection. IOP Conf Ser: Mater Sci Eng 771:012031 12. Akkus Z et al (2017) Deep learning for brain MRI segmentation: state of the art and future directions. J Digit Imaging 30:449–459. https://doi.org/10.1007/s10278-017-9983-4 13. Havaei M et al (2016) Brain tumor segmentation with deep neural networks. https://doi.org/ 10.1016/j.media.2016.05.004. Elsevier. 1361-8415/© 2016 14. Pereira S, Oliveira A, Alves V, Silva C (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35 15. Lakshmi A (2017) Thesis: performance analysis of brain tumor segmentation and classification 16. Dong H et al (2017) Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In: Communications in computer and information science medical image understanding and analysis, pp 506–517 17. Hasan S, Linte C (2018) A modified U-Net convolutional network featuring a nearest-neighbor Re-sampling-based elastic-transformation for brain tissue characterization and segmentation. IEEE. 978-1-7281-0255-9/18/$31.00 © 2018 18. Anaraki A, Ayati M, Kazemi F (2018) Magnetic resonance imaging-based brain tumor grades classification and grading via convolutional neural networks and genetic algorithms. https:// doi.org/10.1016/j.bbe.2018.10.004. Published by Elsevier. ISSN: 0208-5216/© 2018 19. Sultan H, Alem N, Al-Atabany W (2019) Multi-classification of brain tumor images using deep neural network. IEEE Access 7 20. Jijja A, Rai D (2019) Efficient MRI segmentation and detection of brain tumor using convolutional neural network. IJACSA 10(4) 21. Shelke S, Mohod S (2019) Semi-automated brain tumor segmentation and detection from MRI. IRJET 06(01). e-ISSN: 2395-0056; p-ISSN: 2395-0072 22. Afshar P, Plataniotis K, Mohammadi A (2019) Capsule networks for brain tumor classification based on MRI images and coarse tumor boundaries. In: ICASSP. IEEE, p 1368. 978-1-53864658-8/18/$31.00©2019 23. Mehrotra R, Ansari M, Agrawal R, Anand R (2020) A transfer learning approach for AIbased classification of brain tumors. https://doi.org/10.1016/j.mlwa.2020.100003. Published by Elsevier Ltd. ISSN: 2666-8270/© 24. https://www.kaggle.com/datasets/awsaf49/brats2020-training-data. Last Accessed 05 Feb 2022 25. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Navab N et al. (eds) MICCAI 2015, Part III, LNCS 9351. Springer International Publishing, Switzerland, pp 234–241. https://doi.org/10.1007/978-3-319-245744_28
Gradient-Based Physics-Informed Neural Network Kirti Beniwal and Vivek Kumar
Abstract Physics-informed neural networks (PINNs) are one of the most effective tool of deep learning used for solving differential equation. However, one of the significant drawbacks of PINN is its low accuracy in the initial stage, even with a large number of training points. This study introduces gradient-enhanced physicsinformed neural networks (gPINNs) that are trained for solving partial differential equations. Here, the proposed approach is contrasted with a physics-informed neural network, which has proven to be an efficient technique for resolving forward and inverse PDE issues. PINN increases the performance of neural networks (NNs) by using the physical information contained in PDE as a regularization term. So, gradient-based physics-informed neural network (gPINNs) is introduced here to improve the performance of PINNs. gPINNs control the gradient information of equation error and embed this term into the loss term. This method is tested for various PDEs and found to be more effective than PINN. Further, gPINN was implemented with the residual-based adaptive refinement (RAR) method to enhance or increase wave equation accuracy. Finally, we used the Keras library in PINN and gPINN algorithm to find the unknown parameter of inverse PDE. Keywords Partial differential equation · Physics-informed neural network · Residual-based adaptive refinement
1 Introduction While deep learning (DL) has demonstrated remarkable successes in a variety of applications, it has not been widely used to solve partial differential equations (PDEs) until recently discussed by Ex. Karniadakis et al. [1, 2]. While artificial neural network (ANN) was introduced in an article by Ex. McCulloch and Pitts [2, 3], as it was not so popular at that time because of computational machines. This network is made K. Beniwal · V. Kumar (B) Department of Applied Mathematics, Delhi Technological University, Bawana Road, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_54
749
750
K. Beniwal and V. Kumar
Fig. 1 Graph showing a feedforward neural network (FNN), in which one or more artificial neurons are placed into an input layer, one or more hidden layers, and one or more output layers
up of input, output, and hidden layers with several number of neurons. The ANN network with more than one hidden layer is called deep neural network (DNN). A lot of attention has been paid to neural networks (NNs) in solving differential equations in recent years. Their universal approximation properties provide an alternative approach to solving differential equations by providing a nonlinear approximant via a variety of network structures and activation function. Classification, pattern recognition, and regression have all been revolutionized by deep learning. There are various numerical techniques to solve differential equations discussed by Ex. Lochab and kumar [4–8], such as finite element method [9–11] and finite difference method [12, 13]. A new class of neural networks was introduced called Physics-Informed Neural Network (PINN). It is specially designed to solve ordinary differential equations (ODEs), partial differential equations (PDEs), integro-differential equations, etc. It is used not only to solve differential equations but also to predict the solution of the equation from the data called ‘Data Driven Discovery’ given by Ex. Raissi et al. [14]. The main advantage of PINN is that it is mesh-free [15, 16] and simple; therefore, it has been successfully implemented to solve various problems in different field of science such as fluid mechanics and biomedicine. In the past decades, various types of NN have been introduced such as convolutional neural network (CNN), recurrent neural network (RNN), but the simplest one is feedforward neural network (FNN). In this paper, FNN and residual neural network have been implemented, for solving PDE. Consider a NN with L number of layers and L − 1 hidden layers. Nl denotes the neurons in lth layer, (No = input layer neurons) and (N L = output layer neurons). Let W l , bl denotes weight matrix and bias vector, respectively, at the lth layer (Fig. 1). For any nonlinear activation function σ , the feedforward neural network is defined as follow [17]: input layer : N0 (x) = x hidden layer : Nl (x) = σ (W l Nl−1 (x) + bl ), for 1 ≤ l ≤ L − 1, output layer: N L (x) = W L N L−1 (x) + b L In most of the cases, tanh, sigmoid, and rectified linear units (ReLUs) are used as activation functions. Furthermore, it is required to compute the derivatives of output with respect to input. In deep learning framework, automatic differentiation is used to
Gradient-Based Physics-Informed Neural Network
751
compute such derivatives using backpropagation. The basic formulation of solving PDE using PINN requires an loss function which has to be minimized via automatic differentiation. Usually, minimization of the loss function was trained using gradient-based optimizers like Adam, Limited- Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS), and gradient decent. The differentiation using automatic differentiation can be conveniently done with machine learning libraries like Ex. Abadi et al. [18] presented a detailed study of Tensorflow and Ex Paszke et al. [19] discussed PyTorch. It is still necessary to address some issues with PINNs, despite promising results. The problem of improving the accuracy and efficiency of PINNs remains an open one. There are still some aspects of PINN that can be improved such as during training the residual points are randomly distributed while there are some other methods also of training and sampling that use same number of training points, which gives better accuracy, i.e., RAR. Also, in PINN, it becomes difficult to balance the loss terms as it contains various loss terms corresponds to PDE and PDE residuals. So, for that domain decomposition method can be used for large domain. In PINN, neural networks were trained to minimize the loss function obtained using PDE residuals, and therefore, the PDE residual is used as losses. Since this concept is simple and employed by many researchers in the field, no attention is currently being paid to other types of losses connected to PDEs. As a result, it can be clearly observed that if PDE residual becomes zero, it must also need to be zero in terms of the gradient discussed by Lu et al. [20]. By holding gradient part of the PDE residuals, a method is developed, i.e., gradient-enhanced PINN (gPINN), which improves the accuracy of PINNs by using gradient-enhanced loss functions. Using neural networks and the Gaussian process [21], it has been shown that the general concept of using gradient information is advantageous for function regression. These approaches, however, were mainly constructed for function regression and do not work for solving forward or inverse PDEs. Instead, they use function gradients in addition to function values. The sole method in this area that employs the input’s loss gradient as an additional loss is input gradient regularization. Hence, a fundamentally different approach has been provided to these prior approaches, as the PDE residual’s gradient applied to the models inputs. Further improving performance is achieved by combining gPINN with the RAR method. This paper is organized as follows: In Sect. 2, formulation of PINN, gPINN, and gPINN with RAR is discussed. In Sect. 3, forward equation solved using gPINN+RAR and a inverse problem using keras library. Then, summary of this study is given in Sect. 4.
2 Methods This section includes a brief overview of PINN and gradient-enhanced physicsinformed neural network (gPINN). The relevant algorithms are introduced.
752
K. Beniwal and V. Kumar
2.1 PINN Algorithm for Solving Forward and Inverse PDEs Consider a PDE that is defined on the domain Ω and boundary conditions are defined on ∂Ω. ∂2 y ∂2 y ∂y = 0, x = (x1 , x2 , ..., xn ) ∈ Ω , ...; , ..., (1) F x; ∂ x1 ∂ x1 ∂ x1 ∂ x1 ∂ xn B[y](x) = 0, x ∈ ∂Ω
(2)
here, y is defined as unknown solution. B denotes boundary condition, initial conditions are treated in the same process as above boundary conditions. F can be linear 2 or nonlinear differential operator ( ∂∂x , ∂∂x 2 , y. ∂∂x , etc ). In PINN algorithm , firstly, construct a neural model to approximate the solution y(x) and this NN solution is then denoted as y(x; θ ) , where x is taken as input with parameters θ and outputs a vector of same dimension as y(x) . Generally, there are three neural network (NN) layers: input, hidden, and output layer. In this model, assume L − 1 hidden layer and one input and a single output layer. During the creation of the neural network, each hidden layer receives its output from the previous one layer, and each hidden layer is composed of Nk neurons. Each layer’s weight matrix and bias vector are contained in the parameter θ . During the training phase, these parameters will continully be optimized (Fig. 2). Therefore, specify the training constraints Td and Tb for the PDE and boundary/initial conditions to train the model. Let Td denotes the data points in the domain and Tb be the set of boundary points. In a PINN, the loss function is defined as: (3) J (θ, T ) = wd Jd (θ ; Td ) + wb Jb (θ ; Tb ) where, Jd (θ ; Td ) =
1 i | y − y(x yi , t yi )|2 |Td | x∈T
(4)
1 |B( y, x)|2 |Tb | x∈T
(5)
d
Jb (θ ; Tb ) =
b
where wd and wb are the weights, and T = (x1 , x2 , ..., x|T | ) is training set of size |T |. Td and Tb denotes the set of residual points of the training set. Here, y(x yi , t yi )) is training data from initial and boundary conditions. The above loss function contains partial and normal derivative which is then handled via automatic differentiation (AD). In last step, train the NN by minimizing the loss function J (θ ; T ) to find the best parameters θ . As this function is nonlinear, so this will be minimized using gradient-based optimizers like, Adam [22] and L-BFGS [22]. θn+1 = θn − β θ J (θn )
Gradient-Based Physics-Informed Neural Network
753
Fig. 2 An illustration of the physics-informed neural network, used to solve partial differential equations [17]
PINNs have the advantage of being able to be used for forward problems and for inverse PDE-based problems also. If there is any unknown parameter α in Eq. (1), then add an additional loss term of y on the set of points Ti in Eq. (3). So, a new loss function for inverse problem is defined as J (θ, T ) = wd Jd (θ ; Td ) + wb Jb (θ ; Tb ) + wi Ji (θ, α; Ti ). where, Ji (θ, α; Ti ) =
1 | y − y(x)|2 |Ti | x∈T
(6)
(7)
i
2.2 Gradient-Based Physics-Informed Neural Networks (gPINNs) In PINN, the PDE residuals are only enforced to be zero; since f (x) = 0 for any x, its derivatives should also be zero [20]. In this case, suppose that the PDE solution is smooth enough for f (x) to have a gradient of PDE residual, and then, gPINN is introduced to make the derivative of PDE residual to be zero .
754
K. Beniwal and V. Kumar
F(x) =
∂y ∂y ∂y , , ......, ∂ x1 ∂ x2 ∂ xn
= 0, x ∈ Ω
Therefore, the loss function is defined as: J = wd Jd + wb Jb + wi Ji +
n
w f i J f i (θ ; T f i ),
(8)
i=1
where
1 ∂ y 2 J f i (θ ; T f i ) = |T f i | x∈T ∂ xi
(9)
fi
Here, T f i denotes set of residual points for the derivative term. In this study, grid search is used to determine the optimal values of weights; we choose w f1 = w f2 = ... = w fn . When gPINN is tested on some PDEs, the performance is sensitive to the weight value, while on others it is not. As this will be demonstrated in numerical examples, that how, gPINN improves the estimation of PINN by imposing the gradient of PDE residual. As a result, it is possible to predict the solutions for u with more accuracy and with fewer training points. The gPINN algorithm also enhances δu . Motivation of gPINN is that usually the PDE the accuracy of predictions for δx i residual of PINN fluctuates around zero, and penalizing the slope of the residual would lessen this fluctuation and bring the residual closer to zero.
2.3 Residual-Based Adaptive Refinement (RAR) As residual points are randomly distributed in the domian of PINN, a residual-based adaptive refinement (RAR) [20] method is created to optimize the distribution of residual points throughout the training phase. To increase the model’s precision and effectiveness during training, add extra residual points when the PDE residual is high. Steps for gPINN with RAR algorithm: 1. 2. 3. 4.
For certain number of iterations, train the NN using gPINN. Find the PDE residual at random points in the domain. To the training set T , add n new points that has largest residuals. In this step, repeat Steps 1, 2, and 3 until a threshold being reached, or until mean residual falls below that threshold.
3 Results In section, forward and inverse PDE will be discussed using gradient-enhanced PINN (gPINN) and gPINN with RAR to improve the accuracy. Equations are constructed
Gradient-Based Physics-Informed Neural Network
755
using neural network models that will be based on the provided initial and boundary condition. To test gradient-enhanced PINN, approximation results will be compared with true solutions and PINN results.
3.1 Function Approximation Using NN and gNN First, the benefits of adding gradient information are demonstrated through a pedagogical example. So, consider a function. y(x) = sin(10x 2 ),
x ∈ [0, 1]
(10)
Based on the training dataset (x1 , y(x1 )), (x1 , y(x1 )), ..., (xn , y(xn )), where (x1 , x2 , ..., xn ) are equispaced points in a sample [0, 1]. In order to train a neural network, a loss function is used as follows: J=
n 1 |y(xi ) − y(xi )|2 n i=1
(11)
Additionally, consider the gNN with the additional gradient loss function as [22]. J=
n n 1 1 |y(xi ) − y(xi )|2 + wg | y(xi ) − y(xi )|2 n i=1 n i=1
(12)
Using DeepXDE [20] with TensorFlow, 1 backend library code for this function was implemented. In this example, hyperbolic tangent is used as the activation operator, and hyperparameters that are used for this example are as follows: ‘Adam’ optimizer as it is much more efficient and provides much higher performance because it combines two gradient descent approaches: Momentum and root mean square propagation (RMSP); ‘Glorot uniform’ initilizer; 10 hidden layers and 1000 iterations were performed. Using different weight values wg including 1, 0.1, 0.01, 0.001, training of the network was done for all weights values, and it was found that wg does not affect gNN accuracy. NN and gNN shows smaller L 2 relative error even when number of training points increased; here, 20 training points were taken. The gNN algorithm has smaller error as compared to neural network which is about 1 order of magnitude more accuracy. Figure 3 shows the prediction of y and ddyx using 20 training points.
756
K. Beniwal and V. Kumar
Fig. 3 Comparison between exact, NN and gNN, a and b example of predicted y and tively. The black dots represent location of training points
dy dx ,
respec-
3.2 Forward Problem Using gPINN As a result of showing how effectively gradient loss can be added to function approximation, gPINN is applied to the solution of PDEs.
3.2.1
Wave Equation
There are numerous applications of the wave equation [23, 24] in physics, such as the study of how waves propagate through space and radio communications. It appears in many fields of science, such as the study of acoustic wave propagation, radio communications, and seismic wave propagation [25–27]. So, wave equation is mathematically expressed as: ytt (x, t) − c2 yx x = 0,
x ∈ (0, 1), t ∈ (0, 1)
(13)
where y is function of x (spatial variables) and t (time). The initial and boundary conditions are given as follows: y(0, t) = y(1, t) = 0, t ∈ [0, 1]
(14)
y(x, 0) = sin(π x) + sin(aπ x), x ∈ [0, 1]
(15)
yt (x, 0) = 0, x ∈ [0, 1]
(16)
The solution of above equation is y(x, t) = sin(π x) cos(cπ t) + sin(aπ x) cos(acπ t)
Gradient-Based Physics-Informed Neural Network
757
with respect to the spatiotemporal domain, we specifically treat the initial condition (15) as a special boundary condition. Equations (14) and (15) can be summerized as: f (x) = sin(π x) + sin(aπ x),
x ∈ ∂Ω
To train the network, minimize the loss function defined as: J (θ ) = Jy (θ ) + Jyt (θ ) + Jr (θ ) =
(17)
1 1 |y(x yi , θ ) − f (x yi )|2 + |yt (x yi t , θ )|2 N y i=1 N yt i=1 2 Nr 2 2 ∂ yθ i 1 2 ∂ yθ i + wg (x ) − c (x ) r r . Nr i=1 ∂t 2 ∂x2 Ny
N yt
(18)
where a = 2 and c = 10 and set the batch sizes to N y = N yt = Nr = 360 and each Ny N yt Nr , {x yi t }i=1 and {xri }i=1 time gradient descent iterates, all data points {x yi , f (x yi )}i=1 are uniformly selected from the relevant regions in the computational domain. 1000 residual points were considered and used hard parameters constraints for both the algorithms, i.e., PINN and gPINN. The prediction and error of these algorithms are shown in Fig. 4. The results and solutions of gPINN are more accurate as compared to that of PINN in the whole domain.
3.3 Inverse Problem Using gPINN Aside from solving forward PDE problems, gPINN is also applied for solving inverse PDE problems. The loss term for inverse problem is discussed in Sect. 2. Similarly, gPINN algorithm is also used to solve inverse PDE.
3.3.1
Eikonal Equation
Eikonal equation is first-order nonlinear PDE that is basically confronted in wave propagation. This example shows how to solve partial differential equations [28, 29] with unknown parameters λ using the PINN and gPINN methods. Therefore, consider time-dependent Eikonal’s parabolic equation over domain D = [−1, 1] −∂t y(t, x) + |∇ y(t, x)| = λ−1 , y(T, x) = 0, y(t, −1) = y(t, 1) = 0,
(t, x) ∈ [0, T ) × [−1, 1], x ∈ [−1, 1] t ∈ [0, T ).
758
K. Beniwal and V. Kumar
λ is the unknown parameter. The solution is y ∗ (t, x) = λ−1 min{1 − t, 1 − |x|}. whenever the Eikonal equation runs backward in time, it is consistent with the interpretation that it is the optimality condition for controlling systems. The change of variables t ∗ = T − t can transform this equation into a forward evolution problem. Our final model was based on a Leaky ReLU activation function with r = 0.1 slope parameter, which consisted of only two hidden layers, ReLU is defined as: x if r ≥0 σ (z) = r x otherwise For Eiklon equation, 500 uniformly distributed training points were taken into consideration. Next, a new network class was added which include additional λ. The model was then trained for 1000 epoch which takes 30 s. Figure 5 shows the solution of Eiklon equation yθ (t, x). This shows that the relative error calculated using
Fig. 4 Comparison between PINN and gPINNs predicted value and the absolute value of y for 1D wave equation
Gradient-Based Physics-Informed Neural Network
759
Fig. 5 a The solution of Eiklon equation; b evolution of loss values and estimated parameter; c evolution of λ, its mean (blue) and one standard deviation (shaded blue), true value of λ (dashed blue); d mean relative error
gPINN model (4.4365 × 10−6 ) is less than PINN model (6.3427 × 10−4 ) of identified paramter λ. Further, the evolution of λ ,true value of λ, together with its mean and standard deviation is shown below.
4 Conclusion In this study, a method for solving partial differential equation, i.e., physics-informed neural network was introduced (PINN) and its new version gradient-enhanced PINN which improves the exactness of PINN. All the examples show that gPINN clearly gives better accuracy and outperforms PINN for relative errors of the solution and derivatives with the same number of data points. Also, for inverse problem, gPINN takes in the unknown parameter more precisely in less computational time.
760
K. Beniwal and V. Kumar
References 1. Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nat Rev Phys 3(6):422–440 2. McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133 3. Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (2021) Congress on intelligent systems. In: Proceedings of CIS 2021, vol 1 4. Kumar V, Rao SVR (2008) Composite scheme using localized relaxation with non-standard finite difference method for hyperbolic conservation laws. J sound vib. 311(3-5):786–801 5. Lochab R, Kumar V (2021) A new reconstruction of numerical fluxes for conservation laws using fuzzy operators. Int J Num Methods Fluids 93(6):1690–1711 6. Lochab Ruchika, Kumar Vivek (2021) An improved flux limiter using fuzzy modifiers for Hyperbolic Conservation Laws. Math Comput Simul 181:16–37 7. Lochab R, Kumar V (2022) A comparative study of high-resolution methods for nonlinear hyperbolic problems. ZAMM-J Appl Math Mech/Zeitschrift für Angewandte Mathematik und Mechanik: e202100462 8. Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC () Congress on intelligent systems. In: Proceedings of CIS 2021, vol 2 9. Khari K, Kumar V (2022) An efficient numerical technique for solving nonlinear singularly perturbed reaction diffusion problem. J Math Chem 60:1356–1382. https://doi.org/10.1007/ s10910-022-01365-4 10. Khari K, Kumar V (2022) Finite element analysis of the singularly perturbed parabolic reactiondiffusion problems with retarded argument. Numer Methods Partial Differ Eq 38:997–1014. https://doi.org/10.1002/num.22785 11. Sharma H, Saraswat M, Kumar S, Bansal JC (2020) Intelligent learning for computer vision. In: Proceedings of congress on intelligent systems 12. Kumar V, Mehra M (2007) Wavelet optimized finite difference method using interpolating wavelets for solving singularly perturbed problems. J Wave Theory Appl 1(1):83–96 13. Wang S, Wang H, Perdikaris P (2021) On the eigenvector bias of fourier feature networks: from regression to solving multi-scale pdes with physics-informed neural networks. Comput Methods Appl Mech Eng 384:113938 14. Raissi M, Perdikaris P, Karniadakis GE (2017) Physics informed deep learning (part i): datadriven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 15. Kumar V, Srinivasan B (2019) A novel adaptive mesh strategy for singularly perturbed parabolic convection diffusion problems. Differ Equ Dyn Syst 27(1):203–220 16. Sharma H, Saraswat M, Yadav A, Kim JH, Bansal JC (2020) Congress on intelligent systems. In: Proceedings of CIS 2020, vol 1 17. Guo Y, Cao X, Liu B, Gao M (2020) Solving partial differential equations using deep learning and physical constraints. Appl Sci 10(17):5917 18. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M et al. (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283 19. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch, in NIPS 2017 Workshopon Autodiff. https://openreview.net/forum?id=BJJsrmfCZ 20. Lu L, Meng X, Mao Z, Karniadakis GE (2021) DeepXDE: A deep learning library for solving differential equations. SIAM Rev 63(1):208–228 21. Deng Y, Lin G, Yang X (2020) Multifidelity data fusion via gradient-enhanced Gaussian process regression. arXiv preprint arXiv:2008.01066 22. Yu J, Lu L, Meng X, Karniadakis GE (2022) Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems. Comput Methods Appl Mech Eng 393:114823 23. Mehra M, Kumar V (2007) Fast wavelet-Taylor Galerkin method for linear and non-linear wave problems. Appl Math Comput 189(2):1292–1299
Gradient-Based Physics-Informed Neural Network
761
24. Kumar V, Srinivasan B (2019) A novel adaptive mesh strategy for singularly perturbed parabolic convection diffusion problems. Differ Equ Dyn Syst 27(1):203–220 25. Li Jing, Feng Zongcai, Schuster Gerard (2017) Wave-equation dispersion inversion. Geophys J Int 208(3):1567–1578 26. Gu J, Zhang Y, Dong H (2018) Dynamic behaviors of interaction solutions of (3 + 1)− dimensional shallow water wave equation. Comput Math Appl 76(6):1408–1419 27. Kim D (2019) A modified PML acoustic wave equation. Symmetry 11(2):177 28. Kant S, Kumar V (2015) Analysis of an eco-epidemiological model with migrating and refuging prey. In: Mathematical analysis and its applications, Springer, New Delhi, pp 17–36 29. Arora C, Kumar V, Kant S (2017) Dynamics of a high-dimensional stage-structured preypredator model. Int J Appl Comput Math 3(1):427–445
Automated Lesion Image Segmentation Based on Novel Histogram-Based K-Means Clustering Using COVID-19 Chest CT Images S. Nivetha and H. Hannah Inbarani
Abstract COVID-19 has wreaked havoc in the world, causing epidemic scenarios in the majority of countries. As a result, medical practitioners urgently seek an early detection, rapid, and precise diagnosis technique for COVID-19 infection and also find a reliable automated method for identifying and quantifying infected lung areas would be extremely beneficial. Discovering infected regions in chest Computed Tomography (CT) scan images is the greatest approach to halt the transmission of pathogens. The analysis of lung CT scan images is the first step segmentation of lung CT images in lung image analysis. Due to intensity inhomogeneity and the existence of artifacts, the primary problems of segmentation algorithms have been accentuated. This research proposes a novel image segmentation method, Novel HistogramBased K-Means Clustering (NHKMC) to localize the diseased lesion. The average Dice Similarity Coefficient (DSC), Accuracy, Sensitivity, Specificity, and Structural Similarity Index Method (SSIM) for the proposed NHKMC segmentation task are obtained as 86.00%, 82.00%, 86.07%, 86.18%, 85.09%, and 86.04%, respectively. The outcomes of the experiments demonstrate that the proposed approach Novel Histogram-Based K-Means Clustering (NHKMC) performs well and has a great deal of potential for segmenting the COVID-19 lesion region of the CT scan image. Keywords Clustering · Novel Histogram-Based K-Means Clustering · Lesion · Segmentation · K-Means segmentation
1 Introduction The United Nations Agency associated the new contagious virus produced by the contagion Coronavirus Disease 2019 (COVID-19), and the corona contagion was named SARS-CoV-2 by the International Committee on Taxonomy of Viruses S. Nivetha · H. Hannah Inbarani (B) Department of Computer Science, Periyar University, Salem, India e-mail: [email protected] S. Nivetha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_55
763
764
S. Nivetha and H. Hannah Inbarani
(ICTV) [1, 2]. People with chest infection or pneumonia are hospitalized for diagnostic procedures, which can include laboratory-based and other types of diagnostics to evaluate the infection’s etiology, region, and severity. The laboratory testing includes standard procedures such as Arterial Blood Gas (ABG) tests, common blood test, and pleural effusion is a collection of fluid between the tissue layers that cover the chest cavity and lungs. Bhandary et al. [3] may require time-consuming sample transportation from the hospice to the test center. Non-laboratory testing, on the other hand, includes digital Chest Radiography or Computed Tomography scan techniques that are used to investigate lung areas utilizing computer-assisted image analysis techniques. Inflammation growths in the lungs can be dangerous to a person’s health. In medical image analysis, image segmentation is a critical stage. It additionally permits the extraction of vital properties such as tissue texture and form [4]. Segmentation is crucial in COVID-19 image processing and analysis for predicting illness severity. In chest X-ray or Computed Tomography (CT) images, the lung, lobes, bronchopulmonary segments, and infectious regions or lesions must be delineated for more assessment and quantification. CT scans have been found to detect common signs of infection, such as Ground-Glass Opacity (GGO) in the initial phases and Pulmonary Infiltrates in the final phases, according to recent research [5]. A GGO is a lung area with higher attenuation and distinct bronchi and arterial signs on Computed Tomography (CT). Manual observation of the flow of textural information identification and feature extraction from the lung is a time consuming, difficult, and tedious method. The main novelty of the paper is given as follows: 1. In this research, the Novel Histogram-Based K-Means Clustering (NHKMC) procedure is presented for the segmentation of COVID-19-infected areas and delineates overall lung from chest Computed Tomography (CT) images. 2. In the Novel Histogram-Based K-Means Clustering Algorithm, the number of regions is initialized based on the peak values of the image histogram. This value helps to separate the foreground as the lung region and other regions as background and infected lesion areas. 3. The proposed model is used to evaluate COVID-19 CT images from GitHub datasets (https://github.com/UCSD-AI4H/COVID-CT). 4. The proposed model outperformed other approaches such as K-Means-based segmentation and multi-Otsu thresholding segmentation. The novel idea behind this work is that the number of regions in the CT image is automatically learned based on histogram peaks and the dataset taken contains various number of regions, i.e., some COVID-CT images have only three regions such as the lung region, lesion region, and breast tissue region and some images have lung region, breast tissue, spinal cord, and body fat under the skin. In a medical image, identification of regions and localization of the infected region cannot be determined automatically. The novelty of the proposed work is that the number of regions can be detected automatically using peak values of the histogram.
Automated Lesion Image Segmentation Based on Novel …
765
2 Related Work In recent years, various lung segmentation and nodule detection methods and a few algorithms based on machine learning have been reported. The overview of relevant literature is shown in Table 1.
3 Dataset and Its Description Images of the lungs from CT scans of COVID-19-positive patients and healthy people are used to implement the proposed strategy. The database includes 397 CT-nonCOVID images and 349 CT-COVID-19-positive images from 216 individuals [18]. The lowest, average, and peak heights are 153, 491, and 1853, respectively. The widths are 124, 383, and 1485 for lowest, average, and highest, respectively. The input CT images come in a variety of sizes and image formats (JPEG, PNG). The input images are first downsized to 256 × 256 and converted to a PNG image format to maintain uniformity.
4 K-Means Clustering Algorithm The classification of an image into different categories is known as image segmentation. For the proper evaluation and clarification of therapeutic imageries, image segmentation is a crucial step [19]. The K-Means algorithm is one of the partitioningbased clustering approaches. J. B. MacQueen proposed the K-Means algorithm, which is a clustering technique based on partitioning [20]. The K-Means clustering algorithm is given in [21].
5 Multi-Otsu Thresholding Thresholding is an important image segmentation technique that uses the distribution of gray points to try to detect and remove a target from its background. The global thresholding analysis by Pun [22] found that Otsu’s method was among the best threshold selection techniques for generic everyday images in terms of uniformity and shape metrics. With more classes in an image, Otsu’s approach grasps too long to be viable for multilevel threshold. The multi-Otsu thresholding approach is depicted in [23].
766
S. Nivetha and H. Hannah Inbarani
Table 1 Summary of related work S. No.
Authors
Year
Dataset
1
Dhruv et al. [6]
2022
15 chest CT images of Fuzzy C-means patients who suffered from clustering-based COVID-19 segmentation
Methods
2
Rathod and Khanuja [7]
2022
https://www.kaggle.com/ luisblanche/covidct
K-Means clustering
3
Abd Elaziz et al. [8]
2021
These images for analysis were collected from many datasets, including CheX, also known as CheXpert, OpenI, Google, PC, and PadChest, NIH, Chest X-ray14, and MIMIC-CXR
Density peak clustering based on generalized extreme value
4
Hussein et al. [9]
2021
China National Center for Bioinformation
AI-based medical hub platform
5
Chakraborty et al. [10]
2021
115 chest CT scan images
Super pixel-based fuzzy-modified flower pollination algorithm
6
Kumar et al. [11]
2021
https://coronacases.org
Fast fuzzy C-means clustering
7
Farki et al. [12]
2021
Datasets collected from Sun Yat-sen Memorial Hospital, Wuhan University’s Renmin Hospital, Sun Yat-sen University in Guangzhou
Fuzzy C-ordered means an improved version of the enhanced capsule network (ECN)
8
Akbari et al. [13]
2020
https://medicalsegmenta tion.com/covid19/
Comparison study on active contour models
9
Medeiros et al. [14]
2019
72 lung images of donors lungs, either healthy or with fibrosis
Fast morphological geodesic active contour (FGAC) method
10
Paulraj et al. [15]
2019
Real dataset—3D CT lung Possibilistic fuzzy image dataset C-means (PFCM) approach
11
Shariaty et al. [16]
2019
LIDC-IDRI database
Thresholding methods
12
Dai et al. [17]
2015
General Hospital of Ningxia Medical University
Improved graph cuts algorithm with Gaussian mixture models (GMMs)
Automated Lesion Image Segmentation Based on Novel …
767
6 Proposed Novel Histogram-Based K-Means Clustering Algorithm The segmentation of the lung is entirely automated in the proposed system, requiring no human input. The proposed segmentation method focuses on a histogram from the provided image and analyzes it to choose a peak value automatically. Based on this peak value, the number of regions is formed. The image’s outer portion is recognized and separated in color, and the remaining area is then processed further to extract the lungs. The lung segmentation approach makes use of histogram and image processing methods. As shown in Fig. 1, the converted PNG image contains four parts: a black backdrop; lungs in dark gray; lesion region; and spinal cord, rib, and shoulder blade in the white region. Some of the images contain five parts: a black backdrop; a circular area of dark gray; a brighter area; the lungs in a dark gray tone; and a lesion region. A graphical depiction of the quantity of pixels in an image as a measure of their intensity is called an image histogram. The relative frequency of occurrence for each pixel value in a digital image (extending from zero to two fifty-five) is shown in a histogram of the image. The histogram which is made up of bins provides a thorough and easy-to-understand overview of image intensity. In this research, the Novel Histogram-Based K-Means Clustering (NHKMC) is proposed to segment the infected lesion. The peak values of an image’s histogram are found, and the region numbers are initialized. This eliminates the need for region numbers to be generated at random. Peaks in the histogram represent the image’s more common values, which are usually composed of nearly uniform regions. In the histogram, the valleys represent fewer common values. For some of the imageries in the dataset, the histogram of lung images is presented in Figs. 2 and 3. The histogram exhibits four distinct peaks that are visible. At 0, which is represented by the image’s dark background, there is one extremely high peak. The dark gray circular band enclosing the luminous region gives rise to the second peak, which is located at approximately gray level 110. The third peak is generated by the intensities of the lungs and patches inside the bright region surrounding the lungs, and it is located around the gray level of 35. A high peak at 255 relates to the white region mostly around the lungs. Thus, from the peak range values, we can segment the background region, lesion region, brighter region, and spinal cord region
Fig. 1 CT scan for the lung in four to five regions for the proposed NHKMC
768
S. Nivetha and H. Hannah Inbarani
using color code to the pixels. The proposed approach for Novel Histogram-Based K-Means Clustering approach is elucidated in Algorithm 1. Algorithm 1. Proposed approach Novel Histogram-Based K-Means Clustering algorithm Input: P (x, y) = Input Image, K = Number of Cluster (Peak Value of the Histogram), C k = Region Centroids Output: Segmented Image (continued)
Fig. 2 Segmentation mask results of NHKMC
Automated Lesion Image Segmentation Based on Novel …
769
(continued) Algorithm 1. Proposed approach Novel Histogram-Based K-Means Clustering algorithm Step 1: Peak values of the histogram are calculated and the number of regions is found and center as Ck K = Hist(P(x, y)) (1) where pixel of an image as P(x, y). Step 2: Compute Euclidean distance d among the midpoint and individual pixels d = ||P(x, y) − Ck || (2) where Euclidean distance is d and Ck as center. Step 3: All pixels should be assigned to the nearest center based on the distance d. Step 4: After all of the pixels have been assigned, Evaluate the unique position of the center using the relation indicated as, Ck = k1 y∈Ck x∈Ck P(x, y) (3) Step 5: Reiteration the procedure till the fault value is met. Step 6: Restructure the pixels in the regions to fit the image.
7 Assessment Criteria for Segmentation Models The effectiveness of the image segmentation system is assessed using standard and well-known criteria that allow the system to be compared to existing methodologies. This work uses several evaluation measures to assess the performance of Novel Histogram-based K-Means Clustering approach. Sorensen–Dice similarity Consider the segmented regions for which quality with the ground truth is to be measured [24]. Dice similarity coefficient =
2TP 2TP + FP + FN
(4)
Intersection over Union (IoU) The overlap among the prediction and the ground truth is measured by intersection over union (IoU) [24]. Intersection over union =
TP TP + FP + FN
(5)
Accuracy Accuracy is well-defined as the percentage of precise predictions among the whole sum of predictions, as shown in the equation below [25]:
770
S. Nivetha and H. Hannah Inbarani
Fig. 3 Lung and lesion segmentation using NHKMC
Accuracy =
TP + TN TP + FP + TN + FN
(6)
Sensitivity The recall given in the following equation is the ratio of accurately identified positive cases [26]: Sensitivity =
TP TP + FN
(7)
Automated Lesion Image Segmentation Based on Novel …
771
Specificity The ratio of successfully predicted negative class samples to all negative class samples is known as specificity [27]. Specificity =
TN TN + FP
(8)
*True Positive, True Negative, False Positive, and False Negative stands for TP, TN, FP, and FN. Structural Similarity Index Method The structural similarity index is a process for determining exactly two similar images [28]. (2μA μB + C1 )(2σ + C2 ) SSIM(A, B) = 2 μA + μ2B + C1 σA2 + σB2 + C2
(9)
where μA and μB stand for the average values of input and deformed images, σA and σB stands for the root mean square deviation of input and deformed images, also σAB is the skewness of both images, and C1 and C2 are coefficients.
8 Experimental Results and Discussion The proposed NHKMC segmentation models were implemented on a Windows system. The Novel Histogram-based K-Means Clustering (NHKMC) was implemented using the Anaconda software. This paper used peak values in the histogram analysis to estimate the ideal k value. Four regions or five regions were chosen for regions K. In Fig. 2, the input image is shown in the initial pier, the next pier specifies the histogram of the image, the third pier directs segmented image, and the fourth pier and fifth pier specify the lung and lesion masks. In Fig. 3, fourth and fifth piers specify the lung and lesion regions of the image for the proposed NHKMC. Figure 4 shows the output outcomes of the K-Means clustering-based segmentation. The initial pier depicts the input image, subsequent pier indicates the segmented image, and third and fourth piers specify the lung mask and lesion masks. Figures 5 and 6 show the output results of the multi-Otsu thresholding technique. The first column depicts the original image, the second column indicates the segmented image, and third and fourth piers specify the lung mask and lesion mask of the image. In Fig. 6, fourth pier specifies the lung region and lesion region of the image.
772
S. Nivetha and H. Hannah Inbarani
Fig. 4 Segmentation results of K-Means clustering algorithm
9 Evaluation Results of Metrics Table 2 depicts the values of validation measures for the various segmentation algorithms on COVID-19 images.
Automated Lesion Image Segmentation Based on Novel …
773
Fig. 5 Lung and lesion segmentation using multi-Otsu thresholding
10 Conclusion Machine learning methods are used to diagnose COVID-19 using chest CT images effectively. The initial step in assessing sternness and predicting diagnosis in COVID patients is to employ segmentation models, which can aid with resource allocation and patient care. High risk for the individuals is the segmentation of lesion
774
S. Nivetha and H. Hannah Inbarani
Fig. 6 Segmentation results of multi-Otsu thresholding Table 2 Analysis of validation measures for COVID images Method
DSC
IOU
SENSI
SPECI
SSIM
ACC
Proposed NHKMC
86.00
82.00
86.07
86.18
85.09
86.04
K-Means clustering
80.07
78.77
80.00
88.00
80.09
80.74
Multi-Otsu thresholding
79.05
79.48
79.00
79.00
79.12
78.45
Automated Lesion Image Segmentation Based on Novel …
775
regions in CT images to distinguish from the lung component. This paper examined the performance of Novel Histogram-Based K-Means Clustering (NHKMC) to detect lesion areas in medical images of COVID-19 patient’s lungs. The findings revealed the ability of the Novel Histogram-Based K-Means Clustering (NHKMC) to find the difference between infected and lung areas in the images. The proposed NHKMC segmentation method yielded a Jaccard distance average of 0.03 and a Dice coefficient of 0.01, which is a robust performance. Acknowledgements The foremost writer is extremely grateful for the monetary support provided by University Research Fellowship Program, Periyar University, Salem, for carryout this research in the Department of Computer Science, Periyar University, Salem, Tamil Nadu, India. The next writer gratefully recognizes the UGC-Special Assistance Program (SAP) for funding at the level of DRS-II in the Department of Computer Science, Periyar University, Salem, Tamil Nadu, India.
References 1. Ahmadi M, Sharifi A, Dorosti S, Ghoushchi SJ, Ghanbari N (2020) Investigation of effective climatology parameters on COVID-19 outbreak in Iran. Sci Total Environ 729:138705. Elsevier 2. Zhang J, Chu Y, Zhao N (2020) Supervised framework for COVID-19 classification and lesion localization from chest CT. Ethiop J Health Dev 34(4):235–342 3. Bhandary A, Prabhu GA, Rajinikanth V, Thanaraj KP, Satapathy SC, Robbins DE, Shasky C, Zhang Y-D, Tavares JMRS, Raja NSM (2020) Deep-learning framework to detect lung abnormality—a study with chest X-Ray and lung CT scan images. Pattern Recogn Lett 129:271–278. Elsevier 4. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin P-M, Larochelle H (2017) Brain tumor segmentation with deep neural networks. Med Image Anal 35:18–31. Elsevier 5. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L (2020) Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology 296(2):E32–E40 6. Dhruv B, Mittal N, Modi M (2022) Hybrid particle swarm optimized and fuzzy C means clustering based segmentation technique for investigation of COVID-19 infected chest CT. Comput Methods Biomech Biomed Eng: Imaging Vis 1–8. Taylor & Francis 7. Rathod SR, Khanuja HK (2022) COVID-19 segmentation and classification from CT scan images. J Sci Res 66(2):40–45. Special Issue-The Banaras Hindu University 8. Abd Elaziz M, Al-Qaness MAA, Abo Zaid EO, Lu S, Ali Ibrahim R, Ewees AA (2021) Automatic clustering method to segment COVID-19 CT images. PLoS One 16(1):e0244416 9. Hussein K, Hussein A, Chehab A (2021) AI-based image processing for COVID-19 detection in chest CT scan images. AI-based CT-scan analysis for COVID-19 detection. Frontiers 2 10. Chakraborty S, Mali K (2021) SuFMoFPA: a superpixel and meta-heuristic based fuzzy image segmentation approach to explicate COVID-19 radiological images. Expert Syst Appl 167:114142 11. Kumar SN, Ahilan A, Fred AL, Kumar HA (2021) ROI extraction in CT lung images of COVID19 using fast fuzzy C means clustering. In: Biomedical engineering tools for management for patients with COVID-19. Academic Press, pp 103–119 12. Farki A, Salekshahrezaee Z, Tofigh AM, Ghanavati R, Arandian B, Chapnevis A (2021) Covid19 diagnosis using capsule network and fuzzy-means and mayfly optimization algorithm. BioMed Res Int. Hindawi
776
S. Nivetha and H. Hannah Inbarani
13. Akbari Y, Hassen H, Al-Maadeed S, Zughaier SM (2021) COVID-19 lesion segmentation using lung CT scan images: comparative study based on active contour models. Appl Sci 11 (MDPI.178039) 14. Medeiros AG, Guimarães MT, Peixoto SA, Santos LDO, da Silva Barros AC, Rebouças EDS, de Albuquerque VHC, Rebouças Filho PP (2019) A new fast morphological geodesic active contour method for lung CT image segmentation. Measurement 148:106687 15. Paulraj T, Chelliah KSV, Chinnasamy S (2019) Lung computed axial tomography image segmentation using possibilistic fuzzy C-means approach for computer aided diagnosis system. Int J Imaging Syst Technol 29(3):374–381. Wiley 16. Shariaty F, Hosseinlou S, Rud VY (2019) Automatic lung segmentation method in computed tomography scans. J Phys: Conf Ser 1236:012028. IOP Publishing 17. Dai S, Lu K, Dong J, Zhang Y, Chen Y (2015) A novel approach of lung segmentation on chest CT images using graph cuts. Neurocomputing 168:799–807. ACM Digital Library 18. https://github.com/UCSD-AI4H/COVID-CT 19. Moftah HM, Azar AT, Al-Shammari ET, Ghali NI, Hassanien AE, Shoman M (2014) Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput Appl 24(7):1917–1928. Springer 20. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc. 21. Thangarasu M, Hannah Inbarani H (2015) Analysis of K-means with multi view point similarity and cosine similarity measures for clustering the document. Int J Appl Eng Res 10(9):6672– 6675 22. Pun T (1980) A new method for grey-level picture thresholding using the entropy of the histogram. Signal Process 2(3):223–237 23. Liao P-S, Chen T-S, Chung P-C (2001) A fast algorithm for multilevel thresholding. J Inf Sci Eng 17:713–727 24. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, GarciaRodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. Elsevier 25. Nivetha S, Hannah Inbarani H (2022) Neighborhood rough neural network approach for COVID-19 image classification. Neural Process Lett 1–23. Springer 26. Nivetha S, Hannah Inbarani H (2022) Classification of COVID-19 CT scan images using novel tolerance rough set approach. In: Machine learning for critical internet of medical things. Springer, Cham, pp 55–80 27. Iglesias JE, Liu C-Y, Thompson PM, Tu Z (2011) Robust brain extraction across datasets and comparison with publicly available methods. IEEE Trans Med Imaging 30(9):1617–1634 28. Hore A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: 2010 20th international conference on pattern recognition. IEEE, pp 2366–2369
Real-Time Operated Medical Assistive Robot Ann Mariya Lazar , Binet Rose Devassy , and Gnana King
Abstract In the context of increasing the workload of nurses in the hospitals and also to reduce the human intervention, a real-time operated medical assistive robot is proposed here. It is a line following robot that can do the medicine food delivery to the patients, inform emergency alerts from the patients to the nurse and disinfecting the ward. The nurse can control it, and it will take the role of a nurse in the patient’s ward. The design is such a way that it can function only in the wards arranged with the line following the path. Line following is feasible with the IR sensors and the DC motors with L293D driver. Raspberry Pi monitors the state of the robot and also retrieves the medicine information. Time settings of food, medicine delivery and disinfection of the ward are done by the nurse. A carrier with LED-connected slots will carry the food and medicines of each patient. When the robot reaches the destination, the corresponding LED will be ON which indicates that particular patient’s slot. During emergencies, by pressing the emergency switch in the robot, the patients can notify it to the nurse through the Blynk mobile application. A servo motor will be there for disinfecting the ward. Since, in the present scenario, the value of social distancing is more, this robot will create a great impact on society. Experimental results show that its features not only help the nurse, but are also useful for the patients and their bystanders as well. Keywords Line following method · Medical assistive robot · Nurse robot · Robotic motion · Time settings
1 Introduction Nowadays, in the hospitals the workload of nurses is increasing. Especially, the pandemic situation is changing their lifestyle and they are very much struggling with their daily routine. This will affect the overall activities in the hospital and also to the non-COVID patients. In recent years, modern society shows an increasing interest A. M. Lazar (B) · B. R. Devassy · G. King Department of Electronics and Communication Engineering, Sahrdaya College of Engineering and Technology, Kodakara, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_56
777
778
A. M. Lazar et al.
toward the nurse robot technologies [1]. Furthermore, this demand is expected to increase exponentially in the coming years, because so many patients suffer a lot by depending on others for their daily activities. Therefore, a new ‘medical assistive robot’ is evolving. According to a patient, a nurse’s presence itself is a good treatment for their recovery. Even though, Nurses will not be replaced by robots, here proposing a line following robot which can move along a particular path [2]. The design is such that the robot can function only in the patient’s ward with that kind of arrangement. It is focusing mainly on three features such as the robot can do the medicine food delivery to the patients, inform emergency alerts from the patients to the nurse and disinfecting the ward. The robot has one Raspberry Pi single board computer which monitors the state of the robot. It also retrieves the medicine information. The nurse can set the timings of food and medicine information. The patients get assistance from the robot in such a way that the robot has enough space to carry the medicines and foods. So, medicine and food delivery can also be possible [3]. The line following is feasible with the infrared sensors. There will be DC motors with an L293D driver for motion. The emergency switch will enable the bystanders or even the patients to inform the nurse during emergencies [4]. A servo motor will be there for disinfecting the ward [5]. It will reduce the human intervention while disinfecting the rooms, especially the wards accommodated by the COVID patients or any other infectious diseases. The emerging technologies are always looking to reduce human power. That is why I am thinking about a robot instead of a nurse. In the present scenario, the value of social distancing is more, so introducing a medical assistive robot will create a great impact in the society.
2 Related Works According to the evaluation done by Patel et al. [6], using the Arduino Uno, a MedBuddy robot is created. The robot is equipped with a spare smartphone and its camera used for the live feed to the application. It is developed using MIT App Inventor. As Bono et al. [7], the implant is a combination of absorbable and non-absorbable materials with the latter connected by a tether to the skin. While removing, the device is disassembled in the original place. Zhao et al. [8] presented a paper which introduces a new technology novel chlorine dioxide (ClO2 ) sterilization used to reduce bacteria and viruses present in the air and on surfaces. A line follower robot is made to transfer the patient to the stone room from the ambulance by Hadi [9]. It is done by the Arduino program, and the motion is through the lines on the floor of the hospital. Guo et al. [10] researched a new motorized robotic bed mover with omnidirectional mobility. This device consists of an omnidirectional mobility unit, a force
Real-Time Operated Medical Assistive Robot
779
sensing-based human–machine interface, and control hardware with batteries and electronics. Mišeikis et al. [11] proposed a multifunctional mobile robot to be implemented for personal care and human–robot interaction. It is with ROS-enabled setup. An operator-free intelligent robot system was implemented by Turnip et al. [12]. It is for the medical services to reduce the usage of personal protective healthcare equipment. In the blood removal, to fulfill the surgical task, Su et al. [13] designed a robotic system consisting of a pair of dual cameras. According to the impedance control method, Chen et al. [14] proposed a remote human–robot collaborative strategy. Through this, the experience of medical practitioners can integrate to the automatic sampling process. Sayed et al. [15] proposed an approach for a game setting in hospital field. It is for the field where Hexapod is used to scan, and a map of the field of the hospital environment can be drawn.
3 Design and Build Specifications 3.1 Components Required The components with their specifications are summarized in Table 1. Raspberry Pi 3 Model B + is used as the microcontroller here [16]. Two IR sensors of 3.3 Vin are used in the right and left sides of the robot for line following movement. There are two DC motors with 10 RPM and 12 V connected to both the wheels of the robot for motion. Since DC motors need high current, which cannot be provided by the Raspberry Pi, the motor driver L293D is used, and 0.12 V battery is connected to the driver. A 50 Hz servo motor, metal gear (MG996R) is used for the Table 1 Components of real-time operated medical assistive robot Sr. No.
Component
Specifications
1
Raspberry Pi 3
Model B +
2
IR sensor
3.3 Vin
3
DC motors
10 rpm, 12 V
4
Motor driver
L293D
5
Battery
12 V
6
Servo motor
MG996R
7
LED
Red, 2 V
8
Resistors
220
9
Toggle button
Emergency switch
10
Blynk mobile app
To access notification
780
A. M. Lazar et al.
disinfection of the ward [17]. Four red color LEDs are connected to Raspberry Pi via 220- resistor. A toggle button (push button) acts as the emergency switch. Blynk is one of the mobile applications that can be installed from the Play Store which is used for the notification access.
3.2 Block Diagram The block diagram is shown in Fig. 1. Raspberry Pi 3 Model B + is used as the main microcontroller here. Two IR sensors connected on either side of the robot make the line following movement. There will be a servo motor, for disinfecting the ward. The robotic wheels motion is by the two DC motors, and a motor driver L293D is used for this. The motors get the high current by a 12 V battery power supply which is connected to the driver. Hence, the driver acts as an intermediate between the Raspberry Pi’s 3.3 V and motor’s 12 V. At the time when the Pi becomes powered, gets 3.3 V, and the IR sensor detects black, the motor takes 12 V from the battery and starts to rotate. Four red color LEDs connected by resistors are used to represent each patient’s food medicine carrier. An emergency switch is used to alert the nurse during emergencies. Message reaches the nurse through the Blynk mobile application.
3.3 Circuit Diagram Figure 2 shows the circuit diagram of medical assistive robot. GPIO pin numbering of Raspberry Pi 3 Model B + is used here. Two IR sensors of 3.3 Vin are connected to 13th and 15th pins of the pi. Since the DC motor needs high current, a 12 V battery is connected via L293D driver. Battery is connected to the 8th pin of the driver; 2nd, 7th, 10th and 15th input pins of the driver are taken from the 5th, 3rd, 11th and 7th pins of the Raspberry Pi. Output pins 3rd and 6th of the driver are connected with a DC motor, and the 11th and 14th pins are connected with other DC motors. When the Pi gets powered and IR sensors detect black, the motor will rotate by taking 12 V from the battery. The positive and negative terminals of the battery are connected to Vin and ground, respectively, in the driver. The signal pin of the servo motor is connected to the 40th pin of Pi. 5 V power supply is taken from the 16th pin of the driver, and the ground wire is grounded. Four LEDs are connected to the 31, 33, 35 and 37 pins of the Pi via four 220 resistors. Push button switch is connected to the 29th of the Raspberry Pi. Emergency notification messages are received through the Blynk mobile app.
Real-Time Operated Medical Assistive Robot
781
Fig. 1 Block diagram of real-time operated medical assistive robot
3.4 Methodology Methodology of medical assistive robot is shown in Fig. 3. The first step is assembling the different parts to make a base of robotic structure, then interfacing the IR sensors and DC motor with the Raspberry Pi and then pasting the black insulation on the white floor to create a line following the path in the hospital ward. The robot can focus mainly on three features such as medicine food delivery to the patients, informing emergency alerts from the patients to the nurse and disinfecting the ward.
782
A. M. Lazar et al.
Fig. 2 Circuit diagram of real-time operated medical assistive robot
4 Proposed System Robot is moving through the line following a path which is created by pasting the black insulation on the white floor. The implementation is such that the robot can function only in the wards with the line following the path. Medicine food delivery to the patients, informing emergency alerts from the patients to the nurse and disinfecting the ward are the features incorporated in it. There is a tray with LED-connected slots to carry food and medicine for each patient. A sanitizer bottle is used for disinfecting the ward. Nurses can set the food, medicine and disinfecting timings. So, by the time, the robot moves through the line following the path and stops near the particular bed. While it stops, the corresponding LED will be ON and patients can easily understand
Real-Time Operated Medical Assistive Robot
783
Fig. 3 Methodology of real-time operated medical assistive robot
their food and medicine. When the cleaning time reaches, the robot moves through the same line following the path and disinfects the ward by the disinfectant spray. The emergency switch can be pressed by the patients to alert the nurse at any time of emergency. Figure 4 shows the different views of the robot. Line following path and the robotic motion are shown in the front view in Fig. 4a. In the top view in Fig. 4b, the distribution of food and medicine is shown. Carrier tray with LED indicates particular patient’s slot. Figure 4c indicates the back view which represents the disinfection process of the ward. The emergency switch to alert the nurse is done through the push button switch which is shown in Fig. 4d.
784
A. M. Lazar et al.
Fig. 4 Proposed system a front view b top view c back view d side view
5 Experimental Results and Analysis A line following medical assistive robot is implemented for helping the nurse in the hospital. It has the features of medicine, food delivery, emergency alert and disinfection.
5.1 Line Following Method The robotic motion is based on the line following method, that is, the wards designed with this special layout as in Fig. 5. Each T-junction in the black insulation represents each patient’s bed, and the robot moves only through this insulated path. When both IR sensors do not detect black, the two DC motors will rotate in the same direction simultaneously. So, both wheels of the robot move forward. When the right sensor detects black, the right motor and
Real-Time Operated Medical Assistive Robot
785
Fig. 5 Line following path
the right wheel stop and only the left motor will rotate, causing the left wheel to turn right. Similarly, when the black is detected by the left sensor, the left wheel stops and the right motor rotates, which leads the right wheel to turn left. If both IR sensors detect the black, the two motors and two wheels stop rotation, which makes the robot stop.
5.2 Medicine Food Delivery to the Patients Nurse can set the food and medicine timings in the robot. He/she can fill each patient’s tray slots with corresponding tablets. There will be a tray-like carrier divided into different slots for carrying the food and medicine for each patient. According to the time settings, the robot moves to the corresponding bed and delivers the food or medicine. Raspberry Pi will retrieve the medicine information as in Fig. 6. When the robot reaches to the allotted bed, the corresponding slot’s LED will be ON which helps the patients to identify their medicines easily.
Fig. 6 Food and medicine distribution to the a first patient and b second patient
786
A. M. Lazar et al.
Fig. 7 a Pressing push button switch in emergencies, b notification message through Blynk mobile app
5.3 Inform Emergency Alert from the Patient to the Nurse There will be an emergency switch which helps the bystanders or even the patients to inform the nurse during emergencies. By enabling the emergency switch by pressing it, as in Fig. 7, the nurse in the nursing station will get a notification via the Blynk mobile application. The nurse will be alert by the notification quoting that ‘Need Help’. It helps the patients to get assistance at the proper time.
5.4 Disinfecting the Ward Nurse can set the cleaning times also in the robot. During the cleaning time, robot will disinfect the ward as in Fig. 8. A servo motor (50 Hz), i.e., 1/50 = 0.02 s = 20 ms, will be used for this. A sanitizer bottle is kept near this motor in such a way that, in the particular angle of rotation, the motor touches the bottle and it will disinfect the ward. Duty cycles 8 and 4 are given. So, the motor will rotate in the 8’s and 4’s angel. During the forward motion of the robot, the cleaning will be done.
6 Conclusion A ‘real-time operated medical assistive robot’ is implemented which can take the role of a nurse. Even though simply a robot cannot replace a nurse, this robot helps the patients to fulfill their needs. Timely distribution of food and medicines to the patients, alerting the nurse properly and frequent cleaning of wards are the possible features of this robot. Line following method is used for the movement. Since modern society is always developing new technologies to reduce human power, the demand
Real-Time Operated Medical Assistive Robot
787
Fig. 8 Disinfecting the ward by servo motor
for nurse robots will also increase. During this pandemic situation, implementing assistance in the medical service has great influence in the society.
References 1. Ahamed A, Ahmed R, Patwary MIH, Hossain S, Ul Alam S, al Banna H (2020) Design and implementation of a nursing robot for old or paralyzed person. In: 2020 IEEE region 10 symposium (TENSYMP), pp 594–597. https://doi.org/10.1109/TENSYMP50017.2020.923 0956 2. Zaman HU, Bhuiyan MMH, Ahmed M, Aziz SMT (2016) A novel design of line following robot with multifarious function ability. In: 2016 International conference on microelectronics, computing and communications (MicroCom), pp 1–5. https://doi.org/10.1109/Mic roCom.2016.7522507 3. Maan R, Madiwale A, Bishnoi M (2021) Design and analysis of ‘Xenia: the medi-assist robot’ for food delivery and sanitization in hospitals. In: 2021 2nd Global conference for advancement in technology (GCAT), pp 1–7. https://doi.org/10.1109/GCAT52182.2021.9587776 4. Tresa Sangeetha SV et al (2021) IoT based smart sensing and alarming system with autonomous guiding robots for efficient fire emergency evacuation. In: 2021 2nd International conference for emerging technology (INCET), pp 1–6. https://doi.org/10.1109/INCET51464.2021.9456142 5. Vimala P, Gokulakrishnan R (2021) Implementation of IOT based automatic disinfectant robot. In: 2021 International conference on system, computation, automation and networking (ICSCAN), pp 1–5. https://doi.org/10.1109/ICSCAN53069.2021.9526420 6. Patel A, Sharma P, Randhawa P (2021) MedBuddy: the medicine delivery robot. In: 2021 9th International conference on reliability, infocom technologies and optimization (Trends and Future Directions) (ICRITO), pp 1–4. https://doi.org/10.1109/ICRITO51393.2021.9596130 7. Del Bono V et al (2022) Non-surgical removal of partially absorbable bionic implants. IEEE Trans Med Robot Bionics 4(2):530–537. https://doi.org/10.1109/TMRB.2022.3155291 8. Zhao Y-L et al (2021) A smart sterilization robot system with chlorine dioxide for spray disinfection. IEEE Sens J 21(19):22047–22057. https://doi.org/10.1109/JSEN.2021.3101593
788
A. M. Lazar et al.
9. Hadi HA (2020) Line follower robot Arduino (using robot to control Patient bed who was infected with Covid-19 Virus). In: 2020 4th International symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–3. https://doi.org/10.1109/ISMSIT50672. 2020.9254906 10. Guo Z, Xiao X, Yu H (2018) Design and evaluation of a motorized robotic bed mover with omnidirectional mobility for patient transportation. IEEE J Biomed Health Inform 22(6):1775– 1785. https://doi.org/10.1109/JBHI.2018.2849344 11. Mišeikis J et al (2020) Lio-A personal robot assistant for human-robot interaction and care applications. IEEE Robot Autom Lett 5(4):5339–5346. https://doi.org/10.1109/LRA.2020.300 7462 12. Turnip A, Tampubolon GM, Ramadhan SF, Nugraha AV, Trisanto A, Novita D (2021) Development of medical robot Covid-19 based 2D mapping LIDAR and IMU sensors. In: 2021 IEEE international conference on health, instrumentation & measurement, and natural sciences (InHeNce), pp 1–4. https://doi.org/10.1109/InHeNce52833.2021.9537209 13. Su B et al (2021) Autonomous robot for removing superficial traumatic blood. IEEE J Transl Eng Health Med 9(2600109):1–9. https://doi.org/10.1109/JTEHM.2021.3056618 14. Chen Y-L, Song F-J, Gong Y-J (2020) Remote human-robot collaborative impedance control strategy of pharyngeal swab sampling robot. In: 2020 5th International conference on automation, control and robotics engineering (CACRE), pp 341–345. https://doi.org/10.1109/CAC RE50138.2020.9230152 15. Sayed AS, Ammar HH, Shalaby R (2020) Centralized multi-agent mobile robots SLAM and navigation for COVID-19 field hospitals. In: 2020 2nd Novel intelligent and leading emerging sciences conference (NILES), pp 444–449. https://doi.org/10.1109/NILES50944. 2020.9257919 16. Harshitha R, Hussain MHS (2018) Surveillance robot using raspberry pi and IoT. In: 2018 International conference on design innovations for 3Cs compute communicate control (ICDI3C), pp 46–51. https://doi.org/10.1109/ICDI3C.2018.00018 17. Lin J, Gu Z, Amir AM, Chen X, Ashim K, Shi K (2021) A fast humanoid robot arm for boxing based on servo motors. In: 2021 International conference on high performance big data and intelligent systems (HPBD&IS), pp 252–255. https://doi.org/10.1109/HPBDIS53214. 2021.9658471
Enhancing Graph Convolutional Networks with Variational Quantum Circuits for Drug Activity Prediction Pranshav Gajjar, Zhenyu Zuo, Yanghepu Li, and Liang Zhao
Abstract This study pertains to understanding the applicability of variational circuits and quantum enhancements for graph convolutional networks to predict drug activity accurately. The novel study empirically analyzes four graph convolutional approaches, such as the standard graph convolutions, edge-conditioned convolutions, Chebyshev convolutions, and the XENet methodology along with their quantumenhanced counterparts. The study also offers an extensive temporal analysis where each architectural strategy is leveraged in identical environments for a computationalutility trade-off. The proposed utility is thoroughly validated by leveraging an undersampled publicly available HIV dataset from MoleculeNet in an unbiased manner. Each architecture is tested on two primary data train–test splits along with experiments on stratified k-fold cross-validation and the associated median and mean percentage accuracy. Through extensive experimentation, the empirical analysis is able to justify the use of variational circuits, as the modality of quantum-enhanced architectures depicted a superlative performance in comparison with the classical counterparts, with the highest increase in accuracy being 8.9% for the XENet architectures. Keywords Graph convolutional network · Drug activity · Quantum circuits · Quantum neural networks · Machine learning
1 Introduction Accurate and efficient predictions of drug activities and the associated paradigms are considered an immensely important area of research, and it does depict substantial societal value [1]. To complete this and other related and similar problem statements, numerous machine learning-based methods have been developed. However, a wide P. Gajjar Institute of Technology, Nirma University, Gujarat, India P. Gajjar (B) · Z. Zuo · Y. Li · L. Zhao Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_57
789
790
P. Gajjar et al.
array of predicaments is observed. The biggest predicament which is indicated in the underlying functionalities of these approaches is the string-based representation of these molecular structures, which cannot be considered a natural way for these computational intelligence-based tasks [1]. For predictive methodologies concerning medical informatics [2], object detection, and even cyber security [3, 4], numerous deep learning techniques have been developed, and their associated utility has proceeded to become a normal use case [5]. Through the same motivations, extensive development can be directed for a remedial effect to the currently observed predicaments. For perceiving structured data like images, the use of convolutional neural networks (CNNs) [6, 7] can be leveraged for their automated abilities to extract classifiable image features. However, due to the irregular shapes and sizes and the absence of spatial order, specialized algorithms are required [1]. Throughout the recent literature, various efforts have been made to generalize the convolutional operator over non-euclidean data, and the obtained solution is the graph convolutional network (GCN) [1], which can also be considered an extended version of the standard graph neural network (GNN) [8]. The related work for similar domains can also be understood by the articles [9–11]. The paper [11] leveraged the graph neural network paradigm for de novo drug design and was able to provide a superlative result for the domain. The article [9] also recommended the use of GNNs by introducing a sequential molecular graph generator with a comparative study on the QM9 and Zinc dataset. The paper [10] used graph convolutional networks for accurately predicting pharmacological activities, by using the ChEMBL dataset, further justifying the utility of GCNs. To improve the generalizability and predictive performance of classical networks, the paradigm of quantum computing has shown immensely favorable results and has led to the fairly recent domain of quantum machine learning and quantum neural networks. The relevance and the usability of aforementioned appendages can be understood by metrics like the effective dimension [12], which implicates the space occupied by the model in the model space, that correspond to all thee possible functions related to a specific model class. The study [12] depicted a substantially higher effective dimension for models of quantum neural networks, further implicating a higher generalization power. When compared with classical machine learning, sparse research exists for quantum neural networks especially quantum-enhanced graph convolutional networks and drug efficacy, with only some papers which have shown plausible utilities of quantum enhancements in the general domain of graph architectures [13]. This novel study aims to leverage the benefits and understand the applicability of quantum enhancements or quantum machine learning in conjunction with the already recommended graph convolutional networks for the paradigm of drug efficacy and activity. To have a thorough measure of applicability, four different convolutional paradigms and their related architectures are included, and each hyperparameter is carefully assessed. Each architecture is primarily validated on two data distributions and a k-fold cross-validation experiment [14], along with an analysis pertaining to their temporal characteristics. The study also assess the total iterations required for convergence to have a richer analysis. These measures permit us to accurately
Enhancing Graph Convolutional Networks …
791
understand the performance for the tested approaches, hence supporting the proposed methodologies. This section is followed by the used methodology, the thorough empirical results, and the concluding statements along with the future directions of research.
2 Methodology The scope of this paper pertains to validating the applicability of quantum enhancements or parametrized circuits in graph convolutional networks in a hybrid-classical fashion for accurately predicting the drug or pharmacological activity [10]. This section contains a condensed description of the tested methodologies and the neural architectures, along with the dataset and the associated data distributions.
2.1 Architecture This subsection offers an overview of the tested architectural strategies and the layer poisonings in the totality of the networks. To obtain a complete empirical analysis, this novel study constitutes four different graph convolutional strategies and is validated against their quantum-enhanced counterparts. A graphical description of the backbone architecture is illustrated in Fig. 1. For non-quantum-enhanced networks, the QLayer is replaced with a standard ReLU [15] activated layer with 12 neurons. The term GraphCONV implies a graph convolutional layer, GlobalAvgPool [16] implicates a global average pooling layer, and dense resembles a fully connected layer. The layer preceding the QLayer contains exactly 12 neurons, as the Qlayer consists of 12 qubits, hence adding a requirement of at least 12 input features. The number of qubits hyperparameter is kept as 12
Global Average Pooling
Dense, 128, ReLU Dense, 64, ReLU Dense, 128, ReLU QNode QLayer, 12 Input Graph (X, A, E)
Graph Conv, 16 ReLU Graph Conv, 32 ReLU
Fig. 1 Backbone architecture of this study
Dense, 2, Softmax Output
792
P. Gajjar et al.
to adhere to the computational constraints and to have the maximum feasible value for experimentation. For each graph convolutional variant that is studied, the same architectural orientation is used, and the number of filters and other training conditions are kept identical to maintain an impartial study. The inputs for this system, that is, X, A, and E are the node features, adjacency matrices, and the edge features, respectively [16, 17].
2.2 Quantum Layer A universal approximation theorem applies to neural networks, and the same principle holds for quantum circuits, i.e., there is always a quantum circuit that can represent a goal function within an arbitrarily minuscule error [18]. The quantum layer may be understood by considering either a variational circuit or a parametrized quantum circuit [19]. Each quantum layer can be a series of single-qubit rotations followed by a predetermined order of entangling gates or, in the case of optical modes, a few passive and active Gaussian operations accompanied by single-mode non-Gaussian gates [20, 21]. A variational quantum circuit (VQC) with a depth of q can be clarified as a concatenation of several quantum layers. The totality of the functionalities of quantum enhancements is leveraged by using the Pennylane package [22]. This study also tests two qubit simulators, the base or the default variant, which is denoted as BaseQ, and the C++ based lightning qubit package with adjoint differentiation, which is denoted as LQ; and both the simulators are accessed through Pennylane [22]. The inherent functionality of a VQC starts with encoding the input data into the connected qubit state of an appropriate number of qubits [23]. This is followed by the transformation of the qubit state through gates which correspond to entangling and parametrized rotation. Each qubit state is represented as a normalized complex vector, where ||α 2 || and ||β 2 || are the probabilities of observing |0 and |1 [23]. The qubit state representation can be further understood by the following equations [23]. |ψ = α|0 + β|1, ||α 2 || + ||β 2 || = 1
(1)
The expected value of a hamiltonian operator, like Pauli gates, is leveraged for measuring the previously transformed qubit state [23], followed by converting these measurements to an appropriate data format. The parameters are changed and updated using standard machine learning optimization algorithms [24].
2.3 Graph Convolutional Encoders By assessing the recent literature, it can be strongly inferred that the use of graph convolutional layers is justified over standard graph neural networks. The core implementation of each graph convolution variant has been performed by using the Spektral
Enhancing Graph Convolutional Networks …
793
Table 1 Notations pertaining to the graph convolutional layers Symbol Description X xi ei→ j A b W D
Node attributes matrix The node attributes corresponding to the i-th node The edge attributes of the edge from node i to node j Adjacency matrix Trainable bias vector Trainable weight matrix Degree matrix
package [16], and every associated training and testing environment are maintained identically. The convolution operations can be defined as a linear mechanism that functions on diagonalizing the Fourier bases which are depicted as the eigenvectors of the Laplacian operator [1]. The notations as followed in this section and throughout the paper are mentioned in Table 1.
2.3.1
ChebConv
The initial generation of the graph convolutional networks was defined in the paper [1]. However, several limitations prevent its use for practical or utility-driven use cases. The first is the absence of a localized filter, followed by a high learning complexity which can be considered a result of matrix-vector multiplication, and the last is the dependence of the input size for the number of parameters [1]. These predicaments were initially assessed by the paper [25], resulting in the Chebyshev convolution layer [25], which is abbreviated as ChebConv for the totality of this paper. This layer computes the following, where T (0) , ..., T (K −1) are Chebeshev polynomials of L˜ as mentioned in [16, 25]. X =
K −1
T (k) W (k) + b(k) ,
(2)
k=0
T (0) = X
(3)
T (1) = L˜ X
(4)
T (k≥2) = 2 · L˜ T (k−1) − T (k−2)
(5)
L˜ =
2 · (I − D −1/2 AD −1/2 ) − I. λmax
(6)
794
P. Gajjar et al.
Here, the term λmax implicates the maximum real non-negative eigenvalues and can also be understood as the frequencies of the graph [25]. The primary contributions of the paper [25] can be summarized as follows, the development of a graph coarsening technique that groups similar vertices and the creation of a graph pooling operation that compensated spatial resolution for greater filter resolution [25].
2.3.2
GCNConv
The primary baseline convolutional layer which is abbreviated as GCNConv for the totality of this paper was defined in the research [26], which aimed to improve upon the predicaments of the ChebConv layer and introduced a localized first-order approximation of spectral graph convolutions. This layer computes the following equation [16]. (7) X = Dˆ −1/2 Aˆ Dˆ −1/2 X W + b Here, Aˆ = A + I depicts a self-loop appended normalized adjacency matrix, and ˆ D is the associated degree matrix [1]. The resultant model scaled linearly concerning the number of graph edges and was capable of learning appropriate representations and information for both node features and the local structure of the graph [26].
2.3.3
ECCConv
The edge-conditioned convolutional layer which is abbreviated as ECConv or ECCConv as seen in the totality of this paper is a convolution-like operation that was developed to be carried out on graph signals in the spatial domain [27]. This layer computes the following, where MLP implicates a multilayer preceptron functioning on the edge specific outputs, and the term Wroot indicates the use of the root node for computing the message passing instead of only the neighbors [16, 27]. xi = xi Wroot +
x j MLP(e j→i ) + b
(8)
j∈N (i)
This operation involves filter weights being conditioned on edge labels, which can be discrete or continuous and are dynamically created for each particular input sample. The constructed networks were able to function effectively on graphs that had an arbitrary variable structure for the entirety of a dataset [27]. The original work focused on categorizing point clouds and was able to outperform the existing volumetric techniques [27]. The underlying intelligent methodologies were also able to reach a competitive standard of performance on the graph classification benchmark NCI1, an important cheminformatics dataset, where each input graph represents a different chemical molecule for anti-cancer screenings, further justifying its inclusion in this study.
Enhancing Graph Convolutional Networks …
2.3.4
795
XENetConv
The XENet architecture was initially developed to improve the modeling of protein structures, and the primary motivation was to improve the emphasis on the edge information, as most of the graph learning methodologies which are being leveraged indicate minimal attention to the information stowed in the edges [28]. As the usage of edges reflected crucial geometric correlations regarding residue pair interactions, the original work was able to generate a superlative outcome for protein modeling [28]. The inclusion of XENet as an encoder which is abbreviated as XENetConv in this work is primarily motivated by the inference that the improved processing of edge properties can result in substantial gains when modeling chemical data. The tested architectures which pertain to the XENet family add a few hyperparameters like the number of edge and node channels and their associated activation functions [16]. To preserve the unbiased assertion, the number of channels is kept as 16, or 32, based on the layer, and the related activation functions are maintained as ReLU.
2.4 Dataset The dataset used in this study is obtained from the MoleculeNet database [29], which is a publicly available collection of 41127 drugs, containing experimentally measured abilities of the drug samples to inhibit the replication of HIV [29]. The percentage of active molecules in the complete dataset is only 3.5%, which depicts extreme skewness and can be perceived as an unfavorable distribution for training machine learning models [29]. As the scope of this study is concerned with gaging the applicability and adhering to the computational constraints, we randomly sampled 1443 molecules to create the primary data distribution, with 32 node features and 5 edge features. The new distribution contains 722 inactive molecules and 721 active molecules in the simplified molecular-input line-entry system (SMILES) format [30]. The data is converted into leverage-able graphs by the NetworkX package [31], and two different stratified train–test splits [32] were generated, operating at the 75 and 85%. The paper also analyzes the experiments on a stratified k-fold cross-validation setting, with k as 5 to solidify the results. The training and testing splits were kept identical for each neural architecture to have a consistent and impartial breakdown.
3 Results and Discussion This section contains thorough information about the performed experiments along with the apparatus and training conditions. For each tested neural architecture strategy, the modality of hyperparameters and callback conditions is kept identical. This paper utilized the early stopping criteria and the learning rate reduction on plateau
796
P. Gajjar et al.
conditions for gaging convergence, as available in [33] and to have a richer training experience. Each strategy is tested for its F1-score and percentage accuracy [5], which are calculated by the following [34]. Accuracy =
TP + TN TP + TN + FP + FN
(9)
2 ∗ TP 2 ∗ TP + FP + FN
(10)
F1 − Score =
Here, the terms TP, TN, FP, and FN indicate the true positives, true negatives, false positives, and false negatives, respectively. Experiments are also conducted to understand the temporal characteristics for both the training and testing phases for each possible architecture on a dummy data distribution. The maximum possible epochs for training a model are 35, which adheres to the computational constraints. By assessing how early a model converges, we can obtain thorough insights concerning its temporal performance.
3.1 Accuracy Analysis In Table 2, for the 75% split; GCNConv and QGCNConv implicate the standard and the quantum-enhanced variant respectfully, and it can be positively inferred, that the utility of quantum-enhanced networks is justified, as the best performing model was the hybrid-classical-quantum ECCN. For all the four graph convolutional layers, a non-trivial increase in the percentage accuracy is observed which further strengthens the proposed methodology. For the QECCConv, an increase of 6% accuracy was observed, when the standard variant was considered, and a 6.6% increase was
Table 2 Obtained results for the 75 and 85% train–test splits Model 75% split 85% split Accuracy F1-score Training Accuracy epochs GCNConv QGCNConv ECCConv QECCConv XENetConv QXENetConv ChebConv QChebConv
0.698 0.737 0.704 0.765 0.717 0.729 0.693 0.712
0.697 0.736 0.703 0.763 0.716 0.728 0.69 0.709
14 35 15 22 11 21 14 12
The maximum or minimum values are highlighted in bold
0.728 0.733 0.714 0.756 0.728 0.816 0.719 0.760
F1-score
Training epochs
0.728 0.730 0.714 0.756 0.727 0.816 0.716 0.760
22 23 14 20 20 27 14 18
Enhancing Graph Convolutional Networks …
797
Table 3 Results pertaining to the five-fold stratified cross-validation, with the obtained accuracy and the average training epochs Model Accuracy Training epochs Mean Median GCNConv QGCNConv ECCConv QECCConv XENetConv QXENetConv ChebConv QChebConv
0.702 0.731 0.748 0.778 0.719 0.760 0.652 0.669
0.740 0.720 0.758 0.765 0.737 0.774 0.664 0.670
21.2 29.6 25.4 25.8 21.8 31.2 22.2 25
The maximum or minimum values are highlighted in bold
observed for the standard GCNConv comparison. When the total number of epochs required for training these models is considered, the quantum variants depicted an inefficient behavior, and the best performing architecture is obtained as the standard XENetConv. From Table 2, a similar inference can be deduced for the 85% distribution, as the quantum-enhanced variants outperformed the standard or vanilla variants, with the best performing architecture being QXENetConv, with an increase of 8.9% from the standard counterpart and an 8.8% increase from the vanilla GCNConv. It can also be inferred from Table 2, that the architectures which emphasized the edge information depicted an overall better performance. The enhanced performance can be credited to their higher intrinsic generalization power as for the modality of performance metrics, an increment is observed. When the convergence criteria were assessed, ChebConv and ECCConv were able to converge in only 14 epochs, with the most accurate model being the ChebConv. For the experiments pertaining to the five-fold cross-validation, a similar result is obtained, where all quantum-enhanced architectures performed in a superlative fashion for the mean accuracy metric. When the median accuracy was considered, the standard GCNConv variant depicted a superior performance. However, for all the other encoder variants, the quantum-enhanced architectures did depict an enhanced accuracy but also required a higher training epoch. The same architectural backbone is leveraged for the five-fold cross-validation experiment with an identical early stopping and reduces learning rate criteria as mentioned in the previous experiments. The results associated with the aforementioned settings are mentioned in Table 3.
3.2 Temporal Analysis For the computational task of accurately assessing the temporal characteristics of the proposed methodologies, a dummy dataset is generated, and the best performance
798
P. Gajjar et al.
Table 4 Related temporal analysis, both the testing and training times are in the unit: ms Model Training time Testing time GCNConv GCNConv+ BaseQ GCNConv+ LQ ECCConv ECCConv + BaseQ ECCConv+ LQ XENetConv XENetConv + BaseQ XENetConv + LQ ChebConv ChebConv + BaseQ ChebConv + LQ
61.7 1170 233 293 1430 863 218 1160 488 59.8 1120 232
60.3 591 111 154 847 484 106 571 234 57.1 555 107
The maximum or minimum values are highlighted in bold
time is obtained from 5 loops of 15 iterations each in an identical system configuration. Through Table 4, the use of the LQ standard can be recommended, as the substantial time boost can help us gage the applicability of quantum enhancements more quickly and hence, is utilized throughout the main experiments of Tables 2, and 3.
4 Conclusion This novel study aimed to accurately understand the applicability of quantum enhancements in graph convolutional networks for estimating pharmacological or drug activities. Here, four different variants of graph convolutional layers are leveraged, and the experiments were performed on a balanced undersampled version of the publicly available HIV dataset. Each architectural strategy was assessed on multiple performance criteria for a thorough unbalanced study. By assessing the obtained results, an 8.8% accuracy boost was obtained for the quantum XENet from the standard GCN for the 85% split, and a 6.6% accuracy increase was observed for the quantum ECC network when compared against the baseline GCN for the 75% split; and the highest accuracy enhancement when compared with the classical variant was for XENet with an 8.9% increase in the 85% split. The study also leveraged a fivefold stratified cross-validation strategy, further supporting the proposed study, as the modality of the quantum-enhanced networks indicated an enhanced performance. The quantum-enhanced networks did depict computation inefficiency; however, the utility can still be recommended for a performance trade-off due to the potentially higher accuracies. For future work, the authors would like to improve the temporal characteristics and work on fully quantum variants on a more generalized domain.
Enhancing Graph Convolutional Networks …
799
References 1. Sun M, Zhao S, Gilvary C, Elemento O, Zhou J, Wang F (2019) Graph convolutional networks for computational drug development and discovery 21(3):919–935. https://doi.org/10.1093/ bib/bbz042 2. Gajjar P, Mehta N, Shah P (2022) Quadruplet loss and squeezenets for Covid-19 detection from chest-x rays. Comput Sci J Moldova 30(2) 3. Chauhan M, Joon A, Agrawal A, Kaushal S, Kumari R (2021) Intrusion detection system for securing computer networks using machine learning: a literature review. In: Sharma H, Saraswat M, Yadav A, Kim JH, Bansal JC (eds) Congress on intelligent systems. Springer Singapore, Singapore, pp 177–189 4. Thaw AM, Zhukova N, Aung TT, Chernokulsky V (2021) Data classification model for fogenabled mobile iot systems. In: Sharma H, Saraswat M, Yadav A, Kim JH, Bansal JC (eds) Congress on intelligent systems. Springer Singapore, Singapore, pp 125–138 5. Mehta N, Shah P, Gajjar P (2021) Oil spill detection over ocean surface using deep learning: a comparative study 16(3–4):213–220. https://doi.org/10.1007/s40868-021-00109-4 6. Gajjar P, Shah P, Sanghvi H (2022) E-mixup and Siamese networks for musical key estimation. In: International conference on ubiquitous computing and intelligent information systems. Springer, pp 343–350 7. Karthi S, Kalaiyarasi M, Latha P, Parthiban M, Anbumani P (2021) Emerging applications of deep learning. In: Integrating deep learning algorithms to overcome challenges in big data analytics. CRC Press, pp 57–72. https://doi.org/10.1201/9781003038450-4 8. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80 9. Bongini P, Bianchini M, Scarselli F (2021) Molecular generative graph neural networks for drug discovery 450:242–252. https://doi.org/10.1016/j.neucom.2021.04.039 10. Sakai M, Nagayasu K, Shibui N, Andoh C, Takayama K, Shirakawa H, Kaneko S (2021) Prediction of pharmacological activities from chemical structures with graph convolutional neural networks 11(1). https://doi.org/10.1038/s41598-020-80113-7 11. Xiong J, Xiong Z, Chen K, Jiang H, Zheng M (2021) Graph neural networks for automated de novo drug design 26(6):1382–1393. https://doi.org/10.1016/j.drudis.2021.02.011 12. Abbas A, Sutter D, Zoufal C, Lucchi A, Figalli A, Woerner S (2021) The power of quantum neural networks 1(6):403–409. https://doi.org/10.1038/s43588-021-00084-1 13. Choi J, Oh S, Kim J (2021) A tutorial on quantum graph recurrent neural network (QGRNN). In: 2021 International conference on information networking (ICOIN). IEEE. https://doi.org/ 10.1109/icoin50884.2021.9333917 14. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 15. Dubey SR, Chakraborty S (2018) Average biased relu based CNN descriptor for improved face retrieval. https://doi.org/10.1007/s11042-020-10269-x. http://arxiv.org/abs/1804.02051v2 16. Grattarola D, Alippi C (2020) Graph neural networks in tensorflow and keras with spektral. http://arxiv.org/abs/2006.12138v1 17. Reiser P, Eberhard A, Friederich P (2021) Implementing graph neural networks with tensorflowkeras. http://arxiv.org/abs/2103.04318v1 18. Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models 4(4):043001. https://doi.org/10.1088/2058-9565/ab4eb5 19. Mari A, Bromley TR, Izaac J, Schuld M, Killoran N (2020) Transfer learning in hybrid classicalquantum neural networks 4:340. https://doi.org/10.22331/q-2020-10-09-340 20. Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers 101(3). https://doi.org/10.1103/physreva.101.032308 21. Sim S, Johnson PD, Aspuru-Guzik A (2019) Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms 2(12):1900070. https://doi. org/10.1002/qute.201900070
800
P. Gajjar et al.
22. Bergholm V, Izaac J, Schuld M, Gogolin C, Alam MS, Ahmed S, Arrazola JM, Blank C, Delgado, A, Jahangiri S, McKiernan K, Meyer JJ, Niu Z, Száva A, Killoran N (2018) Pennylane: automatic differentiation of hybrid quantum-classical computations. http://arxiv.org/abs/1811. 04968v3 23. Kwak Y, Yun WJ, Jung S, Kim J (2021) Quantum neural networks: concepts, applications, and challenges. In: 2021 Twelfth international conference on ubiquitous and future networks (ICUFN). IEEE. https://doi.org/10.1109/icufn49451.2021.9528698 24. Sweke R, Wilde F, Meyer J, Schuld M, Faehrmann PK, Meynard-Piganeau B, Eisert J (2020) Stochastic gradient descent for hybrid quantum-classical optimization 4:314. https://doi.org/ 10.22331/q-2020-08-31-314 25. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29 26. Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. international conference on learning representations (ICLR 2017) 27. Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.11 28. Maguire JB, Grattarola D, Klyshko E, Mulligan VK, Melo H (2021) Xenet: using a new graph convolution to accelerate the timeline for protein design on quantum computers. https://doi. org/10.1101/2021.05.05.442729 29. Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) Moleculenet: a benchmark for molecular machine learning 9(2):513–530. https://doi. org/10.1039/c7sc02664a 30. Bjerrum EJ (2017) SMILES enumeration as data augmentation for neural network modeling of molecules. http://arxiv.org/abs/1703.07076v2 31. Treinish M, Carvalho I, Tsilimigkounakis G, Sá N (2021) rustworkx: a high-performance graph library for python. http://arxiv.org/abs/2110.15221v2 32. Merrillees M, Du L (2021) Stratified sampling for extreme multi-label data. http://arxiv.org/ abs/2103.03494v1 33. Deep learning using keras (2019). In: Keras to Kubernetes® . Wiley, pp 111–129. https://doi. org/10.1002/9781119564843.ch4 34. Fourure D, Javaid MU, Posocco N, Tihon S (2021) Anomaly detection: how to artificially increase your f1-score with a biased evaluation protocol. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 3–18
Improving Pneumonia Detection Using Segmentation and Image Enhancement Ethiraj Thipakaran, R. Gandhiraj, and Manoj Kumar Panda
Abstract Pneumonia is one of the deadliest diseases, causing difficulties breathing and reducing oxygen consumption in lungs. It primarily affects children and seniors over the age of 65. It was observed that pneumonia caused a significant fraction of child fatalities. Early detection helps to cure with affordable medications. The commonly used diagnostic test is chest X-ray imaging since these x-rays are considerably inexpensive and quicker than other imaging modalities. The emergence of artificial intelligence simplified many activities, particularly the processing and categorization of images using deep learning convolutional networks. Therefore, in this paper, a model is trained utilizing a transfer learning approach to provide a rapid pneumonia detection system. The pre-trained network DenseNet201 with global average pooling layer was employed to validate some techniques, such as segmentation, enhancement, and augmentation. These experiments were conducted on the openly accessible RSNA pneumonia detection challenge dataset. The DenseNet201 with enhanced images achieved the highest results of 95.95% for accuracy, 95.13% precision, 86.41 for recall, and 90.56% for F1 score. This score ensures that this technique outperforms some of the existing techniques. Keywords Transfer learning · DensNet201 · Segmentation · Augmentation · Enhancement
E. Thipakaran (B) · R. Gandhiraj Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] R. Gandhiraj e-mail: [email protected] M. K. Panda Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_58
801
802
E. Thipakaran et al.
1 Introduction Pneumonia is an acute respiratory illness that mostly affects the lungs. When a healthy person breathes, tiny sacs in their lungs called alveoli fill with air. The alveoli are stuffed with pus and fluid when someone has pneumonia, which makes breathing difficult and reduces oxygen intake. Pneumonia is the leading infectious cause of mortality in children globally. 740 180 children under the age of five died from pneumonia in 2019, accounting for 14% of all pediatric fatalities in that age group but 22% of all fatalities among children aged from 1 to 5. Viruses, bacteria, and fungus are among the infectious organisms that cause pneumonia. These conditions can also be identified and treated with affordable oral antibiotics in the earliest stage of the disease [1]. Early diagnosis would enable patients to receive treatment or lessen the severity of their illnesses. Imaging the chest using an X-ray machine is one of the usual and efficient methods of diagnostics. These images are used by doctors who specialize in radiology, generally known as radiologists, to identify patients with symptoms. However, there are not enough radiologists to evaluate all of these X-rays, particularly in remote places, and because symptoms and appearances are similar, it is possible for these images to be misclassified. Therefore, it is necessary to create a system that is dependable, precise, and accurate. Several models based on artificial intelligence have lately been established across the majority of sectors [2, 3]. Especially, in the medical industry, deep learning models have fared very well in biomedical imaging. In recent years, the development of convolutional neural networks (CNNs) has resulted in enormous success. Xu et al. [4] presented DeepCXray, a deep neural network that can automatically diagnose 14 illnesses from the chestXray14 dataset. Luo et al. [5] developed models based on DenseIBN-121 as a backbone to classify diseases from NIH, CheXpert, and InsightCXR datasets. Sogancioglu et al. [6] employed a segmentation model and a classification model to detect cardiomegaly and obtained area under curve (AUC) values of 0.977 and 0.941, respectively. Standard U-Net was utilized for segmentation, while pre-trained networks such as ResNet18, ResNet50, and DenseNet121 were employed for classification. To segment the ribs and clavicles, Wang et al. [7] used a multitask dense connection U-Net and found a value of 88.38 percent for the Dice similarity coefficient. Using fully convolutional network (FCN) U-Net, Adegun et al. [8] segmented skin lesions, brain magnetic resonance imaging (MRI), and retina pictures with accuracy and Dice coefficients over 90%. A wide range of neural networks has been trained to anticipate various illnesses and anomalies in chest X-rays [9]. However, more accurate and exact models that can aid doctors in diagnosis are necessary. Many various strategies, such as the transfer learning approach and segmentation, have been employed to increase the accuracy of predictions. Some significant contributions were documented in this study. First, the performance of the U-Net was explored by utilizing a pre-trained network as the backbone, extending the dataset through augmentation, and employing several loss functions. Second, DenseNet201 was developed utilizing the transfer learning technique, with the addition of the global average pooling layer. Finally, the pneumonia dataset
Improving Pneumonia Detection Using Segmentation and Image …
803
was categorized, and numerous strategies, including augmentation, enhancement, and segmentation, were used to examine the effectiveness of the algorithm.
2 Related Studies Deep learning networks, as previously mentioned, are particularly popular in image processing. Rajpurkar et al. [10] created CheXNeXt, a convolutional algorithm, to predict 14 pathologies. The ChestX-ray14 dataset included 14 pathologies as well as 112,120 frontal chest X-rays from 30,805 individuals. CheXNeXt is a 121-layer DenseNet method. These levels are directly related to the block’s other layers. All the previous layers’ feature maps are utilized as inputs for each layer, and each layer’s feature maps are sent along to all subsequent levels as inputs. After this network had been trained, its effectiveness was assessed, and it was also compared to that of radiologists. This neural network fared as well as radiologists in ten diseases and outperformed radiologists in one. It produced an AUC of 0.862 for atelectasis, which was statistically substantially greater than the AUC of 0.808 achieved by radiologists. The radiologists outperformed CheXNeXt in cardiomegaly, emphysema, and hernia, with AUCs of 0.888, 0.911, and 0.985, respectively, whereas CheXNeXt’s AUCs were 0.831, 0.704, and 0.851, respectively. The AUCs for the remaining ten diseases did not differ statistically significantly. The aforementioned study created new networks to conduct their study. However, researches [11–13] demonstrate how well alternative methods, such as segmentation, augmentation, and pre-trained networks, commonly known as transfer learning approaches, are used to enhance performances. Transfer learning has gained a lot of attention recently, and the majority of current studies make use of it to boost performance. To identify viral and COVID-19 pneumonia, Chowdhury et al. [14] suggested a method based on data augmentation and transfer learning. Eight pre-trained networks and six separate sub-databases were combined to produce one database for this experiment. In this work, two distinct experiments—twoclass and three-class classification without and with image augmentation—were conducted. CheXNet and DenseNet201 both performed well with augmentation for two-class classification, with accuracy rates of 99.69% and 99.70%, respectively. ResNet18 and CheXNet achieved an accuracy of 99.41% without any augmentation. CheXNet achieved 97.74% accuracy for three-class classification without augmentation, whereas DenseNet201 observed 97.94% with augmentation. Pham [15] tested two and three-class categorizations on chest X-ray pictures using COVID-19, viral pneumonia, and healthy. This study made use of the transfer learning approach. The classification was accomplished using AlexNet, GoogLeNet, and SqueezeNet. On the six datasets, the outcomes were better. The researchers stated that the findings imply that fine-tuning network learning parameters are crucial because it can assist in preventing the construction of more complicated models when current ones can achieve the same or better outcomes. Studies [16–19] also applied transfer learning techniques to various imaging issues. DenseNet121 was employed in study [16],
804
E. Thipakaran et al.
whereas DenseNet201 was used in study [18]. Jasil and Ulagamuthalvi [17] used pre-trained models to test the performance of their approach. Pre-trained networks such as DenseNet201 and VGG19 were employed by Jiang [19]. As stated in the studies [11–13], numerous studies used the segmentation network [20], primarily U-Net, to improve the performance of their models. Rahman et al. [21] used segmentation and classification to divide chest X-rays into Tuberculosis and normal. It all started with looking at two deep learning segmentation models, named modified U-Net and U-Net. Both U-Nets were applied to 704 X-ray images and their matching masks. The highest accuracy, intersection over union (IoU), and Dice values achieved for U-Net are 98.14%, 92.82%, and 96.19%, respectively, and 98.1%, 92.71%, and 96.08% for modified U-Net. Three distinct X-ray datasets were also collected, and the images were segmented and classified. Nine pre-trained deep CNNs are used in the classification model. Finally, the performance of the networks with and without segmentation was calculated. ChexNet had the best percentage accuracy without segmentation at 96.47%, whereas DenseNet201 had 98.6% accuracy with segmentation. According to Abedalla et al. [22], the Ens4B-U-Net ensemble U-Net may be used for pneumothorax segmentation. The ensemble U-Net is built using a weighted average of four segmentation models. As the backbone of U-Net, each segmentation model employs pre-trained networks. The SIIM-ACR dataset was used to assess this study. The given ensemble network had a Dice coefficient of 0.86. For segmenting the lungs, Kim and Lee [23] devised the X and Y-attention approach. The X-attention module focuses on the crucial attributes needed for lung segmentation. The global attributes of the input images are used by the Y-attention module. The U-Net employed in this experiment had the X and Y-attention modules applied to it, with ResNet101 serving as the U-Net’s backbone. On the various places of the U-Net, X- and Y-attention were focused. This model experimented on the Montgomery, JSRT, and Shenzhen datasets, and the highest Dice scores on each dataset were 0.982, 0.968, and 0.954, respectively. Novikov et al. [24] investigated three completely convolutional architectures for multiclass segmentation in chest X-ray images, focusing on the lung field, clavicles, and heart. The authors adopted the UNet and made certain changes to its architecture. These designs were integrated with various ways to provide a higher-performing model. InvertedNet and All-Dropout worked well together, yielding Jaccard coefficients of 0.949, 0.833, and 0.888 for the lungs, clavicles, and heart, respectively. Furthermore, several studies attempted to test their algorithm functions through enhancement techniques. In order to analyze acute respiratory distress syndrome, Reamaroon et al. [25] introduced a technique called total variation-based active contour. This algorithm consisted of three major phases. To define and eliminate different medical equipment that was visibly obstructing the lung fields, total variation denoising was used. To systematically identify the lungs, a recursive binarization approach was applied, and a stacked active contour model was used to improve lung border formation. Prior to execution, X-ray images of the chest are standardized using contrast-limited adaptive histogram equalization (CLAHE). The system effectively segregated the patients’ lung fields, yielding 0.86 and 0.85 Dice coefficients for the adult and pediatric cohorts, respectively. Rahman et al. [26]
Improving Pneumonia Detection Using Segmentation and Image …
805
alter the original U-Net by using three convolutional layers rather than two in the decoding process. Additionally, five different image enhancement techniques were used to compare seven different pre-trained CNN models using non-segmented and segmented lung X-ray images for the classification of COVID-19, non-COVID lung opacity, and normal images to study the impact of image enhancement and lung segmentation on COVID-19 detection from the COVQU dataset. Without segmentation, this experiment found accuracy, precision, and recall to be 96.29%, 96.28%, and 96.28%, respectively; with segmentation, these results were 95.11%, 94.55%, and 94.56%. DenseNet201 with gamma enhancement approach excels for the segmented lungs whereas ChexNet model with gamma enhancement technique performed better without image segmentation.
3 Methodology The methodology of the entire study follows four distinct directions, as seen in Fig. 1. To accomplish the research, two neural networks were investigated and utilized, and the first network employs the U-Net architecture that facilitates segmenting the lung region in the X-ray images. The second network was constructed using the pre-trained network to categorize and discover patients with pneumonia. Augmentation was performed in both algorithms to optimize the network performance. Image enhancement was also carried out in the classification network. The loss curves and accuracy cures were utilized to determine the stable train networks. To observe the performance of the networks, for the segmentation, accuracy, IoU, and F1 scores were computed, and for the classification, the matrix’s accuracy, precision, recall, and F1 score are obtained.
Fig. 1 Study overview
806
E. Thipakaran et al.
3.1 Dataset Description Two datasets were utilized to train the networks, one used for segmentation and the other one for classification. Dataset for Segmentation Chest X-ray images and their masks from Kaggle [27] were utilized to train the UNet. The dataset comprised 800 X-ray images and 704 masks. But, only 704 images consist of its corresponding masks. 344 of the 704 images are abnormal, whereas 360 images show no abnormalities. For this study, 53 images were eliminated since their masks contained inappropriate features, and 651 images were finalized. Figures 2a and b show a few discarded images and selected images. Dataset for classification The RSNA dataset [28] was utilized for classification. It contains a total of 26,684 images: 6012 opacity images, 8851 normal images, and 11,821 not-normal and notopacity images. For this study, all the 6012 opacity images and 6012 not-opacity images were selected. These not-opacity images are a composite of 2000 normal images and 4012 abnormal images.
Fig. 2 a Example of removed images and annotated masks. b Example of selected images and masks
Improving Pneumonia Detection Using Segmentation and Image … Table 1 Details of images for U-Net segmentation
Table 2 Training parameters for segmentation
807
Total images
Train
Test
651
521
130
Training parameters Epochs
20
Batch size
8
Learning rate
0.001
Optimizer
Adam
3.2 Preprocessing Resizing images is the one preprocessing method employed before sending the images to neural networks because photos must remain the same size. The segmentation dataset is of varying sizes. As a result, all of the segmentation dataset’s images and masks were transformed to 256 × 256 pixels. Similarly, the categorization dataset’s photos were scaled to 224 by 224 pixels.
3.3 Segmentation U-Net is one of the typical neural networks used in deep learning segmentation. The U-Net was altered to use ResNets as a backbone. The ResNet has 5 variants: ResNet152, ResNet34, ResNet101, ResNet18, and ResNet50. The primary distinction between them is the number of residual layers. On the segmentation dataset, all of the networks were tested. Furthermore, loss functions, such as binary cross-entropy (BCE), focal loss, Jaccard/IoU loss, and Dice loss, were checked for identifying the performance variances. The images and masks were divided according to a five-fold split. 80% of images and masks were used for training and 20% for testing. Table 1 depicts the number of images used for the segmentation, while Table 2 gives the parameters used for training the segmentation network.
3.4 Augmentation Augmentation was performed on both the Kaggle segmentation dataset and the RSNA dataset. However, the functions were quite distinct. In the case of segmentation, the identical images and masks should be aligned. Therefore, both images and masks must be augmented using the same technique. As
808 Table 3 Details of images and masks of augmented Kaggle dataset
E. Thipakaran et al. Normal
Augmented
Images
651
1908
Masks
651
1908
Fig. 3 Example images and masks of the augmented dataset
a result, the photos and masks were rotated 10° clockwise and anticlockwise, as well as zoomed in and out (85 and 110%). Table 3 lists the number of normal images and augmented images, while Fig. 3 displays some of the sample images and masks used for segmentation. The ImageDataGenerator function in Keras was used to enrich the classification dataset. The functions utilized include rotation, width shift, shear range, zoom range, and height shift range. The ranges were carefully selected to ensure that the lung area does not lose out on the augmented images. The study [29] employed a total of 20,599 opacity images and 6002 not-opacity images using the RSNA dataset. Therefore, the 6012 opacity X-rays were augmented as specified in Table 4, yielding a total of 20,600 opacity images, and the same functions were applied to 6012 not-opacity X-rays, yielding 6012 enhanced notopacity images. Following the augmentation, 21,291 and 5322 images were assigned to training and testing, respectively. Because the opacity images were up-sampled, there may be comparable images in both training and testing, causing the neural networks to be biased and resulting in greater accuracies and other performance metrics values. Therefore, the separation was done thoroughly, and it was ensured that the images in the test and train had no similar images. Table 4 Functions of augmentation for classification dataset
Augmentation functions
Ranges
Rotation
10
Width shift
0.1
Height shift
0.1
Shear
0.1
Zoom
0.1
Improving Pneumonia Detection Using Segmentation and Image …
809
Fig. 4 Example of normal and enhanced images
3.5 Enhancement This improves image quality by preserving necessary features and removing noise. It was solely done on the classification dataset. To improve all of the images in the classification data, two enhancement techniques are used: sharpness improvement and contrast enhancement. Figure 4 clearly shows the distinction between a standard image and an enhanced one.
3.6 Classification The classification is the most important aspect of this research. This study divides RSNA pictures into two categories: opacity and not-opacity. The transfer learning approach was employed. The exact meaning of this strategy is to use previously trained networks in earlier research and alter them to meet the needs of the current problem. The network used is DenseNet201. This network was trained on the ImageNet dataset. This network is 201 layers deep convolutional neural network. In this architecture, global average pooling was applied instead of the flattening layer for this study.
3.7 Performance Metrics The performance matrices are employed to assess network performance. On segmentation and classification networks, however, there are preferences when using matrices. The accuracy, IoU, and F1 score matrices are frequently employed on the segmentation network. Receiver operating characteristic (ROC) curve and AUC are
810
E. Thipakaran et al.
also used in classification algorithms, in addition to the matrices accuracy, precision, recall, and F1 score.
4 Results and Discussion 4.1 Segmentation Segmentation Training with 651 Images The U-Net network is trained using ResNets as the backbone. The performance of the variants is stated in Table 5. ResNet variations all performed similarly. However, ResNet34 was chosen among these algorithms since it achieved slightly better results, simple 34 layers are sufficient rather than picking sophisticated networks and that will not require any additional training time. Table 6 shows the performance of ResNet34 with different loss functions. The Dice loss led the Adam optimizer to reach the minimum loss and outperformed the other loss observations. Segmentation with Augmented Images The same ResNet34 was used to observe the performance on 1908 augmented images and masks. Table 7 lists the performance of the segmentation network on augmented images with various losses. Table 5 Performance of ResNets
Variations
Accuracy
IoU/Jaccard
Fi score
ResNet18
98.34
93.63
96.70
ResNet34
98.37
93.74
96.77
ResNet50
98.38
93.67
96.77
ResNet101
98.34
93.68
96.73
ResNet154
98.35
93.72
96.75
Bold refers to the highest value among all the techniques utilized in the testings
Table 6 Network performance on different losses
Loss function
Accuracy
IoU
F1
BCE
98.37
93.74
96.77
Focal
98.34
93.55
96.66
Jaccard/IoU
98.37
93.74
96.76
Dice
98.39
93.82
96.81
Bold refers to the highest value among all the techniques utilized in the testings
Improving Pneumonia Detection Using Segmentation and Image … Table 7 Performance of U-Net/RestNet34 on different losses
Loss
Accuracy
811 IoU
F1
Dice
95.47
94.18
97.00
Focal + Dice
95.43
94.00
96.90
BCE + Jaccard
95.45
94.09
96.95
BCE + Dice
95.49
94.28
97.05
Bold refers to the highest value among all the techniques utilized in the testings
Even though Dice loss obtained the highest value with 651 images, the combination of BCE and Dice loss surpassed all the other losses, including the Dice loss. The IoU and F1 scores were enhanced by using augmented images and masks, but the accuracy was diminished. There were two plausible explanations. First, some annotated masks did not match completely with X-ray images; therefore, after augmentation, predicted masks outperformed annotated masks (Fig. 5). Second, the augmentation stabilized the segmentation network, as seen in the loss and accuracy curve. A variation can be detected in the loss and accuracy curves, but it was optimized after augmentation (Figs. 6 and 7). Also, Fig. 8 shows an example of a produced mask and segmented image for an internal image using trained U-Net/ResNet34 network.
4.2 Classification The classifying section contains four major steps as it was mentioned in Chap. 3. The first step follows the classification directly after preprocessing the 6012 opacity and 6012 not-opacity images using DenseNet201 with global average pooling. The network achieved scores of 72.52% for accuracy, 73.85% for precision, 71.92% for recall, and 72.52% for F1 score. The second stage is to augment the preprocessed images, which yielded 20,600 opacity images and 6012 non-opacity images. Figure 9 displays some of the augmented images. The augmented images were fed into a classification network. The network’s performance was evaluated, and the accuracy, precision, recall, and F1 score were determined to be 85.22%, 72.91%, 55.12%, and 62.78%, respectively. The next step was to use the augmented images to perform enhancement and classification. The accuracy, precision, recall, and F1 score were 96.44%, 92.81%, 91.34%, and 92.07%, respectively. Despite the performance being good, discrepancies in the loss and accuracy curves were seen while training the DenseNet201 (Fig. 10). At first, the loss of validation was smaller than the loss of training accuracy, and when the network trained further, the loss of training dropped to less than 0.1 and continued to decrease, while the loss of validation stopped at 0.1. This suggests that the model is overfitting, the model memorizes, and the model has to be generalized. As a result, 0.2 dropout was conducted to generalize the model. Figure 11 depicts the
812
E. Thipakaran et al.
Fig. 5 Sample images and masks with predicted masks
Fig. 6 Loss and accuracy curves without augmentation
curves acquired after the dropout. Performance metrics were also computed, yielding 95.58% accuracy, 91.13% precision, 89.08% recall, and 90.10% F1 score. After the 20% dropout, the model’s training loss was 0.18, and its validation loss was 0.13. The model appears to be under-fitting. Since the network lost more training features, the dropout rate had to be reduced. At last, the optimal dropout rate of 5% was discovered. Figure 12 depicts the improved model’s loss and accuracy curves. The
Improving Pneumonia Detection Using Segmentation and Image …
813
Fig. 7 Loss and accuracy curve with augmentation
Fig. 8 Image with created mask and cropped image
Fig. 9 Example of augmented images for classification
optimum model loss value for training and validation was 0.1, resulting in accuracy, precision, recall, and F1 score of 95.95%, 95.13%, 86.41%, and 90.56%, respectively. Fourthly, and last, the trained segmentation network (U-Net/ResNet34) was used to generate the masks for the enhanced images, which were then utilized to segment the same dataset. Figure 13 illustrates the masks that were applied to the images, while
814
E. Thipakaran et al.
Fig. 10 Loss and accuracy cures of DenseNet201 on the augmented dataset
Fig. 11 Loss and accuracy curve after 20% dropout were applied
Fig. 12 Optimized model’s loss and accuracy curve
Fig. 14 illustrates some of the segmented images. In this experiment, segmented pictures were fed into the classification model, and the model received accuracy, precision, recall, and F1 scores of 91.47%, 86.83%, 73.27%, and 79.47%, respectively. The segmentation findings did not improve on the earlier outcomes of enhanced image classification. However, studies have shown that segmentation improves the performance of classification findings. The primary factor contributing to the decline in performance was that the lung areas in the Kaggle segmentation dataset were
Improving Pneumonia Detection Using Segmentation and Image …
815
Fig. 13 Example of mask applied images
clearer than the lung regions in the Pneumonia images; therefore, it performed well on the segmentation dataset. The lung regions in abnormal images in the RSNA dataset were not clear enough to predict the specific lung area. Therefore, the segmentation model was unable to forecast the precise lung region. Table 8 compares the steps performed in this study. It demonstrates that step 3 achieved better performance than the other steps. The following tables show the performance of algorithms used in this study as well as existing studies. Table 9 compares segmentation network performance, whereas Table 10 compares classification network performance.
5 Conclusion This study presents a transfer learning technique to detect pneumonia. The technique utilized a pre-trained DenseNet201 to identify pneumonia from other images. In addition, a few other techniques were analyzed: segmentation, augmentation, and enhancement. For the segmentation, U-Net was applied, and it was trained and improved using transfer learning and augmentation. Despite the enhanced U-net outperforming the existing U-Net model, it cannot facilitate the classification task. The major reason is that the severely affected X-rays cannot be segmented by the
816
E. Thipakaran et al.
Fig. 14 Example of segmented images Table 8 Results comparison of the techniques Description
Accuracy
Precision
Recall
F1 score
Normal images
72.81
73.85
71.92
72.52
Augmented
85.22
72.91
55.12
62.78
Enhanced
95.95
95.13
86.41
90.56
Segmented
91.47
86.83
73.27
79.47
Bold refers to the highest value among all the techniques utilized in the testings Table 9 Segmentation network performance comparison on the same dataset Author
Method
Accuracy
IoU
F1
Rahman et al. [21]
U-Net and modified U-Net
98.14
92.82
96.19
This study without augmentation
U-Net/ResNet34 as backbone
98.39
93.82
96.81
This study with augmentation
U-Net/ResNet34 as backbone
95.49
94.28
97.05
Bold refers to the highest value among all the techniques utilized in the testings
Improving Pneumonia Detection Using Segmentation and Image …
817
Table 10 Classification results comparison on a similar dataset Author
Method
Dataset
Accuracy
Precision
Recall
F1 score
Kundu et al. [29]
Ensemble network using GoogleNet, ResNet18, and DenseNet121
RSNA
86.85
86.89
87.02
86.95
Zhou et al. [30]
Inception V3 + ResNet50 without scaled features
NIH-8
79.70
This study
DenseNet201 + Global average pooling layer along with augmentation and enhancement on images
RSNA
95.95
80.00
95.13
86.41
90.56
Bold refers to the highest value among all the techniques utilized in the testings
network since the lung region cannot be seen properly. However, the augmentation and enhancement improved the model’s performance and achieved better results of accuracy of 95.5%, precision of 95.13%, recall of 86.41%, and F1 score of 90.56%. Two limitations of the study were observed. First, the dataset is limited to a few thousand images. Even augmentation cannot satisfy the need beyond a certain limit; once it exceeds, the model becomes biased. Second, the model was validated with internal images only. In this study, the segmentation was unable to meet the expectations. Therefore, object detection could be employed to increase performance in future investigations.
References 1. Who.int. (2021) Pneumonia. (Online) Available at: https://www.who.int/news-room/fact-she ets/detail/pneumonia. Accessed 2 July 2022 2. Nandan K, Panda M, Veni S (2020) Handwritten digit recognition using ensemble learning. In: 2020 5th International conference on communication and electronics systems (ICCES) 3. Priyanka R, Shrinithi S, Gandhiraj R (2021) Big data based system for biomedical image classification. In: 2021 Fourth international conference on electrical, computer and communication technologies (ICECCT) 4. Xu X, Guo Q, Guo J, Yi Z (2018) DeepCXray: automatically diagnosing diseases on chest X-Rays using deep neural networks. IEEE Access 6:66972–66983 5. Luo L, Yu L, Chen H, Liu Q, Wang X, Xu J, Heng P (2020) Deep mining external imperfect data for chest X-Ray disease screening. IEEE Trans Med Imaging 39(11):3583–3594 6. Sogancioglu E, Murphy K, Calli E, Scholten E, Schalekamp S, Van Ginneken B (2020) Cardiomegaly detection on chest radiographs: segmentation versus classification. IEEE Access 8:94631–94642 7. Wang W, Feng H, Bu Q, Cui L, Xie Y, Zhang A, Feng J, Zhu Z, Chen Z (2020) MDU-Net: a convolutional network for clavicle and RIB segmentation from a chest radiograph. J Healthc Eng 2020:1–9
818
E. Thipakaran et al.
8. Adegun A, Viriri S, Ogundokun R (2021) Deep learning approach for medical image analysis. Comput Intell Neurosci 2021:1–9 9. Sathyan H, Panicker J (2018) Lung nodule classification using deep ConvNets on CT images. In: 2018 9th International conference on computing, communication and networking technologies (ICCCNT) 10. Rajpurkar P, Irvin J, Ball R, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Patel B, Yeom K, Shpanskaya K, Blankenberg F, Seekins J, Amrhein T, Mong D, Halabi S, Zucker E, Ng A, Lungren M (2018) Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 15(11):e1002686 11. Liu X, Song L, Liu S, Zhang Y (2021) A review of deep-learning-based medical image segmentation methods. Sustainability 13(3):1224 12. Kumar S, Singh P, Ranjan M (2021) A review on deep learning based pneumonia detection systems. In: 2021 International conference on artificial intelligence and smart systems (ICAIS) 13. Hesamian M, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596 14. Chowdhury M, Rahman T, Khandakar A, Mazhar R, Kadir M, Mahbub Z, Islam K, Khan M, Iqbal A, Emadi N, Reaz M, Islam M (2020) Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8:132665–132676 15. Pham T (2020) Classification of COVID-19 chest X-Rays with deep learning: new models or fine tuning? Health Inf Sci Syst 9(1) 16. Allaouzi I, Ben Ahmed M (2019) A novel approach for multi-label chest X-Ray classification of common thorax diseases. IEEE Access 7:64279–64288 17. Sanagala S, Gupta S, Koppula V, Agarwal M (2019) A fast and light weight deep convolution neural network model for cancer disease identification in human lung(s). In: 2019 18th IEEE international conference on machine learning and applications (ICMLA) 18. Jasil SG, Ulagamuthalvi V (2021) Skin lesion classification using pre-trained DenseNet201 deep neural network. In: 2021 3rd International conference on signal processing and communication (ICPSC) 19. Jiang Z (2020) Chest X-Ray pneumonia detection based on convolutional neural networks. In: 2020 International conference on big data, artificial intelligence and internet of things engineering (ICBAIE) 20. Reddy D, Dheeraj, Kiran, Bhavana V, Krishnappa H (2018) Brain tumor detection using image segmentation techniques. In: 2018 International conference on communication and signal processing (ICCSP), pp 0018–0022 21. Rahman T, Khandakar A, Kadir M, Islam K, Islam K, Mazhar R, Hamid T, Islam M, Kashem S, Mahbub Z, Ayari M, Chowdhury M (2020) Reliable tuberculosis detection using chest X-Ray with deep learning, segmentation and visualization. IEEE Access 8:191586–191601 22. Abedalla A, Abdullah M, Al-Ayyoub M, Benkhelifa E (2021) Chest X-Ray pneumothorax segmentation using U-Net with efficientnet and resnet architectures. PeerJ Comput Sci 7:e607 23. Kim M, Lee B (2021) Automatic lung segmentation on chest X-Rays using self-attention deep neural network. Sensors 21(2):369 24. Novikov A, Lenis D, Major D, Hladuvka J, Wimmer M, Buhler K (2018) Fully convolutional architectures for multiclass segmentation in chest radiographs. IEEE Trans Med Imaging 37(8):1865–1876 25. Reamaroon N, Sjoding M, Derksen H, Sabeti E, Gryak J, Barbaro R, Athey B, Najarian K (2020) Robust segmentation of lung in chest X-Ray: applications in analysis of acute respiratory distress syndrome. BMC Med Imaging 20(1) 26. Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Abul Kashem S, Islam M, Al Maadeed S, Zughaier S, Khan M, Chowdhury M (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-Ray images. Comput Biol Med 132:104319 27. Pandey N (2022) Chest Xray masks and labels. (Online) Kaggle.com. Available at: https://www. kaggle.com/datasets/nikhilpandey360/chest-xray-masks-and-labels. Accessed 5 June 2022
Improving Pneumonia Detection Using Segmentation and Image …
819
28. Kaggle (2018) RSNA pneumonia detection challenge. (Online) Kaggle.com. Available: https:// www.kaggle.com/competitions/rsna-pneumonia-detection-challenge. Accessed on 09 June 2022 29. Kundu R, Das R, Geem Z, Han G, Sarkar R (2021) Pneumonia detection in chest X-Ray images using an ensemble of deep learning models. PLoS ONE 16(9):e0256630 30. Zhou S, Zhang X, Zhang R (2019) Identifying cardiomegaly in chestX-Ray8 using transfer learning. Stud Health Technol Inform 264:482–486
Object Detection Application for a Forward Collision Early Warning System Using TensorFlow Lite on Android Barka Satya, Hendry, and Daniel H. F. Manongga
Abstract Forward collision mitigation system is a safety feature available on cars. This feature prevents collision caused by the lack of maintaining a safe distance between vehicles, as well as from drivers who lack concentration. However, not all vehicles have this feature. So in this research, an android-base application whose function and objectives are similar to the safety feature will be made. This application will use TensorFlow lite for the Custom Object Detector, SSD MobileNet V2 for the pre-trained model, and OpenCV library for the application framework. The research is done using the CRISP-DM methodology, as well as literature studies from previous research as a comparison. The output of this research is an application that can give a passive alert to the driver if the distance between vehicles is less than 3 m for cars, less than 2 m for motorcycles and less than 5 m for buses and trucks, giving a safe distance to do braking if needed. Keywords Forward collision mitigation system · TensorFlow lite · SSD MobileNet V2 · OpenCV
1 Introduction Safety, while driving, is very important that is why car manufacturers are competing to design a technology that can prevent the driver from accidents. Forward collision mitigation calculates the distance from the car to objects in front of the car using B. Satya (B) · Hendry · D. H. F. Manongga Faculty of Information Technology, Satya Wacana Christian University, Salatiga, Central Java, Indonesia e-mail: [email protected] Hendry e-mail: [email protected] D. H. F. Manongga e-mail: [email protected] B. Satya Faculty of Computer Science, Universitas Amikom, Yogyakarta, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_59
821
822
B. Satya et al.
sensors, from radar to laser [1]. However, the hardware needed for this system is quite expensive, so only upper–middle-class cars have this system installed. This system prevents collisions from the front by giving a signal to the driver using a warning light, vibration on the steering wheel, and alarm sound so the driver can do the braking action and even do automatic braking when the distance to the detected object is already very close and there is no action done by the driver. In the research done by [2], the application is made using SSD MobileNet V1 for the pre-trained model which is not the newest model, and the level of accuracy is still not optimal. This system works by detecting an object in front of the car and making an estimation of the distance to the detected object. Since this system is heavily dependent on the model’s level of accuracy, SSD MobileNet V2 is used in this research. A custom model trained using SSD MobileNet V2 has a higher accuracy level in detecting an object. This has been proven in the research done by [3], in which the model can classify objects into five types with an average accuracy rate of 93.02%. Traffic sign detection using Raspberry Pi with a specification equivalent to a standard desktop computer gives a detection accuracy above 90% [4]. However, the usage of Raspberry Pi as the hardware is not easy and not user friendly for the common user. This research used an android smartphone as the hardware so that this system can be used by car owners that do not have the forward collision mitigation system on their car in a cheap and more user-friendly way to increase the safety of driving. This system is not more accurate and effective compared to the forward collision mitigation system provided by the car manufacturer which is connected directly to the car’s radar system and braking system. But this system can provide a passive alert to the driver. Car manufacturers can also integrate this system into their cars using the vehicle’s head unit platform which uses the android operating system only by adding a camera positioned in front of the car, which can reduce the manufacturing cost of the car.
2 Literature Review 2.1 Previous Study Tables 1 and 2 show the previous research used SSD MobileNet V1 for the pretraining model [2], while SSD MobileNet V2 is used in this research which has a higher frame per second and accuracy compared to the V1 model [5]. Table 1 Frame per second comparison between V1 and V2 models
Version
iPhone 7
iPhone X
iPad Pro 10.5
MobileNet V1
118
162
204
MobileNet V2
145
233
220
Object Detection Application for a Forward Collision Early Warning … Table 2 Level of accuracy comparison between V1 and V2 models
823
Version
Top-1 accuracy
Top-5 accuracy
MobileNet V1
70.9
89.9
MobileNet V2
71.8
91.0
SSD MobileNet V2 selection as the pre-training model is also supported by research done by [3], where SSD MobileNet V2 is used as the pre-trained model for a custom model that can detect five object classifications accurately. The test was done using 50 test sets which resulted in an average accuracy of 93.02%. Better accuracy is needed since this system is highly dependent on the accuracy of the model in detecting an object. Furthermore, previous research only detects three types of vehicles which are cars, trucks, and buses. This is not suitable for the driving condition in Indonesia where a motorcyclist is the majority. A motorcycle is also one of the main causes of accidents in Indonesia because of the bad behavior of the riders which does not keep a safe distance between vehicles. For those reasons, a motorcycle is added to the types of vehicles to be detected by the system. So in this research, there are four types of vehicles that can be detected by the system which are cars, trucks, buses, and motorcycles.
2.2 Forward Collision Mitigation System and TensorFlow Lite Forward collision mitigation (FCM) safety feature is developed by the Japanese car manufacturer, Mitsubishi, in which vehicles can be found easily in Indonesia. “Pajero Sport”, which is one of their most sold cars from January to June 2021, also has this safety feature [6]. FCM works using one 77 GHz radar located in the front grill of the car. This radar can detect objects up to 200 m in front of the car. There are two stages in FCM, the first stage is when the system detects a low collision risk. In this situation, the system will give an audio and visual alert to the driver. The second stage is when the driver does not respond to the alert given and keeps reducing the distance at the front. The system will give repeated alerts to the driver while in parallel also do soft automatic braking. In the case where the distance to the object keeps decreasing and increasing the risk of collision, FCM will do maximum braking while giving audio and visual alert to the driver [7]. FCM is active when the car is running from 30 to 180 kmph, and automatic braking will be adjusted based on the car’s speed. Figure 1 shows TensorFlow lite, a mobile version of TensorFlow, a tool known as FlatBuffers which runs a machine learning model on mobile devices such as android and IOS and also on microcontrollers like Raspberry Pi. TensorFlow lite can reduce the latency, increase data security, have lower power consumption, and do not need any connectivity since there is no data transmission to the server. The reduced model size also allows it to be run efficiently in a limited computing resource situation.
824
B. Satya et al.
Fig. 1 TensorFlow general architecture
TensorFlow lite supports Java, Swift, Objective-C, C++, and Python programming languages [8]
2.3 SSD MobileNet and OpenCV Library MobileNet is an architecture used for classification and features extractor, and single shot multibox detector (SSD) is an architecture that generates a boundary box to localize a detection of an object. SSD is based on the convolutional network which uses a single shot to generate a boundary box with a fixed size to detect multiple scales of various sizes [7]. SSD MobileNet takes input as seen in Fig. 2a with the bounding box which has been specified manually using Labelimg on each image during the training process. The bounding box will be evaluated with different aspect ratios in each box position with several different feature maps, like what is shown in Fig. 2b and c. During the training, each bounding box will have its shape offsets and confidence predicted for each object category (c1, c2,…, cp). The default bounding box will firstly be compared to the ground truth boxes; for example in Fig. 2a, it is seen that the cat and dog get a positive value while the other objects are negative. As shown in Fig. 3, some additional feature layers are added at the end of the network that predicts the offset to the default boxes that has different aspect ratio and scales and object category confidence. SSD with an input image of 300 × 300 outperforms YOLO with an input image of 400 × 400. SSD is also faster than YOLO [7]. SSD MobileNet V2 FPN Lite 320 × 320 is selected for this research because of its faster-processing speed and also higher accuracy [9]. SSD MobileNet V2 FPN Lite 320 × 320 has a detection speed of 22 ms with a mean average precision of 20.2. The resulting output is a box.
Object Detection Application for a Forward Collision Early Warning …
825
Fig. 2 SSD MobileNet feature a SSD MobileNet input, b 8 × 8 feature map, and c 4 × 4 feature map
Open computer vision (OpenCV) is an open-source library based on C++ programming language which has a specific function for image processing [10]. OpenCV mimics how the human eyes work. A human eye sees an object, and the information will be transmitted to the brain to be processed, so the human can recognize the object. OpenCV can be run in many operation systems for both desktop and mobile.
826
B. Satya et al.
Fig. 3 Differences between SSD MobileNet and YOLO
3 Research Methodology This research is done using Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology, which is a process with six stages that describes the data science life cycle. This methodology is used by the author to arrange and implement the data and machine learning in the project done by the author [11].
3.1 Business and Understanding The researcher understands that the forward collision mitigation system safety feature is only available on upper–middle-class vehicles. This safety feature can help the driver to avoid accidents caused by a lack of concentration from the driver. To resolve this issue, the researcher tries to implement this feature on an android smartphone which is owned by most drivers. This application is done using the single shot multibox detector (SSD) method. With this application, it is expected that the number of traffic accidents can be reduced and users of non-premium cars can also enjoy these safety features. The system workflow is described in Fig. 4. The system takes the input from the camera of an android smartphone which is placed in the middle of the dashboard or parallel to the middle rearview mirror. There should be no other objects nearby that can interfere with the view of the camera. Frame capture step is done using the OpenCV library which represents the human eyes. The frame captured in this process will be processed in the next step. Frame captured by the camera will be optimized to have a good image quality which will increase the accuracy of the system. The image processing step is the core of the
Object Detection Application for a Forward Collision Early Warning …
827
Fig. 4 System workflow
system. The TensorFlow object detection model will be used to detect and classify objects in the frame that has already been captured. The TensorFlow model has already been trained to recognize the expected model using the model pre-trained COCO SSD MobileNet V2. This model is available in the TensorFlow model zoo. This process also decides the classification of the object based on the training model. This system will recognize cars, motorcycles, buses, and trucks. The result from the previous step which is the bounding box for the vehicle detection along with the classification will be displayed on the phone’s screen. The distance calculation system uses the method of calculating the size of the detected bounding box compared to the smartphone’s screen frame. The empirical method is used for the distance calculation system where the distance determination is not carried out with certainty since the shape and size of the vehicle are different. This means that each type of vehicle will have a difference in the calculation of the bounding box size with the distance of each type of vehicle. To simplify the system, it is determined that the maximum ratio of the bounding box to the screen frame is 75%. If the bounding box size is greater than that, it will be concluded that the vehicle is too close. If it is less than 75%, the distance between the vehicles will be considered the same. The researcher did a data collection to be used in the research. The data are images of vehicles passing on the street of Indonesia. A total of 1000 images in JPG format are collected. Data are collected directly by the researcher using a smartphone camera and also from sources on the Internet such as YouTube and Kaggle. Data are collected from April to July 2021. Table 3 shows the details of the data collected.
828
B. Satya et al.
Table 3 Vehicle image initial data No.
Source
Amount data format
Location
1
Kaggle
56 PNG
Kaggle.com
2
Internet
445 PNG
Youtube.com
3
Internet camera
502 JPG
Street in Salatiga and Wonosobo City
3.2 Data Preparation Data preparation consists of several processes, from the conversion of the data format from PNG to JPG to the generation of TFrecord and training data set to become a model. Single shot multibox detection (SSD) method using Python 3.8.10 and TensorFlow 2.3.0 is used in the research. The research flow can be seen in Fig. 5. It shows the steps starting from the data collection until the implementation of the TFLite model in android, so the application “forward collision early warning system” can be tested. The image is then labeled based on the dataset. After the data collection is finished and all image format has been standardized, all the images will be put into a project folder “vehicle” and subfolder “Images”. The image annotation is done using the Labelimg tool. The author differentiates the vehicle type and coordinates of the vehicle, then Labelimg will save and convert the coordinate data and image annotation into an XLM file. Based on Fig. 5, the next steps are TensorFlow API configuration, setting Python environment, and configuring the TensorFlow-GPU. To be able to run the training process using the local GPU method, some additional configurations are needed. The first thing to do is to install an anaconda navigator as a place to create the virtual environment. After anaconda has been installed, then create a virtual environment using Python 3.8, TensorFlow 2.3, and TensorFlow-GPU 2.3 with cuDNN 7.6.4 and CUDA 10.1. Other than that a repository from TensorFlow is needed as a workspace to organize and make the work easier since all the commands are done in this directory. The XML file is then converted into CSV. The XML file contains the coordinate of xmin, xmax, ymin, and ymax from each class of image that has been defined during the image labeling. This file will be converted into a CSV file that contains all the coordinates from the XML file in a table format. The output CSV file example can be seen in Fig. 6. Class definition using PBTXT file: Before generating the TFRecord, a PBTXT file is needed. The content of the PBTXT file will be the class definition of the objects to be detected based on the annotation done using Labelimg in the previous steps. The annotated labels are car, motorcycle, bus, and truck. TFrecord generation: Since the CSV and PBTXT file needed is available, now the TFRecord file can be generated. TFRecord is a simple format for storing a sequence that will be used in the data set training process. This is to make the model able to recognize and detect vehicle images. The author uses a Python script provided by the TensorFlow repository to do this process. Pipeline configuration: The author chooses SSD MobileNet V2 FPN Lite for the pipeline because currently, only TensorFlow models using SSD
Object Detection Application for a Forward Collision Early Warning …
Fig. 5 Research flow diagram
829
830
B. Satya et al.
Fig. 6 Conversion XML to C
MobileNet are compatible to be converted into TFLite. Several codes are needed to be modified in the pipeline configuration, such as the number of classes, batch size, and location for the PBTXT file, train record, and test record file. The batch size can be adjusted based on the available RAM in the hardware. Batch size is the number of images used simultaneously in the training, the bigger the batch size, the more RAM is needed. In this research, the batch size is set to 6. File and directory structure: Before training the model, it will be good to organize all the files and directories accordingly. It is also used as a pre-request to be able to do the training process. The SSD MobileNet V2 FPN Lite 320 × 320 COCO17 TPU-8 folder is used to store the pre-trained model, the model folder contains the output from the training process, and the annotation folder will have the PBTXT, train.record, and test.record files inside.
3.3 Modeling and Evaluation At this step, the data set is ready to be processed to create the image detection model by doing the training using the single shot multibox detection (SSD) method. The data set training will use SSD MobileNet v2 FPN Lite 320 × 320, which is proved in the TensorFlow model zoo, as the pre-trained model. The training can be done online using Google Colab Free GPU or locally. The training is done locally for more or less 8 h or about 24.000 steps using hardware with the specification of Intel i7-7700HQ, Nvidia GTX 1050 M, and 16 GB of RAM. Figure 7 shows a graph showing the loss decrease during the training phase. The training took more or less 8 h and is stopped
Object Detection Application for a Forward Collision Early Warning …
831
when the loss value is already stable between 0.150 and 0.200 to avoid over-fitting and under-fitting of the model. The second evaluation of the model is done by using the model outputted from the training process. A simple Python script is run on the model to detect an image. Figure 8 shows that the model can detect objects in an image, which contains cars. The high percentage signifies that the model is already valid. The next evaluation is to check the input and output model after the model is converted into TFLite for the android operating system. This is done using a web-based application that can be accessed from Netron.app. The result is the model has an input shape [1, 320, 320, 3] and an output shape [1, 10].
Fig. 7 TensorFlow loss graph
Fig. 8 Detection and classification vehicle
832
B. Satya et al.
4 Research Result The result of this research is a simple android application with a feature similar to a forwarding collision mitigation system. This application utilizes a camera bridge view base and imgproc to create the TensorFlow lite custom model available in the OpenCV library. The application was tested using an android smartphone, Xiaomi Redmi Note 5, having a screen size of 5.99 inches. The live test was done directly in the streets by abiding by the safety aspects. The test is done to verify the capabilities of the application in detecting vehicles, classifying the object types, checking the bounding box accuracy, and the capability to show the early warning alert. Table 4 shows the result of the research in terms of vehicle detection and distance calculation. Figure 8a shows that the application can classify the Sedans and MPVs into the car category with an acceptable bounding box accuracy, with the video frame taken from the back of the object. In research done by [2], it was mentioned that a video frame taken from the front of the object does not represent how forward collision mitigation system works. This system works by detecting the distance between cars that goes in the same direction and not going in opposite directions. The model is also trained using images of vehicles taken from the back to increase the accuracy of the model. Figure 8b shows the classification of vehicles detected by the application. The expected vehicle types detected by this application are car, motorcycle, truck, and bus. The bounding box generated by the model is already accurate enough, but the classification is still not accurate enough. This can be seen in buses that are classified as a truck. The lack of accuracy, in this case, does not have a significant impact on the research since the main objective of this research is to detect the safe distance to the object. However, it was also seen that some vehicles are not detected by the application. This can be caused by the angle and the color of the vehicle which can affect the accuracy of the application. The main feature of this application, which is the forward collision early warning system works by calculating the size of the bounding box of the detected vehicle and comparing it with the screen frame size. If the ratio of the bounding box is larger than 75%, then the application will alert the driver by showing the label “Jaga jarak!!!” on the bounding box. If the bounding box ratio is still less than 75%, the system will assume that the distance to the object is still safe. Figure 9 shows a bounding box that has a ratio larger than 75% of the screen size. The application detects that the distance is not safe and will show an alert “Jaga Table 4 Research result
Vehicle type
Distance detected as safe
Distance detected as danger (m)
Car
5m
3
Motorcycle
4m
2
Truck
6m
5
Bus
Detected as truck
Object Detection Application for a Forward Collision Early Warning …
833
Fig. 9 Distance estimation
jarak!!!” on the bounding box. During the live testing, the alert shows up when the distance to the object is ➧3 m. This distance is enough for the driver to do the braking action when the car is going at low speed. In Fig. 8, the system assumes that the distance is still safe since the bounding box size still has not reached the threshold of 75% of the screen frame. However, the system also has a lack in the model as shown in Fig. 10 where the size of the bounding box is not by the shape of the object detected. But from the test, it was discovered that the size of the bounding box does not affect the safe distance when the application gives an alert to keep your distance.
5 Conclusion and Future Work The forward collision early warning application is effective in passively preventing forward collisions at low. The application can detect four types of vehicles which are cars, motorcycles, trucks, and buses and also warns the driver to keep a safe distance between vehicles. The application will warn the driver by giving a visual alert on the phone screen when the distance is 3 m for cars, 2 m for motorcycles, and 5 m for trucks and buses. However, camera positioning has a big significance in the performance of the application. Positioning the smartphone correctly will give
834
B. Satya et al.
a larger distance to do the braking since the application depends on the bounding box size. Other than that, the physical specifications of the smartphone will also give an effect on the application performance In future research, the car speed obtained from GPS information can be taken into account to get a better calculation of the safe distance. It can also give a better alert to do braking based on the real speed of the car. Additional alert modes by using vibration and audio can also be added to improve the passive alert system. Car head unit mode can also be added to the application for cars having an android operation system in their head unit using a 360° camera. The ultrasonic sensor can used to give a better estimation of the distance from the car to an object.
References 1. Isa ISBM, Yong CJ, Shaari NLABM (2022) Real-time traffic sign detection and recognition using raspberry pi. Int J Electr Comput Eng 12:331–338 2. Rahmaniar W, Hernawan A (2021) Real-time human detection using deep learning on embedded platforms: a review. J Robot Control (JRC) 2 3. Evan, Wulandari M, Syamsudin E (2020) Recognition of pedestrian traffic light using tensorflow and SSD mobilenet v2. IOP Conf Ser: Mater Sci Eng 1007:012–022 4. Sudana O, Putra I, Arismandika A (2014) Face recognition system on android using eigenface method. J Theor Appl Inf Technol 61:128–134 5. Hollemans M (2018) SSD mobilenetv2. https://machinethink.net/blog/mobilenet-v2. Accessed 10 Aug 2021 6. Data IAI (2021) Gaikindo wholesales data January–June 2021. https://files.gaikindo.or.id/myf iles/. Accessed 10 Aug 2021 7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg A (2016) SSD: single shot multibox detector 9905:21–37 8. Bhagwat S, Salunkhe A, Raut M, Santra S (2021) Android based object recognition for visually impaired. ITM Web Conf 40 9. NCAP E (2013) Reward 2013 mitsubishi forward collision mitigation (fcm). https://cdn.eur oncap.com/media/2098/2013-mitsubishi-forward-collision-mitigation-fcm.pdf. Accessed 10 Aug 2021 10. Raphael E, Kiefer R, Reisman P, Hayon G (2011) Development of a camera-based forward collision alert system. SAE Int J Passeng Cars—Mech Syst 4:467–478 11. Alliance DSP (2018) Crisp-dm. https://www.datascience-pm.com/crisp-dm-2/. Accessed 26 July 2021 12. Yadav G, Kancharla T, Nair S (2011) Real time vehicle detection for rear and forward collision warning systems 193:368–377
A LSTM Deep Learning Approach for Forecasting Global Air Quality Index Ulises Manuel Ramirez-Alcocer, Edgar Tello-Leal, Jaciel David Hernandez-Resendiz, and Bárbara A. Macías-Hernández
Abstract The air quality index (AQI) provides information on air quality and its impact on human health using a single number. The AQI is derived from the measured concentration of air pollutants over a period. In this work, we present an approach based on deep learning using the long short-term memory (LSTM) neural network architecture, with the ability to predict an AQI for the next few hours. The predictive model learns to infer an air quality index at the hourly level and in an average range of 8 h, a requirement in the environmental regulations for AQI notification. Our experiment uses real data on pollutant concentration levels and meteorological factors collected at four air quality monitoring stations. The performance of the global AQI forecasting made by the LSTM model is measured considering mean absolute error (MAE), root-mean-squared error (RMSE), and mean absolute percentage error (MAPE) parameters reaching very acceptable results in most cases. Keywords Forecasting · Deep learning · AQI · LSTM · Pollution
U. M. Ramirez-Alcocer · J. D. Hernandez-Resendiz Multidisciplinary Academic Unit Reynosa-Rodhe, Autonomous University of Tamaulipas, 88779 Reynosa, Mexico e-mail: [email protected] J. D. Hernandez-Resendiz e-mail: [email protected] E. Tello-Leal (B) · B. A. Macías-Hernández Faculty of Engineering and Science, Autonomous University of Tamaulipas, 87000 Victoria, Mexico e-mail: [email protected] B. A. Macías-Hernández e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4_60
835
836
U. M. Ramirez-Alcocer et al.
1 Introduction Nowadays, large cities face a severe problem with air pollution due to the indiscriminate use and burning of fossil fuels; added to this is global warming, the rapid depletion of resources, highly toxic, and dangerous waste [1–3]. Specifically, air pollution has been increasing concerning population growth, which occurs when there is an excessive influx of gases, particles, and biological molecules into the earth’s atmosphere [4, 5]. Long-term exposure to air pollution is related to a wide range of diseases, including respiratory, cardiovascular, allergies, reduced lung function, and even death [6–9]. For this reason, there is a great interest on the part of public, private, and government institutions to be able to predict air quality accurately and reliably. Then, through this prediction to be able to generate timely alerts so that the population is not exposed to contaminating particles, as well as carry out actions of mitigation, for example, reducing vehicular mobility and the use of masks [10, 11]. In this way, health and environmental control institutions are supported by the term air quality index (AQI). This universal index describes the number of pollutants present in the air, and how this pollution can affect our health over a period. The main pollutants are monitored for their danger, and the damage can cause to people’s health which are particulate matter (PM10 and PM2.5 ), ozone (O3 ), carbon monoxide (CO), nitrogen dioxide (NO2 ), and sulfur dioxide (SO2 ). The AQI can be adapted for any type of contaminant and obtain a separate AQI for each contaminant, allowing institutions to generate a forecast for each contaminant. Predicting a single index for each pollutant can be problematic since it can confuse the population. In the AQI prediction task, machine learning algorithms must be trained using historical air quality datasets [12, 13], which are transformed into time series. Recent studies have shown that this type of task is highly influenced by meteorological factors, as well as spatial variability of air pollutants [14]. Moreover, it has apparent nonlinear effects, then traditional techniques for predicting air quality fail to perform accurately. Deep learning architectures have been implemented, which are characterized by being able to build complex nonlinear models [15–18]. Therefore, this work proposes a multivariable regression model based on long short-term memory (LSTM) neural networks, which allows to predict a global AQI for the next few hours. The data of the pollutant concentration in the air and meteorological factors used in our experiment were collected at four air quality monitoring stations. Then, for each station’s records, an individual AQI is calculated for the pollutants PM10 , PM2.5 , CO, and O3 . Subsequently, the worst AQI was selected from the four AQI obtained, which is defined as a global AQI. The LSTM predictive model predicts the global AQI for each instance of the test dataset at the monitoring station level. The precision of the LSTM predictive model was evaluated using the mean absolute error (MAE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE) metrics. The precision results obtained by the LSTM model are acceptable, reporting values for MAE of 9.7228, RMSE of 13.1550, and MAPE of 27.8246, according to the scenarios analyzed.
A LSTM Deep Learning Approach for Forecasting …
837
2 Background 2.1 LSTM LSTM networks are a type of enhanced network of the recurrent neural network (RNN) type, which has been shown to have significant capabilities when working with dynamic behaviors in time sequences compared to feed-forward networks. It is composed of an internal state block (memory unit), which is applied to processing input sequences, with the ability to learn both long-term and short-term dependencies [19]. The LSTM can learn both long-term and short-term dependencies. Unlike their predecessors (RNN), the gradient fading problem does not affect these [20].
2.2 AQI The AQI is a measure that reports the quality of the air. This measure lets us know how air pollution affects our health in a short period [21]. A standardized and easy way to understand air quality based on the AQI is through color and a qualifier according to the degree of risk that it presents to human health through a scale of numbers and colors. Table 1 describes the AQI levels in detail. Between 0 and 50, the green color will correspond to the satisfactory condition. Between 51 and 100, the yellow color assignment indicates that the air quality is fair. As the magnitude of the concentrations of pollutants increases, a higher number will be assigned, and the colors indicate an increased risk to health, according to the Environmental Standard (NADF-009-AIRE-2017) of Mexico City [22]. Equation 1 and Equation 2 are used to calculate the AQI; it is necessary to consider that some variables used by this equation are predefined, and different values are determined according to the pollutant to be measured. The values used in this work are found in the Environmental Standard of Mexico City [21]. k=
Isup − Iinf PCsup − PCinf
(1)
where: • k = Constant of proportionality, ppm−1 for O3 , NO2 , SO2 , CO, while for PM10 and PM2.5 in m3 /µg. • PCsup = Concentration of the cut-off point greater than or equal to the concentration to be evaluated in ppm for O3 , NO2 , SO2 , and CO, while for PM10 and PM2.5 in µg/m3 . • PCinf = Concentration of the cut-off point less than or equal to the concentration to be evaluated, in ppm for O3 , NO2 , SO2 , CO, while for PM10 and PM2.5 in µg/m3 . • Isup = Index of the PCsup , dimensionless. • Iinf = Index of the PCinf , dimensionless.
838
U. M. Ramirez-Alcocer et al.
Table 1 Health risks and recommendations based on the AQI color range [22] Category Range Health risk Recommendations Good
0–50
Low: There is little or no health risk Moderate: Susceptible groups may present symptoms in health
Regular
51–100
Bad
101–150
High: Susceptible groups have health effects
Very bad
151–200
Very high: All may have health effects; those belonging to susceptible groups experience severe effects
Extremely bad
201–300
Hazardous
301–500
Extremely high: The entire population is likely to experience serious health effects Hazard: The entire population experiences serious health effects
Any outdoor activity can be done People who are extremely susceptible to pollution should consider limiting outdoor exposure Children, older adults, people with respiratory and cardiovascular diseases, and people who physically engage outdoors should limit outdoor exposure Children, older adults, and people with intense physical activity or respiratory and cardiovascular diseases should avoid exposure to the open air. The rest of the population should limit exposure to the open air The entire population should avoid exposure to the open air Suspension of outdoor activities
AQI = (k ∗ (Cobs − PCinf )) + Iinf
(2)
where: • AQI = Air quality index, dimensionless. • Cobs = Observed concentration of the pollutant, in ppm for O3 , NO2 , SO2 , CO, and µg/m3 for PM10 and PM2.5 .
3 Related Work Recently, work has been carried out using deep learning neural networks in the air quality domain to predict the concentration of a pollutant or the AQI for a given period. Next, the research that is close to our proposed approach is described. In [13],
A LSTM Deep Learning Approach for Forecasting …
839
an analysis of air quality prediction using machine learning techniques is presented. The stacking ensemble, AdaBoost, and random forest algorithms obtained the best performance. Similarly, in [15], the combination of support vector regression (SVR) and an LSTM network for air quality classification is proposed. The stages that this work follows for the classification of the AQI are (1) data preprocessing, where filters are applied to the data to remove non-relevant information, (2) feature extraction using gray level co-occurrence matrix (GLCM), which allows for transform the data to data matrices and extract characteristics treating the data as images, (3) classification using deep learning mechanism SVR with LSTM model, where the LSTM network is trained with the data, and this same network is used with SVR to classify the AQI, and (4) the methodology is evaluated using data from 3 air quality monitoring stations using the RMSE and R2 metrics, obtaining 10.9 and 0.97, respectively. In [23], propose a hybrid model for predicting the air quality index based on a first layer of the GRU neural network and the second layer of LSTM, obtaining better performance in the hybrid model compared to the execution of LSTM and GRU methods individually. In [17], the authors present an approach for hourly air quality prediction, applying a combination of multi-scale deep learning (MDL) algorithms and an ensemble for the optimal combination (MDL-OCE). A preprocessing to the dataset applies a missing data filling task and identifies outliers which are executed. Subsequently, a time series decomposition is applied using the ensemble empirical mode decomposition algorithm, generating a finite number of intrinsic variables given the complexity of the data. Furthermore, use the fine-to-coarse technique to reconstruct all the components in high and low frequencies to unify the number of components of the different variables. In the results, the proposed MDL-OCE model performs better than base models in the prediction task. Similarly, in [16], a method for air quality prediction called CT-LSTM is presented, which combines the chi-square test (CT) with an LSTM model. The performance of the proposed method was compared against models based on SVR, multilayer perceptrons (MLP), backpropagation artificial neural network (BPNN), and RNN techniques, managing to exceed the prediction capacity of these techniques. On the other hand, [18] presents a regression method based on LSTM networks for the prediction of air quality based on data from social networks, using air quality data that have a daily sampling frequency. The performance of the proposed approach was compared against the results achieved by models based on multi-linear regression (MLR) and ARIMA. In [24], propose a method based on ARIMA, optimized extreme learning machine (ELM), and a fuzzy time series model to forecast AQI, in which the time series is decomposed in a multi-level approach, to later be reconstructed, managing to extract better information from the time series and remove the random noise.
840
U. M. Ramirez-Alcocer et al.
4 Methodology The main objective of this work is to apply the regression task to a dataset of criteria pollutants and meteorological factors to predict the AQI for the next few hours. The main problem is that the dataset contains information on different pollutants, so for each pollutant (individually), an AQI can be obtained without considering the other pollutants. Therefore, we intend to get a single global air quality index from different pollutants. The proposed methodology comprises three stages: data preparation, global AQI, and AQI forecasting.
4.1 Data Preparation The values of the concentration levels of pollutants and meteorological factors are automatically stored in a cloud repository from the monitoring stations. Furthermore, the monitoring station name and location values and the date and time in which the data collection by the sensors was recorded are stored. Each monitoring station identifies the presence of particulate matter (PM2.5 and PM10 ), O3 , CO, relative humidity (RH), and temperature (T). Table 2 presents an extract from the census dataset per minute. The raw data are extracted to process it and generate two datasets. The first dataset contains the average value per 8 h per attribute, as shown in Table 3. The second dataset includes the average hourly value for each attribute.
4.2 Global AQI The global AQI for a set of pollutants will be generated by applying the following steps for both the hourly AQI and the 8-h AQI. • Let us consider the dataset shown in Table 3 as a data matrix, where each datai is the mean of the values registered during 8 h. • For each datai , the AQI is calculated for the following pollutants PM2.5 , PM10 , O3 , and CO using Eq. 2: – – – –
PM2.5 (AQI(datai (PM2.5 ))) PM10 (AQI(datai (PM10 ))) O3 (AQI(datai (O3 ))) CO (AQI(datai (CO))).
• From the constructed data matrix and the AQIs of PM2.5 , PM10 , O3 , and CO, a second matrix called dataGI is generated, which contains the values of the data matrix and is added a column that stores the global AQI for each record. Then, to select and add the global AQI of each record, we define the following rules:
PM2.5
30 30 30 ... 33 33
Station
AQ-IoT-02 AQ-IoT-02 AQ-IoT-02 ... AQ-IoT-02 AQ-IoT-02
34 34 34 ... 41 41
PM10 0.004 0.005 0.004 ... 0.010 0.010
O3
Table 2 Structure of the air quality monitoring system dataset CO 0.63 0.60 0.63 ... 0.31 0.25
RH 77.3 77.3 77.2 ... 99.9 99.9
T 21.2 21.2 21.2 ... 31.1 31.3
Date 5/3/2021 5/3/2021 5/3/2021 ... 5/6/2021 5/6/2021
Hour 08:00 08:01 08:02 ... 02:00 02:01
A LSTM Deep Learning Approach for Forecasting … 841
PM2.5
25.26 23.04 22.64 ... 14.69 26.94
Station
AQ-IoT-02 AQ-IoT-02 AQ-IoT-02 ... AQ-IoT-02 AQ-IoT-02
27.44 25.22 24.45 ... 15.48 29.54
PM10
Table 3 Mean value of each variable for every 8 h O3 0.009 0.009 0.007 ... 0.024 0.018
CO 1.67 1.74 1.58 ... 0.53 0.52
RH 98.49 99.87 77.6 ... 68.74 52.46
T 29.03 25.90 31.42 ... 30.04 34.65
Timestamp 5/6/2021 08:00 5/6/2021 16:00 5/7/2021 00:00 ... 6/6/2021 08:00 6/6/2021 16:00
842 U. M. Ramirez-Alcocer et al.
A LSTM Deep Learning Approach for Forecasting …
⎧ ⎪ ⎪ datai ⎨ datai dataGIi = data ⎪ i ⎪ ⎩ datai
+ AQI(datai (PM10 )) if AQI(datai (PM10 )) is worst + AQI(datai (PM2.5 )) if AQI(datai (PM2.5 )) is worst + AQI(datai (O3 )) if AQI(datai (O3 )) is worst + AQI(datai (CO)) if AQI(datai (CO)) is worst
843
(3)
where the values of AQI(datai (PM10 )), AQI(datai (PM2.5 )), AQI(datai (CO)), and AQI(datai (O3 )) are the AQIs for each pollutant separately; for the global index, one of these indices is selected based on the condition is worse. This refers to the fact that each air quality index of these pollutants has a category based on Table 1, where the worst case is the “HAZARDOUS” category and decreasing the level until reaching the “GOOD” category, which is the best state (level of air quality) in all cases. To find the is worst index of each pollutant, AQI(datai (PM10 )), AQI(datai (PM2.5 )), AQI(datai (CO)), and AQI(datai (O3 )) are categorized and selected the worst case. The index of the selected contaminant is added as the global index to the corresponding record in the dataGI matrix.
4.3 AQI Forecasting This stage involves training and evaluating the regression model based on the LSTM neural network using the dataGI matrix. Figure 1 provides an overview of the segmentation task of the time series extracted from the dataGI matrix and illustrates the process for selecting the LSTM prediction model. In this learning process, the tasks described below are executed. Data preprocessing. The corresponding data for each monitoring station were extracted from the dataGI matrix and transformed into multivariable time series. This dataset is divided into two datasets; the first corresponds to 80% of the randomly selected instances and is used as the training dataset. 20% of the remaining instances are used as a dataset for the testing stage. Subsequently, each monitoring station’s respective time series were transformed into a supervised learning problem, which is defined to predict the air quality index. Training/Testing model. The LSTM model is designed layer by layer (input layer, hidden layer, and output layer). Subsequently, the selected model is trained and evaluated using the datasets preprocessed in the previous stage.
5 Results 5.1 Dataset Our experiment uses a dataset collected through a network of IoT air quality monitoring stations presented in [25]. We use data from four monitoring stations; each
844
U. M. Ramirez-Alcocer et al.
Fig. 1 Tasks that integrate the AQI forecasting stage Table 4 Total instances in each dataset per monitoring station Station Raw data 1-h mean AQ-IoT-2 AQ-IoT-3 AQ-IoT-4 AQ-IoT-5 Total
329,483 262,915 371,264 529,366 1,493,028
4396 4399 2640 4108 15,543
8-h mean 552 552 468 525 2097
system comprises a set of sensors that collect the concentration of different pollutants and meteorological factors. The software system of the monitoring station transmits the collected data in real-time to a repository in the cloud, which is managed through a set of Web services. The data extracted from the repository are from May 1 to October 31, 2021. The dataset is processed according to the proposed methodology. Table 4 shows the instances in the monitoring station’s dataset. The “raw data” column presents the total original records for each station. The “1-h mean” column and the “8-h mean” column show the total number of records calculated for each station.
A LSTM Deep Learning Approach for Forecasting … Table 5 Error metrics results for hourly prediction Station Training MAE RMSE MAPE AQ-IoT-02 AQ-IoT-03 AQ-IoT-04 AQ-IoT-05
5.3874 6.0423 6.1603 5.9098
8.0094 8.8903 9.8646 8.8984
21.5458 17.8137 18.0253 16.7512
845
MAE
Testing RMSE
MAPE
7.5439 8.3935 8.7617 7.5223
12.0854 12.4988 14.8593 11.5689
29.1155 26.2672 26.0575 24.7677
5.2 Design of the Prediction Model The LSTM neural network models that obtained the best error metrics for the 1-h mean and 8-h mean were designed from a first LSTM-type layer made up of 500 memory units, an activation function of the ReLU type, and the return_sequences property with a value of false. Subsequently, dropout layer with a value equals to 0.3, followed by a layer of type dense with one unit. In the additional parameters, a loss function mae, an optimizer adamax, a batch size with a value of 64, and epochs equal to 300 were defined.
5.3 Hourly AQI Prediction Approach Table 5 shows the results obtained by the prediction model based on the LSTM network, both for the training and test phases. In the training phase, the AQ-IoT02 station achieved the lowest error metrics with 5.3874 in MAE and an RMSE of 8.0094. On the other hand, the AQ-IoT-04 station obtained the error metrics with the highest values in MAE and RMSE, with 6.1603 and 9.8646, respectively. However, a lower measurement in MAPE with a value of 18.0253 is not the highest error within the results achieved in the training phase. In this sense, the AQ-IoT-05 station obtained the lowest value in the MAPE metric (16.7512), and the AQ-IoT-02 station reached the highest error in MAPE, with a value of 21.5458. On the other hand, in the test stage of the LSTM model with the dataset of the AQIoT-05 monitoring station, the lowest error metrics were obtained, with an MAE of 7.5223, RMSE of 11.5689, and a MAPE of 24.7677 (see Table 5). Figure 2d shows the behavior of the time series of the AQI prediction. In this figure, the blue line represents the original values, and the predicted values are represented by the red line, which is very close to the real values. Similarly, in Fig. 2a, predicted values are observed very close to the current values of the hourly AQI; with the dataset of this station (AQ-IoT-02), values of 7.5439, 12.0854, and 29.1155 were obtained for MAE, RMSE, and MAPE, respectively. In this case, the values reached in the MAE and RMSE metrics are very close to the values of the AQ-IoT-02 station. In Fig. 2c, outliers are identified, for example, at points 125, 450, and 475 on the X-axis.
846
U. M. Ramirez-Alcocer et al.
(a) AQ-IoT-02 station
(c) AQ-IOT-04 station
(b) AQ-IoT-03 station
(d) AQ-IoT-05 station
Fig. 2 Comparison between actual data and prediction data for the hourly AQI
These values affect the prediction model’s performance; the RMSE metric is more sensitive than MAE in the presence of outliers, which is corroborated by a high value of 14.8593 in RMSE obtained with the dataset of the AQ-IoT-04 station.
5.4 8-H AQI Prediction Approach Table 6 shows the error metrics achieved by the LSTM predictive model at each monitoring station, both for the training and test stages. With the dataset of the AQ-IoT-02 station, the values closest to zero were reached in the training stage, with an MAE of 9.7228 and an RMSE of 13.1550. But the value calculated for MAPE (23.8632) was the highest in this experimentation. The best MAPE metric was 21.4243, achieved with the AQ-IoT-05 station dataset. In the testing stage of the predictive model, with the dataset from the AQ-IoT-02 station, the lowest error metrics are obtained with an MAE of 9.7228 and an RMSE of 13.1550 (see Table 6), which is confirmed by the time series of the 8-h AQI prediction shown in Fig. 3a. The 8-h AQI prediction error metrics for the AQ-IoT-05 station are also very acceptable, with 10.4789 and 13.7088 for MAE and RMSE, respectively. Figure 3d shows that the predicted data can correctly follow the real data’s behavior. The highest error metrics in the 8-h AQI prediction are obtained with the data from the AQ-IoT-04 station. Figure 3c shows that the predicted data are lagged concerning
A LSTM Deep Learning Approach for Forecasting … Table 6 Error metrics results for 8-h mean prediction Station Training MAE RMSE MAPE AQ-IoT-02 AQ-IoT-03 AQ-IoT-04 AQ-IoT-05 AdaBoost (Zhongli) [13] AdaBoost (Changhua) [13] Stacking ensemble [13] MDLOCE— station I [17] MDLOCE— station 2 [17] MDLOCE— station 3 [17] LSTM [18]
847
MAE
Testing RMSE
MAPE
8.9049 9.6392 14.5450 10.2250 –
12.3844 13.9619 24.9827 13.5467 –
23.8632 21.8494 23.1530 21.4243 –
9.7228 10.5319 13.3479 10.4789 11.801
13.1550 13.7373 16.8803 13.7088 17.386
34.5883 29.0611 27.8246 31.3216 –
–
–
–
12.747
17.877
–
–
–
–
11.517
16.302
–
–
–
–
10.8876
17.5334
17.97
–
–
–
7.9726
13.6109
17.14
–
–
–
6.5031
8.8657
14.33
–
–
–
37.32
48.25
–
the actual data. In most test samples, the predicted data are higher than the real data; this is observed between points 46 and 90 of the X-axis (see Fig. 3c), which coincides with the values obtained in the MAE (13.3479) and MAPE (16.8803) metrics. In the second section of Table 6, the performance achieved by some state-of-theart investigations is presented. The performance shown in [13] corresponds to the 8-h AQI prediction for different datasets, reaching error metrics greater than our proposal. On the other hand, in [17], they get lower values in their error metrics for one-hour predictions. Finally, the performance achieved in [18] corresponds to forecast of 24 continuous hours, which are very far from the performance achieved in our proposal. Therefore, the performance achieved by our proposal using real datasets from different monitoring stations is very acceptable, with low error metric values allowing confidence in the model’s prediction.
848
U. M. Ramirez-Alcocer et al.
(a) 8-hour AQI prediction (AQ-IoT-02)
(b) 8-hour AQI prediction (AQ-IoT-03)
(c) 8-hour AQI prediction (AQ-IoT-04)
(d) 8-hour AQI prediction (AQ-IoT-05)
Fig. 3 Comparison of prediction performance for the 8-h AQI versus actual data
6 Conclusions The proposal presented differs from the works available in the state-of-the-art, where the AQI range is transformed into a categorical variable. In our approach, time series were implemented using the interval value of the real AQI, providing more information and flexibility to the prediction model. The prediction performance of the LSTM model is highly acceptable, with error rates of 7.5223 and 9.7228 in the MAE metric for the prediction of the hourly AQI and the 8-h AQI, respectively. In forecasting evaluation using the RMSE metric, low values of 11.5689 were reached for the hourly AQI and 13.1550 for the 8-h AQI. Furthermore, it was confirmed how the outliers affect the predictive model’s performance, which was confirmed in the values achieved in the error metrics (AQ-IoT-04 station). Moreover, in the experimentation, it was observed that the LSTM models that obtained the best error metrics in the training stage also reached the best error metrics in the test stage. The LSTM model approach proposed to predict the AQI of the next few hours should be evaluated more thoroughly. However, in the first instance, the results allow trusting deep learning architectures for this task.
A LSTM Deep Learning Approach for Forecasting …
849
References 1. Ghahremanloo M, Lops Y, Choi Y, Mousavinezhad S (2021) Impact of the COVID-19 outbreak on air pollution levels in East Asia. Sci Total Environ 754:142226 2. Goudarzi G, Shirmardi M, Naimabadi A, Ghadiri A, Sajedifar J (2019) Chemical and organic characteristics of PM2.5 particles and their in-vitro cytotoxic effects on lung cells: the Middle East dust storms in Ahvaz, Iran. Sci Total Environ 655:434–445 3. Rivera NM (2021) Air quality warnings and temporary driving bans: evidence from air pollution, car trips, and mass-transit ridership in Santiago. J Environ Econ Manag 108:102454 4. Marlier ME, Jina AS, Kinney PL, DeFries RS (2016) Extreme air pollution in global megacities. Curr Clim Change Rep 2(1):15–27 5. Ulpiani G, Hart MA, Di Virgilio G, Maharaj AM (2022) Urban meteorology and air quality in a rapidly growing city: inter-parameter associations and intra-urban heterogeneity. Sustain Cities Soc 77:103553 6. National Institute of Environmental Health Sciences (2022) Air pollution and your health. https://www.niehs.nih.gov/health/topics/agents/air-pollution/index.cfm. Accessed 09 June 2022 7. World Health Organization (2022) Air pollution. https://www.who.int/health-topics/airpollution/#tab=tab_1. Accessed 09 June 2022 8. Almetwally AA, Bin-Jumah M, Allam AA (2020) Ambient air pollution and its influence on human health and welfare: an overview. Environ Sci Pollut Res 27(20):24815–24830 9. Rhee J, Dominici F, Zanobetti A, Schwartz J, Wang Y, Di Q, Balmes J, Christiani DC (2019) Impact of long-term exposures to ambient PM2.5 and ozone on ARDS risk for older adults in the United States. Chest 156(1):71–79 10. Ravindra K, Singh T, Biswal A, Singh V, Mor S (2021) Impact of COVID-19 lockdown on ambient air quality in megacities of India and implication for air pollution control strategies. Environ Sci Pollut Res 28(17):21621–21632 11. Zhang H, Wang S, Hao J, Wang X, Wang S, Chai F, Li M (2016) Air pollution and control action in Beijing. J Cleaner Prod 112:1519–1527 12. Amuthadevi C, Vijayan D, Ramachandran V (2021) Development of air quality monitoring (AQM) models using different machine learning approaches. J Ambient Intell Humanized Comput 1–13 13. Liang YC, Maimury Y, Chen AHL, Juarez JRC (2020) Machine learning-based prediction of air quality. Appl Sci 10(24):9151 14. Lee LC, Sino H (2022) Assessment of the spatial variability of air pollutant concentrations at industrial background stations in Malaysia using self-organizing map (SOM). In: Congress on intelligent systems. Springer, pp 291–304 15. Janarthanan R, Partheeban P, Somasundaram K, Elamparithi PN (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc 67:102720 16. Wang J, Li J, Wang X, Wang J, Huang M (2021) Air quality prediction using CT-LSTM. Neural Comput Appl 33(10):4779–4792 17. Wang Z, Chen H, Zhu J, Ding Z (2021) Multi-scale deep learning and optimal combination ensemble approach for AQI forecasting using big data with meteorological conditions. J Intell Fuzzy Syst 40(3):5483–5500 18. Zhai W, Cheng C (2020) A long short-term memory approach to predicting air quality based on social media data. Atmos Environ 237:117411 19. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 20. Manaswi NK (2018) RNN and LSTM. In: Deep learning with applications using Python. Springer, pp 115–126 21. US EPA (2018) Technical assistance document for the reporting of daily air quality—the air quality index (AQI), EPA 454/B-18-007. Tech. rep. United States Environmental Protection Agency. https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistancedocument-sept2018.pdf
850
U. M. Ramirez-Alcocer et al.
22. Norma ambiental NADF-009-AIRE-2017 para elaborar el índice de calidad del aire en la Ciudad de México. Tech. rep. Secretaría del Medio Ambiente (2017). http://www.aire.cdmx. gob.mx/descargas/monitoreo/normatividad/NADF-009-AIRE-2017.pdf 23. Hossain E, Shariff MAU, Hossain MS, Andersson K (2021) A novel deep learning approach to predict air quality index. In: Kaiser MS, Bandyopadhyay A, Mahmud M, Ray K (eds) Proceedings of international conference on trends in computational and cognitive engineering. Springer Singapore, Singapore, pp 367–381 24. Li H, Wang J, Yang H (2020) A novel dynamic ensemble air quality index forecasting system. Atmos Pollut Res 11(8):1258–1270 25. Tello-Leal E, Macías-Hernández BA (2021) Association of environmental and meteorological factors on the spread of COVID-19 in Victoria, Mexico, and air quality during the lockdown. Environ Res 196:110442
Author Index
A Agarwal, Devendra, 11 Ahlawat, Neha, 79 Alfred Kirubaraj, A., 173 Ancy, P. R., 557 Antonijevic, Milos, 23 Arroyo, Jos’e E. C., 223 B Baby, Cyril Joe, 315 Baby, Cysil Tom, 315 Bacanin, Nebojsa, 23 Bala, Indu, 651 Balasubramanian, A., 525 Beniwal, Kirti, 749 Benjelloun, Ahmed, 639 Bhakta, Dip, 543 Bhatt, Devershi Pallavi, 147 Bhoge, Lokita, 667 C Chakrabarti, Kisalaya, 421 Charan, P. S. R., 455 Chatterjee, Indranath, 407 Chauhan, Rajdeep, 581 Choudhury, Nupur, 253 D de Freitas, Matheus, 223 Devassy, Binet Rose, 777 Dutta, Dipankar, 613 F Félix, Gabriel P., 223
Flores-Quispe, Roxana, 567 Franklin Vinod, D., 79
G Gagan Machaiah, P. C., 347 Gajjar, Pranshav, 789 Gandhiraj, R., 801 Ghose, Debasish, 65 Ghosh, Dibyendu, 65 Gopalapillai, Radhakrishnan, 53 Goplani, Dewang, 581 Govind, P. Jai, 723 Gupta, Ankit, 65 Gupta, Bhavana, 625
H Hajdarevic, Zlatko, 23 Hannah Inbarani, H., 763 Haveri, Banushruti, 393 Hendry, 821 Hernandez-Resendiz, Jaciel David, 835 Hlavac, Vladimir, 443 Honkote, Vinayak, 65 Hossain, Bayzid Ashik, 511
I Ingle, Archana, 735 Islam, Md. Adnanul, 511
J Jani, Y., 379 Jassim, Hothefa Shaker, 23 Jemima, R. Prarthna Grace, 683
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 613, https://doi.org/10.1007/978-981-19-9379-4
851
852 Jiby, Binita, 667 Jose, Deepa V., 37 Joseph, Daniel Bennett, 525 Joseph, Jeffin, 173 Jovanovic, Dijana, 23 Jovanovic, Luka, 23
K Kabak, Saad, 639 Kannan, Srirangan, 525 Kant, Shashi, 11 Kanungo, Sanjeet, 89 Karthikeyan, S., 711 Kaur, Amritpal, 147 King, Gnana, 777 Kiranmai, B. N. S. S., 237 Krishna, Addapalli V. N., 557 Kulshreshtha, Vyom, 127 Kumar, Deepak, 613 Kumar, Naveen, 723 Kumar, Prince, 613 Kumar, Vivek, 749
L Lal, Ashish Kumar, 711 Lanjewar, Madhusudan G., 117 Lazar, Ann Mariya, 777 Li, Yanghepu, 789
M Macías-Hernández, Bárbara A., 835 Makri, Eleni G., 361 Malave, S. H., 209 Manongga, Daniel H. F., 821 Meena, M., 273 Meleet, Merin, 305 Mishra, Nishchol, 625 Mital, Tanya, 581 Mohammed Abdul Razak, W., 305 Mohana, H. S., 285 Mothkur, Rashmi, 481 Motwani, Deepak, 127 Mountstephens, James, 185 Mukhopadhyay, Debarka, 421 Mukta, Md. Saddam Hossain, 511
N Najmusseher, 197 Natarajan, K., 329 Nivetha, S., 763
Author Index O Oza, Parita, 1
P Panda, Manoj Kumar, 801 Papakostas, George A., 495 Parab, Jivan S., 117 Parate, Rajesh K., 117 Patel, Samir, 1 Pathi, A. M. V., 455 Patkar, Deepak, 735 Paul, Bikram, 253 Pavithra, G., 347 Pembarti, Hrithika, 667 Phadatare, Sakshi, 667 Pooja, C. L., 103 Prabhu, Shreekanth M., 53 Praveena, V., 455
Q Quen, Mathieson Tan Zui, 185
R Raajan, P., 379 Raja, Linesh, 147 Ramdasi, Dipali, 667 Ramirez-Alcocer, Ulises Manuel, 835 Ramson, Jino S. R., 173 Rana, Md. Sohel, 543 Ranjan, P. Vanaja, 683 Ranjan, Rakesh, 613 Rathee, Geetanjali, 161 Reena, J., 683 Rishabh, R., 305 Roja, Mani, 735 Roy, O. P., 467 Roy, Sourabh Prakash, 467
S Saikia, Eeshankur, 253 Sangle, Shailesh S., 595 Sankhe, Manoj, 735 Santhosh Kumar, Ch. V. V., 455 Satya, Barka, 821 Sedamkar, Raghavendra R., 595 Sekar, M., 89 Selvam, Sheba, 581 Senith, S., 173 Sethurajan, Monikka Reshmi, 329 Shankar Gowda, B. N., 103
Author Index
853
Sharma, Jaya, 699 Sharma, Paawan, 1 Sharma, Pankaj, 127 Sharma, Paras, 65 Shashi Raj, K., 393 Shinde, S. K., 209 Shreedevi, P., 285 Shubham, 467 Shukla, Praveen Kumar, 11 Siddeshwar, V. A., 525 Singh, A. K., 467 Singh, Aparna, 161 Singha, Poulami, 613 Sivasankaran, K., 525 Snegha, J., 683 Strumberger, Ivana, 23 Swathi, S., 455
U Umme Salma, M., 197
T Tanisha, V., 581 Tapna, Suparba, 421 Tarun Kumar, S., 89 Tello-Leal, Edgar, 835 Thangaraj, Viswanathan, 237 Thipakaran, Ethiraj, 801 Thomas, Priya, 37 Trivedi, Gaurav, 253 Tsimenidis, Stefanos, 495
Y Yadav, Anupam, 651 Yousuf, Mohammad Abu, 543
V Valadi, Jayaraman, 407 Varna Kumar Reddy, P. G., 273 Veerappa, B. N., 481 Velazco-Paredes, Yuber, 567 Venkateswara Rao, Ch., 455 Venkat, P. R., 525 Vidhya, R., 683 Vignesh, R., 683 Vinod, D. Franklin, 699 Vinodha, D., 525
W Wakodikar, Rupesh, 117
Z Zaman, Akib, 511 Zhao, Liang, 789 Zivkovic, Miodrag, 23 Zuo, Zhenyu, 789