340 95 21MB
English Pages 945 [946] Year 2023
Lecture Notes in Networks and Systems 551
Mukesh Saraswat Chandreyee Chowdhury Chintan Kumar Mandal Amir H. Gandomi Editors
Proceedings of International Conference on Data Science and Applications ICDSA 2022, Volume 1
Lecture Notes in Networks and Systems Volume 551
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Mukesh Saraswat · Chandreyee Chowdhury · Chintan Kumar Mandal · Amir H. Gandomi Editors
Proceedings of International Conference on Data Science and Applications ICDSA 2022, Volume 1
Editors Mukesh Saraswat Department of Computer Science and Engineering & Information Technology Jaypee Institute of Information Technology Noida, India Chintan Kumar Mandal Department of Computer Science and Engineering Jadavpur University Kolkata, India
Chandreyee Chowdhury Department of Computer Science and Engineering Jadavpur University Kolkata, India Amir H. Gandomi Data Science Institute University of Technology Sydney Sydney, NSW, Australia
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-6630-9 ISBN 978-981-19-6631-6 (eBook) https://doi.org/10.1007/978-981-19-6631-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the 3rd International Conference on Data Science and Applications (ICDSA 2022). ICDSA 2022 has been organized by School of Mobile Computing and Communication, Jadavpur University, Kolkata, India, and technically sponsored by Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of ICDSA 2022 through the stringent and careful peer review process. This book presents novel contributions to data science and serves as reference material for advanced research. We have tried our best to enrich the quality of ICDSA 2022 through a stringent and careful peer review process. ICDSA 2022 received many technical contributed articles from distinguished participants from home and abroad. ICDSA 2022 received 482 research submissions from 34 different countries, viz. Afghanistan, Albania, Australia, Austria, Bangladesh, Bulgaria, Cyprus, Ecuador, Ethiopia, Germany, Ghana, Greece, India, Indonesia, Iran, Iraq, Italy, Japan, Malaysia, Morocco, Nepal, Nigeria, Saudi Arabia, Serbia, South Africa, South Korea, Spain, Sri Lanka, Taiwan, Thailand, Ukraine, USA, Viet Nam, Yemen. After a very stringent peer reviewing process, only 130 high-quality papers were finally accepted for presentation and the final proceedings. This book presents first volume of 65 research papers data science and applications and serves as reference material for advanced research. Noida, India Kolkata, India Kolkata, India Sydney, Australia
Mukesh Saraswat Chandreyee Chowdhury Chintan Kumar Mandal Amir H. Gandomi
v
Contents
Cancer Prognosis by Using Machine Learning and Data Science: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Lakshmikanth Rajath Mohan and N. Jayapandian
1
A Qualitative Research of Crops Scenario in Punjab with Relation to Fertilizer Usage Using Power BI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Palvi Mittal and Vijay Kumar Sinha
13
Face Detection-Based Border Security System Using Haar-Cascade and LBPH Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arpit Sharma and N. Jayapandian
25
Proposed Experimental Design of a Portable COVID-19 Screening Device Using Cough Audio Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kavish Rupesh Mehta, Punid Ramesh Natesan, and Sumit Kumar Jindal
39
Big Data Framework for Analytics Business Intelligence . . . . . . . . . . . . . . Farhad Khoshbakht and S. M. K. Quadri
51
Technological Impacts of AI on Hospitality and Tourism Industry . . . . . . Sunil Sharma, Yashwant Singh Rawal, Harvinder Soni, and Debasish Batabyal
71
Improvement of Real-Time Kinematic Positioning Using Kalman Filter-Based Singular Spectrum Analysis During Geomagnetic Storm for Thailand Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worachai Srisamoodkham, Kutubuddin Ansari, and Punyawi Jamjareegulgarn
79
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis for Grade up Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Parvathi and T. Amy Prasanna
89
vii
viii
Contents
Optimal Extraction of Bioactive Compounds from Gardenia and Ashwagandha Using Sine Cosine Algorithm . . . . . . . . . . . . . . . . . . . . . . 109 Vanita Garg, Mousumi Banerjee, and Bhavita Kumari A Systematic Review of User Authentication Security in Electronic Payment System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Md. Arif Hassan and Zarina Shukur Skin Diseases Detection with Transfer Learning . . . . . . . . . . . . . . . . . . . . . . 139 Vo Van-Quoc and Nguyen Thai-Nghe Quadratic Dragonfly Algorithm for Numerical Optimization and Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Divya Soni, Nirmala Sharma, and Harish Sharma A Survey on Defect Detection of Vials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 C. R. Vishwanatha, V. Asha, Sneha More, C. Divya, K. Keerthi, and S. P. Rohaan Tversky-Kahneman: A New Loss Function for Skin Lesion Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Do-Hai-Ninh Nham, Minh-Nhat Trinh, Van-Truong Pham, and Thi-Thao Tran Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Manoj Kumar Deka and Yashu Pradhan Structural Optimization With the Multistrategy PSO-ES Unfeasible Local Search Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Marco Martino Rosso, Angelo Aloisio, Raffaele Cucuzza, Rebecca Asso, and Giuseppe Carlo Marano Fuzzy 2-Partition Kapur Entropy for Fruit Image Segmentation Using Teacher-Learner Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . 231 Harmandeep Singh Gill and Guna Sekhar Sajja Population-Based Meta-heuristics for Feature Selection: A Multi-objective Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Jyoti Ahuja and Saroj Ratnoo Distributed Item Recommendation Using Sentiment Analysis . . . . . . . . . . 265 Tinku Singh, Vinarm Rajput, Nikhil Sharma, Satakshi, and Manish Kumar Methods for Optimal Feature Selection for Sentiment Analysis . . . . . . . . 281 Sakshi Shringi and Harish Sharma
Contents
ix
Machine Learning Techniques for the Prediction of Bovine Tuberculosis Among the Cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Rita Roy, Marada Ravindra, Nitish Marada, Subhodeep Mukherjee, and Manish Mohan Baral Dynamic-Differential Pricing, Buy-Back Pricing and Carbon Trading in Renewable Power Distribution Network Design . . . . . . . . . . . . 305 Yu-Chung Tsao, Tsehaye Dedimas Beyene, and Sisay Geremew Gebeyehu Abstractive Summarization is Improved by Learning Via Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 R. Hariharan, M. Dhilsath Fathima, A. Kameshwaran, A. Bersika M. C. Sweety, Vaidehi Rahangdale, and Bala Chandra Sekhar Reddy Bhavanam A Systematic Review on Explicit and Implicit Aspect Based Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Sameer Yadavrao Thakur, K. H. Walse, and V. M. Thakare Experimenting Different Classification Techniques to Identify Unknown Author Using Natural Language Processing . . . . . . . . . . . . . . . . 357 Priyanshu Naudiyal, Chaitanya Sachdeva, and Sumit Kumar Jindal Categorical Data: Need, Encoding, Selection of Encoding Method and Its Emergence in Machine Learning Models—A Practical Review Study on Heart Disease Prediction Dataset Using Pearson Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Nishoak Kosaraju, Sainath Reddy Sankepally, and K. Mallikharjuna Rao Intelligent Computational Model for Accurate and Early Diagnosis of Heart Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Isaac Kofi Nti, Owusu Nyarko-Boateng, Adebayo Felix Adekoya, Patrick Kwabena Mensah, Mighty Abra Ayidzoe, Godfred Kusi Fosu, Henrietta Adjei Pokuaa, and R. Arjun Genetic Algorithm-Based Clustering with Neural Network Classification for Software Fault Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Pushpendra Kumar Rajput, Aarti, and Raju Pal Nodule Detection and Prediction of Lung Carcinoma in CT Images: A Relative Study of Enhancement and Segmentation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 K. A. Nyni and J. Anitha Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm . . . . 431 M. Goudhaman, S. Sasikumar, and N. Vanathi PASPP Medical Transformer for Medical Image Segmentation . . . . . . . . 441 Hong-Phuc Lai, Thi-Thao Tran, and Van-Truong Pham
x
Contents
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things Based Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 S. L. Prathapa Reddy, Poli Lokeshwara Reddy, K. Divya Lakshmi, and M. Mani Kumar Reddy Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making Problems in Triangular Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Sandhya Priya Baral, P. K. Parida, and S. K. Sahoo Black–Scholes Option Pricing Using Machine Learning . . . . . . . . . . . . . . . 481 Shreyan Sood, Tanmay Jain, Nishant Batra, and H. C. Taneja Literature Review on Waste Management Using Blockchain . . . . . . . . . . . 495 S. S. Sambare, Kalyani Khandait, Kshitij Kolage, Keyur Kolambe, and Tanvi Nimbalkar Applicability of Klobuchar Model for STEC Estimation Over Thailand Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Worachai Srisamoodkham, Kutubuddin Ansari, and Punyawi Jamjareegulgarn A Survey on Neural Networks in Cancer Research . . . . . . . . . . . . . . . . . . . . 519 Jerin Reji and R. Sunder To Optimize Google Ad Campaign Using Data Driven Technique . . . . . . 535 K. Valli Priyadharshini and T. Avudaiappan Identification and Detecting COVID-19 from X-Ray Images Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Rahman Farhat Lamisa and Md. Rownak Islam A Novel Metaheuristic with Optimal Deep Learning-Based Network Slicing in IoT-Enabled Clustered Wireless Sensor Networks in 5G Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 B. Gracelin Sheena and N. Snehalatha Student Attention Base Facial Emotion State Recognition Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Md. Mahmodul Hassan, Khandaker Tabin Hasan, Idi Amin Tonoy, Md. Mahmudur Rahman, and Md. Asif Bin Abedin Detection, Depth Estimation and Locating Stem Coordinates of Agricultural Produce Using Neural Networks and Stereo Vision . . . . . 591 R. Nimalan Karthik, S. Manishankar, Srikar Tondapu, and A. A. Nippun Kumaar Psychosomatic Study of Criminal Inclinations with Profanity on Social Media: Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Angelo Baby, Jinsi Jose, and Akshay Raj
Contents
xi
Grading of Diabetic Retinopathy Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 H. Asha Gnana Priya and J. Anitha Cognitive Radio Networks Implementation for Optimum Spectrum Utilization Through Cascade Forward and Elman Backpropagation Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 Rahul Gupta and P. C. Gupta Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Bilkis Jamal Ferdosi, Mohaomd Sadi, Niamul Hasan, and Md Abdur Rahman A Simulative Analysis of Prominent Unicast Routing Protocols over Multi-hop Wireless Mobile Ad-hoc Networks . . . . . . . . . . . . . . . . . . . . 671 Lavanya Poluboyina, Ch. S. V. Maruthi Rao, S. P. V. Subba Rao, and G. Prasad Acharya A Secured Novel Classifier for Effective Classification Using Gradient Boosting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 E. Baburaj, R. Barona, and G. Arulkumaran E-Learning Acceptance in Higher Education in Response to Outbreak of COVID-19: TAM2 Based Approach . . . . . . . . . . . . . . . . . . . 713 Amarpreet Singh Virdi and Akansha Mer Classification of Malignant Skin Cancer Lesion Using CNN, KNN, and SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Renuka Kulkarni, Sanket Giri, Samay Sanghvi, and Ravindra Keskar A BBF, RANSAC, and GWO-Based Hybrid Model for Forensic Image Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Sharda Devi and Vikram Singh Football Player and Ball Tracking System Using Deep Learning . . . . . . . 757 Kadir Diwan, Rajeev Bandi, Sakshi Dicholkar, and Mrinal Khadse What and Why? Interpretability in Colon Cancer Detection . . . . . . . . . . . 771 Ratnabali Pal, Samarjit Kar, and Arif Ahmed Sekh A OTP-Based Lightweight Authentication Scheme in Python Toward Possible Uses in Distributed Applications . . . . . . . . . . . . . . . . . . . . . 781 Anishka Chauhan and Arnab Mitra Face Mask Detection Using YOLOv3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Devesh Singh, Himanshu Kumar, and Shweta Meena Review on Facial Recognition System: Past, Present, and Future . . . . . . . 807 Manu Shree, Amita Dev, and A. K. Mohapatra
xii
Contents
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Soumya Shrivastava and Punit Kumar Johari Software Fault Prediction Using Particle Swarm Optimization and Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843 Samudrala Santhosh, Kiran Khatter, and Devanjali Relan Cardinality Constrained Portfolio Selection Strategy Based on Hybrid Metaheuristic Optimization Algorithm . . . . . . . . . . . . . . . . . . . . 853 Faisal Ahmad, Faraz Hasan, Mohammad Shahid, Jahangir Chauhan, and Mohammad Imran Homograph Language Identification Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 Mohd Zeeshan Ansari, Tanvir Ahmad, Sunubia Khan, Faria Mabood, and Mohd Faizan A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 Rosmy Antony and R. Sunder Sensor Fusion Methodologies for Landmine Detection . . . . . . . . . . . . . . . . 891 Parag Narkhede, Rahee Walambe, and Ketan Kotecha Machine Learning Techniques to Predict Intradialytic Hypotension: Different Algorithms Comparison on Unbalanced Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 Domenico Vito Real-Time Hand Gesture Recognition Using Indian Sign Language . . . . 927 Rabinder Kumar Prasad, Abhijit Boruah, and Sudipta Majumdar Controller for Electromechanical Flap Actuation System in More Electric Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939 L. Sreelekshmy and S. Sreeja Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
Editors and Contributors
About the Editors Mukesh Saraswat is an Associate Professor at Jaypee Institute of Information Technology, Noida, India. Dr. Saraswat obtained his Ph.D. in Computer Science and Engineering from ABV-IIITM Gwalior, India. He has more than 19 years of teaching and research experience. He has guided three Ph.D. students and presently guiding four Ph.D. students. He has published more than 70 journal and conference papers in the area of image processing, pattern recognition, data mining, and soft computing. He was part of successfully completed project funded by SERB, New Delhi on image analysis and currently running one project funded by CRS, RTU, Kota. He has been an active member of many organizing committees for various conferences and workshops. He is also a guest editor of the Array, Journal of Swarm Intelligence, and Journal of Intelligent Engineering Informatics. He is one of the General Chairs of the International Conference on Data Science and Applications. He is also an Editorial Board Member of the Journal MethodsX. He is also a series editor of the SCRS Book Series on Computing and Intelligent Systems (CIS). He is an active member of IEEE, ACM, CSI, and SCRS Professional Bodies. His research areas include Image Processing, Pattern Recognition, Data Mining, and Soft Computing. Chandreyee Chowdhury is an Associate Professor in the department of Computer Science and Engineering at Jadavpur University, India. She has received M.E. in Computer Science and Engineering in 2005 and Ph.D. in 2013 from Jadavpur University. Her research interests include IoT in healthcare, indoor localization, and human activity recognition. She was awarded Post-Doctoral Fellowship by Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK. She has served as technical program committee members of many international conferences. She has published more than 100 papers in reputed journals, book chapters and international peer-reviewed conferences. She is a member of IEEE and IEEE Computer Society.
xiii
xiv
Editors and Contributors
Chintan Kumar Mandal is presently working in the Department of Computer Science and Engineering at Jadavpur University, Kolkata, India. He did his graduation and post-graduation in Computer Science and Engineering from Calcutta University followed by his Ph.D. from Motilal Nehru National Institute of Technology, Allahabad. Prior to these, he did his Physics with Honours from Calcutta University. He has published papers in various journals and conferences. His areas of interest are Computational Geometry, Computer Graphics and Robotics. Amir H. Gandomi is a Professor of Data Science and an ARC DECRA Fellow at the Faculty of Engineering and Information Technology, University of Technology Sydney. Prior to joining UTS, Prof. Gandomi was an Assistant Professor at Stevens Institute of Technology, USA and a distinguished research fellow at BEACON center, Michigan State University, USA. Professor Gandomi has published over 300 journal papers and 12 books which collectively have been cited 31,000+ times (H-index = 81). He has been named as one of the most influential scientific minds and received Highly Cited Researcher awards (top 1% publications and 0.1% researchers) for five consecutive years, 2017 to 2021. He also ranked 17th in GP bibliography among more than 15,000 researchers. He has received multiple prestigious awards for his research excellence and impact, such as the 2022 Walter L. Huber Prize which is known as the highest level mid-career research award in all areas of civil engineering. He has served as associate editor, editor, and guest editor in several prestigious journals such as AE of IEEE TBD and IEEE IoTJ. Professor Gandomi is active in delivering keynotes and invited talks. His research interests are (big) data analytics and global optimisation.
Contributors Aarti Lovely Professional University, Phagwara, Punjab, India Md. Asif Bin Abedin Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh Mighty Abra Ayidzoe Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana Adebayo Felix Adekoya Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana Henrietta Adjei Pokuaa Department of Computer Science, Sunyani Technical University, Sunyani, Ghana Faisal Ahmad Workday Inc, Pleasanton, USA Tanvir Ahmad Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
Editors and Contributors
xv
Jyoti Ahuja Government P.G. College for Women, Rohtak, India Angelo Aloisio Civil Environmental and Architectural Engineering Department, Università degli Studi dell’Aquila, L’Aquila, Italy T. Amy Prasanna BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India J. Anitha Department of Electronics & Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India Kutubuddin Ansari Integrated Geoinformation (IntGeo) Solution Private Limited, New Delhi, India Mohd Zeeshan Ansari Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Rosmy Antony Sahrdaya college of Engineering and Technology, Kodakara, India R. Arjun School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India G. Arulkumaran Bule Hora University, Bule Hora, Ethiopia H. Asha Gnana Priya Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India V. Asha Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Rebecca Asso DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy T. Avudaiappan Department of Computer Science and Engineering, K.Ramakrishnan College of Technology, Samayapuram, Trichy, India E. Baburaj Bule Hora University, Bule Hora, Ethiopia Angelo Baby Rajagiri College of Social Sciences, Kalamassery, Kerala, India Rajeev Bandi Department of Information Technology, SIES Graduate School of Technology Nerul, Navi, Mumbai, India Mousumi Banerjee Division of Mathematics, School of Basic and Applied Sciences, Galgotias University, Greater Noida, Uttar Pradesh, India Manish Mohan Baral Department of Operation Management, GITAM School of Business, GITAM (Deemed to be University), Visakhapatnam, Andhra Pradesh, India Sandhya Priya Baral Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India
xvi
Editors and Contributors
R. Barona St.Xavier’s Catholic College of Engineering, Chunkankadai, India Debasish Batabyal Amity University, Kolkata, India Nishant Batra Department of Applied Mathematics, Delhi Technological University, Delhi, India Tsehaye Dedimas Beyene Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan; Faculty of Mechanical and Industrial Engineering, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar, Ethiopia Bala Chandra Sekhar Reddy Bhavanam Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Abhijit Boruah Department of CSE, DUIET, Dibrugarh University, Dibrugarh, Assam, India Anishka Chauhan Department of Computer Science and Engineering, SRM University-AP, Amaravati, Andhra Pradesh, India Jahangir Chauhan Department of Commerce, Aligarh Muslim University, Aligarh, India Raffaele Cucuzza DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy Manoj Kumar Deka Bodoland University, Kokrajhar,, Assam, India Amita Dev Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Sharda Devi Department of Computer Science and Engineering, Chaudhary Devi Lal University, Sirsa, India M. Dhilsath Fathima Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Sakshi Dicholkar Department of Information Technology, SIES Graduate School of Technology Nerul, Navi, Mumbai, India K. Divya Lakshmi Department of Electronics and Communication Engineering, K.S.R.M College of Engineering, Kadapa, Andhra Pradesh, India C. Divya Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Kadir Diwan Department of Information Technology, SIES Graduate School of Technology Nerul, Navi, Mumbai, India
Editors and Contributors
xvii
Mohd Faizan Department of Electronics and Communication, Jamia Millia Islamia, New Delhi, India Bilkis Jamal Ferdosi University of Asia Pacific, Dhaka, Bangladesh Godfred Kusi Fosu Department of Computer Science, Sunyani Technical University, Sunyani, Ghana Vanita Garg Division of Mathematics, School of Basic and Applied Sciences, Galgotias University, Greater Noida, Uttar Pradesh, India Sisay Geremew Gebeyehu Faculty of Mechanical and Industrial Engineering, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar, Ethiopia Harmandeep Singh Gill Mata Gujri Khalsa College, Punjab, India Sanket Giri Visvesvaraya National Institute of Technology, Nagpur, India M. Goudhaman CSE, Saveetha Institute of Medical and Technical Sciences— [SIMATS], Chennai, India B. Gracelin Sheena Department of Computational Intelligence, SRM Institute of Science and Technology Kattankulathur, Chennai, Tamil Nadu, India P. C. Gupta Department of Computer Science and Informatics, University of Kota, Kota, India Rahul Gupta Department of Computer Science and Informatics, University of Kota, Kota, India R. Hariharan Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Faraz Hasan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India Khandaker Tabin Hasan Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh Niamul Hasan University of Asia Pacific, Dhaka, Bangladesh Md. Mahmodul Hassan Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh Md.Arif Hassan Center for Cyber Security, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia Mohammad Imran Department of Computer Science, Aligarh Muslim University, Aligarh, India Md. Rownak Islam Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh
xviii
Editors and Contributors
Tanmay Jain Department of Applied Mathematics, Delhi Technological University, Delhi, India Punyawi Jamjareegulgarn King Mongkut’s Institute of Technology Ladkrabang, Prince of Chumphon Campus, Chumphon, Thailand N. Jayapandian Department of Computer Science and Engineering, CHRIST (Deemed to be University), Kengeri Campus, Bangalore, India Sumit Kumar Jindal School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Punit Kumar Johari Madhav Institute of Technology and Science, Gwalior, MP, India Jinsi Jose Rajagiri College of Social Sciences, Kalamassery, Kerala, India A. Kameshwaran Dr. M. G. R Educational and Research Institute, Chennai, India Samarjit Kar National Institute of Technology Durgapur, Durgapur, India K. Keerthi Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Ravindra Keskar Visvesvaraya National Institute of Technology, Nagpur, India Mrinal Khadse Department of Information Technology, SIES Graduate School of Technology Nerul, Navi, Mumbai, India Sunubia Khan Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Kalyani Khandait Pimpri Chinchwad College of Engineering, Pune, India Kiran Khatter BML Munjal University, Kapriwas, India Farhad Khoshbakht Department of Computer Science, Jamia Millia Islamia(A Central University), New Delhi, India Kshitij Kolage Pimpri Chinchwad College of Engineering, Pune, India Keyur Kolambe Pimpri Chinchwad College of Engineering, Pune, India Nishoak Kosaraju Data Science and Artificial Intelligence, International Institute of Information Technology, Naya Raipur, India Ketan Kotecha Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, India Renuka Kulkarni Visvesvaraya National Institute of Technology, Nagpur, India Himanshu Kumar Department of Software Engineering, Delhi Technological University, New Delhi, India
Editors and Contributors
xix
Manish Kumar Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India Bhavita Kumari Division of Mathematics, School of Basic and Applied Sciences, Galgotias University, Greater Noida, Uttar Pradesh, India Hong-Phuc Lai Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam T. Lakshmikanth Rajath Mohan Department of Computer Science and Engineering, CHRIST (Deemed to Be University), Kengeri Campus, Bangalore, India Rahman Farhat Lamisa Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Poli Lokeshwara Reddy Department of Electronics and Communication Engineering, Anurag University, Hyderabad, Telangana, India Faria Mabood Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Sudipta Majumdar Department of CSE, DUIET, Dibrugarh University, Dibrugarh, Assam, India K. Mallikharjuna Rao Data Science and Artificial Intelligence, International Institute of Information Technology, Naya Raipur, India M. Mani Kumar Reddy Department of Electronics and Communication Engineering, JNTUA College of Engineering Pulivendula, Pulivendula, Andhra Pradesh, India S. Manishankar Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India Nitish Marada Department of Mechanical Engineering, GITAM Institute of Technology, GITAM (Deemed to be University), Visakhapatnam, Andhra, India Giuseppe Carlo Marano DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy Shweta Meena Department of Software Engineering, Delhi Technological University, New Delhi, India Kavish Rupesh Mehta School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Patrick Kwabena Mensah Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana Akansha Mer Department of Commerce and Management, Banasthali Vidyapith, Banasthali, Rajasthan, India
xx
Editors and Contributors
Arnab Mitra Department of Computer Science and Engineering, SRM UniversityAP, Amaravati, Andhra Pradesh, India Palvi Mittal Department of CSE, Chandigarh University, Sahibzada Ajit Singh Nagar, India A. K. Mohapatra Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Sneha More Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Subhodeep Mukherjee Department of Operation Management, GITAM School of Business, GITAM (Deemed to be University), Visakhapatnam, Andhra Pradesh, India Parag Narkhede Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India Punid Ramesh Natesan School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Priyanshu Naudiyal School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Do-Hai-Ninh Nham School of Applied Mathematics and Informatics, Hanoi University of Science and Technology, Hanoi, Vietnam R. Nimalan Karthik Amrita School Vidyapeetham, Bengaluru, India
of
Engineering,
Amrita
Vishwa
Tanvi Nimbalkar Pimpri Chinchwad College of Engineering, Pune, India A. A. Nippun Kumaar Amrita Vidyapeetham, Bengaluru, India
School
of
Engineering,
Amrita
Vishwa
Isaac Kofi Nti Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana Owusu Nyarko-Boateng Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana K. A. Nyni Department of Mechatronics Engineering, Jyothi Engineering College, Cheruthuruthy, Kerala, India; Department of Electronics & Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India Raju Pal CSE & IT Department, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Editors and Contributors
xxi
Ratnabali Pal National Institute of Technology Durgapur, Durgapur, India; Brainware University, Kolkata, India P. K. Parida Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India M. Parvathi BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India Van-Truong Pham Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam Lavanya Poluboyina Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India Yashu Pradhan Bodoland University, Kokrajhar,, Assam, India G. Prasad Acharya Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India Rabinder Kumar Prasad Department of CSE, DUIET, Dibrugarh University, Dibrugarh, Assam, India S. L. Prathapa Reddy Department of Electronics and Communication Engineering, K.S.R.M College of Engineering, Kadapa, Andhra Pradesh, India S. M. K. Quadri Department of Computer Science, Jamia Millia Islamia(A Central University), New Delhi, India Vaidehi Rahangdale Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India Md. Mahmudur Rahman Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh Md Abdur Rahman University of Asia Pacific, Dhaka, Bangladesh Akshay Raj Smater Codes, Bengaluru, Karnataka, India Pushpendra Kumar Rajput School of Computer Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India Vinarm Rajput Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India Ch. S. V. Maruthi Rao Department of Electronics and Communication Engineering, Sreyas Institute of Engineering and Technology, Hyderabad, Telangana, India Saroj Ratnoo Department of CSE, GJUST, Hisar, India Marada Ravindra Digital Specialist Engineer, Infosys, Hyderabad, India
xxii
Editors and Contributors
Yashwant Singh Rawal Parul University, Vadodara, Gujarat, India Jerin Reji Sahrdaya College of Engineering and Technology, Kodakara, Kerala, India Devanjali Relan BML Munjal University, Kapriwas, India S. P. Rohaan Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Marco Martino Rosso DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Turin, Italy Rita Roy Department of Computer Science and Engineering, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India Chaitanya Sachdeva School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Mohaomd Sadi University of Asia Pacific, Dhaka, Bangladesh S. K. Sahoo Institute of Mathematics & Applications, Bhubaneswar, India Guna Sekhar Sajja Information Technology Department, University of the Cumberlands, Kentucky, USA S. S. Sambare Pimpri Chinchwad College of Engineering, Pune, India Samay Sanghvi Visvesvaraya National Institute of Technology, Nagpur, India Sainath Reddy Sankepally Data Science and Artificial Intelligence, International Institute of Information Technology, Naya Raipur, India Samudrala Santhosh BML Munjal University, Kapriwas, India S. Sasikumar Faculty of CSE, Saveetha Engineering College, Chennai, India Satakshi SHUATS Allahabad, Allahabad, Uttar Pradesh, India Arif Ahmed Sekh XIM University, Bhubaneswar, India Mohammad Shahid Department of Commerce, Aligarh Muslim University, Aligarh, India Arpit Sharma Department of Computer Science and Engineering, CHRIST (Deemed to be University), Kengeri Campus, Bangalore, India Harish Sharma Rajasthan Technical University, Kota, Rajasthan, India Nikhil Sharma Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India Nirmala Sharma Rajasthan Technical University, Kota, Rajasthan, India
Editors and Contributors
xxiii
Sunil Sharma Pacific University, Udaipur, Rajasthan, India Manu Shree Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Sakshi Shringi Rajasthan Technical University, Kota, Rajasthan, India Soumya Shrivastava Madhav Institute of Technology and Science, Gwalior, MP, India Zarina Shukur Center for Cyber Security, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia Devesh Singh Department of Software Engineering, Delhi Technological University, New Delhi, India Tinku Singh Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India Vikram Singh Department of Computer Science and Engineering, Chaudhary Devi Lal University, Sirsa, India Vijay Kumar Sinha Department of CSE, Chandigarh University, Sahibzada Ajit Singh Nagar, India N. Snehalatha Department of Computational Intelligence, SRM Institute of Science and Technology Kattankulathur, Chennai, Tamil Nadu, India Divya Soni Rajasthan Technical University, Kota, Rajasthan, India Harvinder Soni Taxila Business School, Jaipur, Rajasthan, India Shreyan Sood Department of Applied Mathematics, Delhi Technological University, Delhi, India S. Sreeja College of Engineering Trivandrum, Trivandrum, Kerala, India L. Sreelekshmy College of Engineering Trivandrum, Trivandrum, Kerala, India Worachai Srisamoodkham Faculty of Agricultural and Industrial Technology, Phetchabun Rajabhat University, Sadiang, Thailand; Faculty of Agricultural and Industrial Technology, Phetchabun Rajabhat University, Phetchabun, Thailand S. P. V. Subba Rao Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India R. Sunder Sahrdaya College of Engineering and Technology, Kodakara, Kerala, India A. Bersika M. C. Sweety Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India
xxiv
Editors and Contributors
H. C. Taneja Department of Applied Mathematics, Delhi Technological University, Delhi, India Nguyen Thai-Nghe Can Tho University, Can Tho City, Viet Nam V. M. Thakare P.G. Department of CS and Engineering, Sant Gadge Baba Amravati University, Amravati (M.S), India Sameer Yadavrao Thakur P.G. Department of CS and Engineering, Sant Gadge Baba Amravati University, Amravati (M.S), India Srikar Tondapu Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India Idi Amin Tonoy Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh Thi-Thao Tran Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam Minh-Nhat Trinh Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam Yu-Chung Tsao Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan; Artificial Intelligence for Operations Management Research Center, National Taiwan University of Science and Technology, Taipei, Taiwan; Department of Business Administration, Asia University, Taichung, Taiwan; Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan K. Valli Priyadharshini Department of Computer Science and Engineering, K.Ramakrishnan College of Technology, Samayapuram, Trichy, India Vo Van-Quoc Nhi Dong Hospital, Can Tho City, Viet Nam N. Vanathi Faculty of Science and Humanity, KCG College of Technology, Chennai, India Amarpreet Singh Virdi Department of Management Studies, Kumaun University, Bhimtal (Nainital), Uttarakhand, India C. R. Vishwanatha Department of MCA, New Horizon College of Engineering, Bengaluru, India; Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India Domenico Vito Politecnico di Milano, Milano, MI, Italia
Editors and Contributors
xxv
Rahee Walambe Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, India K. H. Walse Department of CS and Engineering, Anuradha Engineering College, Chikhali (M.S), India
Cancer Prognosis by Using Machine Learning and Data Science: A Systematic Review T. Lakshmikanth Rajath Mohan and N. Jayapandian
Abstract Cancer is one of the most fatal diseases in the world and the leading cause for most deaths worldwide. Diagnosing cancer early has become the need of the day for doctors and researchers as it allows them to categorize patients as highrisk and low-risk categories which will eventually help them in correct diagnosis and treatment. Machine learning is a subset of artificial intelligence that makes use of raw data to make predictions and insights. Using machine learning for cancer prognosis has been under study for a long time and several papers have been published regarding the same. Even though many papers have been published on the usage of statistical methods for cancer prognosis, it has been proved that machine learning models provide more accuracy when compared to conventional statistical methods of detection. These machines can be trained to detect abnormalities such as a tumour by looking at real-world examples. Models such as artificial neural networks, decision trees, clustering techniques, and K-Nearest-Neighbours (KNNs) are being used for cancer prediction, prognosis and also research purposes. The key aim of this article is to go through the popular key trends in using machine learning algorithms for cancer prognosis, types of input datasets to be fed, different types of cancers that can be studied and also the performance of these models. Keywords Artificial intelligence · Cancer · Data science · Decision trees · Deep learning · Machine learning
1 Introduction The number of cancer cases in India has been skyrocketing and it is said to increase by 12% in the next five years. As of 2020, it is estimated that one out of every seventy T. Lakshmikanth Rajath Mohan · N. Jayapandian (B) Department of Computer Science and Engineering, CHRIST (Deemed to Be University), Kengeri Campus, Bangalore, India e-mail: [email protected] T. Lakshmikanth Rajath Mohan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_1
1
2
T. Lakshmikanth Rajath Mohan and N. Jayapandian
thousand individuals are affected in cancer. Cancer is fatal in most cases and even though a hundred percept recovery cannot be promised, diagnosing it at early stages can be very helpful to categorize patients under low-risk and high-risk categories which becomes a vital part of further outlook and treatment. Back in the day, cancer was very hard to detect [1]. Each and every cell had to be examined and checked for malignancy. This becomes very tedious and time consuming when coming to traditional methods such as screening, where there would be multiple doctors from different fields analysing the patient by taking into account multiple factors that go into the cancer diagnosis such as age, general health, inheritance patterns and other such biomarkers. Cancer prognosis is concerned with not only diagnosis but also things like predicting the vulnerability of cancer (risk factors), prediction of the rate of possibilities of recurrence (likelihood of developing cancer post successful cancer treatment) and also the mortality rate [2]. All these become a huge challenge for even the most experienced and skilled doctors. Doctors in the past have relied on macro details for cancer prognosis such as tumour, cankers, lymph nodes, and environment variables. However with the recent development of technology, machines like MRI and CT scans have helped us concentrate on microscopic details which go into cancer diagnosis using image recognition [3]. The development of AI and machine learning, machines can learn to detect such abnormalities by looking at real-world examples [4]. Using numerous machine learning algorithms and tools such as Kmeans-clustering, K-Nearest-Neighbour, supervised vector machines (SVM), and artificial neural networks, information can directly be fed into the model and made enabled to perform zillions of calculations on each data set point before deciding that this is a tumour and this is not. In the detection of cancer, results have kept on proving that statistical and computational abilities are far more reliable than intuitional and human abilities [5]. Researchers and scientists have found out that using machine learning can enhance the unambiguousness of cancer prediction and prognosis by about 20% compared to statistical methods. Using machine learning and data science for cancer prognosis is beneficial not only for the patient but also for the physicians, doctors, and hospitals [6]. There have been multiple papers published about cancer prognosis and the usage of machine learning algorithms in the same. While carrying out cancer prognosis is concerned with three main domains. First one is rate of vulnerability or susceptibility of cancer. Second one is prediction of the frequency of cancer repetition and third one is prediction of the mortality rate. The first one has to do with the risk of developing cancer; the second has to do with the risk of redeveloping cancer after a successful treatment and the latter deals with the rate of successful outcome that is life expectancy or survivability. It was discovered that all these predictions rely on three types of input data. First one is genomic data; it is referring to genomes and DNA of an organism. Genomic data is concerned with the collection, storage and analysis of genomes of living organisms. Second data type is proteomic data that is referring to the systematic recognition and analysis of proteins (or proteomes) of a biological component of a living organism, such as a tissue or a cell [7]. Third one is clinical data that is referring to the wide range of data/information collected both at micro and macro levels for clinical purposes or trials. Examples include tumour size, tumour weight, and inheritance patterns.
Cancer Prognosis by Using Machine Learning and Data Science: …
3
The continuous development of genomic is to be identified with the support of PET scans. Respectively, and micro and macro data of patients can now be easily obtained. Recently it has even been found out that using multiple bio datasets for prediction is far more effective than relying on a single biomarker or a composite test. If molecular data sets are combined with macro scale details, the vigour and accuracy of these machines increase drastically. However, it is a challenge to make sense of the output information. The key aim of this article is to go through the popular key trends in using machine learning algorithms for cancer prognosis, types of input datasets to be fed, different types of cancers that can be studied and also the performance of these models.
2 Literature Review The key approach to solving any problem using machine learning solely depends on choosing the right model. But before getting there, we must understand what machine learning actually is and how it solves our problem [8]. As much as it is important to understand what machine learning can do for us, it is also equally important to understand what it cannot do. As the name itself suggests, machine learning is the process of learning for a machine. It is a subset of artificial intelligence which uses probabilistic and statistical methods to classify data and learn from past examples to perform certain tasks. It can also be used to predict the future outcomes based on the previous data and experience gained. The accuracy however depends on the amount of input data fed. The accuracy of a machine is determined by dividing the number of correct predictions by the total number of predictions. The False predictions are called “false positives” and they are ignored while calculating the accuracy. The lesser the number of false positives, the more is the accuracy. In other words, machine learning is just “Framed Statistics”. Many machine learning algorithms are use Boolean logic (true/false), conditional statements (OR/ELSE/ELSE IF) and probabilistic methods (Given X, what is Y) for data classification [9]. Machine learning has become so popular because the learning pattern of these machines resembles that of humans. We humans also learn from past examples and so do these machines. Machine learning has reached towering recognition for its accuracy and precision. Even Though statistical methods such as regression help us to classify data, their use is limited because regression assumes linearity of variables. It is very powerful however it is extremely sensitive to outliers. Outliers are basically the wow factors of data. Suppose a man at age 130 made $50M. This is a multivariate outlier, because firstly it is very rare to find people of that age and on top of which making so much money. Outliers like these have a cosmic effect on regression. In these situations, the use of machine learning becomes inevitable. Especially in the medical field where data can be very surprising and also may rarely be linear. Machine learning is definitely the state of the art technology we have at this point. This point being said, it does not mean it always promises success and a hundred percept results. Good understanding of machine learning algorithms and good taste in understanding the
4
T. Lakshmikanth Rajath Mohan and N. Jayapandian
problem in hand is very much required. And so is the understanding of the drawbacks of these models. The chances of success in a machine learning experiment solely depends on the experience of the machine learning architect (not always), number of false positives, quality of the input data, robustness of the model. If all these are well taken care of, the rate of failure can be drastically reduced. One must also be cautious so as to not include more variables than events that is reduce the number of data stores. If this is not taken care of, there might be a possibility of having redundant learners in the model. Bellman in 1961 defined the problem of having too many variables and examples in one model as “The Curse of Dimensionality” [10]. Having too many variables is a huge problem and a curse indeed for both statistical and machine learning models. Now how do we solve this problem of having multiple variables? The way we go about doing this is to increase the number of “training sets”. Training set is a material fed into the computer through which it learns the art of processing data. The sample per feature ratio or number of (training sets): (variables) should always exceed five is to one. Just increasing the amount of training sets will not be of any help. There are few other characteristics which also have to be met. For example, the training set should also be diverse that is training set should cover the span of the entire portion of representations that the learner wishes to encounter. Overtraining is also a problem that we might often face. Overtraining refers to training a dataset with too many less examples with very less diversity. Overtraining is also referred to as training on noise. This brings us to another important characteristic of a good training set which is, a training set is expected to have less noise/variance ratio. All these characteristics if not met lead to poor performance of the model. As stated multiple times earlier, machine learning methods definitely do have an upper hand compared to statistical methods. But sometimes statistical methods also prove to be more useful than machine learning methods [11]. This happens in cases where the one, presumably the user assumes that the data is non-linear but this assumption would have gone wrong. This just goes on to show how choosing the right model to diagnose a particular problem is extremely important. All machine learning models cannot be designed in the same way. Each model is trained in a different way to tackle a particular problem. One model, say “X” may solve problem “A” in a better way when compared to model “Y” which was designed to solve problem “B”. The capability of a data scientist is defined based on how accurately he decides which model to use for which particular problem [12]. Because deciding which model is best suited for a particular problem is not very evident and it can be a tedious task. It is very important that a data scientist tries more than just one model to figure out which one is the best suited for the problem. There is a common misconception that the trends or the outcomes that come out of the machine are instinctively not detectable. This is however not really true. Sometimes it just so happens that, without the use of any machine learning model, an expert in the field can rightly spot the problem given that he has enough experience and time.
Cancer Prognosis by Using Machine Learning and Data Science: …
5
3 Machine Learning Strategies There are majorly three categories of machine learning models are available. First one is supervised learning, as the name suggests is a machine learning algorithm which learns in the presence of a “supervisor” or a “persistence provider”. While training a machine supervised learning refers to teaching the machine based on the data which has been labelled. This can be illustrated with a real life example. Suppose I show a person a photo of a carrot. He immediately recognizes that it is a carrot. This is because in his mind the picture is recognized and is labelled as a carrot. But what happens if I show a person a photo of a vegetable that he has never seen before? His guess might be wrong or right. The same thing applies to that of a machine learning model. If the model rightly gives the output, then well and good [13]. We have nothing to worry about. But if the answer however is wrong, then we need to make sure that the model is updated and missed data points are labelled so that the next time the same input is given, the machine must be able to recognize. Like we discussed, Image processing is a widely used application of supervised learning [14]. Second one is unsupervised learning, it deals with a set of data or examples that are fed into the machine. These data are unlabelled. It is up to the model to figure out patterns and make predictions based on those examples. A common example of unsupervised learning is that of a bank. Banks take in more information than required during the process of application filling and not all of this is relevant. Suppose a customer desires to repay a loan, his middle name has got nothing to do with this decision. In unsupervised learning it is important to note that not all information that is fed into the machine is relevant. Some may be useful and some may simply not. “Deep Learning” is a branch in unsupervised learning that has gained itself a lot of attention in the past decade. self-organizing feature Maps (SOM’s), naive Bayes Classifier, and K-Means-Clustering are some of the many examples for unsupervised learning algorithms [15]. Third one is reinforcement learning is basically the art of decision-making. It is about learning the behaviour of the environment from interactions with the same to obtain maximum reward. It may be conducted in the presence or absence of a supervisor. Examples include selfdriving cars, robots, etc. Reinforcement learning can work in dynamic situations and can be used to solve highly complex problems. This is why reinforcement learning is deemed as one of the most complex fields in machine learning. In fact, all the machine learning algorithms used in the field of cancer prognosis and prediction are based on supervised learning. Interestingly, these algorithms belong to a field of classifiers that classifies based on conditionality’s and conditional decisions. Some of the widely used conditional algorithms include. Major one is artificial neural networks (ANNs) [16]. Then another important one is decision trees (DTs) and finally K-Nearest-Neighbour (KNN). Artificial neural networks are very capable of working in dynamic environments and are used for image recognition. However, they cannot solve very complex pattern recognition problems [17]. From the name, it is pretty clear that they work similar to that of a human brain consisting of multiple neurons which are interconnected to each other by axon junctions. These neurons
6
T. Lakshmikanth Rajath Mohan and N. Jayapandian
are trained by using labelled training datasets. Mathematically, these neurons are represented by matrices or tables. This is analogical to the human brains’ cortical [18]. The way that these neural networks work is that they consist of multiple layers or also commonly referred to as hidden layers which are responsible for processing the input data and then generating a specific output. The biggest question while working with neural networks is how to map the real world data such as an image to a mathematical vector or a matrix. The labelled training datasets which are injected into the neural network can essentially change the numbers in the neural networks weight matrices. Many neural networks work on the principle of multi-layeredfeed-forward architecture. This means that they do not support feedback. Feedback majorly means reinjection a part of the back into the input. ANNs have connections that do not loop. The design of ANNs can be customised depending on the situation and problem at hand. A rather substantial disadvantage of artificial neural networks is that trying to figure out where a mistake or miscalculation has occurred in an ANN is almost impossible. They are very hard to decipher once they are trained. This paved way for DTs or decision trees. DTs are very easy to decipher. DTs are basically structured flowcharts or graphs, where one step is the consequence of the previous step. These consequences finally lead to the main goal. DTs have been present for centuries and are not something new to the medical field. A Simple illustration below proves the steps or methods involved in detection of breast cancer. But sometimes, ANNs have proved to be more powerful when compared to DTs. A new spark in the machine learning industry has been the introduction of support vector machines also commonly known as SVMs. There has been no significant research on how SVMs can be used for cancer prognosis. Let us consider a situation for the detection of cancer using SVMs. If given a graph which contains a scatter plot of points, say benign versus malignant cells. What an SVM does is that it generates a line equation which separates the two clusters superlatively. This separation line would then become a plane. With the inclusion of more variables, this plane would become a hyper-plane. Conventionally, the algorithm creates a hyper-plane which can successfully separate the two classes into two groups with maximum margin. What makes SVMs so effective is that they can be used to perform both linear and nonlinear classifications. The way that SVMs perform these non-linear classifications is by using something known as Non-Linear Kernel (NLK). NLK is a mathematical function which is used to transform linear datasets into higher dimensional feature spaces. Like ANNs, SVMs too can be used to perform multiple complex image and pattern recognition. SVMs are more widely used for nonlinear classifications. The recent advancement in the field of neuroscience and machine learning is abnormality detection. The EEG or electroencephalography is a procedure by which small metal discs are attached to the scalp of the patient, and these discs are responsible for the detection of tiny charges that the brain emits. Machine learning has proven successful in leveraging EEG analysis to diagnose epileptic seizures which in turn may lead to brain cancer. Neural network architectures such as convolutional neural networks and temporal convolutional neural networks are being used to decode and encode pathological patterns from the EEG data. One of the algorithms
Cancer Prognosis by Using Machine Learning and Data Science: …
7
in EEG analysis using machine learning is called, feature based decoding framework. Here, a selected number of features represent the entire dataset. For example, we could consider about only five patients experiencing epilepsy and psychosis, perform EEG analysis and then decode the pathological patterns to find matches in the entire dataset. Provided, these five cases are informative enough. Thus selection of these features (typically referred to as ‘priori’) becomes an integral part of the analysis.
4 Machine Learning Usage in Cancer Prognosis The usage of data science and machine learning in cancer prognosis is growing overnight as technology is progressing. It is noted that the number of papers which have been published in this field that is usage of machine learning for cancer prognosis is increasing by 25% every year. It was also very interesting to find out that the majority of these papers (around 86%) were associated with the prediction of the mortality rate (around 44%) and cancer recurrence (around 42%). In recent times, there are more papers being published which deal with the risk factors associated with cancer. These amazing numbers just show how much machine learning has influenced the health industry. As stated once before, it can detect cancer with about 25% more precision when compared to any traditional methods of detection. There have been strong prejudices amongst scientists as to the use of machine learning for risk analysis of breast and prostate cancer. Regardless, machine learning has proved its reliability to predict multiple kinds of cancers. About 70% of these predictions are made using neural networks. Second in this list are SVMs; later come DTs and clustering techniques. Algorithms such as fuzzy logics are rarely used. Researchers have described this to be disappointing because ANN is a relatively old technology compared to SVMs and fuzzy logics. The susceptibility prediction, out of 79 papers that have been published, only three have emphasized the usage of machine learning for the prediction of cancer risk susceptibility. One particular paper has used single nucleotide polymorphism (SNPs) for spontaneous breast cancer prediction. The thesis of this publication was that few particular types of SNPs would eventually result in accumulation of environmental toxins in the breast tissues which would furthermore lead to higher risk of cancer susceptibility. The key to this research was that the researchers had widely used several other methods to reduce the number of variables involved thereby leading to reduction in the sample per feature ratio. Scientists, researchers, and authors had used several machine learning algorithms to find the most suited classifier for prognosis. The scientists were originally using 98 SNPs for classification however towards the end they brought it down to about 2–3 SNPs. usage of 2–3 SNPs showed to be pretty much informative. This avoided the curse of dimensionality. Interestingly, DTs and SVMs hit their highest accuracy with the usage of 2–3 SNPs. Both the SVMs and DTs attained an accuracy of 68–69%. These results were approximately 25% better than the former (usage of 98 SNPs). Another noticeable feature of this study was the
8
T. Lakshmikanth Rajath Mohan and N. Jayapandian
level of importance that was given to confirmation and cross validation. The level of accuracy of these models was measured in three ways. First one is, each model was cross validated and monitored by about 20 folds. Second one is, minimizing the bias in the feature selection. In other words, it refers to selecting those SNPs which are the most informative and effective. These were validated about a hundred times before selection. The random testing accuracy level is reached nearly 50%. The authors tried their best to minimize the randomly distributed (Stochastic) elements but have realized that using few specific methods such as leave one out cross validation (LOOCV) method will eliminate the randomly distributed elements completely. A standard deviation of more than 4% was obtained when the scientists used multiple cross validation techniques. As seen in several other cases, the percentage was close to about 25%, these results were deemed to be negligible. Even though a lot of other authors in the recent times have duplicated these results and conducted several follow up experiments, the gist remains the same and that is—With the right design, suitable data and effective implementation and validation, machine learning models can produce robust and accurate cancer predictive tools. Survivability prediction, nearly half of all the studies that were published about machine learning for cancer prognosis was related to the survivability. One particular paper by Matthias E Futschik, Mike Sullivan Anthony Reeve and Nikola Kasabov emphasized on the usage of more than one machine learning techniques to predict the end result of patients suffering from DLBCL. Unlike Linsgrtens method which used only one kind of classifier, this method used more than one genomic data as classifier. Futschik has rightly said that the inclusion of clinical data can enrich genomic data. The predicting power of both of them combined to form a classifier is much higher than that of using either only clinical data or only genomic data as a classifier. In order to collect real world samples as training data, clinical data and micro array expressions of 56 DLBCL (Diffuse large B-Cell Lymphoma) patients were collected by the authors. All the clinical data that was used were taken from the International Prediction Index (IPI) where patients were categorized as low risks and high risk patients based on the assessed risk factors. This model showed a pretty good accuracy of around 70– 75% in predicting the mortality of these patients. Several different methods are also being developed such as “Evolving Fuzzy Neural Networks” (EFuNN) along with Bayesian classifiers to analyse genomic data. These models had phenomenal accuracies of nearly 78%. Both these models were later combined to form a hierarchical model for census prediction. An accuracy of above 85% was attained in this hierarchical model which was the combination of two models. Due to small availability of sample size, the validation technique that was adopted for the EFuNN was a leave one out cross validation technique. Recurrence prediction of all the papers that were published regarding the usage of machine learning for cancer prognosis, 43% of them were aimed at cancer recurrence prediction. One such paper was published by De Laurentiis in 1999. This study addressed a few of the limitations that had been noted in the previous studies. They predicted a recurrence period of 5 years for breast cancer. Seven prognostic variables were included along with clinical data. The clinical data had taken into consideration the patient’s age, medical history, the size of the tumour, the shape of the tumour. The key aim here was to develop a much better
Cancer Prognosis by Using Machine Learning and Data Science: …
9
model compared to that of the already existing TNM staging system or also simply known as tumour node metastasis. TNM was a traditional old system that relied on the personal opinion of the pathologists or physicians rather than on facts. The scientists and authors of this paper have used artificial neural networks models which took roughly around 2500 breast cancer patients data multiplied with seven data points (or variables) each. Thus more than seventeen thousand data points were obtained. This ensured that the sample to feature ratio was maintained as desired. This entire data was classified into three equal groups. A 1/3 was dedicated to training, another 1/3 for monitoring, and another 1/3 for validation. For the sake of external validation, the authors had included an extra three hundred breast cancer clinical data sets. This helped the authors to get a more generalized view of their model not only from their own datasets inside the institution but also outside their institution as well. This kind of validation was not done in any of the studies seen earlier. This study was hugely appreciated not only for the thoroughness of validation of the data but also for the quality of input data maintained. All the training datasets were stored in relational databases and were constantly being verified by physicians themselves to maintain quality. With approximately 2500 patients data and nearly 17,000 data values in hand, the model value was pretty large enough to cover the normal population of patients having breast cancer within the dataset. Regardless of this privilege, the authors had decided to unambiguously take a look at the scattering the patient’s information inside every training set. They later found out that these distributions were relatively almost similar. Attention was given to every single detail while carrying out this experiment. The sole aim of this experiment was to develop a model that was better at predicting cancer occurrence than that of TNM. In order to do this, ANNs had to be compared to TNM predictions. The performance of either of them was measured using receiver operator curve (ROC). The larger the area of the ROC curve, the more accurate the model. Obviously, the ANN model outperformed the TNM model as it had more area covered by the ROC curve. This study stands out from all the other studies because it was a sheer. The drawback however of this model may be that the authors compared TNM to only one type of ML model.
5 Research Challenges The above three illustrated cases are a few samples of in what way machine learning models must remain implemented, accessed, and described. There are however many more studies which use many more algorithms to yield the same effective results as we have seen here in the above three cases. To classify the possible drawbacks in one or the other the learning, validation or processes of these algorithms is critical not only for the person who is deploying machine learning techniques but also for those who desire to explore the machine learning alternatives. In studies is the lack of attention that is being paid to the size of the training size and learner validation. External validation is also an important factor which checks for sanity. And obviously even
10
T. Lakshmikanth Rajath Mohan and N. Jayapandian
the external validation set must be brought in large sets of data so as to ensure reproducibility. As stated many times, the size of the input dataset has a lot of implications on the robustness and accuracy of the model. The first implication being, for small datasets almost all machine learning models suffer overtraining. Overtraining in turn will lead to misleading data. An example for this would be, once just 28 patients’ data were taken to build a machine learning model using ANNs to predict the recurrence of throat cancer. The accuracy turned out to be 86%. That is for only 28 training sets is a clear-cut case of over training. The size of the data also has a significant impact on the sample per feature ratio. For microarray studies, smaller sample per feature ratios are not preferred. This is because in microarray studies, there would be thousands of genes but this model was fed only a few hundreds. There is a high chance of generating redundant classification models. Redundant models are a huge problem as they do not allow the test for robustness of a machine learning model cannot be promised because the model would have been fed a lot of repeated test cases. Along with data size, the quality of data is equally important. The features chosen for measuring the robustness and accuracy of a model must be precise and be easily measurable from one lab to another. The survivability of breast cancer patients at that specific hospital and that given time, it becomes irrelevant over time. Every time such new codes are created, the model must be retrained to account for the newly created codes. Another important fact to be noted here is that we must always remember that clinical data is always subjected to periodic change. New clinical data may be added, created or even modified. Therefore, an ideal classifier is that which adapts to changing features over time. Another important takeaway from this discussion is the importance of using multiple predictive models for one task. While ANNs are considered extremely sophisticated and robust, they are not always the best. As learnt earlier the tracking of consequences is nearly impossible in an ANN. DTs and Bayesian models have many times proved their worth over ANNs. The most critical process in data science is determining which model to be used for classification. It is not always guaranteed that the most sophisticated model will lead to the best prediction. It can sometimes happen that traditional models may overpower machine learning models. More articles are published have used more than one machine learning model for classification. Like any other experiment which has been based on a hypothesis, even machine learning experiments follow a set of defined procedures. It requires the validation of data. Therefore, the data sets and training sets used must also be made available to the public. Even the details of the algorithms and procedurals must be recorded so as to allow and enable others to verify and reproduce results of their own. Essentially, the results and outcomes of a well-designed machine learning experiment must be properly documented and be open for the public to reproduce their own results. Many efforts have been made and many techniques have been employed in order to find the most optimal algorithm, for the right task. Not only this, a variety of datasets have been used and fed to the model to compare the overall results of models performance. For many decades, we have been using only bare clinical data for finding the smallest of patterns. Now with the ever increasing growth of technology and interests, we have come up with several other alternatives. One of them is the ‘Omics’ technologies
Cancer Prognosis by Using Machine Learning and Data Science: …
11
which is basically collection analysis of large amounts of data, which represents a universal set of some kind. There are several sub categories such as proteomics and metabolomics which are aimed at the collection and detailed analysis of the genetics and metabolites. These turned out to be extremely helpful rather than just the exploitation of bare clinical data.
6 Conclusion In this review article, an attempt has been made to document the different kinds of machine learning algorithms that are being used, state of the art models and also few recent trends in the usage of machine learning for cancer prognosis. The types of cancer being studied the kinds of datasets one needs to focus on while conducting analysis and an overall glimpse of the performance of these models in accurately predicting cancer has also been discussed. Even though a number of machine learning algorithms are on the rise, ANNs are what is currently being currently used for predicting more than just type of cancer. We have also seen that the performance and accuracy of machine learning models are far more reliable than traditional statistical models. Improvement in the biological validation and better design, the overall performance of these machine learning models can be significantly increased. Altogether, if the quality and persistence of these studies continue, then very soon machine learning classifiers will become a common sight in medical research and hospital environments.
References 1. Bronkhorst AJ, Ungerer V, Holdenrieder S (2019) The emerging role of cell-free DNA as a molecular marker for cancer management. Biomol Detect Quantification 17:100087 2. Borgi M, Collacchi B, Ortona E, Cirulli F (2020) Stress and coping in women with breast cancer: unravelling the mechanisms to improve resilience. Neurosci Biobehav Rev 119:406–421 3. Yu C, Helwig EJ (2021) The role of AI technology in prediction, diagnosis and treatment of colorectal cancer. Artif Intell Rev 55(1):323–343 4. Sree SR, Vyshnavi SB, Jayapandian N (2019) Real-world application of machine learning and deep learning. In: 2019 International conference on smart systems and inventive technology (ICSSIT). IEEE, pp 1069–1073 5. Liao H, Xiong T, Peng J, Xu L, Liao M, Zhang Z, …, Zeng Y (2020) Classification and prognosis prediction from histopathological images of hepatocellular carcinoma by a fully automated pipeline based on machine learning. Ann Surg Oncol:1–11 6. Smiti A (2020) When machine learning meets medical world: current status and future challenges. Comput Sci Rev 37:100280 7. Martínez-Jiménez F, Muiños F, López-Arribillaga E, Lopez-Bigas N, Gonzalez-Perez A (2020) Systematic analysis of alterations in the ubiquitin proteolysis system reveals its contribution to driver mutations in cancer. Nat Cancer 1(1):122–135
12
T. Lakshmikanth Rajath Mohan and N. Jayapandian
8. Vinod AM, Venkatesh D, Kundra D, Jayapandian N (2021) Natural disaster prediction by using image based deep learning and machine learning. In: International conference on image processing and capsule networks. Springer, Cham, pp 56–66 9. Natarajan J (2020) Cyber secure man-in-the-middle attack intrusion detection using machine learning algorithms. In: AI and big data’s potential for disruptive innovation. IGI global, pp 291–316 10. Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks. Springer, Berlin, pp 758–770 11. Castillo TJM, Arif M, Niessen WJ, Schoots IG, Veenland JF (2020) Automated classification of significant prostate cancer on MRI: a systematic review on the performance of machine learning applications. Cancers 12(6):1606 12. Khalajzadeh H, Simmons AJ, Abdelrazek M, Grundy J, Hosking J, He Q (2020) An end-to-end model-based approach to support big data analytics development. J Comput Lang 58:100964 13. Goecks J, Jalili V, Heiser LM, Gray JW (2020) How machine learning will transform biomedicine. Cell 181(1):92–101 14. Adams J, Qiu Y, Xu Y, Schnable JC (2020) Plant segmentation by supervised machine learning methods. Plant Phenome J 3(1):e20001 15. Patil V, Jadhav Y, Sirsat A (2021) Categorizing documents by support vector machine trained using self-organizing maps clustering approach. In: Techno-societal 2020. Springer, Cham, pp 13–21 16. Nikitha MA, Swetha S, Mantripragada KH, Jayapandian N (2022) The future warfare with multidomain applications of artificial intelligence: research perspective. In: Proceedings of second international conference on sustainable expert systems, vol 351. Springer, Singapore, pp 329–341 17. Sushma S, Sundaram N, Jayapandian N (2021) Machine learning based unique perfume flavour creation using quantitative structure-activity relationship (QSAR). In: 2021 5th international conference on computing methodologies and communication (ICCMC). IEEE, pp 1397–1402 18. Levy WB, Calvert VG (2021) Communication consumes 35 times more energy than computation in the human cortex, but both costs are needed to predict synapse number. Proc Natl Acad Sci 118(18)
A Qualitative Research of Crops Scenario in Punjab with Relation to Fertilizer Usage Using Power BI Palvi Mittal and Vijay Kumar Sinha
Abstract Agriculture data in Indian context faces the shortage of data analysis. Research is generally conducted based on village level sample study, and generalizations are made for the larger regions. This can be lethal when decisions need to be made for benefit of the larger population. Similarly, in Agriculture Ministry, there are separate departments for each crop category, e.g., Directorate of Oilseeds, Directorate of Fodder, etc. Each department caters only on the focus crop leading to partial picture being presented to the government for taking steps. With the availability of data sets released by Government on a regular basis, the focus of development can be shifted from crop specific view of planning to wholesome view of planning. This work focuses on the complete picture of agriculture being presented. We are taking in consideration the area and productivity fertilizer use in field crops and horticultural crops over time for Punjab state. The results showed that crop acreages showed a trend in major crops, and fertilizer usage has direct impact over crop productivity. The data has been easily visualized in form of charts, tables, maps, etc., into a dashboard using the powerful software of Microsoft Power BI. These results suggest that looking at relation of crop parameters with fertilizer usage will help individuals, organizations, and governments in better and wholesome planning. Dashboards are widely used to in organizations for visualizing and analyzing data on a single screen using Microsoft Power BI. Keywords Yield · Fertilizer · Power BI
P. Mittal (B) · V. K. Sinha Department of CSE, Chandigarh University, Sahibzada Ajit Singh Nagar, India e-mail: [email protected] V. K. Sinha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_2
13
14
P. Mittal and V. K. Sinha
1 Introduction Agriculture, with its allied sectors, is unquestionably the largest livelihood provider in India, more so in the vast rural areas. It also contributes a significant figure to the gross domestic product (GDP) [1]. India’s agriculture is composed of many crops, with the foremost food staples being rice and wheat. Indian farmers also grow pulses, potatoes, sugarcane, oilseeds, and such non-food items as cotton, tea, coffee, rubber, and jute (a glossy fiber used to make burlap and twine). India is a fisheries giant as well [2]. Punjab has only 1.53% of India’s geographical area but contributes nearly 35% toward rice pool and 45% toward wheat pool of India [3]. Punjab is one of few states with 99.5% irrigated lands. The total number of land holdings are 10.93 lakh out of which 2.04 lakh (18.7%) are marginal farmers, 1.83 lakh (16.7%) small farmers, and 7.06 lakh (64.6%) farmers hold land above 2 hectares [4]. The major row crops in Punjab are wheat, paddy, and maize. The major vegetable crops are potato, peas, and cauliflower, while fruit crop is dominated by Kinnow [5]. In recent years, certain districts have seen change in the crop parameters in Punjab state, e.g., in last 3 years, an increase in cotton area has been observed from 2.65 lakh hectare in 2018 to nearly 5 lakh hectare in 2020. While cotton saw a decrease from 4.46 lakh hectare in 2013 to 2.65 lakh hectare in 2018. The increase in cotton area has been due to shift of farmer from paddy crop. Paddy is grown on an area of 3 million hectare in Kharif season in Punjab. There are many factors contributing to crop shift; factors are as follows: 1.
Government is encouraging farmers to grow other less water consuming crops that paddy due to reduction in ground water table [6]. 2. The quality of ground water has deteriorated over the years in state, leading to increased dependence over canal water rather than bore-wells [7]. 3. Government is banning uptake of hybrid paddy due to more water consumption than research or open-pollinated varieties. 4. Paddy cultivation leads to heavy salt crust deposits on surface leading to depletion of soil quality. 5. Paddy is MSP driven crop, leading to dependence of farmer on the crop. 6. Government is promoting high value crops instead of paddy like cotton, vegetables, etc. 7. Vegetables and cache crops take lesser time to grow and help to bring more profit in farmers financial dynamics. 8. Crops like cotton are a low water consuming crops than paddy. 1/8 times less water is consumed for cotton cultivation than paddy. 9. Average urea consumption per acre of paddy is 10 bags per acre (1 bag urea = 45 kg) as against 3 bags suggested by the State Agriculture University, Ludhiana. This is leading to extreme disbalance in fertilizer consumption. 10. The bulk fertilizers such as urea and DAP come under subsided group of fertilizers. Subsidy is a portion of price component that government bears to incentivize the companies for bringing fertilizer country. Subsidy puts a big burden on financial structure of nation.
A Qualitative Research of Crops Scenario in Punjab with Relation …
15
11. Government has been promoting fertilizers like single super phosphate (SSP), NPK to promote low cost and multi-nutrient fertilizers in state. 12. The average productivity in paddy has decreased from 35 Qt/acre in 2016 to 26 Qt/acre in 2020 after heavy use of bulk fertilizers. Hence, from the above, we can clearly see that crop area, productivity, and fertilizer usage craft the agriculture outlook of a region. A 360° view of crop situation is required at individual, organizational, and governmental level to have a wholesome development for agriculture. In Sect. 2, we will discuss about the history of dashboard, the different dashboard tools available, and the most used dashboards. In Sect. 3, we understand the use case of Microsoft Power BI in analyzing and visualizing agriculture-related data. Section 4 deals with the conclusion of our work, and last section deals with the future scope of the using dashboards in agriculture.
2 Dashboards The concept of dashboard was initiated during 80s. During that time, the managers wanted to identify the key parameters for defining the performance of the business/vertical. To cater the need of organizations, two individuals from Harvard, Dr. David Norton and Dr. Robert Kaplan, created a framework which changed activities into actions using balanced scorecard (BSC) technique. Generally, companies focused on short-term financial goals as measure of achievement. Balanced scorecard technique also considers along non-financial strategic measures to focus on longterm success. The balanced scorecard gives the scorecard of companies a ‘balance’ in financial perspective. The balanced scorecard, referred to as the BSC, is a framework to implement and manage strategy. It links a vision to strategic objectives, measures, targets, and initiatives. It balances financial measures with performance measures and objectives related to all other parts of the organization. It is a business performance management tool [8]. Balance scorecard caters to 4 major question pertaining to managers [9]: • How do customers see us? (customer perspective) • What must we excel at? (internal perspective) • Can we continue to improve and create value? (innovation and learning perspective) • How do we look to shareholders? (financial perspective) Organizations having a good understanding for above points will have a greater competitive advantage. The requirement to develop balanced scorecard came into context because general financial accounting measures like return on investment (ROI) and earnings per share (EPS) can give deceiving signals looking at the everchanging landscape of today’s business environment.
16
P. Mittal and V. K. Sinha
Figure 1 is the overview of balanced scorecard which focuses on the 4 perspectives of finance, customer, innovation and learning, and internal business for scoring businesses and looking at the areas of improvement. Balanced scorecard as a tool is a static tool in terms of data input and output, whereas the requirement of modern-day businesses is quite dynamic. Nowadays, live data is being gathered from multiple data points, bringing volatility, uncertainty, complexity, and ambiguity (VUCA) in system. Hence, modern-day dashboard is paving a way forward. Dashboard is a tool which gives visual representation of the underlying data. The underlying data is in form of tables. The strength of dashboard depends upon its ability to connect multiple sources of data. Dashboard is used to give a wholesome view of the data to its user. Dashboards as a tool are being used in 200,000 + organizations in 200 countries worldwide. Dashboards are used by organizations
Fig. 1 Overview of balanced scorecard
A Qualitative Research of Crops Scenario in Punjab with Relation …
17
like Hilton, Infosys, Ingersoll Rand, Kraft Food, Merck, Lockheed Martin, Marriott, Motorola, Ricoh, Saatchi and Saatchi, Siemens, Cisco, Skandia, Statoil, UPS, US Department of Commerce, US Army, FBI, Royal Air Force [10]. Dashboards are forming an integral part of the company’s data assimilation and visualization, making the managers more data strong, and decisions are data driven instead of gut feeling [11]. The Indian Government is focusing on importance of data analytics. In recent years, Indian Government through National Small Industries Council had conducted trainings on data analytics and Power BI [12]. There has been inclusion of courses related to data analytics, such as one offered by NIIT [13]. National level institutes such as IIT’s and NIT’s are including data analytics and visualization courses in their curriculum [14]. Different organizations have brought in data analytics and dashboard solutions such as SpagoBI from Spagoworld, Power BI from Microsoft, Tableau, JasperReports by Jaspersoft and Qlicksense [15]. The widely used ones are MS Power BI and Tableau. In next segment, we will understand the software Power BI developed by Microsoft which forms basis of this research work.
3 Power BI Power BI is a powerful software developed by Microsoft in 2010. Power BI is majorly used by student, researchers, and corporations. It has a strong developmental team and large community of users. Power BI is an online Web service with desktop version for creating reports and dashboard that can be viewed across desktop and mobile devices. The software is used for gathering data, ETL data, analyzes data, and creates dashboards using visuals such as charts, maps, and tables. The software has powerful extract functions for extracting data from Webpages, PDFs, Websites, excel, and many other resources. The software brings data into tabular format which is used further for data analysis and representation. The software has a strong developmental team which releases regular updated and a committed online community of users who are ready to resolve issues. Ability to publish reports and dashboards on the Web interface unfolds the full potential of the software. The reports can be viewed on the Power BI Web service and mobile app. Report and dashboard sharing with individuals increase the ease of use. The software works on the M-query language. Languages like R and Python are widely used for advanced visual creation in Power BI. In the next segment, we will state the problem statement and understand the relationship of crop parameters with fertilizer usage by use of Power BI software [16].
18
P. Mittal and V. K. Sinha
4 Power of Power BI in Agriculture There are 2 major problems in context of Indian Agriculture and data analytics, which are as follows: 1. There are separate crop planning committees. Hence, the planning is being carried out in silos instead of wholesome planning for the state. 2. There is no single repository of data which leads to partial picture being presented. Power BI can handle big data. Distinct parameters such as crop acreage can be compared against fertilizer usage giving Power BI the versatility we require [17]. The data for cereal crop, horticulture crops, and fertilizer sales was taken from the respective ministry’s Websites [18–20]. Figure 2 is the dashboard prepared in Power BI. The dashboard has been divided in 2 segments. The selector area is on top-right of canvas and visuals form rest of the canvas. Different selectors/slicers used in dashboard are district, year, month, fertilizer, and crop. The visual components of the canvas are as follows: Map—District wise area of selected crop/s Line Chart—Trend of area, yield, and fertilizer sales Bubble Chart—Area versus production comparison for district, crop group, and crop Matrix—Areas of crops Funnel Chart—Top 4 crops in each crop segment w.r.t area
Fig. 2 Outlook of dashboard
A Qualitative Research of Crops Scenario in Punjab with Relation …
19
The dashboard gives us the ability of full interaction among visuals. Cross-filtering from visual to slicer and vice versa is very easily done [11]. From the above dashboard, we have received two categories of results, which are as follows: • Trend of change in area under crops at district level • Change of yield with relation to fertilizer usage
4.1 Trend of Change in Area Under Crops at District Level An overview of change in agriculture parameters (area, yield) of crops over years will give us a good understanding of the trend at district and state level. The summary of trend in area for major crops in key districts is shown in Table 1. Table 1 gives us an in-depth view of the trend in crop acreages in different districts of Punjab state. The above results are important to understand crop dynamics, and what measures can be taken to have good crop mix in state. In next section, we will understand effect of fertilizer usage on crop productivity. Table 1 Trend of area under crops in different districts of Punjab states Crop type
Area trend
Major districts
Major crops
Vegetables
Increasing
Jalandhar, Hoshiarpur, Amritsar, Ludhiana
Potato, peas, cauliflower
Cereals and millets
Constant
Sangrur, Ludhiana, Patiala, Bathinda
Wheat
Cereals and millets
Decreasing
Hoshiarpur
Maize
Cereals and millets
Increasing
Sangrur, Ludhiana, Patiala Paddy
Fibers
Decreasing
Bathinda, Fazilka, Mansa, Cotton Muktsar
Fruits
Increasing
Fazilka
Kinnow
Fruits
Increasing
Pathankot, Hoshiarpur
Mango
Flowers
Increasing
Kapurthala, Patiala, Sangrur, Ludhiana
Marigold
Sugars
Variable
Hoshiarpur, Gurdaspur, Jalandhar, Amritsar
Sugarcane
Pulses
Increasing
Amritsar, Fazilka, Ludhiana, Taran Taran
Green gram (moong)
Spices
Increasing
Moga, Ludhiana, Jalandhar, Firozepur
Garlic
Spices
Decreasing
Moga, Jalandhar, Nawashahar
Mint
Spices
Increasing
Jalandhar, Gurdaspur, Nawashahar
Turmeric
Spices
Decreasing
Ludhiana
Coriander
20
P. Mittal and V. K. Sinha
Table 2 Positive correlation of fertilizer against yield District
Fertilizer
Crop type
Major crop
Month
Jalandhar, Hoshiarpur, Ludhiana, Kapurthala, Moga, Amritsar, Patiala, Bathinda
MOP, NPK
Vegetables
Potato
8, 9
Amritsar
DAP, MOP, NPK
Vegetables
Peas
10, 11
Fazilka
DAP, MOP, NPK, Urea
Fruits
Kinnow
1, 2, 8, 9, 12
Hoshiarpur, Gurdaspur, Jalandhar, Amritsar
DAP, MOP, Urea
Sugars
Sugarcane
4, 5
Fazilka
DAP
Pulses
Guar
7
4.2 Change of Yield with Relation to Fertilizer Usage In this section, we will have try to understand the effect of fertilizer usage on yield [21, 22]. Table 2 summarizes positive correlation of yield with fertilizer usage in major crops. Table 2 contains the major crops grown in Punjab and the positive correlation between the fertilizer usage and yield. Month column represents the fertilizer application months in crop. Month numbers represent month name (January = 1 to December = 12). In Fig. 3, correlation of guar crop against DAP for Fazilka District is shown. We can see that DAP application has positive correlation to yield of guar crop. As from Year 2013–2015, the sales of DAP decreased, similarly the yield of guar seed decreased even though the area under guar seed increased. Similarly, from year 2015– 2016, the yield increased as the sales of DAP increased even though the area under crop decreased. Hence, there is a positive correlation between DAP and yield of guar from year 2013 to 2018. In the current work, the above discussed results were found out by applying different combinations of crop, district, and fertilizer and after discussions with agricultural experts. Software advancement and addition of statistical operations like correlation and regression will bring ease in identification of these combinations in future.
5 Conclusion In this work, a dashboard has been designed which shows the key parameters in agriculture. The dashboard is based on district wise crop trends and fertilizer usage trends. We can see that the productivity is very closely related to the fertilizer application. The dashboard gives a wholesome view of the data giving the capabilities of cross filtering, drill down, drill up. The dashboard saves time on data representation and manual charting which needed to be done through MS Excel. These abilities
A Qualitative Research of Crops Scenario in Punjab with Relation …
21
Fig. 3 Example of positive correlation for guar against DAP for Fazilka district
and results were earlier very difficult to achieve with statistical software like MS Excel. Dashboard provides a different view to the already available data. The ability of software to provide interactive and cross filtering options adds to the efficacy of the dashboard and decision-making process.
6 Future Scope The presented tool of Power BI is used for data analysis, visualization, and planning in agriculture. The scope of this agricultural dashboard can extend to individuals, institutions, organizations, and government bodies. They can benefit in the planning and forecasting for a better resource management. Agriculture is a dynamic field, in which the crop patterns of a region change from season to season. This puts a greater responsibility on the governing bodies to make policies which are relevant to the current situation and upcoming times. However, policy making in agriculture was last done more than 40 years ago. The same regulations are being followed even on date. The schemes of government can be designed in accordance with the dynamic need of areas. Trends in crop areas and production can be used for implementing better crop rotations and increase the net income of farmer. Hence, data analytics in agriculture brings the power and dynamics in our hands to bring the much-needed change in policy making. Data analytics is becoming a more mainstream course in curriculum rather than being a subject studied out of curiosity by a few individuals. This kind of analytics is bound to increase the avenues of job creation among the current youth. Young professionals can be lured to contribute toward the agricultural development and
22
P. Mittal and V. K. Sinha
policy making, which otherwise is considered of less interest and taken up by only a few. The research and development in business intelligence software’s and introduction of statistical tools will ensure more power and processing abilities, leading to better decision-making abilities. Large enterprises have great infrastructure of such business intelligence tools. The collaboration of government bodies with enterprises on data exploration and policy making front will bring development in the current agro-technological situation of the country.
References 1. Government of India (2020) Agriculture. Accessed online at https://www.india.gov.in/topics/ agriculture 2. Nations Encyclopedia (2020) India—Agriculture. Accessed online at https://www.nationsen cyclopedia.com/economies/Asia-and-the-Pacific/India-AGRICULTURE.html 3. Kang M, Parshad V, Khanna P, Bal S, Gosal S (2010) Improved seeds and green revolution. J New Seeds. 11:65–103. https://doi.org/10.1080/1522886X.2010.481777 4. Government of India, Ministry of Agriculture (2018) Farmer’s Guide. Accessed online at https://farmech.dac.gov.in/FarmerGuide/PB/index1.html 5. Grover DK, Singh JM, Kaur AR, Kumar S (2017) State agricultural profile-Punjab. https://doi. org/10.13140/RG.2.2.29375.87203. 6. Pujara M (2016) Problems of Punjab agriculture 7. Chand R (1999) Emerging crisis in Punjab agriculture: severity and options for future. Econ Polit Wkly 34(13):A2–A10. Accessed May 9, 2021. http://www.jstor.org/stable/4407788. 8. Accessed online at https://www.intrafocus.com/balanced-scorecard/ 9. Kaplan RS, Norton DP (2019) The balanced scorecard—measures that drive performance. Accessed online at https://hbr.org/1992/01/the-balanced-scorecard-measures-that-drive-perfor mance-2 10. Bradea IA, Sabau-Popa CD, Bolos M (2014) Using dashboards in business analysis. Annals of the University of Oradea. Econ Sci. Tom XXIII 11. Vico G, Miji´c D, Bodiroga R (2019) Business intelligence in agriculture-a practical approach. https://doi.org/10.13140/RG.2.2.18626.43204 12. Ministry of MSME (2019) Business intelligence using applied excel and power BI. Accessed online at https://www.nsic.co.in/NTSC/AppliedExcelandPowerBI 13. NIIT (2020) Data analysis and visualization in excel and power BI. Accessed online at https://www.niit.com/india/short-term-courses/data-analytics/data-analysis-and-vis ualization-excel-and-power-bi 14. IIT Kharagpur (2020) Data analytics (CS40003). Accessed online at https://cse.iitkgp.ac.in/ ~dsamanta/courses/da/index.html 15. Gowthami K, Pavan Kumar MR (2017) Study on business intelligence tools for enterprise dashboard development. Int Res J Eng Technol (IRJET) 04(04), e-ISSN: 2395-0056, p-ISSN: 2395-0072 16. Microsoft (2021) MANAGILITY—agribusiness and farming analytics & planning. https://pow erbi.microsoft.com/en-us/partner-showcase/managility-pty-ltd-agribusiness-and-farming-ana lytics-planning/ 17. https://www.star-knowledge.com/case-studies/microsoft-power-bi-solution-for-an-agricu lture-farming-organization/ 18. Ministry of Horticulture (2019) HAPIS. Accessed online at https://aps.dac.gov.in/
A Qualitative Research of Crops Scenario in Punjab with Relation …
23
19. Ministry of Agriculture (2020) Crop production statistics information system. Accessed online at https://www.aps.dac.gov.in/APY/Index.htm 20. Government of India, Department of fertilizer (2021) MIS report. Accessed online at https:// reports.dbtfert.nic.in/mfmsReports/displayPortal 21. Singh R (2012) Organic fertilizers: types, production and environmental impact 22. Sandeepanie I (2020) Big data analytics in agriculture. https://doi.org/10.13140/RG.2.2.25154. 81604
Face Detection-Based Border Security System Using Haar-Cascade and LBPH Algorithm Arpit Sharma and N. Jayapandian
Abstract Border security is a process which measures the border management ideas by a country or set of countries to wage against unspecific and unauthorized travel or trade across the country borders, to bound non-legal deport, various crimes combat, and foreclose dangerous criminals from entering in the country. A system will help in keeping a check on those personnel’s who forge with the legal document with an intension to cross the border. This article discusses about border security of whole Indian context, and there are various such systems which have been built since 2010 as wireless sensor network system named “Panchendriya”. Remote and instruction manual switch mode arm system using ultrasonic sensor for security of border. This article we have made use of Haar-Cascadian along with LBPH algorithm with their functioning. The result and discussion section we compare most recent face recognition techniques that have been used in the last ten years. The proposed prototype is discussed and shown through simulation model, it provide better result compare to existing model. The proposed Haar-Cascade and LBPH algorithm provide 10% better performance. Keywords Border security · Border surveillance · Haar-Cascade classifier · Face detection · Face recognition · LBPH · Criminal identification
1 Introduction The main objective is to identify the correct identity of person by the use of face detection method; if their identity is matched with the database, then they allow to cross by the border. In a way of doing this, we can increase the security aspect of the border. Also, through face detection method, we will reduce pen and paper work as it A. Sharma · N. Jayapandian (B) Department of Computer Science and Engineering, CHRIST (Deemed to be University), Kengeri Campus, Bangalore, India e-mail: [email protected] A. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_3
25
26
A. Sharma and N. Jayapandian
is done digitally, and this will increase the security level which will reduce the illegal activities and will help the officers in recognizing the person easily and also doing the criminal identification [1]. The main goal of verifying an already matched item is a verified face. The concept face recognition is an application software system which is focussed to diagnose somebody by tracing or we can say identifying [2]. The primary goal of review discussion type paper is to provide the idea with a distinct, one-of-a-kind image of the frontal faces of each individual. The prospective in many different tight particular spheres, such as borders, the solution could be seized factually, different country international airfields, several rail line terminal, different professional schools, and large showrooms for the safety. The ultimate intention of this suggested review discussion should be strengthening the veracity or accuracy. Presently, almost all the countries are revetment the dangerous problem of inner security in their border areas. The terrorists and several intruders with different types of a stiffed arm and weapons are perturbing the calmness peace and harmony of the country. The fatality was first actuation by attack of Pulwama in Jammu and Kashmir and planned Uri attack averse to the Indian military force which discloses the importance of well-organized surveillance or vigilance system in the border region or area. Conventional border security patrol systems need a large amount of military persons, and high-cost, high-tech surveillance devices, and equipment used for highly alert border patrol systems, so in a survey on border security of whole Indian context various systems are described that have been built so far. There is a system that is still in use that is “Panchendriya” a remote wireless sensor network undercrossed system which is being used in the utilization of the group action of five types of sensors: geophone, distinctive hydrophones, small microphones, infrared remote sensors, and highly pixel camera sensors for efficacious surveillance and finds out of human culprit in the above-mentioned border security scenarios with earliest premonition capability [3]. But there is still some problem in this system regarding the recognition rate which can be affected by the environment factor, and the people cannot be identified properly. So to overcome with problem of happening of illegal activities at border site, a proposed system can be discussed in this paper. Haar-Cascade is an algorithm of object perception, and this algorithm is coming under machine learning domain and used to find out variety of objects in a picture [4]. In this method, we can say a systematic solution where a Haar-function is to be cascaded which is trained and modelled from a huge amount of variety of positive and negative images. Different machine learning algorithms are also used for this application, and some of them are used as decision tree, random forest, principal component analysis, and many more, but before that let us understand what actually the machine learning is basically it is a domain which can contain the different types of algorithms or approaches used in the computer mainly their intention is to anticipate the outcomes on various large datasets [5]. So under machine learning, two categories are there, one is supervised learning, and the second is unsupervised learning in which the supervised learning is mainly used when the class labels are noted, and in unsupervised learning, when the class labels are not noted [6]. The frontal face identification is a very old process which comes in the geometric security. The other concept is used as by verifying the DNA, by verifying the iris, and the last is by taking the thumb expression. Let us
Face Detection-Based Border Security System Using Haar-Cascade …
27
discuss the working of the automation system as the first step is when the automation system is connected to the video camera that system searches for the field of view of a video to catalogue for the faces. If the firmness is very low, then the special algorithm is used to find the frontal faces. After that once the frontal face is identified, then the head’s location, magnitude, and the façade are measured. Then, the face needs to be adjusted in 36 grades, so the automated system can log the faces picture. Now, the third step is picture of the head which is calibrated and reciprocated, so the picture is clearly elevated and logged in the system. Then, the facial data of individual is get converted into the distinct binary code, then that code is get equated to the already gathered picture data of the frontal face to justify. In the proposed prototype model, it is shown that the vehicle will be approaching towards the process of checkpoint, and the people of the vehicle will undergo their personal screening with the help of surveillance camera since we have already maintained the database (registered faces) of all the people who migrate across the border, face detection system if finds the screened faces with any of the faces stored in the database, and they will be granted permission to move across the border. During screening if any individual face is not matched with the database, then his access will be denied, he want be permitted to cross the border, and he will go with some other legal procedure to get the permission. Figure 1 is the flowchart of on border security-based face detection system. For the significance or we can say importance of safety of society then the various sectors day by day, the crimes and the other illegal activities are growing due to which the normal human being can face a several consequences, so to overcome and to control the illegal activities, the safety system is required because in some places, it is very difficult to reach, but by some technical automated system, we can easily diagnose the various crime activities in different fields. Nowadays, all country police are using hi-tech system which can help them to groove on the criminals and get control over the crime activities [7]. Figure 1 shows the overall flow diagram of security system how it can be happened in several steps. There are several approaches are being utilized such as Haar-Cascade classifier, LBPH, OpenCV, and for the implementation purpose, Python language is used, let us discuss the algorithms one by one [8]. So the first is Haar-Cascade classifier which is an accession object perception, and this algorithm used to verify the unique objects in a picture. It is most widely used algorithm in border security. In this proposed idea, there is a function to be cascaded which is modelled trained by lot of positive images and negative data images. The function which is cascaded is a type of classifier contains collection of variety sample sets, where each sample state is a group of flimsy learners. The flimsy learners are the subset of simple and effective classifiers called decision stumps. Each sample data is being drilled and trained by using an efficient method called boosting. Boosting is the method which provides the capability to train and model a highly precise classifier by pickings up a decision of highly weighted and compute the average of the decisions made by the weak flimsy learners. In below diagram, features of Haar-Cascade classifier are discussed. For face recognition method, there is a simple solution on face recognition problem and is local binary pattern histogram algorithm which can match and identify both
28
A. Sharma and N. Jayapandian
Fig. 1 Flowchart of on border security-based face detection system
front face and side face in the picture [9]. To solve this particular problem, a rectified LBPH algorithm mainly works on pixel neighbourhood grey median (RLBPH) which is coarctation. The first action of the LBPH is to establish a histogram of a penultimate image that explains the real image in a best way, by focussing the facial characteristics. To work the above discussed term, the algorithm uses a theory of a sliding window, based on the various parameters considered to be as radius and neighbours. Below diagram is working of LBPH algorithm. Figure 2 shows the whole working of the specified algorithm in several stages. OpenCV is an inbuilt library in Python language which is fashioned to solve computer vision real problems. OpenCV favours a wide variety range of different programming languages such as C++, Python, and Java. This OpenCV is working in various operating system platforms such as Windows, Linux, and Mac OS. OpenCV algorithm in Python is nothing but a simple wrapper class entity for the original C++ library to be used with Python code. Using this library, all of the OpenCV array structures get transformed to/from NumPy (a inbuilt library) used to handle and arrange arrays. This makes the algorithm easier to merge it with other libraries which uses the NumPy libraries. For example, libraries such as SciPy and Matplotlib. The next we discuss on this OpenCV Python Tutorial blog, let us look at some of the basic and simple rules and operations that we can improvise with OpenCV library.
Face Detection-Based Border Security System Using Haar-Cascade …
29
Fig. 2 LBPH model for on border security-based face detection system
2 Literature Review To have an insight into the border security in the whole Indian context, the below table shows a comparison of various techniques used in on border security-based face detection system with the objective and analysis. The inventive and main goal of the research is to show the IoT-based security system (IBSSS) which aims at reducing the burden on forces and providing an additional layer of security with high precision and accuracy [10]. They have chosen IoT and LDR as a algorithm for their proposed study work. They concluded that the LDR is the best method to achieve the better recognition rate. The focussed idea of the research is to show the acute perimeter vigilance contrivance automation by applying cellular remote actuator in different areas. Their study approach or idea is WSN which is a theorem. According to them, the discovered system could be a excellent achievement in heighten the safety of our different country boundary dominion or areas and supremely, the regions siding dangerous bad weather situations where life redisposition is an intensive problem. A specific and main solution of the research is to show the prototype model of criminal identification by using the Raspberry Pi and the face matching and face recognition algorithm as Haar-Cascade classifier algorithm and LBPH algorithm. The accuracy rate achieved by this procedure will be as 95%. There could be some difference as final output on anal of the different parameters like durance, video camera earnestness, and flashing. The proposed goal of the research is to show functioning of the simple unique electronic and hand operated canon implemented device for safety
30
A. Sharma and N. Jayapandian
of border. They have built mechanism using microcontroller concept. They finalized with a conclusion that the proposed solution system greatly gives advantages to the army warrior, and their unique electronic and hand operated alteration aiming weapons are viable for terribly alert sheltered space. The specific intention of the research is to implement the front face recognition security system using the PCA algorithm. The applied solution is Eigen faces, PCA in their extent study work. The recognition rate attained by this process will be 81–85%, and also proposed that PCA is best method amongst other algorithms. The important point of the research is to scrutinize the benefits and various drawbacks of many improved face recognition techniques. Special applied approaches are SVM and HMM. They concluded that the methods of SVM and HMM can produce better face recognition results [11]. The simple solution of the depth research is to propose the matching rate using genetic algorithm. They concluded that genetic algorithm is the best method to produce better recognition results that is 80%. The unique idea and solution of the research are to define a supervised auto-encoding system, that is, a wide range of scantling artefacts being in depth of internal construction of numeric values of facial dataset. These proposed models are used to important concepts that are supervised auto-encoder and neural network. They proposed that the supervised auto-encoder method is to get better accuracy results and also gives the relevant benefits [12]. The simple and unique point of the defined research is to give deep investigation of present exploration direction of specific algorithm and the different applications of that algorithm or method in real life in the domain of geometric safety. In their research study, they have applied CBIR, CBFR. They determine that the applied solution rectifies the hardiness complication of presently implemented frontal facet image verification coordination and also determines huge efficient accurate consummation regardless of preliminaries. The major ideal solution of the research is to specify a computational building model for person’s authentication in face recognition security system. Elastic bunch method and graph matching are used in cognizance process. They concluded that graph matching is the best method to produce better recognition results that is 78%. The main aim of this research is to analyze an accommodative classification supervised system (ACS) which is fashioned for videotape-based front face verification security system. The unique method used is DPSO. They proposed that in the future, the accuracy will be 75–78%. The uniqueness of this research is to analyze the motion detection concept used in the proposed system by using motion detection algorithm, SAD. They concluded that the spotting of moving object from the sequences of images. The focussed unique idea of this deep research is determined to assign a designate a self-organizing map (SOM) to triangulate similarities in the image. They concluded that on using the neural network SOM algorithm, the recognition rate will be approached to 70–75% in the future. The specific and final idea of the research is to explain the discussion of correlation method in the area of frontal face verification security system. By employ correlation technique, they concluded that on using the correlation method, the recognition rate in the future will be 65–70%. The focussed solution of this specific research is to design the fiction organization as long as merging deepness and acoustic power data to polish facet matching and verification coordination. By adopting LDA process, they concluded that on using the LDA
Face Detection-Based Border Security System Using Haar-Cascade …
31
method gives the better depth and intensity information of confidence. We present a comprehensive and analytical analysis of machine vision methods in research. Biometric system is an essential starting point in computer vision applications, just because it enables the detected face to just be identified and retrieved from of the surroundings. This could also be used for knowledge image extraction, data compression, teleconferencing, community tracking, and artificially intelligent communications, amongst several other functions. This machine vision challenge, but at the other contrary, would not generate considerable attention from the scientific community until shortly. Face recognition system seems to be a complex thing in object recognition since the person’s figure is a continuous element with only a significant extent of variance regarding expression. Solutions ranging from basic angle algorithms to complex slightly elevated strategies exploiting value added services detection techniques have really been described. The techniques covered by the study were grouped into functionality or picture, and respective technological method and efficiency are addressed. The study represents an image retrieval architecture that also can analyze photographs instantaneously whilst until delivering significant specificity and sensitivity. Three prominent achievements have been made. One being the formation of an original picture is demonstration known as the “integral image”. The other is a quick and well-organized extractor which also extracts a minimal set of fundamental unique cues from a broad variety of potential traits by using AdaBoost optimization algorithms [13]. The main criterion seems to be a “cascade” system for collecting classification models which makes it possible context provinces of such photograph to really be instantly dismissed, whereas the convincing face-like portions get the most calculations. A host-based and network straight forehead vision-based model are employed. A network comprising visual connectivity assesses small rooms of just a picture as well as decides whether or not every panel features of an object. For achieve high performance across such a network segment, a programme resolves amongst different networks. The consistent technique towards synchronizing robust face instances as retraining gets demonstrated. They deploy a cascading methodology to accumulate negative examples that provides inaccurate sightings towards the training sample over retraining progress. That eliminates a moment effort for explicitly generating specific example training instances that should encompass the whole evident signs photo area. Additional optimizations, including such capitalizing upon that assumption that features in photographs usually conflict, could enhance veracity even more. The outcomes of testing involving numerous different province facial sensing devices showed that their solution achieves identical identification and falsified frequencies. The help of image processing some time we predict natural disaster [14]. Considering applications such as video monitoring, human–computer interaction functionality, expression authentication, and countenance images data management, human figure diagnosis is critical. Inside this context of fluctuating different lighting and various backgrounds, researchers develop an image retrieval approach on colour photographs. This system can recognize epidermal patches across the image sequence using only a proprietary luminance qualitative technique as well as a stochastic chromatic adjustment and afterwards construct confront prospects based on the geographical configuration amongst that intact skin. For every facial
32
A. Sharma and N. Jayapandian
expression prospect, this algorithm utilizes eyes, jaw, and perimeter representations. Real-time face across pictures throughout distinct sample proved acceptable over such a huge assortment of physical variations in colour, orientation, dimension, direction, 3-dimensional movement, and attitude [15, 16].
3 Proposed Methodology Nowadays, there are various illegal activities happening on crossing the border in exporting and importing of material. So, in order to reduce these illegal activities, officers are using pen and paper work, but some time, they are unable to catch the criminal by this pen and paper method. So, to improve the security, face detection system is used by which there will be a double check of the men firstly through the face detection and second through the pen and paper work, which will reduce the illegal activities. To prevent the illegal activities happening at the border, a face matching system designed to support the border security guards in the procedure of confirming the individuality travel document holder which is supposed to cross the border [17]. Figure 3 is taking sample image for the purpose of Haar-Cascade method. The system is made by advanced face recognition algorithms to detect the faces and identify the data from the existing database, by this solution, the guards can easily catch the criminals, and the material which are supposed to be export can be done smoothly. This section we explained how the dataset specimen is being taken and then after how the data specimen is being trained and tested by building a model then based on that data what is the veracity rate is anticipated. When highlighting an algorithm, it is suggestible to consider a baseline test dataset to be able to straight analogize the results. Fig. 3 Image samples for dataset using Haar-Cascade
Face Detection-Based Border Security System Using Haar-Cascade …
33
There is also another way to select the dataset which is specific to the component to be comes under the testing phase (example, how algorithm sentimentalize when given types of images with lighting factor changes or images with different expressions of face), so we took some sample of photos about 250 using Haar-Cascade classifier method using OpenCV by Webcam in different environment. In the training module, the dataset images can be trained using the various training algorithms in on border security-based face finding system, and the object detection classifier training algorithm is mainly used to prepare the model which contains the various sample of images into an accurate angle and size using OpenCV which comes with detector as well as trainer. For the recognition, we use the LBPH algorithm in which the sample of images which are collected in database can be matched in real time if face can be matched, then the door unlocks, and the following person is allowed to cross the border [18]. The flowchart of LBPH is shown in figure. When the video surveillance camera is connected to the system, the recognition system forages the field of angle of view of a video camera for faces. A special and rare multi-scale algorithm or method is used to explore for faces in low resolution. Once a face is detected and captured by camera, the system numerates the position of head, stature, and facade. A face is required to be adjusted at least 36 amplitudes anent to the camera because the verification system has to ledger it. The picture of the facial head is calibrated and set in fix size and then rotated so that it can be altered and graphed into an appropriate dimensions and attitude. The normalization process is not providing that much accuracy to finding the head position in image. The system explicates the data of face into a unique and specific code for further processing. This newly generated coding process concedes the easier distinguish of the newly collected facial data to already stored facial data. After matching and distinguishing, it is associated to minimum one stored facial representation data. Figure 4 is to elaborate proposed security-based face recognition model. It is the well-known algorithms used for classification which comes under the supervised learning and also used for regression problems. The main task of the SVM is to design the decision borderland that can isolate the dimensional space of n values into different classes so the accurate data point can be placed in the correct
Fig. 4 Flowchart of security-based face recognition system
34
A. Sharma and N. Jayapandian
class. SVM chooses some vector points to design the hyperplane. The formula of calculating the distance of the hyperplane is given as follows. d=
|ωx0 + b| ω
(1)
Form the above Eq. (1), ω is the vector normal to the hyperplane. b is the offset value in the hyperplane. X0 is the particular data point in the hyperplane. A certain approach which is mainly used for elucidate the different complication of the concussed and the unconcise complication based on the habitual choosing procedure. The full form of the above term is principal component analysis. It is a classification algorithm which comes under the supervised learning. The main task of it is to reduce the dimension and enhancing the understandability of the dataset. By using this, we can simply separate the different attributes of a dataset in the form of the graph. As per the set, the motion sensor will detect the arrival of vehicle. As the vehicle approaches towards the barrier, the whole component hardware and software of face detection system start working. To set up, the surveillance camera will start detecting the individual face. The screened faces will automatically match with the stored databases. After the matching of faces is done, the system will grant access, and the people will allow crossing the border. All the pictures shown in Figure 5 are summarizing the whole scenario of the working of the security system in which the first stage is sensors which are being activated, then after suppose a car is coming, then it is firstly get stopped then the video surveillance cameras are spotting the human faces, then after when once facet is spotted, it then saves to the database, then after at the recognize, process is get started, and if the facet is paired with the saved image, then that car is allowed to cross the barrier. This model requires several equipment to be implemented, and when it gets completed, then it can be deployed in various domain fields which are coming under the critical situation sometime. If the faces are found unmatched with the original database, so those people will undergo some other legal procedures. The legal procedures are simply to check all his real documents as Aadhar card, Domicile Certificate, information collected from the particular one’s family and school to check that the respective person is criminal or not..
4 Result and Discussion In this section, we discuss about the result, mainly, we target on the accuracy part, so in the whole study, work different algorithms are applied, and their respective accuracies are also been calculated, so the below graph describes and gives the brief understanding of the whole research idea, but before the dataset which is to be used is created in on the point of time by collecting the samples of different frontal faces. Figure 6 is showing accuracy level of our proposed model. The result proposed in this article by the recognition of accuracy is 82%, but this model is refined in the
Face Detection-Based Border Security System Using Haar-Cascade …
35
Fig. 5 Pictorial representation of proposed model
future to improve the security level more effectively and even at the time of creating the sample dataset, each person will get assign an id number. Whilst recognition (verification), when the matching phase is performed in which particular person image authenticate with the dataset, then an alert type message will get shown like an unauthorized person personify as criminal or terrorist through security verification system, if the sample data of person image does not get authenticate with the dataset, then no message will get shown, and a normal human being undergoes other legal activities. The computational and combinational models, which were being discussed in this research type article, were selected after voluminous research, and the successful testing outcomes confirm that the decisions made by the researcher were impeccable. The system with manual face perception and automatic face verification did not have verification accuracy over 85%. The HOG algorithm performs pretty well but some issues in identifying small faces. So, we preferred Haar-Cascadian classifier algorithm which performs around as well as HOG overall. So, we personally used mainly Haar-Cascadian in our proposed project due to its speed and accuracy. This algorithm when provided an ideal atmosphere the accuracy achievable is 95–97%. In the future when this proposed project implemented practically with all essential hardware and software components, it will minimize the efforts of border guards in screening the individual and keep them safe from any fraud intruders into the border. Table 1 is showing the different algorithms applied in the implementation of proposed border security system with their accuracies, proposed model is tested with different algorithms but the best accuracy is 95% which is given by LBPH, actually we have improved the accuracy of LBPH at a certain level, and we came
36
A. Sharma and N. Jayapandian
Fig. 6 Proposed accuracy level
Table 1 Proposed algorithms with their accuracies
Algorithms
Accuracy (%)
LDR
65
WSN
70
DPSO
75
SOM
78.9
Genetic algorithm
80
LBPH
95
to a conclusion that accuracy is dynamic in real-time environment parameter. The proposed approach model is very efficient in security because it can give appropriate accuracy in a real time. For experimental setup, we have used a Web camera for the detection and collecting the facial images at different posture creates a real-time dataset, then we use Python 3 IDE for the implementation of the proposed model.
5 Conclusion The accuracy is maximum 95% and not less than 80%, so it is clear that the accuracy is fully depend on the environment factor, and presently, all the researchers are working on this real-time project, and mainly, they are working on accuracy and efficiency, but in the future, the accuracy can be beyond 95%, and huge amount of
Face Detection-Based Border Security System Using Haar-Cascade …
37
sample images can be productively used to prepare the proffered model in order to increase its proficiency and accuracy as well at the security aspect on border. This automated system can be used to classify or compare the existence of particular individuals crossing border in just less seconds based on their facial characteristics. Face recognition and detection using this proposed automated system are used when effluence the integrity papers and helpful in prohibition of personality fraud and distinctiveness burglary. The algorithms are compared on the basis of two parameter computation time and accuracy in term of speed HOG algorithms bit faster algorithms followed by Haar-Cascadian and CNN. In the future, acknowledgement if the dataset is optimized with the data and some values like variance instead of images which are to be extracted from the various images using the wavelet transform concept then maybe there are higher chances of improvement of the veracity outcomes.
References 1. Marsot M, Mei J, Shan X, Ye L, Feng P, Yan X, …, Zhao Y (2020) An adaptive pig face recognition approach using convolutional neural networks. Comput Electron Agric 173:105386 2. Li L, Correia PL, Hadid A (2018) Face recognition under spoofing attacks: countermeasures and research directions. IET Biometrics 7(1):3–14 3. Pawar BB, Shinde GN (2021) Real-time powdery mildew crop disease monitoring using risk index theory-based decision support system. In: Proceeding of first doctoral symposium on natural computing research: DSNCR 2020, vol 169. Springer Nature, p. 463 4. Shetty AB, Rebeiro J (2021) Facial recognition using Haar cascade and LBP classifiers. Global Transitions Proc 2(2):330–335 5. Sree SR, Vyshnavi SB, Jayapandian N (2019) Real-world application of machine learning and deep learning. In: 2019 International conference on smart systems and inventive technology (ICSSIT). IEEE, pp 1069–1073 6. Natarajan J (2020) Cyber secure man-in-the-middle attack intrusion detection using machine learning algorithms. In: AI and big data’s potential for disruptive innovation. IGI global, pp 291–316 7. Govender D (2021) An evaluation of safety and security. In: Police behavior, hiring, and crime fighting: an international view, vol 100 8. Jagtap AM, Kangale V, Unune K, Gosavi P (2019) A study of LBPH, Eigenface, Fisherface and Haar-like features for Face recognition using OpenCV. In: 2019 International conference on intelligent sustainable systems (ICISS). IEEE, pp 219–224 9. Yang B, Chen S (2013) A comparative study on local binary pattern (LBP) based face recognition: LBP histogram versus LBP image. Neurocomputing 120:365–379 10. Manavi SY, Nekkanti V, Choudhary RS, Jayapandian N (2020) Review on emerging internet of things technologies to fight the COVID-19. In: 2020 fifth international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 202–208 11. Xiong X, Chen L, Liang J (2017) A new framework of vehicle collision prediction by combining SVM and HMM. IEEE Trans Intell Transp Syst 19(3):699–710 12. Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221 13. Chang V, Li T, Zeng Z (2019) Towards an improved Adaboost algorithmic method for computational financial analysis. J Parallel Distrib Comput 134:219–232 14. Vinod AM, Venkatesh D, Kundra D, Jayapandian N (2021) Natural disaster prediction by using image based deep learning and machine learning. In: International conference on image processing and capsule networks. Springer, Cham, pp 56–66
38
A. Sharma and N. Jayapandian
15. Jayapandian N, Negi MS, Agarwal S, Prasanna A (2019) Community based open source geographical classical data analysis. Int J Recent Technol Eng (IJRTE) 8(1):413–418 16. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inf Fusion 64:131–148 17. Bolakis C, Mantzana V, Michalis P, Vassileiou A, Pflugfelder R, Litzenberger M, …, Kriechbaum-Zabini A (2021) Foldout: a through foliage surveillance system for border security. In: Technology development for security practitioners. Springer, Cham, pp 259–279 18. Shanthi KG, Vidhya SS, Vishakha K, Subiksha S, Srija KK, Mamtha RS (2021) Algorithms for face recognition drones. Mater Today Proc
Proposed Experimental Design of a Portable COVID-19 Screening Device Using Cough Audio Samples Kavish Rupesh Mehta, Punid Ramesh Natesan, and Sumit Kumar Jindal
Abstract With the proliferation of COVID-19 cases, it has become indispensable to conceive of innovative solutions to abate the mortality count due to the pandemic. With a steep rise in daily cases, it is a known fact that the current testing capacity is a major hindrance in providing the right healthcare for the individuals. The common methods of detection include swab tests, blood test results, CT scan images, and using cough sounds paired with AI. The unavailability of data for the application of deep learning techniques has proved to be a major issue in the development of deep learning-enabled solutions. In this work, a novel solution of a screening device that is capable of collecting audio samples and utilizing deep learning techniques to predict the probability of an individual to be diagnosed with COVID-19 is proposed. The model is trained on public datasets, which is to be manually examined and processed. Audio features are extracted to create a dataset for the model which will be developed using the TensorFlow framework. The trained model is deployed on an ARM CortexM4 based nRF52840 microcontroller using the lite version of the model. The in-built PDM-based microphone is to be used to capture the audio samples. The captured audio sample will be used as an input for the model for screening. Keywords COVID-19 · Cough detection · Deep learning · Embedded device · Audio analysis · Neural networks
1 Introduction Coronavirus disease 2019 (COVID-19) is a disease that is caused by the contraction of the SARS-CoV-2 virus. The virus is responsible for causing symptoms such as fever, difficulty in breathing, dry cough, and loss of smell and taste. The WHO had declared the COVID-19 as a global pandemic on 11th March 2020. Due to the lack of a vaccine against the disease, the disease which originated in Wuhan, China had spread across the globe. At the moment when this work was compiled, a total of K. R. Mehta · P. R. Natesan · S. K. Jindal (B) School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_4
39
40
K. R. Mehta et al.
nine million confirmed cases had been reported globally, with a death toll of over a hundred and thirty-five thousand people. The disease is transmitted through close contact, and an individual can contract this disease via the airborne respiratory droplets whose presence can be justified when people cough or sneeze in public without wearing the appropriate masks or a contaminated surface containing said aerosols. The virus can survive for up to 3 hours airborne and upon inhalation, and the virus enters the individual’s lungs. The clinical diagnosis of COVID-19 includes nucleic acid amplification tests such as reverse transcription polymerase chain reaction (RT-PCR) which is usually done via the collection of a nasopharyngeal swab. Although this is the current gold standard for detecting the presence of the coronavirus, the shortage of equipment and medical personnel makes the process of diagnosis and treatment sluggish. The use of machine learning in the field of medicine is not uncommon as it has been used to develop deep learning-based radiomics, early detection of certain types of cancer, classify diabetic retinopathy types, cardiovascular diseases, genomebased sequencing for detecting genetic diseases, and detecting lesions. Deep learning models have been employed to perform MRI reconstructions, analysis on X-rays, and CT scans using convolutional neural networks (CNNs) for image classifications. These solutions utilize the image classification capabilities of convolutional neural networks which are trained on multiple image samples of the respective scans. X-ray imaging-based diagnosis requires the presence of sufficient equipment and radiologists which can further delay diagnosis and treatment. With limited resources, there is a need to screen patients before they can be allowed to be diagnosed by medical personnel. The usage of machine learning techniques has attracted a lot of data scientists to provide pre-screening solutions including the usage of CT scans [1] and X-rays [2] for the detection of COVID-19. Studies have shown that respiratory syndrome patients have distinct cough features [3], which on subjection to signal analysis, and processing can be extracted for further analysis. These features can be used in the creation of a dataset which can be used to train a model for classification. The audio samples can be acquired with the consent of volunteers to provide cough data via a smartphone app or a Website. The recommended method of collection of audio samples is to collaborate with hospitals to acquire the cough audio samples. In this work, the goal is to develop a portable device that is capable of capturing audio and performing in situ analysis of the audio sample without the need for an active Internet connection. The dataset to be used to develop the model can be created using publicly available audio samples along with samples that can be collected with the consent of patients. The model is developed using the TensorFlow framework and is converted to a TensorFlow Lite format that can be deployed on a low-power device. The lite version of the model is deployed on the microcontroller and the in-built microphone as the primary data source for the audio sample. The device will generate audio features, which will serve as the input for the model. The model utilizes a CNN-based architecture for improved performance.
Proposed Experimental Design of a Portable COVID-19 Screening …
41
2 Related Works Enhancements and initiatives in the field of audio recognition and analysis have been presented to assist the various AI models being made for the detection of COVID19. Alongside this, an overview of the various deep learning methods employed by the community has also been discussed. Speech analysis models based on cough for detection of medical conditions such as tuberculosis and asthma have been up to 90% accurate. Speech and audio analysis have been used for the analysis of cough and breath. The need for a proper dataset has been stressed upon by the author. Lastly, various monitoring methods have been discussed to slow the spread of the virus, and how deep learning technology can assist in that [4]. Cough is a major symptom of many known non-COVID-19 medical conditions, and hence, detection of Covid-19 based on cough becomes a challenge. The solution to this challenge has been addressed by analyzing the discreteness of pathomorphological alterations in the respiratory system induced by COVID-19. For detection of cough, a mel spectrogram is generated which is then fed to a CNN. Once cough is detected, the audio sample is sent to three parallel distinguished classifier systems based on the deep transfer learning model and classical machine learning. The results from the three classifiers are compared, and a conclusion is drawn if all have the same output, else the result is deemed to be indecisive. The average accuracy of the DTL model was found to be 92.5%, and for the CML model, it was found to be 88.6%. High levels of accuracy are a strong indication of the effectiveness of the AI models in detecting COVID-19 from cough and inspire the community to collect and provide more labeled data for analysis [5]. With multiple sensors and high-speed processors, on-board mobile phones are a compact powerhouse in the world of electronics. Mobile phones have been used to collect cough samples for COVID-19. These samples were run through a custom AI model, and notable signals were obtained indicating the presence of COVID-19. 3621 samples of confirmed COVID-19 positive patients were collected to train the AI model. A proper 3-step procedure was followed to collect the data to maintain the high quality of the dataset. Alongside the manually collected data, publicly available data were also used to train the model. For classification, an end-to-end CNN model was employed which directly outputs the binary classification depicting the probability of the presence of COVID-19. Various training strategies were followed to enhance the AI model. Analytical evidence has been provided proving the high performance of the proposed model [6]. Subirana et al. had developed a CNN-based speech processing framework that utilizes mel frequency cepstral coefficients (MFCCs) as an input for the CNNs, where data from 4256 subjects were used to train the model, and data from 1064 subjects were used to evaluate the model. The speech processing framework utilizes the 3 ResNet50 models in parallel for transfer learning of biomarker features. The biomarkers included muscle degradation, vocal cords, sentiment, and the lung and respiratory tract. A Poisson mask was applied on an MFCC point from the recording via the multiplication with a random Poisson distribution parameter and the average
42
K. R. Mehta et al.
value of all MFCCs. It is to be noted that the biomarkers that were used in detecting COVID-19 were the same that were used in the detection of Alzheimer. The model had achieved the highest accuracy compared to other AI-enabled solutions of 97.1%. The model has a false positive rate of 16.8% which may arise due to error in labeling [7]. There has been an increasing need for prognostic analysis to estimate the spread and reach of diseases such as COVID-19. Talha et al. have proposed a predictive model for the same. To examine the prognostic ability of their model and validate its correctness, they have calculated F1, area under the curve (AUC), and fidelity scores. Eighteen laboratory findings were used to evaluate the performance of the data with the help of various cross-fold validation and train–test split techniques. The results have been very encouraging and show scores between 80 and 99%. It can be concluded that laboratory findings are a great source for training models and are a promising tool that can be used for the detection of COVID-19 [8]. Hassan et al. [9] have developed a recurrent neural network (RNN)-based COVID19 detection system. Automatic speech recognition was accomplished through the usage of the RNN via long short-term memory (LSTM) for analysis of acoustic features of cough. Audio features such as spectral centroid, spectral roll-off, and zero crossing rate were extracted from the collected audio samples. The model was trained using 70% of the samples for training and the rest 30% for testing purposes. The model uses 512 LSTM units and three fully connected layers. The model was trained for 200 epochs. Performance metrics such as F1 score, AUC, and accuracy were used to judge the model. An accuracy of 97% has been achieved. It is to be noted that the amount of training data that was used to train the model was comparatively less when compared to other solutions [9]. Vijayakumar et al. [10] have suggested a deep learning-based model trained on a public dataset available online. They were able to achieve a rigorous score of 94%. Four different classes of diseases with matching symptoms to COVID-19, namely pneumonia, pertussis, and normal cough, have been considered to train the model. To start with the mel spectrogram of the sample is generated, and it is determined whether the audio clip is a cough sound or not, once this pre-screening is done, then the audio is sent to two machine learning classifiers based on a support vector machine with radial basis function (RBF) kernel and LSTM. The task of the above-mentioned classifiers is to classify the audio clip into the 4 classes taken into account. If the result obtained from both the classifiers is very similar, the final presumptions are displayed. Various scores are calculated to validate the functioning and the correctness of the model, and the results obtained are truly encouraging. Recently, various COVID19 data analysis model proposed in literature for prediction [11], analysis [12], and spread rate prediction [13].
Proposed Experimental Design of a Portable COVID-19 Screening …
43
3 Dataset 3.1 Data Acquisition In the span of 8 months, since the dawn of the COVID-19 pandemic in March, the medical community has worked hard and has successfully collected cough samples of COVID-19 patients. Although due to the patient data confidentiality clause, it has become an ordeal for data scientists to procure sufficient data to work on AI-based solutions. As a means of collecting data, online platforms have been developed where individuals can volunteer to submit their audio recordings to the study. For generating the dataset for the model, public datasets containing .WAV files can be utilized. Audio samples from the IISc’s Coswara initiative [14], Stanford University’s Virufy initiative, EPFL’s CoughVid dataset [15], and audio samples from volunteers can be collected and processed. Due to the scarcity of data, the audio files from different sources are to be merged based on their labels and to be subjected to manual scrutiny. This ensures the quality of audio files taken into consideration.
3.2 Feature Extraction It is a known fact that respiratory syndromes have distinct cough patterns. This information can be materialized via the use of spectrograms. Spectrograms are a visual representation of the frequency spectrum of a signal. The generation of this spectrum is done via the short-time Fourier transform (STFT). This is then converted to the mel scale given by f Mel( f ) = 2595 log 1 + 700
(1)
The scale in Hz is converted to bins, and the bins are scaled to their corresponding value on the mel scale. The audio sample is to be converted into mono, sampled at 16 kHz, normalizing the bit depth from −1 to 1, and generating the spectrogram. This can be achieved via the use of the LibROSA library. To obtain the spectrogram, the parameters including window length of 256 ms, 512-point FFT, and hop length of 512 ms are set. The entire frequency spectrum of the audio sample is divided into 128 equally spaced frequency bins. The spectrogram image is analyzed, and it can be observed that the patients with COVID-19 produce low-frequency sounds when they inhale after coughing. The spectrogram images for both cases have been shown in Fig. 1. This information is useful for the generation of a usable dataset. To do this, the mel frequency cepstral coefficients (MFCCs) which can be considered as the representations of distinct units of sound as the shape of the vocal tracts are to be extracted. It is known that the shape of the vocal tract regulates the type of sound that will be produced. The aim is to represent the time power spectrum envelope
44
K. R. Mehta et al.
Fig. 1 a Mel spectrogram of COVID-19 negative cough audio sample, b Mel spectrogram of COVID-19 positive cough audio sample
by determining the coefficients. For training and testing, the LibROSA library was used to set the sampling rate to 16 kHz, reducing the audio range, setting the audio channels to mono, and normalizing the bit depth in the range from −1 to 1. In this work, a total of forty MFCCs for each audio sample shall be considered. The same process is to be done for each audio sample to generate the dataset.
4 Proposed Solution The flowchart for the proposed work is illustrated in Fig. 2.
4.1 Convolutional Neural Networks Architecture Convolutional neural networks have been extensively used in applications involving image classification and audio analysis. Convolutional neural networks have been demonstrated to perform well in extracting features from their input [16]. The proposed solution involves utilizing CNNs to extract MFCC to predict the probability of the presence of the coronavirus via binary classification. To ensure optimum performance of the model, it is essential to initialize the value of weights of the fully connected layer. Hence, the model is to undergo a pre-training process where the model is first trained on public datasets [17] containing audio samples to distinguish cough from other sounds. The proposed model architecture is provided as an illustration in Fig. 3. The hardware capabilities of the on-board microcontroller have been kept in mind, and as a result, one-dimensional CNNs have been used in place of the standard two-dimensional CNNs.
Proposed Experimental Design of a Portable COVID-19 Screening …
45
Fig. 2 Flowchart of proposed work
Two-dimensional CNNs require more training data to fine-tune kernel sizes and are resource-intensive hence are not practical to be implemented on low-power embedded devices. The model has two convolution layers along with max-pooling layers to discard unimportant features. The activation function for each convolutional layer used is the ReLU function. The first convolutional layer takes in the reshaped MFCC matrix as an input and has 8 filters with a kernel size of 3. The second convolutional layer also has similar parameters to the previous layer. The output of CNNs is a multidimensional tensor which is then flattened and serves as the input for the fully connected layer for analysis of extracted features. The fully connected layer consists of two linear layers with dropout layers of the probability of 0.25, to ensure the prevention of overfitting. The activation function for the first linear layer is the ReLU function, and the activation function for the output layer is the softmax function given by e xi Soft max(xi ) = K j=1
ex j
(2)
This is done to obtain probability values at the output. The model uses a learning rate of 0.001 and the binary cross-entropy loss. The Adam optimizer is suggested as it uses a variable learning rate which helps in minimizing the vanishing gradient problem. The Cortex Microcontroller Software Interface Standard Neural Network
46
K. R. Mehta et al.
Fig. 3 Proposed model architecture
(CMSIS-NN) inference engine translates the model instructions that will be suitable for running on the microcontroller.
4.2 Choice of Embedded Device TensorFlow lite has provisioned running Tiny ML on ultralow-power and highly compact devices with ease. The concept of edge computing has been gaining momentum and recognition among researchers and developers. With the widespread usage of microsensors, data collection has become effortless. For detection and analysis of cough samples, the use of the Arduino Nano 33 BLE Sense has been proposed as the preferred development board, shown in Fig. 4. The development board hosts the nRF52840 microcontroller with 1 KB ROM and 256 KB of RAM. This is sufficient to run a simple multi-layer TFLite model and will suffice the hardware requirements of the model. The development board also hosts various in-built sensors for data collection. The micro-electromechanical systems (MEMS)-based digital microphone sensor shall be used as the primary means of capturing audio
Proposed Experimental Design of a Portable COVID-19 Screening …
47
Fig. 4 Arduino Nano 33 BLE Sense board
samples. The digital microphone is capable of outputting a digital signal in pulse density modulation (PDM) format.
4.3 Deployment of ML Model on Board The amount of RAM available on ultralow-powered devices is considerably less to run the models trained and tested on personal computers. Hence, there is a need to quantize the parameters of the model to ensure ease of deployment on compact development boards [18]. The required quantization involves converting 32-bit floatingpoint values into 8/4/2-bit integer values. The model to be deployed is first trained using the TensorFlow library and is quantized. To achieve this, the TensorFlow model is converted into a TensorFlow Lite (TFLite) model using in-built resources in the TensorFlow library. As the development board does not have the provision of file storage on-board, there is a need to compile the model alongside the C program. Hence, the TFLite file is parsed, is converted into a C array, and is stored as a C header file. The conversion to C array is feasible via the use of a tool named ‘xxd’ supported by the Windows operating system. This process has been illustrated in Fig. 5. Supporting header files including the interpreter to load the model into the microcontroller program and operations resolver for running the model are also imported. The header file is compiled along with the C program to be used by the microcontroller for inference. Once the model is deployed, the performance can be analyzed by monitoring the time taken to execute the inference and the correctness of the output received.
5 Analysis and Discussion In the course of the proposed research, various methods of analyzing audio samples were considered. These included generating spectrograms, MFCC, and Mel Filter
48
K. R. Mehta et al.
Fig. 5 Proposed flowchart of model deployment
Bank Energy Features (MFE). The above-mentioned methods of acquiring audio data are equally useful for the analysis of non-speech related audio samples. It was found that generating spectrogram-based features are best suited for this application as it generates almost double the number of features compared to MFCC or MFE. As it generates a plethora of features, it requires a considerable amount of ROM space and hence is not the best choice for deployment on a compact embedded device. Hence, it is suggested to use MFCC since it deals with fewer features but still makes it viable to achieve significant accuracy after training the model. It is to be noted that there is no significant decrease in the model accuracy after quantization of data as 32-bit floating-point numbers are converted into 8-bit integers. This reassures the fact that TensorFlow Lite models are as effective as TensorFlow models. The usage of image-based classification on mel spectrograms using two-dimensional CNNs shows promising results. Due to the lack of a method for on-board generation of the mel spectrogram images, this approach was not considered as a viable solution.
6 Limitations Although this research work covers the possibility of detecting COVID-19 and its variants, it faces some limitations. Due to privacy and health concerns, the collection of audio samples from infected individuals and compiling a dataset can be a challenge. Furthermore, publicly available datasets have inaccurately labeled data which has a negative impact on the performance of the model. Finally, on the field, the built-in microphone on the microcontroller may not capture intricacies of the cough audio sample which may lead to a poor classification.
Proposed Experimental Design of a Portable COVID-19 Screening …
49
7 Future Directions The proposed solution can be applied in various scenarios due to its compact size, high processing power, and independence from an Internet connection. The suggested device can be deployed as a data collection device to collect cough audio samples. Due to its compact size, researchers can construct a wearable device and wear this while traveling. The on-board microphone can constantly detect environmental noises, and if a cough sound is detected, it can run the captured audio sample through the model and save it as a ‘.WAV’ file in an SD card with a label name indicating as a positive or negative sample. This can help us collect valuable data required by the community to train ML models for COVID-19 detection with cough samples to achieve accurate results. TFLite models with high accuracy can then be deployed on the development board. They can be used as pre-screening devices in test centers which will significantly reduce the number of cases to be handled and make the process of testing quick and easy. These modules can be deployed at various public places such as malls, airports, and railway stations, which serve as the primary site for the proliferation of the virus. If a cough audio sample indicates a high chance of the individual to be infected, concerned authorities can be contacted, and this data can be used for contact tracing and to avoid the spawn of any new COVID-19 hotspots. Continuous monitoring for cough can ensure better medical response and early treatment. Further, similar models can be trained for detecting various diseases whose major symptoms include cough.
8 Conclusion In this work, an experimental design for a portable COVID-19 detector has been proposed. Due to the lack of feasibility of collection of data, utilization of public audio samples has been considered for this solution. The detector is capable of recording audio recordings using the on-board microphone. The audio sample is then converted into raw data from which MFCC features are extracted using the on-board DSP. The extraction is facilitated via the use of appropriate libraries to perform short-time Fourier transforms and scaling the resultant to the mel scale. This extracted audio feature is then used as an input for the model for the binary classification. The model is based on a CNN architecture with two CNN layers and two fully connected layers. The model is first trained using the TensorFlow library using the dataset formed from extracting MFCC from the collected audio samples. This model is then converted into a TensorFlow Lite model and is further processed into header files for deployment on the microcontroller. The model header files along with the microcontroller driver code are written onto the microcontroller and paired with the in-built microphone, serving as a screening device.
50
K. R. Mehta et al.
References 1. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B,..., Xia J (2020) Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology 2. Ahammed K, Satu MS, Abedin MZ, Rahaman MA, Islam SMS (2020) Early detection of coronavirus cases using chest X-ray images employing machine learning and deep learning approaches. MedRxiv, 2020–06 3. Infante C, Chamberlain D, Fletcher R, Thorat Y, Kodgule R (2017) Use of cough sounds for diagnosis and screening of pulmonary disease. In: 2017 IEEE global humanitarian technology conference (GHTC). IEEE, pp 1–10 4. Deshpande G, Schuller B (2020) An overview on audio, signal, speech, & language processing for covid-19. arXiv preprint arXiv:2005.08579 5. Imran A, Posokhova I, Qureshi HN, Masood U, Riaz MS, Ali K, …, Nabeel M (2020) AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inf Med Unlocked 20:100378 6. Bagad P, Dalmia A, Doshi J, Nagrani A, Bhamare P, Mahale A, …, Panicker R (2020) Cough against covid: Evidence of covid-19 signature in cough sounds. arXiv preprint arXiv:2009.08790 7. Laguarta J, Hueto F, Subirana B (2020) COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J Eng Med Biol 1:275–281 8. Alakus TB, Turkoglu I (2020) Comparison of deep learning approaches to predict COVID-19 infection. Chaos Solitons Fractals 140:110120 9. Hassan A, Shahin I, Alsabek MB (2020) Covid-19 detection system using recurrent neural networks. In: 2020 international conference on communications, computing, cybersecurity, and informatics (CCCI). IEEE, pp 1–5 10. Vijayakumar DS, Sneha M (2021) Low cost Covid-19 preliminary diagnosis utilizing cough samples and keenly intellective deep learning approaches. Alex Eng J 60(1):549–557 11. Singh V, Poonia RC, Kumar S, Dass P, Agarwal P, Bhatnagar V, Raja L (2020) Prediction of COVID-19 corona virus pandemic based on time series data using Support Vector Machine. J Discrete Math Sci Crypt 23(8):1583–1597 12. Bhatnagar V, Poonia RC, Nagar P, Kumar S, Singh V, Raja L, Dass P (2021) Descriptive analysis of COVID-19 patients in the context of India. J Interdisc Math 24(3):489–504 13. Kumari R, Kumar S, Poonia RC, Singh V, Raja L, Bhatnagar V, Agarwal P (2021) Analysis and predictions of spread, recovery, and death caused by COVID-19 in India. Big Data Min Analytics 4(2):65–75 14. Sharma N, Krishnan P, Kumar R, Ramoji S, Chetupalli SR, Ghosh PK, Ganapathy S (2020) Coswara—a database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv preprint arXiv:2005.10548 15. Liu Y, Pu H, Sun DW (2021) Efficient extraction of deep image features using convolutional neural network (CNN) for applications in detecting and analysing complex food matrices. Trends Food Sci Technol 113:193–204 16. Orlandic L, Teijeiro T, Atienza D (2021) The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Sci Data 8(1):1–10 17. Piczak KJ (2015) ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1015–1018 18. Novac PE, Hacene GB, Pegatoquet A, Miramond B, Gripon V (2021) Quantization and deployment of deep neural networks on microcontrollers. Sensors 21(9):2984
Big Data Framework for Analytics Business Intelligence Farhad Khoshbakht
and S. M. K. Quadri
Abstract Business intelligence (BI) systems collect, store, analyze, and present data to help businesses make better decisions. In today’s extremely competitive and difficult industry, it is critical to develop and operate BI systems. Business intelligence (BI) approaches are now widely used in many industries that rely on decision-making. To obtain, evaluate, and anticipate business-critical information, business intelligence (BI) is used. Traditionally, business intelligence seeks to collect, retrieve, and categorize data to handle requests efficiently and effectively. The Internet of things (IoT), the advent of big data, cloud computing, and artificial intelligence (AI) have all increased the importance of business intelligence (BI). Many issues arise in strategic decision-making. The main goal of this research is to get past these problems and build a technical foundation for big data analytics and business intelligence. Keywords Business intelligence · Big data analytics · Decision-making
1 Introduction An enormous amount of information was recently collected as well as produced by computer equipment including GPS, sensors, Websites, communication technologies, or through users using social networking sites (LinkedIn, Facebook, Twitter, and Instagram) in the 4th industry assessment [1]. Businesses produce massive amounts of data, which are persistently saved on database systems throughout the business. BI as well as big data (BD) analytics involve methods of data management which are utilized in industries and organizations to collect past and existing information, appraise them utilizing analytics and technology, and also make suggestions for new decision-making. F. Khoshbakht (B) · S. M. K. Quadri Department of Computer Science, Jamia Millia Islamia(A Central University), New Delhi 110025, India e-mail: [email protected] S. M. K. Quadri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_5
51
52
F. Khoshbakht and S. M. K. Quadri
Nowadays, up to eighty percent of a company’s revenue can be attributed to the evaluation of unstructured data [2]. The technique that this company uses to increase its commercial efficiency has the potential to be enhanced as a result of the analysis of unorganized material that contains helpful knowledge. BI refers to technologies, techniques, systems, as well as applications which are used to collect, analyze, integrate, and show business information in a way that permits active corporate decisionmaking. This strategy will also provide endless support in gaining, understanding, and regulating data to assist in decision-making as well as the growth of company systems and practices [3]. BI can be defined as an organization’s capacity to derive business insight from the stream of continuous data produced by its operational processes [4]. Decision makers benefit greatly from the use of BI since it provides them with the information, they need to facilitate growth or make quick and informed decisions. BI can also assist to enhance and promote the execution of established standards and their impacts on business decision-making, tracking, management, as well as planning, and also corporate finance, thereby aiding in the improvement of business techniques in rapidly changing company contexts [5]. BI can also help businesses be more efficient by letting them take advantage of new possibilities, showing them new business insights, pointing out possible threats, and making it easier for them to make decisions [6, 7]. The development of computational power and methods for collecting data has led to large amounts of data being accumulated by businesses. The following is a list of some of the terminology that will be used in this research: BD: It is true that the term “big data” refers to information that does not exactly fit into any of the conventional categories of data. It is a collection of technologies that work together to complete a shared goal, such as the creation of value from data that was once considered useless. Analytics: Evaluation refers to the skills, technologies, systems, and processes that are necessary for ongoing iterative investigation as well as the analysis of the previous company performance in order to create transactions and/or improve future performance. BI: BI is the art of obtaining a commercial advantage from data by answering critical questions about how to classify more customers, how the company is going, and if it continues on the current path, which clinical studies should continue and which should stop. BI Analytics: The BI application was used to create the analytics. The analytics in BI are applied in information technology (IT) organizations for sales and revenue analysis, usage analysis, profitability analysis, bank analysis, sales and general administrative expenses analysis, quality analysis (project delivery), training analysis, assumptions analysis, and wear analysis. In this investigation, the usefulness and significance of these evaluations were rated. Tools of BI: The tool of BI enables the methodical gathering, combining, examination, as well as analysis of findings from both the organization’s internal and external environments [8]. Microsoft, SAP, MicroStrategy, IBM, Oracle, as well as SAS and QlikTech are the primary product providers [9] for the BI software and
Big Data Framework for Analytics Business Intelligence
53
tools. In order to collect technical data from BI tool applied by the IT business, the research provides use of the products offered by the most significant providers. Decision Categories: White (2006) [10–12] classified BI into three categories: strategic, tactical, and operational based on business, users, calendar, and emphasis on information. The use of the BI tool for operational, tactical, and strategic decisions within the firm was quantified. Apache Hadoop: “Apache Hadoop is an open-source software library that incorporates a framework that takes into consideration the dispersed preparation of huge datasets in gatherings of PCs using basic programming models. There are plenty of alternatives extending from a solitary PC to a great many PCs, every one of which offers near-handling and capacity. Instead of depending on equipment, the library itself is intended to identify and oversee disappointments and guarantee high accessibility at the application level.” The “Apache Hadoop includes following modules: (a) Hadoop core: Common utilities that support other modules. (b) Hadoop distributed file system: Provide high throughput access to application data. (c) Hadoop YARN: Used as a framework for job scheduling and resource management. (d) Hadoop MapReduce: Used as a framework for parallel processing of large dataset.” With computerized software and cool gadgets becoming more common, “a lot of advanced data is made every day.” This enormous quantity of data has been further expanded as a result of developments in computerized sensors and communication technologies, which have enabled the collection of vital information for corporations and enterprises. The following is a list of distinguishing characteristics of research studies: 1. Finding the most accurate machine learning (ML) method with the use of an innovative model. 2. This model’s performance is evaluated in terms of time and space. 3. After job allocation, MapReduce and Spark models are analyzed for cluster computing. 4. Using supervised learning, historical data is used to train test data. 5. Applying irrelevant and redundant features to a model can lead to poor performance due to discrepancies.
1.1 Objectives The following are the goals that are intended to be accomplished with this study: (i)
To study big data analytics (BDA) as well as BI approaches which can be used as strategic performance diagnostics techniques for organizations to notify the differences between the various excellence structures.
54
F. Khoshbakht and S. M. K. Quadri
(ii) To explore real-time BDA and its implications on business performance. (iii) To find out why and how important it is for BI to use BDA. (iv) To investigate the applications of BDA in the real-world and to identify prospective “big” datasets.
1.2 Contribution of the Paper The paper contributes in the following ways: • To look at data in order to get information and knowledge that helps organizations get more information. • To help executives use business analytics to better plan and achieve corporate goals based on decision-making excellence.
2 Literature Review (LR) The LR provides an overview of the various pieces of research and studies which have been carried out in connection with the topic that is the focus of this investigation. Figure 1 depicts the three-tier structure of a standard BI system, as reported by Elrab and Ahmed [13]. In Fig. 1, a typical BI system’s three-tier design is shown. There are three layers in this architecture: (1) The presentation layer (PL), (2) the application layer (AL), and (3) the database layer (DL). The most significant challenge presented by this three-tier architecture is the accomplishment of service level objectives, such as minimal throughput rates as well as maximal response times. This is due to the fact that the application layer is unaware of the manner in which
Fig. 1 Three tier architecture [13]
Big Data Framework for Analytics Business Intelligence
55
the lower levels store data. Because of this, it is difficult to estimate how long the process will require. BD consists of datasets which are typically very large in size in addition to being different in terms of the data type and velocity. As a result of these features, traditional forms and procedures can be difficult to operate with BD [14, 27]. This has necessitated the investigation and development of methods for managing as well as retrieving data from large databases. It has become necessary to design additional techniques and structures in order to handle the vast volumes of data that are generated. As a direct consequence of this, a great number of models, frameworks, technologies, pieces of hardware, and other technological improvements have been developed specifically for the purpose of extracting data from large databases [15]. A consumer contact, along with social media, has the potential to supply crucial ideas to decision makers. This is true despite the vast amount of information that is constantly evolving from daily necessities [16, 17]. Academic research on analytics for large amounts of data is far too frequent. Despite this, the majority of articles regarding technological innovations as well as technological developments have been published in business publications [18, 24]. Scientists and business people adopt varied definitions of “big data,” according to Gandomi and Haider [19]. Such BD notions vary depending on the user’s comprehension, with some focusing on the characteristics of BD like volume, variety, and velocity, while others are cantered on their professions, and the majority on the criterion of businesses. Analysis of vast volumes of data typically involves the use of “non-relational” databases, which are more generally referred to as “database systems” (and not just SQL). As a result, despite the fact that relational databases have a more structured and hierarchical structure, unstructured data can be organized in a NoSQL database. The analysis of large amounts of data using distributed storage and processing is part of the scope of BDA. Technical solutions using BDA are the methods outlined below: 1. MapReduce [20]: It is, in fact, a “Hadoop distributed programming” with “batch processing computing workloads” that apply shared keys to handle staff allotment and task management. In addition, there are two types of MapReduce, which are referred to as the “mapper” and the “reducer.” The mapper is responsible for transforming and filtering the data. To feed the MapReduce process, the data is stored in a large HDFS storage file. Each mapper is responsible for receiving and analyzing their own blocks of data, which must first be cleaned of duplicates and unclean data. Then, it produces an intermediary output stream of data, which is sent via rest of the system in the direction of the reducer so that reductions can be made. The reducer is responsible for categorizing the papers and collecting them. The keys will be responsible for the creation of the final output file. 2. Hive [21]: Facebook was developed from an interface that was similar to “Hadoop” SQL “Hive” is a SQL-interface that is similar to that of Hadoop. Using standard SQL commands as well as relational table architectures, it enables SQL developers to alter MapReduce jobs without having to have prior knowledge of MapReduce. Hive stores all of the gathered information, or at least the majority
56
F. Khoshbakht and S. M. K. Quadri
of it, in tables, and it provides assistance to us in constructing table structures immediately above data items. Additionally, it translates data to MapReduce processes and organizes unstructured data metadata in tabular form. 3. Pig [22]: “Yahoo!” has developed their own data—flow scripting language, and it is referred to as this. In addition to this, pig modifies the script so that it can be used with MapReduce. “Piggy bank housing” is the traditional name for “pig scripts.” The pig design is optional during runtime. 4. Flume [23, 25, 26]: In order to collect enormous amounts of information from and inside Hadoop, the methodology uses the usage of an agent. One effective technique to use “Flume” is to gather Weblogs from a variety of sources by use agents. Users can select from a variety of other varieties provided by the company, based on the requirements of their specific situations.
3 Methodology Data analytics (DA) is the process of examining, analyzing, modifying, as well as modeling data in order to discover useful information which can be applied to the process of decision-making. It incorporates a wide variety of approaches and processes and covers a wide variety of aspects and domains because of its many facets and methodology. BD is a collection of vast data which is stored using effective data storing methods. This collection concentrates the procedure of analytics on the developing data that is scattered across many different domains. It is made up of unstructured material on the Internet, which can be found in the form of movies, phrases, and pictures. Examining unstructured data is next to impossible when using conventional database techniques, as it is extremely difficult to organize the information. The term “big data” most typically means to datasets which are too enormous to be explored, managed, and analyzed using the conventional software techniques. Such large volumes of data are disseminated in parallel using “commodity hardware clusters.” In general, information is spread and analyzed for analytics using the MapReduce structure. This platform was developed by Google as a parallel programming method for the purpose of examining and generating massive data collections. Because of its one-of-a-kind characteristics, such as load balancing and fault tolerance, the model is an appealing prospect as an application for conducting extensive volumes of DA research. Graph processing, text processing, as well as Internet processing are all possible uses for it. To manage Google’s system files, MapReduce is an open-source data processing methodology. Many firms, including Facebook, Yahoo!, and many others, adopt Hadoop’s MapReduce methodology to analyze BD collections.
Big Data Framework for Analytics Business Intelligence
57
3.1 MapReduce Model MapReduce is an information flow model for data-centric applications. It is a straightforward, explicit paradigm of information flow programming that outperforms conventional methods for high-level sets of data. The MapReduce methodology parallelizes BD sets by utilizing clusters of computers or grids of computers. A MapReduce program consists of two different functions, which are referred to as the “map procedure (MP)” and the “reduce procedure (RP).” These kinds of approaches are currently being implemented on multiple computers in order to manage enormous amounts of information. Both the Map() and Reduce() functions may be shown in this example, with K1 and K2 standing for 2 different key sets and V 1 and V 2 representing different value combinations. Map (K 1, V 1) a` list (K 2, V 2)
(3.1)
Reduce (K 2, list (V 2)) a` list (V 2)
(3.2)
The MP is responsible for both selecting and filtering operations, and it does so by managing the “infrastructure.” The approach known as RP provides a concise summary of the “framework” designing procedures. As a consequence of this, the plan is commonly referred to as the “basis for infrastructure.” This system coordinates the transit of data between the system’s different components, which allows it to perform a number of actions concurrently on multiple servers. The various steps involved in the processing of data are depicted in Fig. 2.
Fig. 2 MapReduce programming model framework stages
58
F. Khoshbakht and S. M. K. Quadri
MapReduce separates the input into a number of fixed divisions, which are referred to as input divides, and then executes the user-defined map feature (MF) on each record. The amount of time required for processing the split MF is reliant on the quantity of maps that are utilized by the cluster. Increased performance can be seen within a short period of time when the number of maps increases. Likewise, performance degrades as the number of maps decreases. The minute segments are handled in parallel in order to distribute the workload more evenly. A fast system can process more splits in a single step. In a cluster environment, computers which are already there are more likely to break down. If any system breaks down to operate, the job that was running on a map task (MT) is relocated to some other MT to keep the load balanced. For fine grain, the percentage of splits is risen. In other cases, if the splits are too small, the write or read disc has to be made bigger, which makes the overhead problem worse and slows down performance. When the total processing time goes up, the overhead also goes up.
3.2 Spark Framework Hadoop is often preferred by companies and the tech industry to analyze data based on how each set of data is formed. MapReduce is an effective parallel programming method that allows for scalable, flexible, fault-tolerant, as well as cost-effective programming. Spark addresses the limitations faced by the MapReduce programming model, which are having a cluster with a high disc rate, low throughput, and decreasing performance. Apache Foundations have implemented the Spark framework for parallel distributed systems. It is not a part of the Hadoop platform because it is a separate technology. Spark has its own cluster management structure so that it can store things better. It is different because it has a high computational unit that lets you do calculations quickly. These new features include “dynamic query analysis” as well as streaming data across the MapReduce paradigm. Using in-memory computing to accelerate an application is an essential component of Spark. An important feature of Spark is its use of in-memory computing to improve the processing speed of an application. Spark’s characteristics make it suitable for a wide range of applications, including batch apps, interactive queries, iterative methods, streaming, and several others.
3.3 Data Sharing Using Spark “Resilient distributed datasets (RDDs)” are collections of datasets that cannot be changed. Hadoop uses MapReduce, Hive, Python, Python APIs, Python libraries, and Python. The following are the RDD data structure features:
Big Data Framework for Analytics Business Intelligence
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
59
Read-only record structure Partitioned record set Protected storage Computational procedure Rapid processing procedures Parallel processing Tolerance for flaws Unchanging data model Logical distribution Logical data structure
The process of data exchange in MapReduce takes a significant amount of time because of the replication, serialization, and disc I/O involved. Reading and comprehension take up a significant portion of time for ninety percent of Hadoop applications, which is unfavorable to their overall efficiency. This problem can be solved by employing RDDs along with particular formatted data. RDD enables processing of memory that accurately reflects its current state through the use of sharable network jobs. Iterative Operation on Spark RDD: Instead of transferring them to distributed memory, the intermediate effects are temporarily maintained in secure storage. Interactive operations on Spark RDD: For better execution, the various queries evaluated on the same datasets are preserved on a secure storage device. During implementation, each transformed RDD is re-calculated. Spark keeps track of cluster records that can be accessed quickly and sustain failure. This helps to arrange further replicas across several nodes.
3.4 Workflow Stages Figure 3 shows the execution of the steps discussed in this research. The selfexplanatory design of the flowchart makes it simple to identify the steps of the workflow.
3.5 Experimental Work Stages The different stages are explained below: 1. Installation of Tools and Related Libraries a. VMware The “virtual machine library appears in the Workstation Pro window on the lefthand side. We use the library in Workstation Pro to access and pick the virtual machines, directories, and remote hosts. The library comes up by chance.”
60
F. Khoshbakht and S. M. K. Quadri
Fig. 3 Workflow stages
Specifications: Should not the library be accessible, choose? View > Customize > Library. b. Ubuntu Ubuntu Linux is a convenient and straightforward operating system. Consider an application that is not accessible as a.deb package or an app image file inside Ubuntu Software Center. Moreover, you may use terminal to install/uninstall an application by using an official or unofficial PPA. If it does not work, you can always start from scratch and create from a codebase. Installing software with.deb files with Ubuntu
Big Data Framework for Analytics Business Intelligence
61
seem fairly simple and equivalent to installing software with a.exe file on a Windows operating system. The majority of tech companies give their software in.deb type. The user must download.deb files from the product-specific vendor and double-click it to access the Ubuntu software center, which includes the activation button. c. Hadoop The “Apache™ Hadoop ® project develops open-source applications for distributed, efficient, and scalable computing. The Apache Hadoop software library is a platform that allows massive datasets across clusters of computers to be processed distributed using simple programming models. There are two ways of running Hadoop, i.e., multinode and single node. The process of installing Hadoop on a single node cluster is mentioned below:” Specifications • VIRTUAL BOX: It is where the operating system has been installed. • OPERATING SYSTEM: Hadoop can be deployed on operating systems based on Linux. Ubuntu and CentOS are used very often. We do use CentOS in this tutorial. • JAVA: “We need Java 8 package installed on our system. • HADOOP: We need package Hadoop 2.7.3.” The installation of Hadoop involves the actual installation by downloading the package, setting environment variables, checking the version of Hadoop, and editing configuration files. Figure 4 gives the screenshot of the Hadoop configuration files present during the study. Once all the steps are completed, the Mozilla is opened and localhost: 50070/dfshealth.html is loaded to search the interface NameNode shown in the screenshot in Fig. 5. After installation of Hadoop, configurations are modified for MapReduce, Hive, Python. The pre-processing step involves the following: Separating words: Finding the smallest deterministic finite automaton which responds variably on two supplied strings is the problem of differentiating words.
Fig. 4 Hadoop configuration files
62
F. Khoshbakht and S. M. K. Quadri
Fig. 5 Hadoop installation–starting WebUI
This indicates that one of the given strings is accepted, while the other string is rejected. Removing stop list: The process of translating information to something that can be interpreted by a computer is referred to as pre-processing. One of the most important aspects of pre-processing is removing unnecessary data from the mix. In natural language processing (NLP), words that are considered to be useless are referred to as “stop words.” Extracting features • Word score • Bigrams: A bigram or diagram is a sequence of two consecutive parts taken from a string of tokens, which are typically letters, words, or consonants. Bigrams and diagrams are sometimes used interchangeably. The use of bigrams is beneficial when calculating the conditional probability of a token. • Negative Features: The probability of an experiment’s outcome is never negative, while the distribution of quasiprobability allows for a negative probability or quasiprobability for certain cases. Feature score identification, identification of weighted features, and applied decision tree and Bayesian network-based hybrid classifier. The environment setup is also done accordingly.
Big Data Framework for Analytics Business Intelligence Table 1 Dataset features
Attribute
63 Values
Name of dataset
Amazon product review dataset
Description of dataset
Contains user reviewers for products
URL of dataset
Jmcauley.uscd.edu/data/amazon
Products included
Multiple
Number of reviews
83.68 millions
Number of sentiments
2 (positive or negative)
File format
.csv
3.6 Dataset Description The dataset description is given in Table 1, which represents the attribute with value, and shows the name of the dataset having value. This is a product review dataset. The URL of the dataset has the feature Jmcauley.uscd.edu/data/amazon, which includes multiple products. There are 83.68 million reviews, two positive, and two negative sentiments, and the file format is .csv.
4 Results 4.1 Experiment I In experiment I, first, we show the Table 2 properties of experiment 1 in which properties represent relative value set like— • • • • • •
Size of training data (SOTD): 20,000 Size of test set (SOTS): 1000, 10,000, 30,000 Text features (TF) gathered: bigrams, unigrams, and negative TF Pre-processing: stemming and stop list removal Other characteristics: weights and frequencies No. of class is 2
Table 2 Properties in experiment I
Properties
Values
SOTD
20,000
SOTS
1000, 10,000, 30,000
TF gathered
Bigrams, unigrams, and negative TF
Pre-processing
Stemming and stop list removal
Other features
Weights and frequency
Number of class
2
64
F. Khoshbakht and S. M. K. Quadri
Table 3 Sample (samp.) 1 confusion matrix
Predicted Actual
Positive
Negative
Positive
591
121
Negative
25
263
Table 4 Samp. 2 confusion matrix
Predicted Actual
Positive
Negative
Positive
4291
846
Negative
401
4462
Table 5 Samp. 3 confusion matrix
Predicted Actual
Positive Negative
Positive
Negative
18,556
3008
578
7858
The confusion matrix for samp. 1, samp. 2, and samp. 3 is shown in Tables 3, 4, and 5, respectively. Samp. 3 (88%) and samp. 2 (87%) are more accurate than samp. 1 (85%) as shown in the Fig. 6, respectively. In Figs. 7 and 8, for different sample sets, statistic measure index (MSE) and precision have been analyzed, respectively. Fig. 6 Accuracy analysis for different samp. sets
0.9
Accuracy
0.88 0.86 0.84 Sample-1 Sample-2 Sample-3
Fig. 7 MSE analysis for different samp. sets
0.4 0.38 0.36 0.34 0.32
MSE MSE
Big Data Framework for Analytics Business Intelligence Fig. 8 Precision analysis for different samp. sets
Precision
0.9 0.85 0.8 0.75
Fig. 9 Recall analysis for different samp. sets
65
Precision
Recall
0.9 0.89 0.88 0.87
Recall
0.86 0.85
Figure 9 shows that samp. 3 is better for recall analysis than samp. 2 from different samp. sets. From all results, we can say that the there is a significant impact of BD framework on analytics BI and organization performance.
4.2 Experiment II In experiment II, Table 6 depicted the properties which represent against relative value set like— • SOTD: 20,000 • SOTS: 100,000, 500,000, 1,000,000 Table 6 Experiment II properties
Properties
Values
SOTD
200,000
SOTS
100,000, 500,000, 1,000,000
TF gathered
Bigrams, unigrams, and negative TF
Pre-processing
Stemming and stop list removal
Other features
Frequency, weights
Number of class
2
66
F. Khoshbakht and S. M. K. Quadri
Table 7 Samp. 1 confusion matrix (experiment II)
Predicted Actual
Positive Negative
Table 8 Samp.2 confusion matrix (experiment II)
Negative 10,524
1714
22,654
Predicted Positive Actual
• • • •
Positive 65,108
Negative
Positive
279,093
45,598
Negative
12,408
162,901
TF gathered: bigrams, unigrams, and negative TF Pre-processing: stemming and stop list removal Other characteristics: weights and frequencies The number of class is 2, respectively
In Tables 7, 8 and 9 have analyzed the confusion matrix for samp. sets that elaborates the findings for experiment II. It has been observed in Fig. 10 that samp. 2 (88%) was more accurate than samp.1 (87%). As we have seen in Figs. 11, 12, and 13, evaluation metrics (MSE, precision and recall) have been evaluated for numerous sample sets. In addition, from both experiments I and II, we found similar behavior in all three samples showing very little variations, respectively. From all results, we can say that there is a significant Table 9 Samp.3 confusion matrix (experiment II)
Predicted Positive Actual
Fig. 10 Accuracy analysis for different samp. sets (experiment II)
Negative
Positive
605,474
98,689
Negative
20,712
275,125
0.885 0.88 0.875 0.87
Accuracy
Big Data Framework for Analytics Business Intelligence
67
impact of big data framework on analytics business intelligence and organization performance. If we used the BI approach for different organizations, it would be beneficial for fair decisions, and the progress made following the implementation of the BI tool. The study’s findings will aid BI tool-based firms in improving the tool used to increase decision-making reliability and organizational effectiveness. The research will assist a non-BI tool-based organization to develop the implementation of the BI tool business case. This study thus explores the role of the big data system for evaluating market intelligence and its greater effect on business efficiency. Fig. 11 MSE analysis for different samp. sets (experiment II)
MSE
0.36 0.35 0.34 0.33
Fig. 12 Precision analysis for different samp. sets (experiment II)
0.88
MSE
Precision
0.86 0.84
Precision
0.82 0.8
Fig. 13 Recall analysis for different samp. sets (experiment II)
0.896
Recall
0.895 0.894
Recall
0.893 Sample-1 Sample-2 Sample-3
68
F. Khoshbakht and S. M. K. Quadri
5 Conclusion The purpose of this study is to carry out a comprehensive investigation of BD and business analytics strategies with the intention of enhancing business decision-making as well as technical methodologies, applications, and open research issues. In addition, the research aims to bring awareness to the enormous benefits that BD has given to businesses in industrialized nations and how these advantages might be replicated by indigenous corporate groups. In addition, the research covers a variety of challenges that are presented by BDA, with a particular emphasis on data management and protection, as well as accessibility, legislation, and compliance. We can draw the conclusion that our suggested system generated more recall analysis for samples 1 (89.53%) and 3 (89.49%) than it did for sample 2 (89.44%) from different sample sets. It is also possible to draw the conclusion that the output of analytics BI and the business is both significantly impacted by the BD system.
6 Future Directions Non-BI tool-based firms can benefit from the study by better applying the BI tool business case. As a result of this investigation, the concepts for formulating and solving business problems using quality decision-making are proposed. This research can be expanded to include the entire company, and its parameters can be expanded by whoever attempts it. These models are the exponential expansion of computer power, and determining the optimal performance results needs the usage of this technique. Such an approach can help make the model more accessible for future projection.
References 1. Yafooz WMS, Abidin SZ, Omar N (2011) Challenges and issues on online news management. In: 2011 IEEE international conference on control system, computing and engineering (ICCSCE). IEEE, pp 482–487 2. Unstructured data: the hidden threat in digital business. (2021). TechNative. https://technative. io/unstructured-data-the-hidden-threat-in-digital-business/ 3. Balachandran BM, Prasad S (2017) Challenges and benefits of deploying big data analytics in the cloud for business intelligence. Procedia Comput Sci 112:1112–1122 4. Kimble C, Milolidakis G (2015) Big data and business intelligence: debunking the myths. Glob Bus Organ Excell 35:23–34 5. Richards G, Yeoh W, Chong AYL, Popovic A (2017) Business intelligence effectiveness and corporate performance management an empirical analysis. J Comput Inf Syst:1–9 6. Xia BS, Gong (2014) Review of business intelligence through data analysis. Benchmarking Int J 21:300–311 7. Kowalczyk M, Buxmann P (2014) Big data and information processing in organizational decision processes: a multiple case study. Bus Inf Syst Eng 5(2014):267–278 8. Stefanescu ¸ A, Stefanescu ¸ L, Ciora IL (2009) Intelligent tools and techniques for modern management. Chin Bus Rev 8(2):46–54
Big Data Framework for Analytics Business Intelligence
69
9. Gartner (2014) Worldwide business intelligence and analytics software market grew 8% in 2013.Retreived from http://www.gartner.com/newsroom/id/2723717 10. Markarian J, Brobst S, Bedell J (2007) Critical success factors deploying pervasive BI. Informatica Teradata MicroStrategy:1–18 11. Dagan B (2007) Dashboards and scorecards aid in performance management and monitoring. Nat Gas Electricity 24(2):23–27 12. White C (2006) The next generation of business intelligence: operational BI. Retrieved from http://www.bi-research.com, 16 13. Gad-Elrab Ahmed AA (2021) Modern business intelligence: big data analytics and artificial intelligence for creating the data-driven value. IntechOpen. https://EconPapers.repec.org/ RePEc:ito:pchaps:212245 14. Constantiou ID, Kallinikos J (2015) New games, new rules: big data and the changing context of strategy. J Inf Technol 30:44–57 15. Oussous A, Benjelloun FZ, Lahcen AA, Belfkih S (2018) Big data technologies: a survey. J King Saud Univ Comput Inf Sci:431–448 16. Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data J:51–59 17. Elgendy N, Elragal A (2014) Big data analytics: a literature review paper. s. l. Springer, cham, pp 214–227 18. Elragal A, Klischewski R (2017) Theory-driven or process-driven prediction? Epistemological challenges of big data analytics. J Big Data:19 19. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage 35:137–144 20. White T (2012) Hadoop: the definitive guide, 3rd edn. O’Reilly Media Inc., Sebastopol, CA, USA 21. Apache Hive. Available online: http://hive.apache.org/. Accessed 28 July 2021 22. Apache Pig. Available online: http://pig.apache.org/. Accessed 28 July 2021 23. Apache Flume. Available online: https://flume.apache.org/. Accessed 28 July 2021 24. Khoshbakht F (2021) Role of the big data analytic framework in business intelligence and its impact: need and benefits. Turk J Comput Math Educ (TURCOMAT) 12(10):560–566 25. Khoshbakht F, Shiranzaei A, Quadri SMK (2020) Adoption of big data analytics framework for business intelligence and its effectiveness: an analysis. PalArch’s J Archaeol Egypt/Egyptology 17(9):4776–4791 26. Khoshbakht F, Shiranzaei A, Quadri SMK (2020) A Technological performance analysis of big data analytics framework for business intelligence. Solid State Technol 63(6):19701–19713 27. Khoshbakht F, Quadri SMK, A study on analytics of big data and business intelligence-a review
Technological Impacts of AI on Hospitality and Tourism Industry Sunil Sharma , Yashwant Singh Rawal , Harvinder Soni, and Debasish Batabyal
Abstract The importance of technology in the tourism industry is undeniable, especially in the field of tourism. Technology has penetrated the tourism industry, and our contemporaries have not been obliged. From choosing a location to booking hotels and resorts, people can plan their vacation without any hassle. In fact, more than 70% of travelers plan their trips online. Thus, with the introduction of new technologies such as chat-bots, messengers, artificial intelligence, and robots, our tourism business is becoming more important than ever. The growing popularity of the Internet, smart phones, and smart mobile apps added to the development of tourism and allowed the market to rise to an unprecedented level. This paper focuses on the technological impact of AI in the field of hospitality and tourism industry. Keywords Hospitality · Tourism industry · Artificial intelligence · Impacts · Internet
1 Introduction Anurag [1] presented that in the old days, hotels, motels, and other hospitality businesses seemed to be the center of people’s lives, a place for travelers to keep their tops far as of abode. On the other hand, somebody who has depleted instances in the hospitality trade more than the times of yore recognizes that the responsibility of expertise in the hospitality trade has developed terrifically. In lots of cases, they turn S. Sharma (B) Pacific University, Udaipur, Rajasthan, India e-mail: [email protected] Y. S. Rawal Parul University, Vadodara, Gujarat, India H. Soni Taxila Business School, Jaipur, Rajasthan, India D. Batabyal Amity University, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_6
71
72
S. Sharma et al.
into the province of their work. Barnes [2] presented that a large amount of work is unswervingly or ultimately reliant on tools, making IT more appropriate to hotel procedures and visitor information than eternally before. Here, this paper introduces technology trends and their impact on the hospitality and tourism industry.
1.1 Technological Trends in Hospitality and Tourism Industry The latest technological trends which are currently very high in demand in hospitality industries are discussed in next subsections.
1.1.1
Cloud Migration
As shown in Fig. 1, cloud migration is not too much attractive as the other attributes; however, Greenwood et al. [3] said that tourism businesses have begun to move toward cloud technology. From the cost–benefit of OpEx ultimately to software assimilation preferences, as well as innovative improvements commencing from promising technologies [4–6] said that cloud communication recommend hotels the opportunity for workplace effectiveness at the stumpy cost of acquiring expertise. It makes no sense about new, smaller buildings, but still, great hotel companies are exploring how cloud technology can reduce bare bones procedures, decrease conscription requirements as well as make available an enhanced visitor understanding. The confront prolongs headed to ensure the consistency as well as safety of more than 99.99% SLA for those who need it. Fig. 1 Cloud migration [4]
Technological Impacts of AI on Hospitality and Tourism Industry
73
Fig. 2 Service automation [8]
1.1.2
Service Automation Through AI
Computerization persists to be a real development with the aim to change the way visitors are supplied. By means of the development of artificial intelligence (AI) [7], hotels are exploring innovative traditions to communicate through digital visitors whereas acquittal hotel personnel from working on further errands. This as well enhances the journey familiarity as verbal communication diversity can be eradicated, guarantee comprehensible communication [8] through explorers anywhere they are. Different service automations through IA are shown in Fig. 2. Crafting real human being understandings has been confront meant for engineers in the earlier period, but it comes out on a daily basis. Whereas most communities are fed up with the proposal of chatting to software that cannot comprehend verbal communication and act in response effortlessly, the day is impending when the difference between a real person and a computer cannot be seen [9].
2 Artificial Intelligence Gartner envisages that through 2020, customers around the world will manage 83% of their trade dealings devoid of the necessitate for an individual representative. In the field of tourism and hospitality industry consumer engagement, and the following information are essential for continual achievements [10]. AI will play the role of
74
S. Sharma et al.
game-changer in this. Using several positions on or after a visual subordinate or Chabot, AI has empowered forums to assist businesses to keep their businesses open to customer inquiries 24 × 7 devoid of committed individual personnel.
3 Integrated Guest Applications The concept of an allied visitor encompasses almost every feature of the visitor’s understanding. The cleverly planned app covers the whole thing as of party announcements to hotel examines to allegiance programs [11]. If we observe a visitor using a set of ratings at a discussion, we can use that application to propel our guest by electronic means for an incident trip, entire with an interactive plan of conference place where assembly will be apprehended. Many hotels have mobile apps already accessible; however, they add innovative attributes on a daily basis. Similarly, emerging technology developers working in the hospitality industry have fascinating skills that canister is integrated with this hotel relevance that provide visitors with the added facility to interact by means of hotel employees [12–17] and additional hotel visitors. A number of big buntings have begun exploring innovative techniques to fashion an exclusive tailored experience for their visitors on their cellular phone devices. Eventually, a hotel through strictly associated understanding will be filled with the happiest, most energetic visitors.
4 Self-Service Meeting Spaces The contemporary congregation’s place has changed a lot now a day. Modern meetings are technologically advanced [18, 19]. Industry natives want to formulate multimedia arrangements and video tours from time to time both at the same time. And with marathon meetings, they would like to have access to services such as food with minimal disturbances. Above all, smart hotels know that they need these places to be used as easily as possible. The A/V rooms of the past that needed dedicated engineers are no longer an option [20]. Considering that these types of industry requirements are imperfect to boardroom-style consultation rooms, business clients are gradually inquiring for the power of superior expertise in ballrooms and event halls, too.
5 Predictive Analytics As more technology can be added to improve visitor self-awareness, software and procedures will generate supplementary information concerning how travelers interrelate [21] with office personnel and resources across campus. Depending on when the lights or the TV are turned on, employees can determine the standard moment a
Technological Impacts of AI on Hospitality and Tourism Industry
75
Fig. 3 Predictive analytics [21]
visitor wakes up. These outlines can be maintained and tracked by visitors as they move from one place to another, acclimatizing to their circumstances so that they are more consistent no matter where they live. Different steps of predictive analysis are shown in Fig. 3.
6 Mobile Friendly Over 46.95% of Web page analyzes worldwide came from cellular phones only. It means that, no matter what voyage or tourism overhaul is offered to visitors, they require making sure that all of your online business presence requests to be compatible with a cell phone. Moreover, businesses require making sure they give clientele admittance to essential services at their property via smartphones. For illustration, a hotel that offers a range to reserve-related amenities such as health spa, cabins, leisure actions, eatery, and in-room dining services [22], with an itinerant application pretty dissimilar to calling reception to do so.
7 Immersive Visual Experiences If we propose a visual sightsee of our hotel or leisure center or an admired vacationer target where our business maneuver, to impending customers abroad, then it will definitely increase the visitors counting. Actually, it became possible today only because of advances in advanced visual expertise, such as “Augmented Reality” (AR), “Virtual Reality” (VR), and “Mix Reality” (MR) [23]. With these, we can visualize tourist attractions and accommodations even in interactive content marketing campaigns.
76
S. Sharma et al.
8 Internet of Things Knowledge extends beyond the confines of computers, smartphones, and covers approximately all visible space around us. The Internet of Things (IoT) archetype has unlocked up innovative opportunities to improve client experience significantly. The hospitality and tourism industry knows how to use the power of IoT to help their clientele effectively [24]. On or after hotels presenting elegant opportunity, atmosphere gearshift to tourists and airlines that help get in and out of the airport.
9 Big Data Analytics From the vast amount of statistics produced by tourists and voyagers, trades in the hospitality and tourism industry are capable of finding information that helps them make the most excellent conclusion intended for development. This is encouraged through the prevailing dynamics [25] of data statistics available today even on a subscription basis. This makes the proposal even more appealing to small trades while they are currently able to contend through bullies in their trade population by fast valuable information regarding client performance, expenditure practice, and their well-being [26, 27].
10 Conclusion Technology has revolutionized the way the world works, travels, and thrives by making things easier, safer, and more efficient. It is easy to specify in detail the expectations and make sure they will be met if there are hi-tech programs in place to reduce the chances of human error. The hospitality industry, which is important for customer communication, is currently following guidelines set for the development of information and communication technology; this becomes an important factor in achieving the highest possible performance. A growing number of hoteliers are recognizing the importance of this feature, thus considering the definition of an online marketing strategy, as well as an increasingly important budget allocation to improve physical services or even the development of dedicated mobile applications, thanks to the latest global trends. Concluding it is observed that AI plays a vital and impactful role in the field of hospitality and tourism industry, and by the use of these abovementioned technological trends, this sector got lots of advantages.
Technological Impacts of AI on Hospitality and Tourism Industry
77
References 1. Anurag (2018) 4 Emerging trends of artificial intelligence in travel. Available at: www.newgen apps.com/blog/artificial-intelligence-in-travel-emerging-trends. Accessed 5 Sept 2019 2. Barnes S (2016) Understanding virtual reality in marketing: nature, implications, and potential: implications and potential. Available at: https://ssrn.com/abstract=2909100. Accessed 3 Nov 2016 3. Beerli A, Martin JD (2004) Factors influencing destination image. Ann Tour Res 31(3):657–681 4. Boiano S, Borda, A, Gaia G (2019) Participatory innovation and prototyping in the cultural sector: a case study. Proceedings of EVA, London, pp 18–26 5. Bulanov A (2019) Benefits of the use of machine learning and AI in the travel industry. Available at: https://djangostars.com/blog/benefits-of-the-use-of-machine-learning-and-ai-in-the-travelindustry/. Accessed 2 Sept 2019 6. Chavre P, Ghotkar A (2016) Scene text extraction using stroke width transform for tourist translator on the android platform. In: 2016 international conference on automatic control and dynamic optimization techniques (ICACDOT), IEEE 7. Chawla S (2019) 7 Successful applications of AI & machine learning in the travel industry. Available at: https://hackernoon.com/successful-implications-of-ai-machine-learning-in-tra vel-industry-3040f3e1d48c. Accessed 5 Sept 2019 8. Dirican C (2015) The impacts of robotics, artificial intelligence on business and economics. Procedia Soc Behav Sci 195:564–573 9. Gajdošík T, Marciš M (2019) Artificial intelligence tools for smart tourism development. Computer science on-line conference. Springer 10. Issa H, Sun T, Vasarhelyi MA (2016) Research ideas for artificial intelligence in auditing: the formalization of audit and workforce supplementation. J Emerg Technol Account 13(2):1–20 11. Ivanov S, Webster C (2019) Perceived appropriateness and intention to use service robots in tourism. Information and communication technologies in tourism 2019. Springer, Cham, pp 237–248 12. Ivanov S, Webster C (2017) Adoption of robots, artificial intelligence and service automation by travel, tourism and hospitality companies—a cost-benefit analysis. International scientific conference contemporary tourism—traditions and innovations, pp 19–21 October, Sofia University 13. Ivanov SH, Webster C, Berezina K (2017) Adoption of robots and service automation by tourism and hospitality companies. Revista Turismo Desenvolvimento 27(28):1501–1517 14. Jung T, Tom Dieck MC, Lee, H, Chung N (2016) Effects of virtual reality and augmented reality on visitor experiences in museum. In: Inversini A, Schegg R (eds) Information and communication technologies in tourism 2016. Springer, Cham 15. Jung T, Tom Dieck MC, Moorhouse N, Tom Dieck D (2017) Tourists’ experience of virtual reality applications. In: 2017 IEEE international conference on consumer electronics (ICCE), IEEE 16. Kannan P, Bernoff J (2019) The future of customer service is AI-Human collaboration. MIT Sloan Management Review 17. Kaushik N, Kaushik J, Sharma P, Rani S (2010) Factors influencing choice of tourist destinations: a study of North India. IUP J Brand Manage 7(1/2):116–132 18. Kim T, Kim MC, Moon G, Chang K (2014) Technology-based self-service and its impact on customer productivity. Serv Mark Q 35(3):255–269 19. Kumar R, Li A, Wang W (2018) Learning and optimizing through dynamic pricing. J Revenue Pricing Manage 17(2):63–77 20. Kumar VM, Keerthana A, Madhumitha M, Valliammai S, Vinithasri V (2016) Sanative chatbot for health seekers. Int J Eng Comput Sci 5(3):16022–16025 21. Leong B (2019) Facial recognition and the future of privacy: I always feel like… somebody’s watching me. Bull At Scientists 75(3):109–115 22. Nagaraj S (2019) AI enabled marketing: what is it all about? Int J Res Commer Econ Manage 8(6):501–518
78
S. Sharma et al.
23. Nagaraj S (2020) Marketing analytics for customer engagement: a viewpoint. Int J Inf Syst Soc Change (IJISSC) 11(2):41–55 24. Nagaraj S, Singh S (2019) Millennial’s engagement with fashion brands: a moderatedmediation model of brand engagement with self-concept, involvement and knowledge. J Fashion Mark Manage Int J 23(1):2–16. https://doi.org/10.1108/JFMM-04-2018-0045 25. Patel V (2018) Airport passenger processing technology: a biometric airport journey. Available at: https://commons.erau.edu/edt/385/. Accessed 5 Sept 2019 26. Peranzo P (2019) AI assistant: the future of travel industry with the increase of artificial intelligence. Available at: www.imaginovation.net/blog/the-future-of-travel-with-the-inc rease-of-ai/. Accessed 5 Sept 2019 27. Sharma S, Rawal YS, Pal S, Dani R (2022) Fairness, accountability, sustainability, transparency (FAST) of artificial intelligence in terms of hospitality industry. In: Fong S, Dey N, Joshi A (eds) ICT analysis and applications. Lecture notes in networks and systems, vol 314. Springer, Singapore. https://doi.org/10.1007/978-981-16-5655-2_48
Improvement of Real-Time Kinematic Positioning Using Kalman Filter-Based Singular Spectrum Analysis During Geomagnetic Storm for Thailand Sector Worachai Srisamoodkham, Kutubuddin Ansari , and Punyawi Jamjareegulgarn Abstract The ionospheric error is the largest error source of positioning, especially when electron density becomes high during geomagnetic storms. Real-time kinematic (RTK) positioning during the storm time often has higher fluctuation and noise errors in positioning. Therefore, in this work, a technique based on Kalman filter with implementation of singular spectrum analysis (namely KF-SSA) is anticipated for RTK positioning. The RTK drone data are collected around 50 min of interval (7.37 AM to 8.28 AM) on May 12, 2021, with respect to a base station located at 13.84° N, 100.29° E. The RTK positioning tests have been done to determine the positioning accuracy of the proposed algorithm. The simulated results reveal that SSA implication with KF showed very high-precision estimates and improves the positioning accuracy. Keywords Global navigation satellite system · KF-SSA positioning · Real-time kinematic
1 Introduction Global navigation satellite system (GNSS) has been implemented for precise positioning, navigation and timing services in several modern technologies. The GNSS receiver positioning approach is based on a trilateration concept for computing ionospheric range delays based on the received satellite signals and several related biases. The main error source during calculating the receiver positioning is the ionosphere W. Srisamoodkham Faculty of Agricultural and Industrial Technology, Phetchabun Rajabhat University, Phetchabun 67000, Thailand K. Ansari Integrated Geoinformation (IntGeo) Solution Private Limited, New Delhi 110025, India P. Jamjareegulgarn (B) King Mongkut’s Institute of Technology Ladkrabang, Prince of Chumphon Campus, Chumphon 86160, Thailand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_7
79
80
W. Srisamoodkham et al.
where the number of electron density always fluctuates regarding to local time, season, solar activity, location, etc. [1]. Moreover, the ionospheric electron density also responses to geomagnetic storms and behaves numerous irregularities such as equatorial plasma bubbles (EPBs), ionospheric storm, and prompt penetrated electric field. Wintoft and Cander [2] The ionospheric time delays of GNSS signals are related directly to the total electron content (TEC) and employed to indicate the precise receiver positioning effectively. However, due to several mentioned ionospheric anomalies, the receiver positioning accuracies sometimes are extremely deteriorated. Therefore, a lot of modern technologies for new businesses, transportations, and industries have still asked for high accuracy GNSS in conjunction with the 5G mobile cellular networks, for example, drone, autonomous car, and other unmanned autonomous vehicles (UAVs) and so on. Here, there have been several approaches for GNSS-based high accuracy for 5G technology such as real-time kinematic (RTK), precise point positioning (PPP), and RTK-PPP. So far, although some prediction models like the IRI and IRI-Plas models have been considerably improved to predict and report the ionospheric parameters and TECs during with and without the geomagnetic storms, these models have never focused on the GNSS receiver positioning and added some methods for mitigating the positioning errors. In contrast, the international GNSS service (IGS), known as a very large network of GNSS receivers for providing the GIM TEC map, has been focusing on the high accuracy GNSS for 5G technology, UAV, autonomous car, etc., with the new techniques for RTK and PPP. Therefore, the new techniques for real-time positioning are very important for future technologies. Recently, there have been numerous investigations about SSA for modeling and forecasting the time series data such as ionospheric TEC, positioning, and velocity have been done because it employs data in time domain to extract the information of noisy time series without prior dynamical information of time series. Particularly, the trends of obtained data are not necessarily to be linear, but the modulation of oscillations both phase and amplitude is possible [3]. Dabbakuti [4] proposed a new ionospheric prediction model based on SSA and artificial neural network, namely SSA-ANN model. They studied hourly GPS)-TEC observations from the period of 2009–2017 at a station located in Bangalore, India (13.02° N and 77.57° E). The root means square errors between the observed and the SSA-ANN TEC values were within 1.40 TECU, and the correlation coefficient can be close to 0.99. Likewise, Ansari et al. [1] also applied SSA method on the GPS TECs during the year of 2017 over low-latitude Nepal region. The higher and lower TEC variabilities can be found clearly during equinoctial and solstice seasons. The correlation coefficient between the observed and reconstructed data of about 1.00 demonstrates that SSA method is a promising tool for TEC prediction over equatorial and low-latitude regions. As for SSA-based positioning, Ansari and Park [5] investigated PPP method of multiconstellation GNSSs and predicted the propagation errors in East-Asian sector. The PPP accuracy relies on the dilution of positioning (DOP) and is also impacted by the corrections of the atmospheric delays. The accuracies are enhanced about 70% after applying SSA method. Afterward, Ansari [6] proposed initially a technique based on Kalman filter with implementation of singular spectrum analysis (namely
Improvement of Real-Time Kinematic Positioning Using Kalman …
81
Fig. 1 Location of RTK drone data is collected around 50 min of interval (7.37 AM to 8.28 AM) on May 12, 2021, with respect to the base station located at 13.84 °N; 100.29 °E
KF-SSA) for RTK. The real-time car data were used for validating the proposed method. The results showed that KF-SSA-based accuracy of positioning improved as compared to the ordinary KF model. In this paper, the KF-SSA method has been continued to apply with a moving drone, and its performance has been analyzed during geomagnetic storms on May 12, 2021, over Thailand sector (Fig. 1).
2 Method of Modeling The Kalman filter (KF) was named to honor “Rudolf E. Kalman” is an algorithm used for observed series measurements with time, including statistical inaccuracies or other noise errors and generates unknown estimated variables intending to be high accuracy compared to the observed measurements. In the case of the linear Kalman filter, let us consider the previous state matrix given by X k −1 and process covariance matrices Pk −1 . Let us suppose that the predicted state and process covariance matrices are denoted by X kp and Pkp . Now, according to the definition of KF, the predicted matrix for the new state can be given in the following way:
82
W. Srisamoodkham et al.
X kp = AX k−1 + Bµk + wk Pkp = A Pk−1 A T + Q k
(1)
where A and B work as the matrices of adaptation and are utilized for the conversion from the previous state to the predicted state. Here, w is the predicted state noise matrix; µ is the control variable matrix, and Q is the process covariance noise matrix. Suppose Kalman gain is denoted by K G , and R is the measurement error, or sensor noise covariance, then the equation of process covariance matrix with the relation of K, can be given like this: KG =
Pkp M M Pkp M T + R
(2)
Here, R indicates the measurement error matrix, and M is a simple matrix of adaptation. Let us consider the measurement noise matrix given by zk ; X km is measured value matrix with an adaptation matrix of C; then, measured state matrix (Y k ) can be computed by the following equation. Yk = C X km + z k
(3)
Now, the final equation for the predicted state matrix and next process covariance matrix by using previous state can be expressed as shown in (4). X k = X kp + K G [Yk − M X kp ] Pk = (I − K G M)Pkp
(4)
The obtained predicted values of X k are predicted by SSA method. The SSA includes four important steps, i.e., Step (i) Embedding step, Step (ii) Singular value decomposition (SVD) step, Step (iii) Eigentriple grouping step, and Step (iv) Diagonal averaging step. Note that the SSA method has several equations and details that are not shown and described in this section. However, those SSA explanations can be found and read additionally from Ansari 2021 [6].
3 Data Used The RTK drone data are collected around 50 min of interval (7.37 AM to 8.28 AM) on May 12, 2021, with respect to the base station located at 13.84° N; 100.29° E (Fig. 1). Here, the date May 12, 2021 (DOY 132), was selected because the most intense geomagnetic storm (G3-storm) with the Kp index = 7 occurred on the same day. A coronal mass ejection (CME) connected with a filament eruption left the Sun on May 9, 2021. From this CME, a shock signature was seen in the DSCOVR and ACE satellite data on May 12, 2021, at around 05:50 (UT), and a sudden commencement
Improvement of Real-Time Kinematic Positioning Using Kalman …
83
was noted at Earth on the morning (just after 6:30 UT) of the May 12, 2021. The geomagnetic fields were disturbed and reached to the conditions of STORM G3 [7]. GNSS data were processed with RTKLib software [8], and the rover positions and velocities with uncertainty were also recorded and shown in this present work.
4 Result and Discussion The GNSS data were obtained in Rinex format and processed with RTKLib software. The plots of rover position and velocity with uncertainty were shown in Figs. 2 and 3. The starting point of drone is considered as an origin point, and the displacements from that point in northeast and up directions are estimated with time. The X-axis of the plot showed the time of recorded data. Obviously, we can see that the drone was moving smoothly which has some peaks in south, east, and up directions. Actually, this kind of data was needed to study the SSA performance for positioning purpose. The velocity plots of drone in north, east, and up directions have been shown in Fig. 3. The velocity and position of drone as well as the estimated position shown in Fig. 4 were used as input signals for Kalman filter (KF). Afterward, the SSA techniques were implemented, and the positions were predicted. The observed positions are plotted with blue color; the normal KF plots are shown with red color, and the KFSSA plots are depicted with green color. It can be seen clearly from Fig. 4 that the observed and KF-estimated positions have some fluctuations in positioning. The predicted positions by KF-SSA are smoother in all of three.
Fig. 2 GNSS positioning plots of rover position with uncertainty
84
W. Srisamoodkham et al.
Fig. 3 GNSS velocity plots of rover position with uncertainty
Fig. 4 GNSS positioning plots of rover position obtained from the observation, the KF-based predicted values, and the KF-SSA-based predicted values
Improvement of Real-Time Kinematic Positioning Using Kalman …
85
Table 1 Range values in (north, east, up) directions of the observed, the KF-based, and the KFSSA-based positioning, as well as the respective RMS improvements Direction
Observation (m)
KF (m)
KF-based RMS improvement (%)
KF-SSA (m)
KF-SSA-based RMS improvement (%)
North
10.5140
9.0051
14.35
8.3886
20.22
East
9.4893
6.4679
31.80
6.3355
33.24
Up
8.2777
7.6922
7.07
7.1852
13.20
Total
9.4712
7.8669
16.94
7.3515
22.38
To ensure the proposed model accuracy, the root mean square (RMS) values have been estimated and shown in Table 1. It can be noticed from the Table 1 that the RMS after using KF is improved around 14.35% in north direction, 31.80% in east direction, and 7.07% in up direction. Total combined position RMS has been reached up to 16.94%. After implementing KF-SSA method, the RMS values have also been estimated, and the notable enhancements were recorded. The RMS in north direction was improved from 14.35% with only KF method to 20.22% with KF-SSA technique. Similarly, the RMS of 31.80% before applying SSA method was improved up to 33.24% in east direction, and it also increased from 7.07% to 13.20% in up direction. Afterward, we combined all three-dimensional positions and estimated errors. The RMS of the observed position was 9.4712 m, which becomes lower around 7.8669 m after applying the KF-SSA that means it was improved up to 16.94%. Likewise, it becomes 7.3515 m (22.28% improvement) when KF-SSA was utilized. Recently, Ansari [6] evaluated the performance of KF-SSA method using the real-time car data: Swift Piksi Multi 1.2 FW and u-blox M8T that can be accessed via http://rtkexplorer.com/downloads/gpsdata/?fbclid=IwAR2O_8ixUQM xdanYuEFnhpgFGgoA0lxBjTa2pEi8h7C53lG6-cTV7qSJ7_M). Hence, this realtime car data were also used as the first dataset for validation purpose in this work. We found that the estimated residuals by KF model provide the RMS values (7.21, 4.42, and 4.20) m that can be decreased with KF-SSA approach to be (4.53, 3.44, 3.54) m. These results indicate that the accuracies of KF-SSA method are higher than those of KF model. Furthermore, we also used the positioning data of IISC station in Bangalore at India, one of Indian IGS stations (13.02° N; 77.570° E) as the second dataset for validation purpose in this work. In this case, we use only IRNSS signals. The studied results show that the RMS values in (north, east, up) directions are (14.36, 3.12, and 9.84) m before applying KF-SSA. After implementing this method, the accuracy was improved up to 23% as shown in Fig. 5. Finally, we concluded that the KF-SSA technique works well to improve the positioning accuracy over Thailand sector.
86
W. Srisamoodkham et al.
Fig. 5 GNSS positioning plots of static position at IISC station, Bangalore, India
5 Conclusions This study investigates the positioning performance when the raw observations of a moving drone were taken during the storm time. This is clear and already proved that because of ionospheric errors, the RTK positioning provides lot of discrepancies along with noise errors in positioning. Although the KF model is widely used to predict the RTK positioning and remove the noise errors, but it is not sufficient. Hence, we implemented the SSA forecasting method in conjunction with KF (socalled KF-SSA) in this work. The results show that more noise errors can be removed, and the positioning performance can be improved. Final positioning improvement in RMS was noted around 22%. The proposed KF-SSA technique has been proved as the suitable technique for RTK positioning improvement over Thailand sector. However, the improvement is only 22% since the observed data have too much fluctuation. In conclusion, such type of study can design a minimization of estimated errors and optimizing the utilized computational resources simultaneously. The results from the study revealed that the study is able to provide insights into the Kalman filter for GNSS-based positioning, velocity, and acceleration estimation. Acknowledgements This research is financially supported by Broadcasting and Telecommunications Research and Development Fund for Public Interest (project code: B2-001/6-2-63). The authors would also like to express their sincere thanks to the RINEX observation files of Department of Public Works and Town & Country Planning, Bangkok, Thailand.
References 1. Ansari K, Panda SK, Jamjareegulgarn P (2020) Singular spectrum analysis of ionospheric TEC variations over Nepal during the low solar activity from GPS network observables. Acta Astronaut 169:216–223 2. Wintoft P, Cander LR (2000) Ionospheric foF2 storm forecasting using neural networks. Phys Chem Earth Part C Solar Terr Planet Sci 25:267–273 3. Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi Yiou P (2002) Advanced spectral methods for climatic time series. Rev Geophys 40(1):3–1
Improvement of Real-Time Kinematic Positioning Using Kalman …
87
4. Dabbakuti JRKK (2019) Application of singular spectrum analysis using artificial neural networks in TEC predictions for ionospheric space weather. IEEE J Sel Top Appl Earth Observations Remote Sens 12(12):5101–5107 5. Ansari K, Park KD (2018) Multi constellation GNSS precise point positioning and prediction of propagation errors using singular spectrum analysis. Astrophys Space Sci 363(12):1–7 6. Ansari K (2021) Real-time positioning based on Kalman filter and implication of singular spectrum analysis. IEEE Geosci Remote Sens Lett 18(1):58–61 7. Seok HW, Ansari K, Panachai C, Jamjareegulgarn P (2022) Individual performance of multiGNSS signals in the determination of STEC over Thailand with the applicability of Klobuchar model. Adv Space Res 69(3):1301–1318 8. Takasu T (2018) RTKLIB: an open-source program package for GNSS positioning. Available online, www.rtklib.com
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis for Grade up Strategy M. Parvathi and T. Amy Prasanna
Abstract Accreditation is a process for grading the technical or non-technical programs that are offered by various colleges and universities. The TIER-I system is designed for the technical programs run by independent institutions and university departments. The institutions under government or deemed-to-be universities or affiliated one follow TIER-II accreditation system. NBA and NAAC are the major accreditation systems where as NIRF is the National Institutional Ranking Framework (NIRF) is a methodology adopted by the Ministry of Education, Government of India, to rank institutions of higher education in India. NAAC evaluation method is qualitative, whereas NIRF has quantitative methodologies which it follows to provide ranks to educational institutions. The existing literature restricted their study and analysis is for only comparative analysis for an accreditation process. This paper suggests the necessary measures that are to be taken by an institute or autonomous or university to get a better grade while in the process of accreditation. Keywords NBA · NAAC accredits · TIER-I & II systems · NIRF ranking · Qualitative evaluation method · Better grade
1 Introduction Any programme or institute must be accredited by the authority of a set of professional councils or regulatory bodies that are established by the University Grants Commission. Any institute that needs accreditation in their programme has to undergo a rigorous process that helps them succeed in the required quality checks in terms of managing and maintaining the institution’s data, designing the curriculum and courses, following procedures for assessments and evaluation, improving the M. Parvathi (B) · T. Amy Prasanna BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India e-mail: [email protected] T. Amy Prasanna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_8
89
90
M. Parvathi and T. Amy Prasanna
students’ performance, providing top-class infrastructure and facilities, etc. Among them, peer review, visit, and examination are the main steps for a final decision in the process of NBA accreditation [1, 2]. For the first step peer review process to go through, institutions are need to compile their data at least for the first five years from the inception and submit the same without any errors in the form of compliance reports. Based on the satisfactory level, the institutes are given permission for accreditation committee to visit. The committee main role is to check the appropriate truthiness in the submitted data to the available information by validating the necessary testimonials and documents. Accordingly, the decision will be made either positively or negatively by the committee in providing the accreditation. In this process, the accreditation empowers the higher educational organizations abide to maintain certain procedures like curriculum gap analysis, strengthen their processes, further that helps in build the confidence among the individuals. It benefits the institutions as an opportunity to bring a strategy on their curriculum as a roadmap toward sustainable developments. It predominantly helps in achieving the key parameters like meeting the benchmarks of quality and excellence, increasing the admission rate by gaining confidence of students and parents, aids the employers while define the curriculum standards that help in enrich the student knowledge, lead them to appear for further certification examinations, accumulates the probabilities of placements, finally helps in providing routes for self-improvement of educational institutions. The validity period of NBA accreditation as per the revised guidelines is 3 years of minimum to 6 years for full accreditation. Despite of program accreditation, NAAC carries out the process of quality assessment and accreditation of higher educational institutes (HEI) [3]. Several HEIs like universities and autonomous colleges will have their own program curriculum to provide better research culture paradigms to their future endeavors. NAAC accreditation is an essential tool while bringing out the committed and quality culture in HEIs on par with the existing requirements. However, the methods and procedures of various accreditation styles followed by institutes or HEIs, the parametric measurement while selecting a particular educational institute for its quality, are done by National Institutional Ranking Framework (NIRF), which was approved by the MHRD, launched by Honorable Minister of Human Resource Development in 2015. Based on certain quality measures like “Teaching, Learning and Resources,” “Research and Professional Practices,” “Graduation Outcomes,” “Outreach and Inclusivity,” and “Perception,” the core committee of NIRF will provide ranking to various universities and institutions. [4–6] have investigated on the growth of publication in the different subject category as well as impact of growth before and after NIRF, and identified the research is one of the key parameters that influence the ranking a lot. Chowdhury et al. [7] have published their findings as comparative study on global ranking framework of higher educational institutions, yet the parameters for grade improvement are not delved much. Addressing these issues, the remaining part of this paper is structured in such a way that Sect. 2 discusses various evaluation parameters that are considered by NBA or NAAC during their accreditation process and their comparisons; Sect. 3 will discuss the analysis on existing grade information and improvement based on the evaluation;
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
91
Sect. 4 will discuss about the method of estimation of grades of a particular institute based on the existing grade criteria. The results analysis and conclusions will be discussed in Sect. 5.
2 Performance Indicators for Accreditation Basically, there are two accreditation systems, namely TIER-I and TIER-II. The TIER-I accreditation system is related to the technical programs [2, 3] that are offered by autonomous institutions and university departments, whereas TIER-II is related to the technical programs that are offered by non-autonomous institutions including both private and government controlled, i.e., affiliated to a university. The process of accreditation follows certain performance indicators that are individually defined by NBA and NAAC bodies can be observed in Tables 1 and 2. In both the tables, Tables 1 and 2, the highest scoring criteria’s are represented with colors. For NBA system, the faculty information is given priority that belongs to criteria 5. Among the criteria factors, the importance is given to research and development with a bit high score to strengthen the research culture among the institutions. For NAAC system, the priority is given for research innovations and then to teaching learning evaluations. The teaching learning evaluation is given highest priority in the case of affiliated colleges under NAAC, and the reason behind this is to bring out the best from such affiliated institutions on par with universities and autonomous one. Compared to TIER-II the TIER-I outcome parameters, weightage is increased and is mainly to strengthen the outcome-based education system, further to attain the desired outcomes of a program. Table 3 is highlighted with common parameters that belong to both NBA and NAAC accreditation systems taking affiliated institutions under Table 1 NBA-performance-indicators S. No.
Performance indicator
Marks
1
Criterion 1: vision, mission, and program educational objectives
60
2
Criterion 2: Program curriculum and teaching–learning processes
120
3
Criterion 3: Course outcomes and program outcomes
120
4
Criterion 4: Students’ performance
150
5
Criterion 5: Faculty information and contributions
200
6
Criterion 6: Facilities and technical support
80
7
Criterion 7: Continuous improvement
50
8
Criterion 8: First year academics
50
9
Criterion 9: Student support systems
50
10
Criterion 10: Governance, institutional support, and financial resources
120
Total
1000
92
M. Parvathi and T. Amy Prasanna
Table 2 NAAC-performance-indicators S. No.
Performance indicator
Marks Universities
Autonomous colleges
Affiliated/Constituent colleges
1
Curricular aspects
150
150
100
2
Teaching–learning and evaluation
200
300
350
3
Research, innovations, and extension
250
150
120
4
Infrastructure and learning resources
100
100
100
5
Student support and 100 progression
100
130
6
Governance, leadership, and management
100
100
100
7
Institutional values and best practices
100
100
100
Total
1000
1000
1000
consideration. The program curriculum and teaching learning process from criteria2 is scored 120 in NBA and is equivalent to two criteria’s 1 and 2 in the NAAC system with a score of 450 that includes both curricular aspects (scored for 100) and teaching–learning evaluation (scored for 350). As per NBA evaluation criteria’s, curriculum evaluation is one of the important measuring indicators which is an essential phase of curriculum development. Using respective outcomes of a program, one can assess whether the curriculum is fulfilling Table 3 Common parameters and comparisons from NBA, NAAC NBA
Scores
NAAC for affiliated
Criterion 2: program curriculum and teaching–learning processes
120
100
Criteria 1: curricular aspects
Criterion 2: program curriculum and teaching–learning processes
120
350
Criteria 2: teaching–learning and evaluation
Criterion 3: course outcomes and program outcomes
120
350
Criteria 2: teaching–learning and evaluation
Criterion 4: students’ performance
150
350
Criteria 2: teaching–learning and evaluation
Criterion 5: faculty information and contributions
200
350
Criteria 2: teaching–learning and evaluation
Criterion 9: student support systems
50
130
Criteria 5: student support and progression
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
93
its purpose and whether students are actually learning. Curriculum is linked with program objectives and is to be identified in prior while evaluating and assessing the curricular effectiveness. Teaching learning process assessment is part of curriculum assessment. However, in the case of NAAC, for teaching–learning evaluation, more weightage is given rather than for curriculum evaluation. The key point to be observed in the teaching, learning is that both the accreditation systems evaluations are purely rely on the assessment of student in terms of classroom teaching, conducting experiments, and question papers. It is the largest criterion with 35% share compared to all other criteria’s; hence, more focus is to be given to the best academic practices to ensure better learning skills. In the similar line, criteria 3 and 5 from NBA are also closely matching with criteria’s 1 and 2 from NAAC. Well evaluation of criteria’s 2, 3, and 5 of NBA results in score of 560 compared to the score of criteria’s 1 and 2 which is 450 from NAAC. In the similar way, criteria’s 4 and 9 from NBA are similar to criteria 5 from NAAC. The total scores of criteria’s 4 and 9 from NBA are 200 whereas the score of criteria 5 from NAAC is 130. The commonality of each criteria further can be delved further with internal sub-criteria’s [8–12] from both accreditation systems, but the final score of each criteria is a matter how that can influence the final grade. Hence, one can understand that if any institute that complete their accreditation evaluation through NBA can easily achieve NAAC evaluation system too. However, the piece-wise evaluation is more essential rather than the whole program evaluation that leads to consistence maintenance of the data at every corner of work environment and further allows to inculcate systematic procedures to polish the refinement in curriculum policy decisions, continuous curriculum adjustments based on the feedbacks and further to process curriculum implementations.
3 Analysis on Existing Grade Information and Improvement Based on the Evaluation Cumulative grade point average (CGPA) is one of the measuring parameters for NAAC that allots the grades in the range of A++ to D from low to high as shown in Table 4 [3]. If the CGPA score is less than or equal to 1.5 indicates the respective educational institution is not accredited, whereas is the institutions having A+ grade means their CGPA scores are in the range of 3.26–3.50. In the similar line, CGPA range for grade A will be 3.01–3.25, and for B is in the range of 2.01–2.50. Equations 1 and 2 represent the calculation for cumulative grade point average (CGPA). n CrGPA j = where, ‘i’ indicates key aspects
i=1 (KAWGP)i n i=1 W i
=
CrWGP Wi
(1)
94
M. Parvathi and T. Amy Prasanna
Table 4 Grades and CGPA ranges in NAAC system
Grades in NAAC system
Range of CGPA
D
0–1.5
C
1.51–2.0
B
2.01–2.50
B+
2.51–2.75
B++
2.76–3.00
A
3.01–3.25
A+
3.26–3.50
A++
3.51–4.00
‘j’ indicates the criteria ‘n’ indicates the key aspects in that criterion (KAWGP)i summation of the allotted key aspect-wise weighted grade points of that criterion Wi summation of the predetermined weightages of the key aspects of that criterion The cumulative grade point average (CGPA) of a particular institute can be calculated using the ratio of sum of the criteria-wise weighted grade points to pre-assigned weightage points of all the criteria’s. Based on the CGPA, the grade will be allotted. 7 Institutional CGPA =
i=1 (
CrWGP) j 7 i=1 W j
(2)
For example, if we consider criteria 1, i.e., curricular aspects which include curriculum design and development, academic flexibility, curriculum enrichment, feedback system, have predetermined values of 50, 50, 30, 20, respectively, and the corresponding key aspects grade points assigned by peer team are (KAGP)i 3, 2, 1, 2, out of 4 point scale, respectively, and the key aspect-wise weighted grade points can be calculated as follows: KAWGPi = KAGPi ∗ W i
(3)
Then, each sub-criteria under curricular aspects will be measured to 150, 100, 30, and 40, respectively. In this case, the calculated criteria-wise grade point average is Cr GPAi = (CrWGP)i/Wi = 320/150 = 2.13. In the similar manner by considering Cr GPAi for all seven criteria’s, the institutional CGPA can be calculated using Eq. 2. Accreditation status of various institutes is studied from reliable sources of [13– 22] and gathered relevant data, finally categorized based on the type of institution. For our work, the information collected from 347 universities based on their CGPA and accreditation validity period, shown in Table 5. However, the worth note point from the NAAC accreditation grades for universities is few of their CGPA though is not in the stipulated ranges but allotted higher
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
95
Table 5 Grades corresponding CGPA range and accreditation period for universities S. No.
Name of the university
State
CGPA
Grade
Accreditation period
248
Periyar Maniammai Institute of Science and Technology, Thanjavur (Second Cycle)
Tamil Nadu
2.66
B
15-11-2020
250
Ponnaiyah Ramajayam Institute of Science and Technology, Thanjavur (First Cycle)
Tamil Nadu
2.95
B
15-11-2020
254
St. Peter’s Institute of Tamil Nadu Higher Education and Research, Chennai (First Cycle)
2.52
B
15-11-2020
303
Santosh University, Ghaziabad (First Cycle)
Uttar Pradesh
2.56
B
15-11-2020
304
Shobhit Institute of Engineering and Technology, Meerut (First Cycle)
Uttar Pradesh
2.12
B
15-11-2020
3
Sri Krishnadevaraya University Anantapur—515003 (Third Cycle)
Andhra Pradesh
2.85
B
24-05-2021
5
Yogi Vemana University Kadapa (Cuddapah)—516003 (First Cycle)
Andhra Pradesh
2.54
B
18-01-2021
180
Shivaji University, Shivaji University, Kolhapur—416004, Kolhapur, Maharashtra, 416004 (Fourth Cycle)
Maharashtra
3.52
A++
30-03-2026
271
Sri Ramachandra Tamil Nadu Institute Of Higher Education And Research (Deemed-To-Be University U/S 3 Of The Ugc Act 1956),Tamil Nadu, 600116 (Third Cycle) (Seven Years)
3.53
A+ +
24-01-2028
10
Sri Venkateswara Andhra Pradesh University, Tirupati, Chittoor—517502 (Third Cycle)
3.52
A+
08-06-2022
44
University of Delhi, Delhi—110007 (First Cycle)
3.28
A+
29-11-2023
Delhi
96
M. Parvathi and T. Amy Prasanna
grades (S. No 180, 129, and 10), and some of them are given low grade even their CGPA score is within the qualified range (S. No 248, 250, 254, etc.). Universities or deemed-to-be universities run their institutes with full independence in administration and operations, freedom in deciding syllabus, pedagogy, etc., while compared to autonomous universities that are affiliated to the government. The freedom of deciding program and syllabus raised the number of institutions to the level of autonomous deemed universities. According to SIR report from 2019 to 2021, the number of institutions in the world ranking is raised in Asian region to (+433) [23]. For our work, the list of 209 accredited autonomous colleges, affiliated institutes, and their grades are collected; few are listed in Tables 6 and 7. However, the worth note point from the NAAC accreditation grades for Transition Autonomous Colleges is few of their CGPA though they are in the stipulated ranges but allotted low grades (S. No: 20, 25, 27, 42, 44, 64, etc.). The metrics are evaluated for the score of 500 (each with a score of 100) and are common to all universities, or colleges or institutes, but the difference comes with the weightage that is given on each individual parameter. In the case of universities, more weightage is given to research productivity, whereas for institutes and colleges more weightage is given to teaching, learning, and resources. Hence, one should consider both research and teaching learning practices to be given more weight irrespective of accreditation or ranking for the quality upliftment.
4 Method of Estimation of Grades Based on the Existing Grade Criteria Numerous college/institution Web site data were collected to understand institutional academic performances in accordance of their grade obtained in NAAC accreditations. We have considered NAAC grade performance rather than NBA due to point of interest in overall academic performance that is more achievable through NAAC grading rather than program-based NBA. It is observed form the literature, most of the universities, deemed-to-be universities are in the grade of A++ and is due to their independent program curriculum. It helps them in inculcating the procedures and systems more related to outcome-based learning practices. Few of the differentiators that are identified through various university Web sites are department-wise individual research labs, facilities for startups, incubation centers, sustained flagship programs and activities under research wing, project-oriented education programs, establishing a kind of education technology units to ease in developing teaching learning materials that are need by teachers and students as well with the objective of improving education in schools/colleges. Some of the universities, or those deemed to be so, are focused on parameters that are more important for the development of an individual’s skill or performance by incorporating industry-relevant courses, creating in-house trainings by pooling industry experts, training for high salary package core placements, offering special
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
97
Table 6 NAAC grade information for Transition Autonomous Colleges Sr. No
Name of the college
19
State
Institutional CGPA
Grade
Accreditation period
Raghu Institute Of Andhra Technology (Autonomous) Pradesh
3.02
A
31-12-2023
20
Sagi Rama Krishnam Raju Engineering College
Andhra Pradesh
3.60
A
31-12-2021
21
Sasi Institute Of Technology And Engineering
Andhra Pradesh
3.14
A
31-12-2023
24
Sri Vasavi Engineering College
Andhra Pradesh
3.18
A
31-12-2023
25
Vignan’s Institute Of Information Technology
Andhra Pradesh
3.41
A
31-12-2022
27
Vishnu Institute Of Technology, Bhimavaram—534202 (First Cycle)
Andhra Pradesh
3.51
A
31-12-2024
41
Nowgong College, Nagaon—782001 (Third Cycle)
Assam
3.27
A
31-12-2025
42
Patna Women’s College
Bihar
3.58
A
31-12-2023
43
St. Xavier’s College Of Education, Patna—800011 (Third Cycle)
Bihar
3.02
A
31-12-2023
44
Parvatibai Chowgule College Of Arts And Sciences
Goa
3.41
A
31-12-2020
64
Marian College, Kuttikkanam
Kerala
3.52
A
31-12-2021
162
Sri Sivasubramaniya Nadar Tamil College Of Engineering Nadu
3.55
A+
31-12-2023
182
K. Ramakrishnan College Of Technology, Tiruchirappalli—621112 (First Cycle)
3.54
A+
31-12-2025
Tamil Nadu
academic programmes and certificate course offerings by MNCs, Skill drill activities in coordination with academic staff colleges or by NITTTR centers; a special wing on foreign languages (not limited to): South Korea, Japan, Germany, etc.; BEC business English certification; encouragement to conduct or participate in global hackathons and skill competitions; global certifications; and coordination parts of design thinking activities, etc. Moreover, platforms like LinkedIn, Adobe, and Coursera are part of skilling activities across all programmes and also to publicise or get more connected
98
M. Parvathi and T. Amy Prasanna
Table 7 Affiliated colleges NAAC grade information Sr. No.
Name of the college
609
State
CGPA
Grade
Accreditation period
Government Lal Chhattisgarh Chakradhar Shah College, Ambagarh Chowki, Rajnandgaon—477227 (Second Cycle)
1.75
C
22-01-2022
153
Giet Engineering College, Andhra Chaitanya Knowledge Pradesh City, Velugubanda, Rajanagaram, Rajahmundry,
2.78
B++
26-11-2022
168
Gates Institute Of Technology, Nh 44, Gootyanantapuram Village, Gooty, Anantapuramu Dist
Andhra Pradesh
2.81
B++
30-04-2024
4208
Dr. Harekrushna Mahtab College, Kupari, Balasore—756059 (First Cycle)
Odisha
2.53
B+
26-11-2022
4265
Sri Manakula Vinayagar Medical College And Hospital, Kalitheerthalkuppam, Madagadipet, Puducherry, 605107
Puducherry
2.69
B+
28-10-2025
902
I. M. Nanavati Law College, Ellisbridge, Ahmedabad, Gujarat—380006 (II Cycle)
Gujarat
2.4
B
14-02-2023
909
Government Arts College, Gujarat Amirgadh, At.-Amirgadh, Ta.-Amirgadh, Dist.-Banaskantha State—Gujarat
2.41
B
07-02-2024
4956
Bannari Amman Institute Of Technology, Sathy-Bhavani State Highway, Tamil Nadu,638401 (Third Cycle)
3.36
A+
07-02-2026
Tamil Nadu
(continued)
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
99
Table 7 (continued) Sr. No.
Name of the college
State
CGPA
Grade
Accreditation period
4961
Jayaraj Annapackiam College For Women (Autonomous), Jayaraj Annapackiam College For Women (Autonomous),Tamil Nadu,625601 (IV Cycle)
Tamil Nadu
3.46
A+
01-02-2026
1203
The Islamia College Of Science And Commerce, Srinagar—190001 (II Cycle)
Jammu and Kashmir
3.27
A
11-09-2022
1258
Ghatsila College, Powrah, Jharkhand Ghatsila, East Singhbhum—832303 (First Cycle)
3.07
A
29-10-2022
5160
BVRIT Hyderabad College Of Engineering For Women, 8-5/4, Rajiv Gandhi Nagar, Bachupally, Medchal-Malkajigiri Dist., Hyderabad, Telangana (I Cycle)
3.23
A
13-02-2025
Telangana
with peer global competitors. Further, most of the programmes that are stakeholdercentric enrich their interpersonal skills, management skills, professionalism skills, team-building skills, analytical skills, life skills, and personal life skills. Most of the literature have delved the evaluation parameters from all the accreditation systems along with ranking evaluation system, but none have figured out the best practices that are required for the sustainable performance upgradation. This paper is an outcome of identification such best practices and prediction of best methods in the process of getting better rank while preserving their CGPA besides.
4.1 Grade Calculation with Example A sample grade calculation considering university is as follows in Table 8a, b: In the similar way, followed to other criteria calculations and observed their GPAs as follows: For criteria III, the calculated GPAII = (CrWGP)II/WII = 580/200 = 2.90 For criteria IV, the calculated GPAIV = (CrWGP)IV/WIV = 250/100 = 2.50 For criteria V, the calculated GPAV = (CrWGP)V/WV = 340/100 = 3.40
100
M. Parvathi and T. Amy Prasanna
Table 8 Sample grade calculation considering university for (a) criteria I, (b) criteria II Criteria indicators
Predetermined weightage (Wi)
Assessment key aspect Key aspect-wise weighted grade points (KAGP)i grade points KAWGPi = 4/3/2/1/0 KAGPi * Wi
Criterion I: curricular aspects Curriculum design and development
50
3
150
Academic flexibility
50
2
100
Curriculum enrichment
30
1
30
Feedback system
20
2
40
Total
WI = 150
(CrWGP)I = 320
Total
Calculated Cr GPAI = (CrWGP)I/WI = 320/150 = 2.13 Criterion II: teaching—learning and evaluation Student enrolment and profile
10
3
30
Catering to student diversity
20
4
80
Teaching-learning process
50
3
150
Teacher quality
50
3
150
Evaluation process and reforms
40
2
80
Student performance and learning outcomes
30
3
90
Total
WII =200
(CrWGP)II =580
Total
Calculated Cr GPAII =(CrWGP)II/WII = 580/200 =2.90
For criteria VI, the calculated GPAVI = (CrWGP)VI/WVI = 240/100 = 2.40 For criteria VII, the calculated GPAVII = (CrWGP)VII/WVII = 200/100 = 2.00 By summing up all criteria grade GPAs, the grand total observed is 2450. Taking predetermined weightage Wj for 1000, the institutional CGPA is as follows: 7 Institutional CGPA =
j=1 (CrWGP j) 7 j=1 W j
=
2450 = 2.45 1000
Final grade will be decided by score on CGPA, in the form of A or B or etc., i.e., for CGPA = 2.45; Grade allocation will be B, and performance indicator will be ‘Good’, and status of accreditation will be ‘Accredited’. In the dataset, the performance indicators are selected from minimum to maximum value limits to reflect best grade A++ (CGPA 3.5–4) to lowest grade D (0–1.5).
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
101
Table 9 Dataset preparation for grade prediction Possible cases
CP1
CP2
CP3
CP4
CP5
CP6
CP7
CGPA
GR
KAWG|max
600
800
1000
400
400
400
400
4
NA
Wi|Tot
150
200
250
100
100
100
100
1
NA
1
320
750
520
250
340
240
200
2.62
B+
2
550
750
800
350
350
350
350
3.5
A++
3
100
80
50
40
80
200
100
0.65
D
4
550
700
500
350
350
300
150
2.9
B++
5
550
700
900
300
160
400
200
3.21
A
6
550
700
700
300
160
400
200
3.01
A
7
300
240
150
120
240
100
300
1.45
D
8
350
280
175
140
280
240
350
1.815
C
9
400
320
200
160
320
100
400
1.9
C
10
450
360
225
180
360
200
400
2.175
B
11
550
700
700
350
350
300
150
3.1
A
12
550
440
275
220
400
400
200
2.485
B
13
550
750
800
350
350
350
350
3.5
A+
4.2 Dataset Preparation The dataset contains entries for 2400 institutions with a mix of universities, affiliated institutions, and colleges, provided worst-case performance to best performances, as shown in Table 9 (only 13 entries are shown for convenience).
4.3 Grade Prediction Using ML Algorithms Many methodologies have been evolved in assessing the grades, but several flaws in the rankings caused inconsistencies in university placings in different rankings [24– 26]. Identifying a better way is always existing research thirst. In our work, machine learning techniques have been used as they proved much excessively in education data mining solutions for the last decade due to their ease in application. Many machine learning techniques either in classification or prediction style are being used to predict performance of required parameter like student or institutional grade keeping other parameters in reference. In this paper, total CGPA is used as reference parameter, and the final grade is estimated based on summation of various weighted grade points. The algorithms used for this estimation experiment are multilinear regression, support vector machine, K-means clustering, and random forest. The reason behind choosing these many algorithms is to validate the prediction mechanism through higher estimation percentages that help in evaluation of the grade.
102
M. Parvathi and T. Amy Prasanna
Fig. 1 Grade prediction based on estimated CGPA
4.4 Multilinear Regression Linear regression is the process of identification of linear relationship between independent variables (X) and the dependent variable (Y ) that has to be forecasted. From a visual perspective, we seek to draw a line that has the shortest distance between points compared to all other lines. Although linear regression can only handle one independent variable, multiple regression is an extension of linear regression that uses many elements to predict the Y. In our work, 2400 samples are considered by considering each criteria with its weighted grade points on each column as independent variables and final CGPA along with grades as dependent parameters. In this process, each criteria is taken with its probable values from least to maximum keeping other criteria values constant. This process helps in assessing the grade variation based on deviation of any criteria from its standard score. The results from regression algorithm showed with maximum achievable accuracy up to 100%, and the predicted grade graph is shown in Fig. 1. It is observed that the maximum variation of CGPA occurs in the range of 1.8–2.2 for which the grades are falling from 1 to 3. Very few are reaching the grade of 4 who’s CGPA is beyond 2.5.
4.5 Support Vector Machine Algorithm In SVM approach, the institute’s dataset is converted into real-valued rating matrix having institutes grades from 0 to 4. The dataset is split into 80% for training the model and 20% for testing the model accuracy. It can be observed from the Table 10, the precision and recall values are 0.93 and 1, respectively, for 444 samples that
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
103
Table 10 SVM algorithm outcomes S. No.
Precision
recall
F1_score
Support
0
0.00
0.00
0.00
2
1
0.00
0.00
0.00
24
2
0.93
1.00
0.96
444
3
0.00
0.00
0.00
10
0.93
480
Accuracy
lead to an F1-score of 0.96. F1-score is an outcome of accuracy of the test, and the precision is the ratio of number of positive results that are identified correctly to the total number of positive results, which includes those which are identified incorrectly. The recall is the ratio of number of correctly identified positive results to the number of all samples that are identified as positive. It is observed 93% accuracy with SVM algorithm.
4.6 K-Means Clustering It is non supervised learning algorithm, but the advantage of clustering is to predict the group of institutes falling in certain grade. In order to have grades in the range of 0–3, the number of clusters are also selected as 4 in this algorithm implementation. Total 2400 samples are made clusters of 4 in which 1149 samples are falling in the cluster of 3 with expected grade score 3, 135 samples are falling in cluster of 2 with expected grade score of 2, 492 samples are falling in the cluster of 1 with expected grade score of 1, and 624 samples are falling in the cluster of 0 with expected grade score of 0. The outcome of clustering algorithm is shown in Fig. 2.
4.7 Random Forest Since this algorithm performs both regression and classification jobs simultaneously using large datasets, it is an added advantage choosing this algorithm for prediction criteria. As customary requirement, the entire dataset is divided into two sets with 80% as for train data, and remaining 20% is for testing. Similar to SVM, the maximum F1-score observed in this case of 0.95 with an accuracy of 90% as shown in Fig. 3.
104
M. Parvathi and T. Amy Prasanna
Fig. 2 Grade prediction over cluster of institutes
Fig. 3 Accuracy and prediction analysis of random forest algorithm
5 Results Analysis Early CGPA predictions are a valuable source for determining institution performance in comparison with others. In this study, we discussed ML algorithm techniques for predicting institution or university CGPA. We use three machine learning techniques such as regression, support vector, and k-means cluster for predicting institutional or university performances in various criteria’s.
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
105
The dataset is prepared from the available institutional or university grade realworld empirical values that show the effectiveness of the prediction accuracy. From the prediction analysis, few cases are considered as shown in Table 11 and are discussed as follows: Case 1: If Curricular Aspects–Weighted Grade Points KAWGP1=KAGP1 * W1 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.5, i.e., grade A+. Case 2: If Teaching–Learning and Evaluation Weighted Grade Points KAWGP2=KAGP2 * W2 are as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.3, i.e., grade A+. Case 3: If Research Innovation and Extension Weighted Grade Points KAWGP3=KAGP3 * W3 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.1, i.e., grade A. Case 4: If Infrastructure and Learning Recourses Weighted Grade Points KAWGP4=KAGP4 * W4 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.7, i.e., grade A++. Case 5: If Student Support and Progress Weighted Grade Points KAWGP5=KAGP5 * W5 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.7, i.e., grade A++. Case 6: If Governance, leadership, and Management Weighted Grade Points KAWGP6=KAGP6 * W6 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.7, i.e., grade A++. Case 7: If Institutional Values and best practices Weighted Grade Points KAWGP7=KAGP7 * W7 are considered as minimum as possible (say 100) keeping other criteria’s at maximum score, then the grade can be reached to 3.7, i.e., grade A++. Table 11 Grade performance evaluation Possible cases
CP1
CP2
CP3
CP4
CP5
CP6
CP7
CGPA
G
KAWG|max
600
800
1000
400
400
400
400
4
NA
Wi|Tot
150
200
250
100
100
100
100
1
NA
Case 1
100
800
1000
400
400
400
400
3.5
A+
Case 2
600
100
1000
400
400
400
400
3.3
A+
Case 3
600
800
100
400
400
400
400
3.1
A
Case 4
600
800
1000
100
400
400
400
3.7
A++
Case 5
600
800
1000
400
100
400
400
3.7
A++
Case 6
600
800
1000
400
400
100
400
3.7
A++
Case 7
600
800
1000
400
400
400
100
3.7
A++
106
M. Parvathi and T. Amy Prasanna
From the above case analysis, one can understood that, the grades not greatly affected with the poor score in criteria’s 4, 5, 6, and 7. But, Cr-3 plays a major role in altering the grade from best to poor, followed by Cr.2 and 1. The grade performance evaluation is shown in Table10 for all the cases mentioned. Hence, one should cautious on the score of Cr-3, i.e., research innovation and extension by keep monitoring to maintain the level of grade. The other two more criteria’s that have a serious observation along with Cr-3 are curricular aspects (Cr-1) and teaching–learning and evaluation (Cr-2).
6 Conclusions In this work, NBA and NAAC accreditation commonalities were discussed, and identified the high score practices that are common in both accreditation systems. Further, machine learning-based grade analysis for institutes or universities is carried out, and a grade up strategy using different cases is discussed. In this work, three types of ML algorithms are considered in order to observe the capability of estimating the grade with ease in addition to high accuracy of prediction. Further, the analysis provided the importance of Cr-3, i.e., research innovation and extension compared to other criteria’s. The sustainability of good grade lies in the maintainability of best practices that incurred from a particular criteria. Reader can understand that NIRF ranking is not part of accreditation but is a procedure to give best rank for any university or institute based on the best practices for which grades are initial indicators in that process. This kind of analysis and feedback model can be used as a component of an early warning system that will lead to further motivation and provides them early warnings if they need to improve their knowledge in the particular criteria. It also helps the institute to determine weak parameters for which more focus will be added that helps in providing necessary interventions to improve the overall performance. In this way, grade of institute or university can be retained and raised further.
References 1. Manual for accreditation of undergraduate engineering programs. https://www.nbaind.org/ 2. National board of accreditation manual for UG engineering programs (Tier-I) https://www.nba ind.org/ 3. National institutional ranking framework 2020, department of higher education ministry of education government of India. http://naac.gov.in/index.php/en/ 4. Kumar A, Tiwari S, Chauhan AK, Ramswaroop A (2019) Impact of NIRF on research publications: a study on top 20 (ranked) Indian Universities. COLLNET J Scientometrics Inf Manage 13(2):219–229. https://doi.org/10.1080/09737766.2020.1741194 5. Anbalagan M, Tamizhchelvan M (2021) Ranking of Indian Institutions in Global and Indian Ranking system: a comparative study. Library Philosophy and Practice (e-journal). 5100. https://digitalcommons.unl.edu/libphilprac/5100
Performance Evaluation Metrics of NBA, NAAC, NIRF, and Analysis …
107
6. Kumar V, Yadav SB (2020) How efficient are university library portals of NIRF ranked Indian universities?: an evaluative study. DESIDOC J Libr Inf Technol 40(1):3–10. https://doi.org/10. 14429/djlit.40.1.14932 7. Chowdhury AR (2021) Global ranking framework and indicators of higher educational institutions: a comparative study. Library Philosophy and Practice (e-journal). 5268. https://digita lcommons.unl.edu/libphilprac/5268 8. NIRF ranking: a methodology for ranking of universities and colleges in India. https://www. nirfindia.org/Docs/Ranking%20Framework%20for%20Universties%20and%20Colleges.pdf 9. Mahendra Gowda RV (2020) A comparative analysis of NIRF ranking, NAAC accreditation and NBA accreditation. Int J Adv Sci Eng 7(1):1572–1578 10. Gholap P, Kushare PA (2019) A comparative study of accreditation grades of NAAC vis -a-vis NBA for quality improvement of higher education in India. Int J 360 Manage Rev 07(02):70–84. ISSN: 2320-7132 11. Vasudevan N, SudalaiMuthu T (2019) Development of a common framework for outcome based accreditation and rankings. In: Proceedings of 9th world engineering education forum 2019, WEEF 2019, Science Direct, pp 270–276 12. Balatsky EV, Ekimova NA (2020) Global competition of universities in the mirror of international rankings. Her Russ Acad Sci 90:417–427. https://doi.org/10.1134/S10193316200 40073 13. https://www.timeshighereducation.com/world-university-rankings. Accessed on 26-05-2022 14. https://www.shanghairanking.com/rankings/arwu/2021. Accessed on 26-05-2022 15. https://www.ariia.gov.in/. Accessed on 26-05-2022 16. https://roundranking.com/. Accessed on 26-05-2022 17. https://cwur.org/2022-23.php. Accessed on 26-05-2022 18. https://www.leidenranking.com/ranking/2021/list. Accessed on 26-05-2022 19. https://www.4icu.org/in/. Accessed on 26-05-2022 20. https://www.nirfindia.org/2021/UniversityRanking.html. Accessed on 26-05-2022 21. https://www.nirfindia.org/2021/EngineeringRanking.html. Accessed on 26-05-2022 22. https://www.nirfindia.org/2021/CollegeRanking.html. Accessed on 26-05-2022 23. De-Moya-Anegón F, Herrán-Páez E, Bustos-González A, Corera-Álvarez E, Tibaná-Herrera G, Rivadeneyra F (2020) Ranking iberoamericano de instituciones de education superior 2020 (SIR Iber). Granada: Ediciones Profesionales de la Information. ISBN: 978 84 120239 3 0. https://doi.org/10.3145/sir-iber-2020 24. Fauzi MA, Tan CNL, Daud M, Awalludin MMN (2020) University rankings: a review of methodological flaws. Issues Educ Res 30(1):79–96. http://www.iier.org.au/iier30/fauzi.pdf 25. Adwan AA, Areiqat AY, Zamil AMA (2021) Development of theoretical framework for management departments’ ranking systems in Jordanian Universities. Int J of High Educ 10(1):106–112. Published by Sciedu Press, ISSN 1927-6044 26. , Ravikumar K, Samanta S, Rath AK (2021) Impact of NAAC accreditation on quality improvement of higher education institutions in India: a case study in the State of Karnataka. Purushartha 14(1):34–49
Optimal Extraction of Bioactive Compounds from Gardenia and Ashwagandha Using Sine Cosine Algorithm Vanita Garg, Mousumi Banerjee, and Bhavita Kumari
Abstract Bioactive compounds are extracted from natural resources and have beneficial effect on human health. Ashwagandha and Gardenia are rich in crocin, phenolic compound, saponins, and vitamin C among others. Extraction process for Ashwagandha and Gardenia depends upon several factors, e.g., technique to be used the raw material and the organic solvent. The sine cosine algorithm is used in this paper to extract bioactive compounds from Ashwagandha and Gardenia. For apply sine cosine algorithm, Ashwagandha and Gardenia are modeled in nonlinear multiple objective optimization problems. There are three objective functions in Gardenia, yields of crocin, yields of geniposide, and the yields of total phenolic compound, and in case of Ashwagandha, it is yield of alkaloids, yield of steroidal lactones, and yield of saponins. These three bioactive compounds are depending upon specific time, temperature, and the concentration of ethanol. The sine cosine algorithm is a population-based algorithm inspired by nature. The mathematical properties of sine and cosine trigonometric functions were used as inspiration. SCA was used to solve the problem of extracting the maximum yield from Gardenia and Ashwagandha, and it outperformed other nature-inspired optimization techniques. Keywords Extraction optimization · Bioactive compounds · Nonlinear multi-objective optimization · Sine cosine algorithm · Population-based algorithm
1 Introduction to Operation Research Swarm intelligence is a way of exchanging information through collusion communication between nature-inspired search agents. Communication between search agents is required to avoid local optima during the search process, and information exchange provides a guide to exploring promising search spaces. Various algorithms have been V. Garg · M. Banerjee (B) · B. Kumari Division of Mathematics, School of Basic and Applied Sciences, Galgotias University, Greater Noida, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_9
109
110
V. Garg et al.
published that describe swarm intelligence. These subclasses of algorithms include some popular and recent algorithms, including: • Genetic algorithms (GA), differential evolution (DE), and other evolutionary techniques. • Swarm intelligence algorithms are divided into three categories: ant colony optimization (ACO), particle swarm optimization (PSO), and artificial bee colony (ABC). • Gravitational search algorithms (GSAs), colliding bodies optimization (CBOs), and black holes are examples of physics-based techniques (BHs). Many of these healing properties have been discovered through research on the Ashwagandha & Gardenia plant. The benefits of Ashwagandha and Gardenia were proposed by Bhati et al. [1]. To improve the performance of the BBO, Jain and Sharma [2] proposed an improved LX-BBO. Zhang and Colleagues [3] proposed an algorithm for balancing exploration and exploitation. Hu et al. [4] proposed and tested improved memetic algorithms using 13 well-known standard benchmark functions and four technology-constrained optimization problems. The Laplacian migration operator, proposed by Garg and Deep [5], is based on the Laplacian crossover of real-coded data. Bajpai et al. [6] proposed that the efficiency of different organic extractions derived from leaves be designed. In this study, Koffi et al. looked at the total content of phenolic compounds, antioxidant activity, reducing power, and ferrous ion [7]. To provide basic knowledge and information about animal and tissue mounting, Kulkarni [8] wanted to give students a basic understanding of engineering procedures. Mohamed et al. [9] presented an improved version of SCA that employs opposition-based learning (OBL) as a mechanism for exploring the search space and generating more precise solutions. The sine cosine algorithm, proposed by Das et al. [10], is a new population-based algorithm for solving the short-term hydrothermal scheduling problem. A populationbased algorithm is proposed by Mirjajili [11]. Li et al. [12] proposed to optimize largescale parameter. Mirjajili and Lewis [13] proposed the gray wolf optimizer (GWO), a new meta-heuristic inspired by gray wolves; Park et al. [14] used a variety of methods to propose that an ethanol extract of gardenia fruit (Gardenia jasminoides Ellis) extracted from the flower of Gardenia jasminoides has antiangiogenic properties. Lee et al. [15] globally, gastric cancer is the most common oncological disease. Despite the fact that the incidence of gastric cancer has decreased dramatically in recent years, the survival rate remains alarming due to poor diagnostic strategies. Chen et al. [16] investigated the anti-inflammatory properties of Geniposide (GE), an iridoid glycoside compound extracted from Gardenia jasminoides Ellis (GJ) fruit, as well as its pharmacokinetic (PK) basis in adjuvant-induced arthritis (AA) rats. Ozaki et al. [17] investigated the use of genipap as a substitute for natural blue pigment in the food industry. Mohajeri and Nazari proposed [18]. This study looked into the protective effects of crocin, a unique water-soluble glycosylated carotenoid. Pham et al. [19] proposed some properties. Crocin, a water-soluble carotenoid, is found in Gardenia (Gardenia jasminoides Ellis) fruits and saffron stigmas (Crocus sativus Linne). For crocin purification, Gardenia fruits are extracted with 50% acetone.
Optimal Extraction of Bioactive Compounds from Gardenia …
111
2 Sine Cosine Algorithm In sine cosine algorithm, correlations are found using the trigonometric functions sine and cosine. This rule was introduced by Mirjalili [11] for international improvement issues equally to alternative improvement algorithms. As a system, SCA also searches for the best solution in a random manner while observing the global best solution. In the SCA, a destination point is regarded as the best global solution. SCA employs the following search equations to determine the position of candidate solutions.
X i, j X i, j
t+1
t = X i,t + r1 ∗ sin r2, j ∗ r3, j Pit − X i,t j
(1)
t+1
t = X i,t + r1 ∗ cos r2,j ∗ r3, j Pit − X i,t j
(2)
X it represents the ith solution vector at tth solution vector at (t + 1)th iteration and r1 , r2 , and r3 are the random numbers, position of the destination point indicated by Pi. To use these equations, combine them as follows: X i,t+1 j
=
X i,t j + r1 ∗ sin(r2 , j) ∗ r3, j Pit − X it r4 ≤ 0.5 X i,t j + r1 ∗ cos(r2 , j) ∗ |r3, j Pit − X it |r4 0.5
(3)
and r4 is random variable in interval [0,1]. In the sine cosine algorithm, different operators and their features are: r1 , r2 , r3 , and r4 . r1 indicates exploration at first half iterations, and for the remaining iterations exploitation, r2 decides the moment either toward the current solution known as exploitation or outside the current solution known as exploration. r3 indicates exploitation and exploration in random manner, and weight provides to the X D which highlights the exploration (r3 > 1) and exploitation (r3 < 1). And r4 is a random number followed uniform distribution in interval (0,1); this random number helps in transition from sine to cosine functions and cosine to sine. The sine and cosine cyclic patterns can be used to reposition a solution around another. The gap between the two solutions can be exploited in this way. It should also be possible for solutions to find anything outside the search space’s boundaries between two destinations. The search space should be explored and exploited in a manner that finds the promising areas and eventually converges on the global optimum, and the range of sines and cosines in equations should balance exploration and exploitation. The exploration and exploitation controlled by a random number (r1 ) during the search process as using expression: r1 = a −
ta T
(4)
112
V. Garg et al.
where t indicates the current iteration, and max number of iterations denoted by T, a is the constant Algorithm 1: pseudocode of basic SCA algorithm 1.1.Initialize the population of candidate solutions using uniform distribution 2.2.Evaluate the fitness of each candidate solution 3.3.Select destination point − x→ D 4.4.Initialize the algorithm parameters α, ϕ, and c 5.5.Initialize the iteration count t = 0 6.6.While t < T 7.7.For each candidate solution 8.8.Update the solution using Eqs. (6) and (7) 9.9.End of for 10.10.Evaluate the fitness of update candidate solutions 11.11.Update the destination point − x→ D 12.12.Update the algorithm parameter α 13.13..T = t + 1 14.14. End of while
A random set of solutions is used by SCA algorithm to start the optimization process. With increasing iteration counters, the sine and cosine functions ranges are modified so that they emphasize the exploitation of search space. Default settings for the SCA algorithm end optimization as soon as the number of iterations exceeds the maximum. Other termination conditions, e.g., number of functions evaluated or the exactness of the optimum, can also be taken into consideration.
3 Problem Formulation for the Extraction of Compounds from Gardenia and Ashwagandha The first thing to do is to model the finished data properly, which includes yields of bioactive compounds, e.g., yields of crocin say Y1 , yields of geniposide say Y2 , and yields of the phenolic compounds say Y3 which is obtained from Gardenia fruits and yield of alkaloids say Y1 , yield of steroidal say Y2 , and yield of saponins say Y3 which is obtained from Ashwagandha plants. These bioactive compounds are affected by three individual variables, e.g., intensity of EtOH (ethanol) denoted by X 1 extraction temperature say X 2 and the extraction time say X 3 . Table 1 reproduces that describes the levels and yields of independent variables, extraction parameters, and the response of dependent variables. Data are taken from Yang et al. [20]. The data are modeled using the least squares fitting method. To represent yields in terms of independent variables, use the following equation in second-order polynomial form: Yk = a0 +
3 i=1
ai X i +
3 I =1
an X 2I
+
3 i= j=1
ai j X i X j
(5)
Optimal Extraction of Bioactive Compounds from Gardenia …
113
Table 1 Dependent variable’s response to the Gardenia reproduction extraction parameters from Yang et al. [20] S. No.
EtOH (X 1 (%))
Temp X 2 (◦ c)
Time ((X 3 (min))
Yield (mg/g dry powder) Crocin (Y1 )
Geniposide (Y2 )
Total phenolic compound (Y3 )
1
50
27.1
30
6.907
88.85
16.35
2
50
50
30
7.646
102.01
22.08
3
50
50
52.9
7.619
99.7
21.12
4
70
65
45
8.123
105.44
22.21
5
30
35
15
6.645
87.6
16.02
6
30
35
45
7.396
98.56
18.24
7
70
65
15
7.24
92.9
20.22
8
19.5
50
30
6.149
83
16.43
9
50
50
30
7.649
101.15
22.01
10
30
65
45
7.334
95.37
21.53
11
70
35
45
6.954
87.8
16.07
12
30
65
15
7.895
99.51
21.38
13
50
72.9
30
8.335
108.29
24.05
14
50
50
30
7.715
103.23
22.04
15
80.5
50
30
6.663
87.73
16.22
16
50
50
30
7.674
102.61
22.06
17
70
35
15
5.787
72.64
13.84
18
50
50
30
7.625
101.38
22.11
19
50
50
30
7.639
101.48
22.08
20
50
50
7.1
7.126
88.57
17.15
Model coefficients ai , aii , and ai j represent straight-line, quadratic, and interactive coefficients of the model; Y k represents the yielding of bioactive compounds, and a0 is a constant. X i and X j are the independent variables. Sine cosine algorithm is used to fit the data, where objective function is as follows: min
n (yi − ytrue )2
(6)
i=1
where number of observations is denoted by n, fitted value of the ith observation is denoted by Yi and Ytrue indicates the observed value of ith observation. For yields of bioactive compounds which are denoted by Y1 , Y2, and Y3 and its resultant equations are as follows:
114
V. Garg et al.
y1 = 3.8384907903 + 0.0679672610X 1 + 0.0217802311X 2 + 0.0376755412X 3 − 0.0012103181X 12 + 0.0000953785X 22 − 0.0002819634X 32 + 0.0005496524X 1 X 2 − 0.0009032316X 2 X 3 + 0.0008033811X 1 X 3
(7)
y2 = 46.6564201287 + 0.6726057655X 1 + 0.4208752507X 2 + 0.9999909858X 3 − 0.0161053654X 12 − 0.0034210643X 22 − 0.0116458859X 32 + 0.0122000907X 1 X 2 − 0.0095644212X 2 X 3 + 0.0089464814X 1 X 3
(8)
y3 = −6.36291692810.4060552042X 1 + 0.3277005337X 2 + 0.3411029105X 3 − 0.0053585731X 12 − 0.0020487593X 22 − 0.0042291040X 32 + 00017226318X 1 X 2 − 0.0011990977X 2 X 3 + 0.0007814998X 1 X 3
(9)
Quadratic relationships and interactions are represented by qualitative coefficients or multiple factor terms in response equations. This study also shows that responses do not always follow linear relationships with factors. Depending upon various factors e.g. time and extraction temperature we use this formulation when multiple factors are altered at the same time and a factors can have various level of impact. For response Y1 , extraction temperature denoted by X 2 and extraction time denoted by X 3 interrelate positively, whereas they interrelate negatively for responses Y2 and Y3 . Data are taken from Shashi [21]. The above equations show that each independent variable, such as ethyl alcohol concentration (X 1 ), extraction temperature (X 2 ), and time variable (X 3 ), has a positive effect on bioactive compounds.
4 Mathematical Formulation of the Problem of Ashwagandha Ashwagandha roots as a solution to the problem of compound extraction. Shashi [21] uses high-pressure liquid chromatography to generate the data (HPLC). HPLC is based on the chromatography principle. Chromatography is a separation technique that separates mixtures based on the absorbency of individual components. Table 2 shows the data generated by this method. The coefficients of Eq. (10) are fitted using the least square fitting method. Bioactive compounds extracted from Ashwagandha roots, namely Afrin-A (Y 1 ) and Anolide-A (Y 2 ), are the dependent variables; as a result, these two yields are
Optimal Extraction of Bioactive Compounds from Gardenia …
115
Table 2 Responses of the dependent variables to the extraction parameters of Ashwagandha reproduced S. No.
MeOH(X 1 (%))
Temperature X 2 (°C)
Yield (%) With afrine-A (Y 1 )
With anolide-A (Y 2 )
1
80.0
30.0
0.055746
0.052
2
55.0
55.0
0.061624
0.061
3
30.0
30.0
0.056002
0.037
4
55.0
19.7
0.04701
0.048
5
19.7
55.0
0.054492
0.036
6
55.0
55.0
0.05548
0.062
7
55.0
55.0
0.060528
0.062
8
55.0
55.0
0.058675
0.063
9
80.0
80.0
0.045047
0.04
10
55.0
55.0
0.060426
0.061
11
90.4
55.0
0.070218
0.055
12
55.0
90.4
0.030424
0.039
13
30.0
80.0
0.02087
0.031
treated as unrelated variables in the problem. These two yields are derived from methanol concentration (MeOH) (X 1 ) and extraction temperature (X 2 ). The goal is to extract the highest yields of two bioactive compounds in the least amount of time possible. The produce of the two compounds is written as a second-order polynomial equation, which is a nonlinear function of two independent variables. Yn = b0 +
2 i=1
bi X i2 +
2
bi j X i X j
(10)
i=1
where Y n is the nth yield, b0 is a constant, bi and bi j denote the model’s quadratic and interactive coefficients of the model, respectively. The resultant equations for yields Y 1 and Y 2 are calculated as follows: Y1 = 0.0320790574 + 0.0001292883X 1 + 0.0012707700X 2 − .0000016421X 12 − 0.0000195955X 22 + 0.0000097853X 1 X 2
(11)
Y2 = −0.0457594117 + 0.002021757X 1 + 0.0017892681X 2 − 0.0000148519X 12 − 0.000016451X 22 − 0.0000164518X 1 X 2
(12)
where x i p [30, 80] for i = 1, 2 (given in Shashi [21]) After choosing a problem model, the problem is transformed into a multi-objective optimization problem in which both the Y1 and Y2 yields must be maximized at
116
V. Garg et al.
the same time. This multi-objective optimization problem is confined to a single optimization problem using the weighted linear approach.
5 Result and Analysis Tables 3 and 4 present the results of the sine cosine algorithm for optimal removal of organic compounds from Gardenia and Ashwagandha, respectively. The ten runs produce very similar results, as can be seen. This shows that the sine cosine algorithm can be used to determine the best extraction bioactive components separation from Gardenia and Ashwagandha. Table 3 Results obtained in 10 runs for solving optimal extraction problem of Gardenia using sine cosine algorithm
Table 4 Results obtained in 10 runs for solving optimal extraction problem of Ashwagandha using sine cosine algorithm
Objective function value
x1
x2
x3
−48.0276
57.1487
75
34.2391
−48.0276
57.1498
75
34.2107
−48.0276
57.143
75
34.1854
−48.0276
57.1466
75
34.2132
−48.0276
57.213
75
34.2007
−48.0276
57.126
75
34.1938
−48.0276
57.1948
75
34.2275
−48.0276
57.1816
75
34.1988
−48.0276
57.1817
75
34.1946
−48.0276
57.149
75
34.2256
Objective function value
x1
x2
−0.052574
57.3595
37.7363
−0.05253
57.0244
35.5385
−0.052569
56.4957
36.9479
−0.052537
57.0996
38.6801
−0.052574
58.6875
36.9899
−0.052519
59.8615
35.7258
−0.052572
57.555
37.8409
−0.052559
56.2416
36.6808
−0.052559
58.14
38.1529
−0.052581
57.4776
37.1102
Optimal Extraction of Bioactive Compounds from Gardenia …
117
Fig. 1 Convergence curve of sine cosine for best Ashwagandha bioactive compounds extraction
6 Convergence Analysis Figures 1 and 2 depict the convergence graph of the SCA algorithm for the problem of bioactive components extraction from Ashwagandha and Gardenia. The number of iterations is represented on the horizontal axis, while the value of the objective function for each iteration is represented on the vertical axis. The convergence graph depicts the high convergence rate of the algorithm. Optimal extraction of bioactive compounds from Gardenia and Ashwagandha is low-dimensional problems which are considered for three variables and two variables, respectively. Figures 1 and 2 demonstrate that SCA has proved its efficiency in very less iteration to solve low-dimensional problems.
7 Conclusion By applying the investigation ideas and target logical examination, drugs utilized in customary medication might be a rich wellspring of new meds to treat unmanageable infections. In this specific situation, Ashwagandha and Gardenia are synthetically rich with its changed substance of dynamic mixtures. The study in this paper clearly indicates the superiority of sine cosine algorithm for solving optimal extraction of bioactive compounds from Gardenia and Ashwagandha.
118
V. Garg et al.
Fig. 2 Convergence curve of sine cosine for best Gardenia bioactive compound extraction
References 1. Bhatia P, Rattan SIS, Cavallius J, Clark BFC (1987) Withaniasomnifera (Ashwagandha) a so-called rejuvenator inhibits growth and macromolecular synthesis of human cells. Med Sci Res 2. Jain NK, Sharma SN (2003) A textbook of professional pharmacy. Vallabh Prakashan.Shashi, B.,92011). New real coded genetic algorithms and their application to bio related problem. Ph.D. thesis. Indian Institute of Technology Roorkee 3. Zhang X, Wang D, Chen H, Mao W, Liu S, Liu G, Dou Z (2020) Improved Laplacian biogeography-based optimization algorithm and its application to QAP. Hindawi, pp 1–9 4. Hu Z, Cai X, Fan Z (2014) An improved memetic algorithm using ring neighborhood topology for constrained optimization. Soft Comput 18:2023–2041 5. Garg V, Deep K (2016) Performance of Laplacian biogeography-based optimization algorithm on CEC 2014 continuous optimization benchmarks and camera calibration problem. Swarm Evol Comput 7:12–45 6. Bajpai VK, Rahman A, Shukla S, Mehta A, Shukla S, Arafat SMY, Rahman MM (2009) Ferdousi, Z.Antibacterial activity of leaf extracts of Pongamia pinnata from India. Pharm Biol 47:1162–1167 7. Koffi E,Sea T, Dodehe Y, Soro S (2010) Effect of solvent type on extraction of polyphenols from twenty threeivorian plants. J Anim Plant Sci, pp 550–558 8. Kulkarni SK (1993) Handbook of experimental pharmacology, 2nd edn. Vallabh Prakashan, NewDelhi 9. Mohamed AE, Oliva D, Xiong S (2017) An improved opposition-based Sine Cosine algorithm for global optimization. Expert Syst Appl 12:484–500 10. Das S, Bhattacharya A, Chakraborty AK (2017) Solution of short-term hydrothermal scheduling using sine cosine algorithm. Soft Comput 6:1–19 11. Mirjalili S (2016) A Sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133
Optimal Extraction of Bioactive Compounds from Gardenia …
119
12. Li S, Fang H, Liu X (2018) Parameter optimization of support vector regression based on sine cosine algorithm. Expert Syst Appl 91:63–77 13. Mirjalili S, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 14. Park EH, Joo MH, Kim SH, Lim CJ (2003) Antiangiogenic activity of Gardenia jasminoides fruit. Phytother Res 17:961–962 15. Lee JH, Lee DU, Jeong SC (2009) Gardenia jasminoides Ellis’s ethanol extract and its constituents reduce the risks of gastritis and reverse gastric lesions in rats. Food Chem Toxicol 47:1127–1131 16. Chen J, Wu H, Li H, Hu S, Dai M, Chen J (2015) Anti-inflammatory effects and pharmacokinetics study of geniposide on rats with adjuvant arthritis. Int Immunopharmacol 20:46–53 17. Ozaki A, Kitano M, Furusawa N, Yamaguchi H, Kuroda K, Endo G (2002) Genotoxicity of gardenia yellow and its components. Food Chem Toxicol 40:1603–1610 18. Mohajeri D, Nazari M (2012) Inhibitory effect of crocin on hepatic steatosis in the rats fed with high fat diet. 11:2373–2379 19. Pham TQ, Cormier F, Farnworth E, Tong VH, Calsteren MR (2000) Antioxidant properties of crocin from Gardenia jasminoides Ellis and study of the reactions of crocin with linoleic acid and crocin with oxygen. J Agric Food Chem 48:1455–1461 20. Yang B, Liu X, Yanxiang G (2009) Extraction Optimization of bioactive compounds (crocin, geniposide and total phenolic compounds) from Gardenia (Gardenia jasminoides Ellis) Fruits with response surface methodology. Innov Food Sci Emerg Technol 10:610–615 21. Shashi B (2011) New real coded genetic algorithms and their application to bio related problem. Ph.D. thesis. Indian Institute of Technology Roorkee
A Systematic Review of User Authentication Security in Electronic Payment System Md. Arif Hassan and Zarina Shukur
Abstract Recently, security is becoming an increasingly crucial component of any financial organization, and as a result, the need for authentication has expanded. To ensure the security of electronic transactions, several authentication mechanisms have been created. The present paper interprets a comprehensive and systematic literature review regarding the articles published in the area of electronic payment systems. This article reviewed the selected papers from 2013 to 2021 to identify the user authentication techniques and threats to the electronic payment methods, which guide the researchers to do in future direction. In addition, this article presented the chosen articles with a comparative table of multifactor security measures and analysis problems. Such findings not only outline the latest research efforts but also indicate possible gaps and future directions for study by suggesting a research agenda. The findings of this study should be used as the basis for potential electronic payment analysis and related topics. Furthermore, all papers are categorized according to the origin of the research, publishing sources and country of affiliation to the author, publishing journal, and year of publication. This paper also provides an exhaustive analysis of various models of attacks on authentication systems. Keywords User authentication · Knowledge factor · Ownership · Biometric · Multifactor authentication
Md. A. Hassan (B) · Z. Shukur Center for Cyber Security, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia e-mail: [email protected] Z. Shukur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_10
121
122
Md. A. Hassan and Z. Shukur
1 Introduction Modern technology innovations are continuously improving, and its effect is vigorously changing social, economic, and cultural interactions. E-services, such as ecommerce and e-payment, make life much simpler as they are time saving, convenient, and accessible [1–4]. E-payment service development is exponential because of its speed, user-friendliness, accessibility, digitization, and precious timesaving properties. There has been an increase in the number of consumers using mobile devices to perform various online transactions, shopping tasks, such as reservation, and ordering, mobile cash transfer, and payments. The e-payment system requires an electronic payment form [5–7] that enables individuals to purchase products and services online [8–10]. Several e-payment systems have been developed worldwide. Due to the increase in security attacks against electronic payment systems, there have been significant changes in electronic transactions. Many of these threats were faced because of the vulnerabilities of online application authentication schemes. When conducting transactions electronically, the first step is always authentication. It is the primary concern of the consumer and can be achieved by providing password or a unique token or biometrics to users. This authentication is confirmed before the user can access details. Authentication methods, which usually hinge on more than a single element, are harder to compromise compared with a one-component system. To make the authentication effective, an MFA component is required in order to increase security for electronic transactions. This article aimed to analyze recently released user authentication documents and threats based on electronic payments we have been looking at several basic algorithms, such as knowledge, ownership, and multifactor. Consequently, the article has gathered and examined man recent papers on authentication system and threats to electronic payments. This paper provides an in-depth study of MFA security and interface options of financial institutions in different organizations. Likewise, investigate the vulnerabilities of electronic payment methods, which could influence online transactions. This article selected 142 academic publications from those based in the different continents worldwide.
2 Material and Methods The selection of the method was driven by the research questions, from which a set of search terms referred to as keywords was extracted. The keywords and relevant initiatives that make up the research questions and were used during the review protocol are e-wallet, electronic payment system, online payment system, e-wallet security issues, and electronic payment system security issues.
A Systematic Review of User Authentication Security in Electronic …
123
2.1 Research Questions This paper aimed to consider the state of the user authentication mechanism used by financial institutions in terms of acceptance, protection, and sophistication. We have raised three research questions. RQ1: What are the user authentication threats of the electronic payment system? RQ2: What are the countermeasures to minimize the threats? RQ3: What are the most cryptographic function used in electronic payment systems? This research question attempts to examine several cryptographic function to find out most used. The result will be nominations are based on selected study.
2.2 Data Sources To find relevant literature, information sources have been selected from peerreviewed journals, articles, and conferences, and Internet sources published from 2013 to 2021 were considered. Therefore, this library has been used as our basic source, as well as IEEE Xplore, Google Scholar, Science Direct, Springer, MDPI, ACM digital library, and Taylor and Francis.
2.3 Search Steps This article’s search process consists of the following keywords: • The keywords electronic payments, user authentication, and electronic payment threats were used to search the titles and abstracts of the papers. • Journals, conference papers, white papers, technical reports, workshop, and books, with available full-text versions (peer-reviewed, published, and pressed), were all considered.
2.4 Data Collection For every included paper, two forms of data were gathered and included in the database with each individual text. For a journal paper, designations, classifications, and the impact factors were collected. For a conference paper, conference names and conference years were collected.
124
Md. A. Hassan and Z. Shukur
2.5 Data Selection All research publications have been selected from these keywords were categorized in the followings: • The year of research publication is before 2013 in English • There is a duplicate publication in the systematic literature analysis of the study • The research article mentions or discusses authentication security or threats, vulnerabilities, regarding the electronic payment system.
2.6 Data Extraction Out of 472 research papers, there are 222 articles founded that are related to the abstract concepts of the research topic. Regarding more findings in 222 research publications, 142 research publications were used for the research in question. Primary analysis selected data are presented in Table 1.
3 Results Analysis The context of review is very likely to offer some fascinating insights into the improvement of user authentication in electronic payments. As a result, we have studied the variations in the distribution of papers in the regions being analyzed. We examined the distribution at five levels of electronic payment documents: origins, country, publications, sources, and year of publications. Table 1 Information collected for the primary analysis selected
Sources
Found
Candidate
Elected
Google scholar
109
40
22
IEEE
190
85
52
ACM
36
26
19
Science direct
38
19
11
Springer
40
20
13
Taylor and Francis
29
13
10
MDPI
30
19
15
Total
472
222
142
A Systematic Review of User Authentication Security in Electronic …
125
3.1 Articles of the Study Focused on Origins The search results were 14 Malaysian origins and 128 international articles. The most regularly reviewed articles are published in academic journals as well as information technology and banking journals. The classification of the papers depends on their theory, model, or structure, the buildings studied, the geographical location, and the possibilities for study that they have suggested.
3.2 Publication List of Articles There are three types of publications: journals, conference proceedings, and technical reports. From the 142 research publications, there were 93 research publications of the journal, 42 research publications of the conference proceedings, and 7 research publications of the technical report.
3.3 Number of Papers Focused on Country From the 142 research articles, 44 countries were found to contribute to the electronic payment system research. At the country level, 47 countries were discussed, and the most frequently studied country was India, Malaysia, USA, China, Bangladesh, Nigeria, and the UK. Seven countries were the subjects of a double article, whereas the rest of the countries were the subjects of single articles.
3.4 Sources for Research Publishing From 142 publication research sources, the most articles sources were found, specifically Google Scholar and IEEE, Google Scholar collected 22 articles, including the journal and conference papers. IEEE collected the 52 articles, ACM with 19 articles, MDPI with 15 articles, Springer with 13 articles, and Science Direct with 11 articles.
3.5 Publication Year of Study From 142 publication research sources about user authentication of the electronic payment system, the most active years of electronic payment research were 2019, the top year, during which we collected 25 papers.
126
Md. A. Hassan and Z. Shukur
Table 2 Attack models on knowledge-based authentication Threats
Refs.
Phishing attacks
[16, 25, 54, 62, 65, 101, 103, 108–113, 132, 140]
Social engineering
[11, 16, 23, 25, 30, 62, 63, 103, 106, 114–116]
Password reset attack
[16, 25, 54, 62, 65, 101]
Shoulder surfing attack
[16, 25, 54, 62, 65, 101]
Rainbow table attack
[16, 21, 101, 125, 126]
Session hijacking
[16, 27, 73, 118, 121–124]
Brute-force attack
[16, 25, 54, 62, 65, 101]
Dictionary attack
[19, 21, 26, 27]
Password guessing and cracking
[17–19]
4 User Authentication Threats of EPS In general, there are three main methods employed to verify entities. The first in general, there are three main methods employed to verify entities. The first method is the use of things user know, such as passwords and pins. The second method is the use of anything user have, such as a device, token, and card. The third method is the use of biometrics. Each of authentication threats are present as follows.
4.1 Knowledge-Based Authentication Currently, the knowledge-based authentication was commonly used throughout the community, as it is distinctive and operator-friendly [11–13]. However, it was recognized that authentication with only one element cannot sufficiently provide security due to the number of security threats [15] and [22]. In addition, the novel studies on password habits revealed that 86% of user-selected passwords are incredibly vulnerable [32]. Consequently, an unauthorized user may also attempt to gain access by using different kinds of attacks. Attack models on knowledge-based authentication system are presented in Table 2.
4.2 Ownership-Based Authentication The ownership, the extra security measures that involve individuals to get into a code route to their email or phone, has usually been effective to maintain usernames and passwords protection from phishing attacks. The use of the ownership has been reduced the occurrence of fraud but did not stop it [23]. However, the attackers are working hard to breaking this technique and found numerous methods to hack it
A Systematic Review of User Authentication Security in Electronic …
127
Table 3 Attack models on ownership-based authentication Threats
Refs
Impersonation attacks
[16, 25, 54, 62, 65, 101, 103, 108–113, 132]
Replay attacks
[11, 16, 25] [30, 62, 63, 101, 106, 115, 116]
Masquerade attack
[16, 25, 54, 61, 67, 97–99, 101]
Spoofing attack
[16, 25, 54, 62, 65, 101]
Social engineering attack
[11, 16, 23, 25, 30, 62, 63]. [101–103, 106, 114–116]
MITM attack
[21, 28, 30, 31, 63, 100, 106–108, 116, 117, 127–130, 133–135]
Phishing attack
[16, 25, 54, 62, 65, 101, 103, 108–113, 132]
Salami slicing attack
[84]
Insider attack
[35, 72, 96]
Dos/DDoS attacks
[16, 101, 103, 108, 115]
Mobile phone lost
[22, 36, 74]
Eavesdropping attack
[12, 14, 35, 40, 47, 78]
Brute-force attack
[16, 25, 54, 62, 65, 101]
Guessing attack
[19, 21, 26, 27]
Shoulder surfing attack
[17–19]
Malware
[23, 37, 50, 62, 104, 105, 108–110, 117, 118]
along with present the sensitive information of the individuals. Attack models on ownership-based authentication system are presented in Table 3.
4.3 Multifactor Authentication The multifactor can be the three measures of authentication that the person has going through user. The multifactor may be performed in numerous ways; the most widely used being the utilization of login credentials with an additional information. We present the multifactor authentication-related study and their threats as follows:
4.3.1
One-Time Password
An OTP method depends on the capability for hardware and both software to make a one-time code, which will be delivered toward the server or device for verification [44, 45]. There are two primary kinds of OTP authentication tokens, namely hardware tokens (often recognized as “hard tokens”) and application tokens (often recognized as “soft tokens”) [46–50]. The attack models on OTP are presents in Table 4.
128
Md. A. Hassan and Z. Shukur
Table 4 Attack models on one-time password
4.3.2
Attacks
Refs
Key loggers
[32, 50]
Replay attacks
[12, 29, 35, 40]
Shoulder surfing
[34, 41, 47, 71, 91]
Stored browser password
[64, 69],
Brute-force attack
[17, 42, 47, 68, 72, 91]
Dictionary attack
[11, 17, 40, 72]
Phishing attack
[17, 29, 40, 47, 68],
MITM
[36, 48, 117–120]
Biometric Authentication
Biometrics is a means to recognize customers through their biological features, such as fingerprints, palm, facial, voice, geometry, tooth shape, handprint, and retina, vein, signature, iris, and ear and gait recognition [49–51]. Every biometric feature is different and difficult to counterfeit [52–54]. As an increasing electronic payment data fraud or security, assignment is not sufficient to counter the use of traditional approaches, such as identification records and passwords, for authentication [55, 56] and [60]. Table 5 presents the authentication techniques in electronic payment system whether Table 6 presents the different biometric system attack.
5 Countermeasures of Threats In authenticated financing transactions, cryptographic techniques are extensively employed to guarantee security. They are all techniques for providing basic safety services like integrity, authentication, and non-repudiation. Cryptographic algorithms’ primary role is to conceal information. These algorithms are employed to keep information out from the hands of user who should not have it. Cryptographic methods convert real information into an unreadable format, preventing attackers from accessing it. Plaintext refers to the original readable information, whereas ciphertext refers to the unreadable format of the message following transformation. Encryption and decryption are two methods used by cryptographic algorithms to accomplish it. Encryption is the process of converting plaintext to ciphertext, while decryption is the process of converting encrypted text back to plaintext. Cryptographic algorithms are divided into two categories: private-key encryption and publickey encryption. Table 7 shows the summary of countermeasures for authentication techniques against the attacks.
A Systematic Review of User Authentication Security in Electronic …
129
Table 5 Authentication techniques in electronic payment system Method
Methods type
Method role
User_id/name
Knowledge
User identification [65–67]
Password
Knowledge
User authentication
[65, 67–69]
Security question
Knowledge
User authentication
[65, 70]
Pattern recognition
Knowledge
user identification
[33, 71], [132]
TOTP/HOTP
Ownership
Timing
[29, 72–75]
Token based
Identification
User identification [33, 52, 68, 70, 76]
NFC/Other mechanism
Combination
User authentication
[77–79]
Location based
Tracking
location authentication
[80]
Fingerprint recognition
Biometric
User authentication
[24, 34, 52, 58, 59, 76, 81–87]
Palm recognition
Biometric
User authentication
[81]
Facial recognition Biometric
User authentication
[24, 58, 83, 88–91]
Iris recognition
Biometric
User authentication
[92, 93]
Vein recognition
Biometric
User authentication
[94]
Voice recognition
Biometric
User authentication
[95]
Gait recognition
Biometric
User authentication
[57, 94]
Retina recognition Biometric
User authentication
[64]
Signature recognition
User authentication
[30]
Biometric
Ref
6 Most Cryptographic Function Used in EPS This section provides the most cryptographic function related to the electronic payment system. The result will be nominations are based on selected study from 2013 to 2021. Cryptographic functions help in the fulfillment of the security goals of confidentiality, authenticity, and integrity in the authentication process for electronic payment systems. Asymmetric encryption function, symmetric encryption function,
130
Md. A. Hassan and Z. Shukur
Table 6 Summary of different biometric system attack Attacks
Attack points
Ref
Spoofing
User interface
[28, 52]
Zero-effort attack
User interface
[59]
Replay attack
Interface between modules
[28, 45]
Hill climbing attack
Interface between modules
[49, 59, 83]
Denial-of-service attack
Channel between modules
[20]
Trojan horse attack/malware
Software modules
[23, 50],
template database attack
Attack in template database
[59, 104, 105]
Function creep
Attack in template database
[28, 52, 59]
Side-channel attacks
Template protection techniques module
[116, 131]
Table 7 Summary of countermeasures for authentication techniques against the attacks No. Countermeasures
Proposed technique
1
Private-key Advanced encryption encryption standard
[17, 22, 38, 39, 141]
2
Public-key encryption
Rivest–Shamir–Adleman [20, 29, 33, 43, 52, 67, 68, 72, 74, 78, 88, 136, 142] Elliptic curve cryptography
[28, 34, 70, 82, 117]
Secure hash function
SHA-1
[24, 65, 70, 74, 77, 78, 137–139]
data encryption standard
3
[38, 66, 68, 78, 136]
and hash function are the most commonly utilized cryptographic functions in electronic payment systems. The cryptographic function against threats is shown in Fig. 1 [84]. The analysis of the above Fig. 2 shows the frequency of studies about electronic payment authentication threats about 13 researcher’s proposed RSA cryptographic function to secure payment system. The most key security services that come with RSA are privacy and secrecy, authentication, integrity, and non-repudiation [70] because they prove RSA’s being an excellent security public-key cryptosystem. However, in electronic payment systems, private-key encryption was not widely used [20].
A Systematic Review of User Authentication Security in Electronic …
131
Threats Countermeasures
Cryptographic Function
Public-Key Encryption
Elliptic curve cryptography
Hash Algorithm Private-Key Encryption
RSA Cryptosystem
Secure Hash Algorithm 1
MD5
Advanced Encryption Standard
Secure Hash Algorithm 2
Data Encryption Standard
Fig. 1 Cryptographic function against threats adopted from [84] 16
13
14
Total Study
12 10
8
8 6
4
5
5
4 2 0
Advanced Encryption Standard
Data Encryption Rivest– Shamir– Standard Adleman
Elliptic curve cryptography
SHA-1
Cryptographic Function
Fig. 2 Most often-cryptographic function used in electronic payment system
7 Conclusion The key objective of this study was to examine and evaluate the empirical progress in subjects related to user authentication of the electronic payment system in a systematic manner as well as to provide insights into the knowledge-based, ownershipbased and multifactor authentications and threats in this field. The research provides a systematic analysis of the user authentication literature on electronic payments, which was conducted between 2013 and 2021. Since 2012, the number of conceptual and empirical research has increased tremendously in the area of electronic payment service. However, many of the studies highlight that the Internet will increase the use of electronic payment services in the near future and will significantly improve
132
Md. A. Hassan and Z. Shukur
the version of digital device. As is obvious in the literature, electronic payments have recently been the subject of researches. To improve the quality of suggestion in recommender systems in electronic payments, many researchers suggested numerous authentication algorithms, particularly for user authentication mechanisms. This analysis identifies different methods for user authentication techniques and user threats, which may require further research consideration in electronic payments.
References 1. Hassan A, Shukur Z, Hasan MK (2021) Enhancing Multi-factor user authentication for electronic payments. Lecture Notes Netw Syst 173(LNNS):869–882 2. Jun J, Cho I, Park H (2018) Factors influencing continued use of mobile easy payment service: an empirical investigation. Total Qual Manag Bus Excell 29:1043–1057 3. Oney E, Guven GO, Rizvi WH (2017) The determinants of electronic payment systems usage from consumers’ perspective. Econ Res Istraz 30:394–415 4. Li S (2021) Research on the Design of electronic payment system of financial company. In: 2021 2nd international conference on e-commerce and internet technology, pp 91–94 5. Uddin MS, Akhi AY (2014) E-Wallet system for bangladesh an electronic payment system. Int J Model Optim 4(3):216–219 6. Hassan A, Shukur Z, Hasan MK (2020) An Efficient secure electronic payment system for e-commerce. Computers 9(3):13 7. Zhang J, Luximon Y (2020) A quantitative diary study of perceptions of security in mobile payment transactions. Behav Inf Technol, 1–24 8. Rancha, Singh P (2013) Issues and Challenges of electronic payment systems. Int J Res Manag Pharmacy(IJRMP) 2(9):25–30 9. Xuanzhi L, Ahmad K (2019) Factors affecting customers satisfaction on system quality for e-commerce. Proc Int Conf Electr Eng Inf 2019(July):360–364 10. Liébana-Cabanillas F, García-Maroto I, Muñoz-Leiva F, Ramos-de-Luna I (2020) Mobile payment adoption in the age of digital transformation: the case of apple pay. Sustain 12(13):1– 15 11. Ometov A, Bezzateev S, Mäkitalo N, Andreev S, Mikkonen T, Koucheryavy Y (2018) Multifactor authentication: a survey. Cryptography 2(1):1 12. Kaur N, Devgan M (2015) A comparative analysis of various multistep login authentication mechanisms. Int J Comput Appl 127(9):20–26 13. Fan K, Li H, Jiang W, Xiao C, Yang Y (2017) U2F based secure mutual authentication protocol for mobile payment. ACM Int Conf Proc Ser Part F1277:1–6 14. Emeka BO, Liu S (2017) Security requirement engineering using structured object-oriented formal language for m-banking applications. In: Proceedings 2017 IEEE international conference on software quality, reliability and security. QRS 2017, 176–183 15. Ali MA, Arief B, Emms M, Van Moorsel A (2017) Does the online card payment landscape unwittingly facilitate fraud? IEEE Secur Priv 15(2):78–86 16. Enisa (2016) Security of mobile payments and digital wallets, no. December. European Union Agency for Network and Information Security (ENISA) 17. Sudar C, Arjun SK, Deepthi LR (2017) Time-based one-time password for Wi-Fi authentication and security. In: 2017 international conference on advances in computing, communications and informatics (ICACCI) 2017(Janua):1212–1215 18. Kogan D, Manohar N, Boneh D (2017) T/Key: second-factor authentication from secure hash chains Dmitry, 983–999 19. Isaac SZJT (2014) Secure mobile payment systems. J Enterp Inf Manag 22(3):317–345
A Systematic Review of User Authentication Security in Electronic …
133
20. Dwivedi A, Kumar S, Pandey SK, Dabra P (2013) A cryptographic algorithm analysis for security threats of semantic e-commerce web (SECW) for electronic payment transaction system. Adv Comput Inf Technol, 367–379 21. Yang W, Li J, Zhang Y, Gu D (2019) Security analysis of third-party in-app payment in mobile applications. J Inf Secur Appl 48:102358 22. Gualdoni J, Kurtz A, Myzyri I, Wheeler M, Rizvi S (2017) Secure online transaction algorithm: securing online transaction using two-factor authentication. Procedia Comput. Sci. 114:93–99 23. Khattri V, Singh DK (2019) Implementation of an additional factor for secure authentication in online transactions. J Organ Comput Electron Commer 29(4):258–273 24. Venugopal, Viswanath N (2016) A robust and secure authentication mechanism in online banking. In: Proceedings on 2016 online international conference on green engineering and technologies (IC-GET) 2016, pp 0–2 25. Roy S, Venkateswaran P (2014) Online payment system using steganography and visual cryptography. In: 2014 IEEE students’ conference on electrical, electronics and computer science SCEECS 2014, pp 1–5 26. Ataya MAM, Ali MAM (2019) Acceptance of website security on e-banking. a-review. In: ICSGRC 2019 IEEE 10th control and system graduate research colloquium, proceeding, no. August, pp 201–206 27. Kaur R, Li Y, Iqbal J, Gonzalez H, Stakhanova N (2018) A security assessment of HCE-NFC enabled e-wallet banking android apps. Proc Int Comput Softw Appl Conf 2:492–497 28. Chaudhry SA, Farash MS, Naqvi H, Sher M (2016) A secure and efficient authenticated encryption for electronic payment systems using elliptic curve cryptography. Electron Commer Res 16(1):113–139 29. Skraˇci´c K, Pale P, Kostanjˇcar Z (2017) Authentication approach using one-time challenge generation based on user behavior patterns captured in transactional data sets. Comput Secur 67:107–121 30. Ibrahim RM (2018) A review on online-banking security models, successes, and failures. In: 2018 international conference on electrical, electronics, computers, communication, mechanical and computing (EECCMC) IEEE, no. February 31. Tan SF, Samsudin A (2018) Enhanced security of internet banking authentication with extended honey encryption (XHE) scheme, pp 201–216 32. Bajwa G, Dantu R, Aldridge R (2015) Pass-pic: A mobile user authentication. In: 2015 IEEE international conference on intelligence and security informatics. World through an Alignment Technol. Intell. Humans Organ. ISI 2015, p 195 33. Vengatesan K, Kumar A, Parthibhan M (2020) Advanced access control mechanism for cloud based e-wallet, vol 31(August 2016). Springer International Publishing 34. Shaju S, Panchami V (2017) BISC authentication algorithm: an efficient new authentication algorithm using three factor authentication for mobile banking. In: Proceedings 2016 online international conference on green engineering and technologies, IC-GET 2016, pp 1–5 35. Mohammed, Yassin (2019) Efficient and flexible multi-factor authentication protocol based on fuzzy extractor of administrator’s fingerprint and smart mobile device. Cryptography 3(3):24 36. Eman DA, Alharbi T (2019) Two factor authentication framework using OTP-SMS based on blockchain. Trans Mach Learn Artif Intell 7(3) 37. Nwabueze EE, Obioha I, Onuoha O (2017) Enhancing multi-factor authentication in modern computing. Commun Netw 09(03):172–178 38. Wang F et al (2020) Identity authentication security management in mobile payment systems. J Glob Inf Manag 28(1):189–203 39. Emin H, Marc SJ (2019) Physical presence verification using TOTP and QR codes. In: International conference on ICT systems security and privacy protection—IFIP SEC 2019, Lisbon (Portugal) 40. Shukla V, Chaturvedi A, Srivastava N (2019) A new one time password mechanism for client-server applications. J Discret Math Sci Cryptogr 22(8):1393–1406 41. Mohan R, Partheeban N (2014) Secure multimodal mobile authentication using one time password. Int J Recent Technol Eng 1(1):131–136
134
Md. A. Hassan and Z. Shukur
42. Uymatiao MLT, Yu WES (2014) Time-based OTP authentication via secure tunnel (TOAST): a mobile TOTP scheme using TLS seed exchange and encrypted offline keystore,” ICIST 2014 - Proc. 2014 4th IEEE Int. Conf. Inf. Sci. Technol., pp. 225–229, 2014 43. B. Rajendran, A. K. Pandey, and B. S. Bindhumadhava, “Secure and privacy preserving digital payment. IEEE SmartWorld Ubiquitous Intell. Comput. Adv. Trust. Comput. Scalable Comput. Commun. Cloud Big Data Comput. Internet People Smart City Innov. SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2017, pp 1–5 44. Collins L (2013) Access controls. Cyber Secur IT Infrastruct Prot, pp 269–280 45. Huseynov E, Seigneur JM (2017) Context-aware multifactor authentication survey. Elsevier Inc 46. Esiner E, Hanley SH, Datta A (2016) DMZtore: a dispersed data storage system with decentralized multi-factor access control (Demo). Proc Int Conf Distrib Comput Syst 2016(Augus):757–758 47. Divya R, Kumarasamy SM (2015) Visual authentication using QR code to prevent keylogging. Int J Eng Trends Technol 20(3):149–154 48. R, Idayathulla CC (2019) Enhanced adaptive security system for SMS—based one time password. Int J Adv Res Ideas Innov Technol 5(4):538–541 49. Meng W, Wong DS, Furnell S, Zhou J (2015) Surveying the development of biometric user authentication on mobile phones. IEEE Commun. Surv. Tutorials 17(3):1268–1293 50. Sinigaglia F, Carbone R, Costa G, Zannone N (2020) A survey on multi-factor authentication for online banking in the wild. Comput Secur no. February, p 101745 51. Atanda AA (2019) Biometric Smartcards and payment disbursement: a replication study of building state capacity in India. J Dev Eff 11(4):360–372 52. Alibabaee A, Broumandnia A (2018) Biometric authentication of fingerprint for banking users, using stream cipher algorithm. J Adv Comput Res 9(4):1–17 53. F. Nizam, H. J. Hwang, and N. Valaei, Measuring the Effectiveness of E-Wallet in Malaysia, vol. 786. Springer International Publishing, 2019. 54. Alam SS, Ali MH, Omar NA, Hussain WMHW (2020) Customer satisfaction in online shopping in growing markets: an empirical study. Int J Asian Bus Inf Manag 11(1):78–91 55. Kim SS (2020) Purchase intention in the online open market: do concerns for E-commerce really matter? Sustain 12(3) 56. Malathi R, Raj RJR (2016) An integrated approach of physical biometric authentication system. Proc Comput Sci 85(Cms):820–826 57. Sharma L, Mathuria M (2018) Mobile banking transaction using fingerprint authentication. Proceedings on 2nd international conference on inventive systems and control ICISC 2018, no. Icisc, pp 1300–1305 58. Okpara OS, Bekaroo G (2017) Cam-wallet: fingerprint-based authentication in m-wallets using embedded cameras. IEEE Int Conf Environ Electr Eng 59. Yang W, Wang S, Hu J, Zheng G, Valli C (2019) Security and accuracy of fingerprint-based biometrics: a review. Symmetry (Basel) 11(2) 60. Sain M, Normurodov O, Hong C, Hui KL (2021) A survey on the security in cyber physical system with multi-factor authentication. Int Conf Adv Commun Technol ICACT 2021, 2021– Febru, pp 1322–1329. https://doi.org/10.23919/ICACT51234.2021.9370515 61. Shaji NA, Soman S, Science C (2017) Multi-factor authentication for net banking. Int J Syst Softw Eng 5(1):1–4 62. Hassan MA, Shukur Z (2019) Review of digital wallet requirements. In: 2019 international conference on cybersecurity, ICoCSec 2019, pp 43–48 63. Yildirim N, Varol A (2019) A research on security vulnerabilities in online and mobile banking systems. In: 7th international symposium on digital forensics and security ISDFS 2019, pp 1–5 64. Hammood WA, Abdullah R, Hammood OA, Mohamad S, Mohammed A (2020) A review of user authentication model for online banking system based on mobile IMEI number 65. Tyagi H, Rakesh N (2018) Enhanced online hybrid model for online fraud prevention and detection. Smart Innov Syst Technol 79:97–106
A Systematic Review of User Authentication Security in Electronic …
135
66. Be ABH, Balasubramanian R (2018) Developing an enhanced high-speed key transmission (EHSKT) technique to avoid fraud activity in E-commerce. Indones J Electr Eng Comput Sci 12(3):1187–1194 67. Oo KZ (2019) Design and implementation of electronic payment gateway for secure online payment system. Int J Trend Sci Res Dev 3(5):1329–1334 68. Taher KA, Nahar T, Hossain SA (2019) Enhanced cryptocurrency security by time-based token multi-factor authentication algorithm. In: 1st international conference on robotics, electrical and signal processing techniques ICREST 2019, pp 308–312 69. Jeong H, Jung H (2021) MonoPass: a password manager without master password authentication, pp 52–54. https://doi.org/10.1145/3397482.3450720 70. Sharma N, Bohra B (2017) Enhancing online banking authentication using hybrid cryptographic method. In: International conference on computational intelligence and communication technology, pp 1–8 71. Yusuf SI, Boukar MM, Mukhtar A, Yusuf AD (2019) User define time based change pattern dynamic password authentication scheme. In: 14th international conference on electronics computer and computation ICECCO 2018, pp 206–212 72. Aina F, Yousef S, Osanaiye O (2018) Design and implementation of challenge response protocol for enhanced e-commerce security, vol 3. Springer International Publishing 73. Chanajitt R, Viriyasitavat W, Choo KKR (2018) Forensic analysis and security assessment of android m-banking apps. Aust J Forensic Sci 50(1):3–19 74. Alhothaily A, Alrawais A, Song T, Lin B, Cheng X (2017) Quickcash: secure transfer payment systems. Sensors (Switzerland) 17(6):1–20 75. Pukkasenunk P, Sukkasem S (2016) An efficient of secure mobile phone application for multiple bill payments. In: Proceedings on IEEE 30th International Conference on Advanced Information Networking and Applications Workshops WAINA 2016, pp 487–432 76. Harish M, Karthick R, Rajan RM, Vetriselvi V (2019) A new approach to securing online transactions—the smart wallet 500(January), Springer Singapore 77. Song J, Yang F, Choo KKR, Zhuang Z, Wang L (2017) SIPF: a secure installment payment framework for drive-thru internet. ACM Trans Embed Comput Syst 16(2) 78. Thammarat C (2020) Efficient and secure nfc authentication for mobile payment ensuring fair exchange protocol. Symmetry (Basel) 12(10):1–19 79. Cigoj P, Blažiˇc BJ (2015) An authentication and authorization solution for a multiplatform cloud environment. Inf Secur J 24(4–6):146–156 80. Journal UGCC (2020) A methodology for electronic money transaction security using multilayer security. UGC Care J. 04:1834–1842 81. Wang, He Q, Han Q (2017) Research on internet payment security based on the strong authentication of the timeliness and multi-factors 59(Emcm 2016):19–23 82. Khachane D, Sant Y, Sachan Y, Ghodeswar A (2018) Enhancing security of internet banking using biometrics. J Comput Eng 20(1):22–25 83. Ibrahim DR, Teh JS, Abdullah R (2020) Multifactor authentication system based on color visual cryptography, facial recognition, and dragonfly optimization. Inf Secur J 00(00):1–11 84. Ali G, Dida MA, Sam AE (2020) Two-factor authentication scheme for mobile money: a review of threat models and countermeasures. Futur. Internet 12(10):1–27 85. Hassan MA, Shukur Z (2021) A secure multi factor user authentication framework for electronic payment system 86. Juremi J (2021) A secure integrated e-wallet mobile application for education institution. Int Conf Cyber Relig 87. Iqbal S, Irfan M, Ahsan K, Hussain MA, Awais M, Shiraz M, Hamdi M, Alghamdi A (2020) A novel mobile wallet model for elderly using fingerprint as authentication factor. IEEE Access 8:177405–177423. https://doi.org/10.1109/ACCESS.2020.3025429 88. Gode P, Nakhate ST, Mane SS (2017) Authentication for mobile banking by using android based smart phones. Imp J Interdiscip Res 3:1314–1318 89. Houngbo PJ et al (2019) Embedding a digital wallet to pay-with-a-selfie, from functional requirements to prototype, vol 206. Springer International Publishing
136
Md. A. Hassan and Z. Shukur
90. Aria M, Agnihotri V, Rohra A, Sekhar R (2020) Secure online payment with facial recognition using MTCNN. Int J Appl Eng Res 15(3):249–252 91. Azimpourkivi M, Topkara U, Carbunar B (2017) Camera based two factor authentication through mobile and wearable devices. ACM Interactive Mobile Wearable Ubiquitous Technol 1(3) 92. Gupta A, Kaushik D, Gupta S (2020) Integration of biometric security system to improve the protection of digital wallet. SSRN Electron J ICICC, 1–6 93. Islam I, Munim KM, Islam MN, Karim MM (2019) A proposed secure mobile money transfer system for SME in Bangladesh: an industry 4.0 perspective. 2019 Int Conf Sustain Technol Ind 4.0, STI 2019, pp 1–6 94. Noh KS (2016) A study on the authentication and security of financial settlement using the finger vein technology in wireless internet environment. Wirel Pers Commun 89(3):761–775 95. Hashan B, Abeyrathna Y, Kaluaratchi M, Thelijjagoda S (2019) VoiceNote: an intelligent tool for monetary transactions with integrated voice support. In: 2019 international research conference on smart computing and systems engineering, pp 119–125 96. Aigbe P, Akpojaro J (2014) Analysis of security issues in electronic payment systems. Int J Comput Appl 108(10):10–14 97. Xin T, Xiaofang B (2014) Online banking security analysis based on STRIDE threat model. Int J Secur Appl 8(2):271–282 98. Qiao Z, Yang Q, Zhou Y, Zhang M (2021) Improved secure transaction scheme with certificateless cryptographic primitives for IoT-based mobile payments. IEEE Syst J, 1–9. https:// doi.org/10.1109/JSYST.2020.3046450 99. Guan X, Xie SF, Liu F, Zhao HB, Liang Z (2021) Risk prediction in e-commerce mobile payment based on PSO-SVM. In: Proceedings of the 2021 international conference on bioinformatics and intelligent computing BIC 2021, pp 208–213 100. Bosamia M (2017) Mobile wallet payments recent potential threats and vulnerabilities with its possible security measures. In: International Conference on Soft Computing and its Engineering Applications CHARUSAT, Chang, India 101. Solat S (2017) Security of electronic payment systems: a comprehensive survey 102. Conteh NY, Schmick PJ (2016) Cybersecurity: risks, vulnerabilities and countermeasures to prevent social engineering attacks. Int J Adv Comput Res 6:31–38. https://doi.org/10.19101/ ijacr.2016.623006 103. Ali L, Ali F, Surendran P, Thomas B (2017) The effects of cyber threats on customer’s behaviour in e-banking services. Int J E-Educ E-Bus E-Manage E-Learn 7(1):70–78 104. Bezhovski Z (2016) The future of the mobile payment as electronic payment system. Eur J Bus Manag 8(8):2222–2839 105. Masihuddin M, Khan BUI, Mattoo MMUI, Olanrewaju RF (2017) A survey on e-payment systems: elements, adoption, architecture, challenges and security concepts. Indian J Sci Technol 10(20):1–19 106. Caldwell T (2015) Locking down the e-wallet. Comput Fraud Secur Bull 2012(4):5–8 107. Paymentsforum.uk (2015) The Open banking standard, p 11 108. Urs B-A (2015) Security issues and solutions in e-payment systems. Fiat Iustitia 1(1):172–179 109. Vimala V (2016) An evaluative study on internet banking security among selected Indian bank customers. Amity J Manag Res 1(1):63–79 110. Chun SH (2019) E-commerce liability and security breaches in mobile payment for e-business sustainability. Sustain 11(3) 111. Hassan A, Shukur Z, M K, Hasan ASAK (2020) A review on electronic payments security. Symmetry (Basel) 12(8):24 112. Pabian A, Pabian B, Reformat B (2020) E-customer security as a social value in the sphere of sustainability. Sustainability, pp 1–14 113. August M, Summary M (2017) Multi-faceted evolution of mobile payment strategy. Authentication Technol 114. Hassan A, Shukur Z, Hasan MK (2020) An improved time-based one time password authentication framework for electronic payments. Int J Adv Comput Sci Appl 11:359–366
A Systematic Review of User Authentication Security in Electronic …
137
115. Kouicem DE, Bouabdallah A, Lakhlef H (2018) Internet of things security: a top-down survey. Comput Netw 141:199–221 116. Sherif MH (2016) Protocols for electronic commerce 53(9) 117. Kisore NR, S S (2015) A secure SMS protocol for implementing digital cash system. In: 2015 international conference on advances in computing, communications and informatics, pp 1883–1892 118. PCI Security Standards Council LLC (2013) PCI mobile payment acceptance security guidelines for developers. Pci Dss Inf . Suppl, no. February, pp 0–27 119. US Payments Forum and Secure Technology Alliance (2018) Mobile and digital wallets: U.S. landscape and strategic considerations for merchants and financial institutions. No . January, pp 1–50 120. ECB (2013) Recommendations for the security of internet payments. no. January, pp 1–26 121. Santos J, Antunes M, Mangana J, Monteiro D, Santos P, Casal J (2018) Security testing framework for a novel mobile wallet ecosystem. In: Proceedings on 9th international conference on computational intelligence and communication networks, CICN 2017, vol 2018–Janua, pp 153–160 122. Roland M, Langer J, Scharinger J (2013) Applying relay attacks to Google Wallet. In: 2013 5th international workshop on near field communication NFC 2013, pp 1–6 123. Kwon Y (2021) Session details: session 7 software security and Malware 2021 124. Hu Y, Wang S, Tu GH, Xiao L, Xie T, Lei X, Li CY (2021) Security threats from bitcoin wallet smartphone applications, pp 89–100 125. Mazumder FK, Jahan I, Das UK (2015) Security in electronic payment transaction. Int J Sci Eng Res 6(2):955–960 126. Uzoka F (2016) Development of e-wallet system for tertiary institution in a developing country. Comput Sci Telecommun 3(3) (49):18–29 127. Dai W, Deng J, Wang Q, Cui C, Zou D, Jin H (2018) SBLWT: a secure blockchain lightweight wallet based on trustzone. IEEE Access 6(1):40638–40648 128. Alhothaily A, Alrawais A, Cheng X, Bie R (2015) A novel verification method for payment card systems. Pers Ubiquitous Comput 19:1145–1156 129. Dmitrienko A, Noack D, Yung M (2017) Secure wallet-assisted offline bitcoin payments with double-spender revocation. In: ACM on Asia conference on computer and communications security 130. Akinyokun N, Teague V (2017) Security and privacy implications of NFC-enabled contactless payment systems. ACM Int Conf Proc Ser 131. Rahaman S, Wang G, Yao D (2020) Security certification in payment card industry: Testbeds, measurements, and recommendations. Internet Secure, pp 481–498 132. El Orche A, Bahaj M (2019) Approach to use ontology based on electronic payment system and machine learning to prevent Fraud. ACM Int Conf Proc Ser 133. Huang TY, Huang C (2019) Fraud payment research- Payment through credit card. ACM Int. Conf. Proceeding, pp 189–194 134. Boureanu I, Chen L, Ivey S (2020) Provable-security model for strong proximity-based attacks: with application to contactless payments. In: Proceedings of the 15th ACM Asia conference on computer and communications security. ASIA CCS 135. Boureanu I, Chothia T, Debant A, Delaune S (2020) Security analysis and implementation of relay-resistant contactless payments. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp 879–898 136. P, D, Babu SS, Vijayalakshmi Y (2020) Enhancement of e-commerce security through asymmetric key algorithm. Comput Commun 137. Izhar A, Khan A, Khiyal MSH, Javed W, Baig S (2016) Designing and implementation of electronic payment gateway for developing countries, pp 3643–3648 138. Hussain S, Khan BUI, Anwar F, Olanrewaju RF (2018) Secure annihilation of out-of-band authorizationfor online transactions. Indian J Sci Technol 11:1–9 139. Kogan D, Manohar N, Boneh D (2017) T/Key: Second-factor authentication from secure hash chains Dmitry, pp 983–999
138
Md. A. Hassan and Z. Shukur
140. Damodaram R (2016). Study on phishing attacks and antiphishing tools. Int Res J Eng Technol 3 141. Yoo C, Kang BT, Kim HK (2015) Case study of the vulnerability of OTP implemented in internet banking systems of South Korea. Multimed Tools Appl 74:3289–3303 142. Yeh KH (2018) A secure transaction scheme with certificate less cryptographic primitives for IoT-based mobile payments. IEEE Syst J 12(2):2027–2038
Skin Diseases Detection with Transfer Learning Vo Van-Quoc and Nguyen Thai-Nghe
Abstract For instance, with extreme changes in climate, weather, and environmental influences, skin diseases have become more and more dangerous and common. Therefore, the identification and detection of skin diseases are important researches to help patients have timely prevention and treatment solutions. In this study, we propose using a transfer learning approach to detect and identify skin diseases. The input image will be preprocessed, segmented, and used transfer learning from the pretrained VGG19 deep learning model to identify skin diseases. Different scenarios were tested on the set of images collected from the International Skin Imaging Collaboration (ISIC). This dataset contains a total of 2655 images (including male and female patients). The data were classified into 3 categories which are carcinoma disease (1127 images), skin hemorrhagic disease (1006 images), and normal skin (522 images). These data were labelled by experts in the field of dermatology. The experimental results show that using the transfer learning method from the pre-trained VGG19 model is very positive with the accuracy and the F1 measure of 0.85–0.84, 0.93–0.93, and 0.87–0.86, respectively, for 3 test scenarios. Keywords Skin diseases · Skin carcinoma · Skin hemorrhagic · Transfer learning · VGG19 · Image segmentation
1 Introduction Human skin is the largest part of the body and is an important part that covers and protects the body from external and internal influences. Because of its role in communicating with the environment, the skin plays an important immune role in defending the body against pathogens and excessive water loss. However, human V. Van-Quoc Nhi Dong Hospital, 345 Nguyen Van Cu Street, Can Tho City, Viet Nam N. Thai-Nghe (B) Can Tho University, 3-2 Street, Can Tho City, Viet Nam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_11
139
140
V. Van-Quoc and N. Thai-Nghe
skin can be affected by many factors such as alcohol drinking, lifestyle, viruses, tobacco, working environment, and ultraviolet rays. Human diseases may directly come from the skin. Skin diseases have impacted nearly 33% of the total population of about 1.9 billion people. About 1.79% of all diseases worldwide are skin diseases. For example, Australia is one of the countries with the highest death rate from skin cancer in the world. The number of people who die each year from skin cancer in Australia is about 1890 people, and in the US, it is 7180 people (in 2021).1 Skin diseases such as inflammation, infection, cancer have affected the daily life of people of all ages. Diseases also cause serious consequences such as cancer, viscera damage, depression, and even death. This is the reason why it is necessary to detect and identify skin diseases early to have timely solutions to support patients. In this study, we propose using a transfer learning approach to identify common skin diseases such as carcinoma and hemorrhagic. The input images will be preprocessed, segmented, and used transfer learning from the pre-trained VGG19 deep learning model to identify the skin diseases. The research results will help users to detect the disease early based on the pictures taken of symptoms on the skin so that users can make decisions in the treatment of skin diseases. The article consists of 5 parts. Part 1 is an introduction. Related studies will be presented in Sect. 2. Section 3 is the implementation methodology. The next part is the experiment, and finally the review.
2 Related Work There are many related studies in the field of skin disease diagnosis with many different approaches. In the study [1], the authors applied deep learning and SVM techniques to find and detect skin melanoma; the study performed a feature transformation, and this transformation helps in the treatment of melanoma. Disease imaging is like observing pictures in nature. The proposed approach is to mimic the procedure that clinicians use to describe skin lesion patterns. And this method has yielded good results with an accuracy of 93.1%. Study [2] proposed an integrated model for skin lesion boundary segmentation and skin lesion classification proposed by cascading novel deep learning networks. In the first phase, a new full-resolution complex network (FrCN) is used to demarcate the boundaries of skin lesions from dermatoscopy images. The segmented lesions are then transferred into a network such as ResNet-50 for classification. The presegmentation process allows ResNet-50 to extract more specific and representative features from skin lesions and use them to improve the classification. Evaluation through the F1 measure reached 81.57% and 75.75%, respectively. In the study [3], the authors proposed a multi-tasking deep neural network architecture to simultaneously solve the problem of classification and segmentation of skin lesions. To get the most out of the features from different tasks and thus get richer 1
https://www.skincancer.org/skin-cancer-information/skin-cancer-facts/.
Skin Diseases Detection with Transfer Learning
141
knowledge of the pattern, they designed a feature transfer module to pass messages between the shard branch and the classifier branch. Since the feature transfer module is not always useful and may involve individual samples, gate functions are used to control the transmission of information. Thus, features from one task are learned and selectively transferred to another, and vice versa, which effectively improves the performance of both tasks. The results are obtained with an accuracy of 80%. In the study [4], an intelligent digital diagnostic scheme is proposed to improve the classification accuracy of many diseases. The multi-class multilevel (MCML) classification algorithm inspired by the “divide and conquer” rule was discovered to solve research challenges. MCML classification algorithm is implemented using traditional machine learning and advanced deep learning approaches. Improved techniques are proposed to remove noise in traditional machine learning methods. The proposed algorithm is evaluated on 3672 classified images, collected from different sources, and the diagnostic accuracy achieved is 96.47%. Other related studies published in [5–10] also gave very positive results in the diagnosis of skin diseases and other works [11–16]. In this study, we propose using the transfer learning approach from the pre-trained VGG19 deep learning model to take advantage of the trained parameters on more than one million different images to identify the skin diseases which are carcinoma and hemorrhagic.
3 Proposed Approach We propose a combination of steps such as image preprocessing, image segmentation, and transfer learning to retrain the models from existing deep learning models [17–19].
3.1 Skin Images Preprocessing In the preprocessing step, to perform the removal of confounding components such as skin hair and hair, this study performed the following steps, as illustrated in Fig. 1: • • • • •
Convert RGB color image to grayscale image; Apply the Black-Hat filter on the grayscale image to find out the hairline or fur; Enhance the value to highlight the hairline using a threshold (thresholding); Recreate the image after the hair or hair has been removed. Finally, convert the swatch back to RGB (using the inpaint method in the CV2 library). Also, at this stage, if the images are not uniform in size, we will resize the image to 256 × 256.
142
V. Van-Quoc and N. Thai-Nghe
Fig. 1 Preprocess the images of the skin
Fig. 2 VGG19 architecture (Picture Source saicharanars.medium.com)
The training dataset from the input images contains both diseased and nondiseased skin. Therefore, this stage of image segmentation will help to remove the non-diseased skin areas, focusing only on the diseased areas. This will help the training model in the next step to extract the most accurate feature of the disease area. From there, it is possible to increase the accuracy of the model. In this step, the UNET model [20] is used to perform segmentation for the diseased image region.
3.2 Skin Disease Detection Using Transfer Learning To identify two common skin diseases, namely carcinoma and hemorrhagic chromatin, this study proposes using transfer learning from deep learning model VGG19 [21], retrain and refine this model (fine-tuning). The overall architecture2 as well as the parameters of the VGG19 model are presented in Fig. 2.
2
https://saicharanars.medium.com/building-vgg19-with-keras-f516101c24cf.
Skin Diseases Detection with Transfer Learning
143
• Transfer learning phase: During the training process, we reuse the parameters of the previously trained VGG19 model. This model has been pre-trained on the ImageNet dataset consisting of 1.2 million images and 1000 layers, so the transfer learning will take advantage of the available parameters of this model without having to train from the beginning. This will help the model improve the accuracy in case the input data are small. To use in the problem of skin disease identification, the VGG19 model will be removed from the last layers (the fully connected and output layers) to replace it with a new fully connected layer and an output layer with a function, softmax classification for 2 classes corresponding to 2 types of carcinoma and hemorrhagic chromosomal and disease-free class. The proposed architecture of the transfer learning model for skin disease recognition is illustrated in Fig. 3. • Fine-tuning phase: During the model calibration phase, we continue to calibrate the hyperparameters to help the model achieve the highest accuracy. The hyperparameters used for the calibration process are as follows: (a) the number of epochs tested from 20 to 150 to find a suitable epoch threshold; (b) batch sizes are tested from 16 to 256; (c) hidden layer is 512, 256, and 128; (d) learning rate tested: 0.1, 0.01, 0.001, and 0.0001.
Fig. 3 Transfer learning for skin disease detection
144
V. Van-Quoc and N. Thai-Nghe
4 Experiments 4.1 Dataset for Experiments Experimental data were collected from ISIC3 (International Skin Imaging Collaboration). This data contain a total of 2212 images (including male and female patients). The data were classified into 3 categories: carcinoma (586 images), hemorrhagic chromosomal (1099 images), and normal skin (527 images). This is a labeled dataset from experts in the field of dermatology expertise. Each image with different sizes from 600 × 450 to 6688 × 4439 will be resized to 256 × 256.
4.2 Evaluation Methods and Baselines To evaluate the model, this study divides the dataset into three sets: training set, validation set, and test set with ratios of 60-20-20. Using the accuracy measures (accuracy-acc) and the F1 measure to compare the results with other methods such as VGG16 [21], MobileNet [22], Inception V3 [23], and Xception [24]. Three experimental methods are proposed as follows: • Experiment 1 is performed in the following sequences: from the original data → preprocessing → image segmentation → training and validation. • Experiment 2 is performed in the following sequences: from the original data → preprocessing → training and validation. • Experiment 3 is performed in the following sequences: from the original data → segmentation → training and validation.
4.3 Experiment 1 In this experiment, we first perform hair removal on the original dataset. This removal process has been described in Sect. 3.1. After preprocessing the data, image segmentation will be performed using the UNET model [20]. Next is the model training phase consisting of two steps of transfer learning (A) and refinement (B). All models are executed under the same set of parameters: epoch = 20, batch size = 16, hidden unit = [512, 256, 128], hidden layer = number of hidden layers of the pre-trained VGG19 model, remove the last layer and replace it with 11 new hidden layers and output layers corresponding to skin diseases. Table 1 shows the results of training and testing for experiment 1.
3
https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main.
Skin Diseases Detection with Transfer Learning
145
Table 1 Results on experiment 1 Phase
Model
acc
val_acc
val_loss
test_acc
F1
Transfer learning
VGG16
0.76
0.81
0.49
0.76
0.73
Fine-turning
VGG19
0.86
0.66
0.69
0.82
0.82
MobileNet
0.94
0.8
0.67
0.8
0.79
Xception
0.87
0.79
0.51
0.76
0.76
Inception V3
0.83
0.77
0.48
0.76
0.76
VGG16
0.91
0.81
0.40
0.82
0.84
VGG19
0.96
0.81
0.45
0.85
0.84
MobileNet
0.76
0.8
0.66
0.82
0.82
Xception
0.81
0.8
0.41
0.8
0.79
Inception V3
0.79
0.77
0.43
0.77
0.77
acc: accuracy val_acc: validation accuracy val_loss: validation loss test_acc: test accuracy
From this experiment, we see that the transfer learning model with VGG19 gives the best results with an accuracy of 85% on the test set and an F1 measure of 84%.
4.4 Experiment 2 In this experiment, we perform hair removal on the original dataset. After preprocessing the image, the data will be transferred to the training phase of the transfer learning model (A) and refining (B). Table 2 shows the results of training and testing for the scenario in experiment 2. From this experiment, we see that the handover learning model with the VGG19 network still gives the best results with an accuracy of 93% on the test set and a measure of F1 of 93%. Figures 4 and 5 illustrate the accuracy and the loss during training and validation times.
4.5 Experiment 3 In this experiment, we proceed to segment images from the original dataset using the UNET model [20]. After performing image segmentation, the data will be transferred to the transfer learning (A) and refining (B) stages. The results of the training and testing process are presented in Table 3. From this experiment, we see that the transfer learning model with VGG19 gives the best results with an accuracy of 87% on the test set, and the F1 measure is 86%.
146
V. Van-Quoc and N. Thai-Nghe
Table 2 Results on experiment 2 Phase
Model
acc
val_acc
val_loss
test_acc
F1
Transfer learning
VGG16
0.84
0.87
0.30
0.87
0.87
Fine-turning
VGG19
0.83
0.87
0.33
0.85
0.85
MobileNet
0.95
0.87
0.38
0.85
0.84
Xception
0.92
0.79
0.53
0.78
0.76
Inception V3
0.88
0.76
0.6
0.78
0.78
VGG16
0.91
0.93
0.23
0.92
0.92
VGG19
0.97
0.94
0.17
0.93
0.93
MobileNet
0.93
0.82
0.58
0.84
0.83
Xception
0.88
0.84
0.37
0.81
0.81
Inception V3
0.81
0.78
0.74
0.76
0.76
Fig. 4 Accuracy on training and validation phases
4.6 Discussion on Experimental Results From the above experimental results, we can see that the handover learning model with the VGG19 network gives the best results in the experimental scenarios, which is suitable for the problem of skin disease diagnosis.
Skin Diseases Detection with Transfer Learning
147
Fig. 5 Loss on training and validation phases
Table 3 Results on experiment 3 Phase
Model
acc
val_acc
val_loss
test_acc
F1
Transfer learning
VGG16
0.88
0.83
0.40
0.79
0.78
Fine-turning
VGG19
0.88
0.81
0.46
0.80
0.80
MobileNet
0.97
0.84
0.80
0.80
0.78
Xception
0.92
0.83
0.44
0.78
0.77
Inception V3
0.87
0.83
0.42
0.80
0.80
VGG16
0.90
0.87
0.33
0.0.85
0.85
VGG19
0.99
0.87
0.31
0.87
0.86
MobileNet
0.80
0.84
0.61
0.82
0.83
Xception
0.94
0.84
0.40
0.82
0.81
Inception V3
0.87
0.80
0.45
0.80
0.79
Figure 6 shows the results of the transfer learning model on three different experimental methods. Through experimental methods, we can see that going through two steps of preprocessing and image segmentation has significantly reduced the training results of the model. The reason is that the preprocessing and segmentation have lost too many features. In situations where only data are preprocessed or image segmented, the results are markedly improved. In which, the preprocessing method
148
V. Van-Quoc and N. Thai-Nghe
Results of transfer learning on 3 experiments 1 0.95 0.9 0.85 acc
0.8
val_acc
0.75 0.7
test_acc Experiment 1
Experiment 2
Experiment 3
acc
0.96
0.97
0.99
val_acc
0.81
0.94
0.87
test_acc
0.85
0.93
0.87
F1
0.84
0.93
0.86
F1
Experiments Fig. 6 Results of transfer learning on 3 experiments
only gives better results than the image segmentation method, possibly because the image segmentation has lost more features than the image preprocessing. In addition, the results from this study are also relatively satisfactory compared with the studies presented in [3] with an accuracy of 80%, and in [2] with an accuracy of 81.57%.
5 Conclusion This study proposed to use a transfer learning model to identify common skin diseases. The input image will be preprocessed, segmented, and used transfer learning from the pre-trained VGG19 deep learning model to identify skin diseases. With a combination of data preprocessing, retraining from the existing model, and fine-tuning, the model has achieved an accuracy of 93%. We will continue to test different data preprocessing techniques to improve model accuracy. In addition, it is necessary to collect more data on existing disease classes and on augmenting other skin disease classes. Data augmentation also has the potential to help improve model accuracy. Finally, learning more models like Gradcam and Alibi to help visualize deep learning models.
Skin Diseases Detection with Transfer Learning
149
References 1. Codella N, Cai J, Abedini M, Garnavi R, Halpern A, Smith JR (2015) Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images. In: Zhou L, Wang L, Wang Q, Shi Y (eds) Machine learning in medical imaging. MLMI 2015. Lecture notes in computer science, vol 9352. Springer, Cham. https://doi.org/10.1007/978-3-319-24888-2_15 2. Al-masni MA, Al-antari MA, Park HM, Park NH, Kim T-S (2019) A deep learning model integrating FrCN and residual convolutional networks for skin lesion segmentation and classification. In: 2019 IEEE Eurasia conference on biomedical engineering, healthcare and sustainability (ECBIOS), pp 95–98. https://doi.org/10.1109/ECBIOS.2019.8807441 3. Chen S, Wang Z, Shi J, Liu B, Yu N (2018) A multi-task framework with feature passing module for skin lesion classification and segmentation. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 1126–1129. https://doi.org/10.1109/ISBI.2018.8363769 4. Hameed N, Shabut AM, Ghosh MK, Hossain MA (2020) Multi-class multi-level classification algorithm for skin lesions classification using machine learning techniques. Expert Syst Appl 141:112961. ISSN 0957–4174. https://doi.org/10.1016/j.eswa.2019.112961 5. Abhishek K, Kawahara J, Hamarneh G (2021) Predicting the clinical management of skin lesions using deep learning. Sci Rep 11:7769. https://doi.org/10.1038/s41598-021-87064-7 6. Albahli S, Nida N, Irtaza A, Yousaf MH, Mahmood MT (2020) Melanoma lesion detection and segmentation using YOLOv4-darknet and active contour. IEEE Access 8:198403–198414. https://doi.org/10.1109/ACCESS.2020.3035345 7. Zakeri A, Hokmabadi A (2018) Improvement in the diagnosis of melanoma and dysplastic lesions by introducing ABCD-PDT features and a hybrid classifier. Biocybernet Biomed Eng 38:456–466 8. Ma Z, Manuel J, Tavares RS (2017) Effective features to classify skin lesions in dermoscopic images. Expert Syst Appl 84(C):92–101. https://doi.org/10.1016/j.eswa.2017.05.003 9. Hagerty JR, Stanley RJ, Almubarak HA, Lama N, Kasmi R, Guo P, Drugge RJ, Rabinovitz HS, Oliviero M, Stoecker WV (2019) Deep learning and handcrafted method fusion: higher diagnostic accuracy for melanoma dermoscopy images. IEEE J Biomed Health Inform 23(4):1385–1391. https://doi.org/10.1109/JBHI.2019.2891049 Epub 2019 Jan 4 PMID: 30624234 10. Celebi ME, Zornberg A (2014) Automated quantification of clinically significant colors in dermoscopy images and its application to skin lesion classification. IEEE Syst J 8(3):980–984. https://doi.org/10.1109/JSYST.2014.2313671 11. López-Leyva JA, Guerra-Rosas E, Álvarez-Borrego J (2021) Multi-class diagnosis of skin lesions using the Fourier spectral information of images on additive color model by artificial neural network. IEEE Access 9:35207–35216. https://doi.org/10.1109/ACCESS.2021.306 1873 12. Romero Lopez A, Giro-i-Nieto X, Burdick J, Marques O (2017) Skin lesion classification from dermoscopic images using deep learning techniques. In: 2017 13th IASTED international conference on biomedical engineering (BioMed), pp 49–54. https://doi.org/10.2316/P.2017. 852-053 13. Dorj UO, Lee KK, Choi JY et al (2018) The skin cancer classification using deep convolutional neural network. Multimed Tools Appl 77:9909–9924. https://doi.org/10.1007/s11042018-5714-1 14. Melbin K, Raj YJV (2021) Integration of modified ABCD features and support vector machine for skin lesion types classification. Multimed Tools Appl 80:8909–8929. https://doi.org/10. 1007/s11042-020-10056-8 15. Al-antari MA, Al-masni MA, Park SU et al (2018) An automatic computer-aided diagnosis system for breast cancer in digital mammograms via deep belief network. J Med Biol Eng 38:443–456. https://doi.org/10.1007/s40846-017-0321-6 16. Chang H, Han J, Zhong C, Snijders AM, Mao J-H (2018) Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications. IEEE Trans Pattern Anal Mach Intell 40(5):1182–1194, 1 May 2018. https://doi.org/10.1109/TPAMI.2017.2656884
150
V. Van-Quoc and N. Thai-Nghe
17. Dildar M, Akram S, Irfan M et al (2021) Skin cancer detection: a review using deep learning techniques. Int J Environ Res Public Health 18(10):5479. Published 2021 May 20. https://doi. org/10.3390/ijerph18105479 18. Yu Z et al (2019) Melanoma recognition in dermoscopy images via aggregated deep convolutional features. IEEE Trans Biomed Eng 66(4):1006–1016. https://doi.org/10.1109/TBME. 2018.2866166 19. Yu L, Chen H, Dou Q, Qin J, Heng P (2017) Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans Med Imaging 36(4):994–1004. https:// doi.org/10.1109/TMI.2016.2642839 20. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham 21. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLR 2015 22. Howard AG et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 23. Szegedy C et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition 24. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Quadratic Dragonfly Algorithm for Numerical Optimization and Travelling Salesman Problem Divya Soni, Nirmala Sharma, and Harish Sharma
Abstract Dragonfly algorithm (DA) is swarm optimisation technique which imitates the forging and swarming behaviour of dragonflies. Despite the popularity of nature inspired (NI) algorithm, there are still certain areas and disputes that desire further study. Second, the lack of a meticulous mathematical approach for analysing such techniques, a crucial domain of study is parameter tuning and parameter control. However, its challenging to tune or control parameters and its still unsolved. Simultaneously, its tough to attain a fine symmetry between exploration and exploitation. As the basic review of DA has shown the limitations like overflowing of search space [1], increased convergence time [2], trapping into local optima, high exploration and low exploitation, and difficulty in optimising complex problems. A quadratic dragonfly algorithm (QDA) is introduced to overcome the limitations of traditional dragonfly algorithm. It impacts the swarming behaviour of dragonfly which can curdle towards exploration and exploitation in faster pace when compared to basic dragonfly algorithm. The enactment of the proposed technique is scrutinised on the comprehensive set of CEC-C06 2019 [3] benchmark functions. The efficiency of QDA was also evaluated on real-world application: travelling salesman problem and the outcomes are analysed with well-known metaheuristic algorithms. Keywords Dragonfly algorithm · Quadratic dragonfly algorithm · Swarm intelligence · Travelling salesman problem
1 Introduction Swarm intelligence (SI)-based techniques have been flourishingly exercised to solve various optimisation problems. SI amplifies the collective behaviour of networked human cliques using controlled algorithms modelled after natural swarms. In past few years, a dozen of new optimisation algorithms such as ant colony optimiser [4], cuckoo search [5], particle swarm optimisation [5], and firefly algorithm [6] have D. Soni (B) · N. Sharma · H. Sharma Rajasthan Technical University, Kota, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_12
151
152
D. Soni et al.
emerged, and they have manifested significant potential in solving tough engineering optimisation problems. Ant colony optimisation was developed by Marco Dorigo (1992) strives to mimic the foraging behaviour of social ants. All individuals make use of chemical messenger, known as pheromone, to coordinate with one another; however, the communication is drewed on local information. The model of pheromone dethronement and disappearance may vary slightly, depend on algorithm variant’s [4]. Cuckoo search was introduced by Yang and Deb, drewed on the intriguing brooding parasitism of some cuckoo species. Cuckoo algorithm has both local and global search capabilities [5]. Particle swarm optimisation was introduced by Kennedy and Eberhart (1995). Its drewed on the swarming behaviour of birds and fish. PSO has been stretched out to solve multi-objective optimisation problems; however, PSO can often encounter premature convergence in which population may get stuck into local optima and thus lose track of exploration rate [5]. Firefly algorithm was introduced by Yang [6] is an optimiser inspired by the swarming and light flashing nature of fireflies. This algorithm subdivides the whole swarm into multiple subswarms. FA is suitable for multi-modal optimisation problems. A hybridised version of artificial bee colony and differential evolution (2017) was introduced to develop an efficient metaheuristic algorithm with more acceptable convergence rate and balance between exploitation and exploration factors [7]. An improvised grey wolf optimisation algorithm (GWO) was introduced to solve the Transmission Network Expansion Planning Problem (2017) for Graver’s 6-bus and Brazilian 46-bus systems, outperformed the basic GWO [8]. Bee forth artificial bee colony algorithm (2018) was introduced to solve job-shop scheduling problem to improve the exploitation capability by altering the onlooker phase of existing ABC algorithm [9]. Grasshopper inspired artificial bee colony algorithm (2018) is a hybridised variant of SI based technique. This search technique incorporated the jumping mechanism of grasshoppers to improvise the convergence rate and exploitation factor [10]. Dragonfly algorithm (DA) introduced by Mirjalili [11] is one of the recent SI algorithms. The common shortcoming of the most SI algorithms is that the convergence velocity is slow, the optimisation precision is slow, and easy to trap in local optimum [11]. However, DA gives good results when compared with the other state-of-the-art algorithms. In DA, Levy Flight Mechanism’s large searching steps caused interruption and results in overflowing of the search space [1], increase convergence time [2], trapping into local optima, high exploration and low exploitation, and may face difficulty in optimising complex problems. In order to minimise the drawbacks of DA and to make DA more efficient when being applied to solve the optimisation problems, a new variant namely quadratic dragonfly algorithm (QDA)is proposed. Applications of DA besides being successfully applied to a list of test problems. DA has been exercised to riddle out many practical optimisation problems like multilevel colour image segmentation [12], filter design problem, travelling salesman prob-
Quadratic Dragonfly Algorithm for Numerical Optimization …
153
lem, infant cry classification, structural design optimisation of vehicles, knapsack problem, hybrid energy distributed power systems, Internet of vehicles, stress distribution, wireless node localisation, power flow management, and machine learning. The remaining paper is assembled as follows: Sect. 2 presents the overview of basic dragonfly algorithm and its mathematical models. Section 3 proposes new variant of DA algorithm named quadratic dragonfly algorithm. Section 4 is composed of results and discussion of comparative study on several benchmark functions and real case study is also provided in section. Finally, Sect. 5 provides a report about the conclusion of the modified QDA algorithm.
2 Overview of DA The dragonfly optimiser is a modern heuristic swarm intelligence optimization technique set in motion by Mirjalili [11]. Dragonflies have peculiar swarming behaviour. They swarm for two ruling objectives: Forging and Migration. While hunting, dragonflies move back and forth over compact region for hunting other preys. However, gigantic count of dragonflies prepare the swarm for voyaging in one direction over distant ranges. The former is termed as static swarming, and the latter is termed as dynamic swarming (Figs. 1, 2, 3, 4, 5 and 6). The swarming behaviour of dragonflies is relatable to the crucial metaheuristic optimisation phases: exploration and exploitation. (1) Dragonfly Algorithm: (a) Exploration and Exploitation Operators: • Separation: Denotes static collision avoidance of the dragonflies from neighbour dragonflies. Its mathematically calculated as:
Fig. 1 Static swarming and dynamic swarming
154 Fig. 2 Separation
Fig. 3 Alignment
Fig. 4 Cohesion
D. Soni et al.
Quadratic Dragonfly Algorithm for Numerical Optimization …
155
Fig. 5 Attraction to food
Fig. 6 Distraction from enemy
Si = −
n
D − Dj
(1)
i=1
where, D is current dragonfly position and D j is the jth nearest dragonfly position. • Alignment: Indicates the velocity complementing of dragonflies to that of other dragonflies in the neighbourhood.
156
D. Soni et al.
Its mathematically calculated as: n Ai =
i=1
Vj
n
(2)
where, n is nearest dragonflies number and V j is jth nearest dragonfly velocity. • Cohesion: Donates the dragonfly’s tendency to the centre of the swarm group. Its mathematically calculated as: n Ci =
i=1
n
Dj
−D
(3)
where, D is current dragonfly position and D j is jth nearest dragonfly position and n is nearest dragonflies number. • Attraction: An individual attraction towards food source. Its mathematically calculated as: Fi = Fl − D
(4)
where, Fl is of the food source position. • Distraction: An individual tendency to outwards an enemy. Its mathematically calculated as: E i = El + D
(5)
where, El is the enemy position. (b) Step Vector: Denotes dragonfly’s motion direction and can be calculated as: Dt+1 = (s Si + a Ai + c Ci + f Fi + e E i ) + wDt
(6)
s = 2 × r × c
(7)
a = 2 × r × c c = 2 × r × c f = 2×r e = c
W = 0.9 − I ×
(0.9 − 0.4) Max_I
(8)
Quadratic Dragonfly Algorithm for Numerical Optimization …
c=
0.1 − 0
(0.9−0.4) Max_I
if (2 × I ) ≤ Max_I otherwise
157
(9)
(c) Position Vector: The dragonfly’s position at t + 1 is updated as: Dt+1 = Dt + Dt+1
(10)
here t represents current iteration. If there is no nearest solutions, the dragonfly’s positions are updated using Levy Flight [13]. It refines the randomness, chaotic behaviour and global search. capability of dragonflies (Fig. 7). Dt+1 = Dt + levy(d) × Dt
(11)
3 Quadratic-based Dragonfly Algorithm As the basic review of DA has shown the limitations like overflowing of search space [1], increased convergence time [2], trapping into local optima, high exploration, and low exploitation and difficulty in optimising complex problems. To overcome these limitations, a new variant of DA is proposed, namely quadratic dragonfly algorithm. In the proposed quadratic dragonfly algorithm, the coefficients values are revised in nonlinear fashion. The affect is on the searching behaviour of dragonflies which can fast track exploration or exploitation rate when compared to the basic DA [2]. There are various ways of balancing exploration and exploitation, the one is to tune the swarming parameters (w, s , a , c , f , and, e ) [11]. The swarming factors are modified as:
158
D. Soni et al.
Fig. 7 Dragonfly flow chart
s = 1.5 × rand × c
(12)
a = 2.5 × rand × c c = 2.5 × rand × c f = 4.5 × rand e = c
(0.9 − 0.4) w = 0.9 − I × Max_I
c=
⎧ ⎪ ⎨
0.1 −
⎪ ⎩0
(0.9−0.4) Max_I
2 (13)
2 if (2 × I ) ≤ Max_I otherwise
(14)
Quadratic Dragonfly Algorithm for Numerical Optimization …
159
The quadratic dragonfly algorithm preserves the equilibrium of exploitation and exploration and is not obvious to trap into the local optima, in addition to this, it performs better in optimising complex problems while avoiding the overflowing of the search area.
4 Results and Discussions A numerous standard benchmark functions are used to measure the accomplishment of the QDA. The results are then compared to other existing algorithms. The results for 19 benchmark functions are taken from the work [11]; however, CEC-C06 Test [3] was also conducted. Its worth mentioning that the results for dragonfly algorithm (DA), whale optimisation algorithm (WOA), and salp swarm algorithm (SSA) is taken from work [11, 14]. In Tables 2 and 3, average (AV) and standard deviation (SD) compares the overall performance of the algorithms.
4.1 Classical Benchmark Function For the comparative analysis of QDA, three set of test functions, namely uni-modal, multi-modal, and composite are adopted as given in Table 1. The uni-modal functions can benchmark the convergence and exploitation of any algorithm as they have single optimum. However, the multi-modal functions have one global optima and rest local optima, so they can benchmark the exploitation and local optima avoidance of any algorithm. The last set of benchmark functions and composite functions are having shifted, rotated, combined, and biased forms of uni-modal and multi-modal functions [15, 16]. These functions mimic the real-world problems by providing immense set of local optima and search spaces which can benchmark both exploration and exploitation rate of any algorithm. Each algorithm in Table 2 has been scrutinised for 30 runs, operating 30 search agents with dimension up to 10, in particular test, the algorithms were allowed to search for the best optimum solution in 500 iterations. Hence, the results of quadratic dragonfly algorithm (QDA), dragonfly algorithm (DA), particle swarm optimisation (PSO), and genetic algorithm (GA) are conferred [14].
4.2 CEC-C06 2019 Standard Benchmark Function For additional evaluation on QDA , 10 modern CEC benchmark functions are used, as shown in Fig. 12. The test set “The 100-Digit Challenge” were improvised by Prof. Suganthan and his colleagues for single objective optimization problem [3]. These are special featured functions which make them harder to find global optima
160
D. Soni et al.
Table 1 Test problems S. No. Test function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Sphere function Schwefel 2.22 Schwefel 1.2 Schwefel 2.21 Rosenbrock Step Quartic Schwefel Rastrigin Ackley Griewank Penalized Penalized 2 Foxholes Kowalik Six Hump Camel Branin GoldStein-Price Hartman 3
Search range
Dimension
AE
f min
[−100 100] [−10 10] [−100 100] [−100 100] [−30 30] [−100 100] [−10 10] [−500 500] [−5.12 5.12] [−32 32] [−600 600] [−50 50] [−50 50] [−65.536 65.536] [−5 5] [−5 5] [−5 0] [10 15] [−2 2] [−0 1]
10 10 10 10 10 10 10 10 10 10 10 10 10 2 4 2 2 2 3
1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1 1.0E−1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
using any algorithm. All functions are scalable and function CEC04-CEC10 are shifted and rotated., as shown in Table 3. QDA is compared with modern optimization techniques like WOA, and SSA. These are swarm optimization algorithms and proven to have an outstanding performance on benchmark functions as well as physicalworld problems. Furthermore, all of these algorithms to chase the search space for 500 iterations operating 30 search agents.
4.3 Quantitative Metrics For in-depth analyses and observation of QDA, two quantitative metrics are shown in Figs. 8 and 9. The benchmark functions are selected from the set of uni-modal (F1– F7), multi-modal (F8–F13), and composite (F14–F19), functions, respectively. These functions are solved using 10 search agents through 150 iterations. Figure 8 shows the search history of dragonfly’s position over the search space. The swarm quickly explores the search pace first (exploration) and then gradually head towards optimality (exploitation). It has been observed QDA searches the promising regions of the search space more intensively. The coverage of the search space for composite
Quadratic Dragonfly Algorithm for Numerical Optimization …
161
Table 2 Comparative analysis with QDA Test function
QDA
DA
PSO
GWO
AV
SD
AV
SD
AV
SD
AV
SD
F1
31.0183
32.5779
2.85E−18
7.16E−18
4.2E−18
1.31E−17
748.5972
324.9262
F2
0.6051
0.8907
1.49E−05
3.76E−05
0.003154
0.009811
5.971358
1.533102
F3
68.164
68.1098
1.29E−06
2.1E−06
0.001891
0.003311
1949.003
994.2733
F4
2.2825
2.2048
0.000988
0.00278
0.001748
0.002515
21.16304
2.605406
F5
3733.3
3733.3
16395.28
6.786473
63.45331
80.12726
133307.1
85,007.62
F6
29.5736
24.4079
4.17E−16
1.32E−15
4.36E−17
1.38E−16
563.8889
229.6997
F7
0.01231
0.00622
0.010293
0.004691
0.005973
0.003583
0.166872
0.072571
F8
−2355.34
321.81
−2857.58
383.6466
−7.1E+11
1.2E+12
−3407.25
164.4776
F9
32.6732
14.5834
16.01883
9.479113
10.44724
7.879807
25.51886
6.66936
F10
1.6757
1.7135
0.23103
0.487053
0.280137
0.601817
9.498785
1.271393
F11
0.07852
0.0628
0.193354
0.07350
0.083463
0.035067
7.719959
3.62607
F12
2.1600
1.1794
0.031101
0.098349
8.57E−11
2.71E−10
1858.502
5820.215
F13
1.4619
0.7038
0.002197
0.004633
0.002197
0.004633
68,047.23
87,736.76
F14
0.9980
1.9E−05
103.742
91.2436
150
135.4006
130.0991
21.32037
F15
0.0034
0.0051
193.0171
80.6332
188.1951
157.2834
116.0554
19.19351
F16
−1.0312
0.00041
458.2962
165.3724
263.0948
187.1352
383.9184
36.60532
F17
0.3983
0.000422
596.6629
171.0631
466.5429
180.9493
503.0485
35.79406
F18
3.0042
0.0035
229.9515
184.6095
136.1759
160.0187
118.438
51.00183
F19
−3.8625
0.00023
679.588
199.4014
741.6341
206.7296
544.1018
13.30161
Table 3 CEC-2019 test functions Test function
QDA
DA
WOA
SSA
AV
SD
AV
SD
AV
CEC01
398 × 108
680 × 108
543 × 108
669 × 108
4.2E − 18 411 × 108
SD
AV
SD
605 × 107
475 × 107
CEC02
105.0264
105.7088
78.0368
87.7888
17.3495
0.0045
18.3434
0.0005
CEC03
11.7025
0.0002
13.7026
0.0007
13.7024
0.0
13.7025
0.0003
CEC04
243.3538
119.9589
344.356
414.098
394.6754
248.5627
41.6936
22.2191
CEC05
0.9037
0.1139
2.5572
0.3245
2.7342
0.2917
2.2084
0.1064
CEC06
2.841
0.5481
9.8955
1.6404
10.7085
1.0325
6.0798
1.4873
CEC07
787.0029
237.1081
578.9531
329.3983
490.6843
194.8318
410.3964
290.5562
CEC08
1.9407
0.2177
6.8734
0.5015
6.909
0.4269
6.37[17]23 0.5862
CEC09
6.8531
3.6929
6.0467
2.871
5.9371
1.6566
3.6704
0.2362
CEC10
19.4859
0.1169
2.7342
0.1715
21.2761
0.1111
21.04
0.078
162
D. Soni et al.
Fig. 8 Search history of the QDA on uni-modal (F2), multi-modal (F10), and composite (F17) benchmark functions
function F17 appears to be high and precise that shows the effectiveness of the QDA in searching the optimal solution. Figures 9 and 10 represent the convergence of nearby global optimum through the course of iterations for QDA versus DA algorithm, respectively. Overall, the metrics proved that the QDA algorithm shows significant improvement in the exploration and exploitation rate, improving the overall performance, and converging towards global optimality by avoiding local optima (Fig. 11).
4.4 Travelling Salesman Problem Using QDA Travelling salesman problem is a well-known NP hard problem aims to detect a minimum cost Hamiltonian Cycle in any network. As the problem is NP hard, means there is no exact algorithm to reach out its solution in polynomial time. The nominal expected time to obtain its optimal solution is exponential. So, it becomes essential to puzzle out complex TSP problems applying metaheuristic algorithms.
Quadratic Dragonfly Algorithm for Numerical Optimization … Fig. 9 QDA convergence curve on uni-modal (F2), multi-modal (F10), and composite (F17) functions
163
164 Fig. 10 QDA versus DA convergence curve on uni-modal (F2), multi-modal (F10), and composite (F17) benchmark functions
D. Soni et al.
Quadratic Dragonfly Algorithm for Numerical Optimization …
165
Fig. 11 CEC-19 benchmark problems Table 4 Experimentation and results Instance Algo Ulysses22
Bayg29
Eil51
Eil101
ACO PSO GA BH DA QDA ACO PSO GA BH DA QDA ACO PSO GA BH DA QDA ACO PSO GA BH DA QDA
Best cost
Avg. time
75.49 76.22 75.99 75.68 77.83 75.69 9882.22 9947.03 9771.95 9375.44 9547.75 9364.52 461.02 574.80 453.48 458.93 475.16 456.09 763.92 1499.99 838.83 897.38 898.52 842.74
84 62 63 52 21 19.7 100 75 56 46 22 23 59 57 60 44 23 22 90 62 55 42 36 39
166
D. Soni et al.
Fig. 12 a Coordinates of 14 cities, b best tour for 14 cities
Predominantly, TSP is to determine the shorten feasible route that travels each city exactly once and returns to departure point. Some of the real-world TSP applications are computer wiring, vehicle routing, printed circuit panel, x-ray Crystallography, and order selecting problem in warehouse [17].
4.4.1
Experiments and Results
Travelling salesman problem can be symbolised as graph G = (N , E ), where N resembles the set of n number of cities and E resembles the number of pathways connecting all the cities. Each edge e = (i, j) between every city represents the cost di j, which is the distance between the two respective cities. The euclidean distance formula is used to calculate the cost di j:
Quadratic Dragonfly Algorithm for Numerical Optimization …
di j =
(u i − u j )2 + (vi − v j )2
167
(15)
The accomplishment of the QDA has been gauged using various contracted TSP instances, drawn from TSPLIB library [18], ranging up to 101 cities. The data set size varies from 14 to 101 cities. The scrutinised results were compared against existing approaches in literature that are: black hole algorithm (BH) [19], genetic algorithm (GA) [20], ant colony optimisation (ACO) [21], particle swarm optimisation (PSO) [21], and dragonfly algorithm (DA) [22], as given in Table 4. The TSP benchmark functions were run 21 times on population size 50 for 10,000 fixed number of iterations. In Fig. 12, quadratic dragonfly algorithm successfully achieved the best solution of one of the instances of TSPLIB [18].
5 Conclusion This paper proposed the modified version of dragonfly algorithm, namely quadraticbased dragonfly algorithm (QDA). QDA ameliorate the exploration and exploitation capabilities and overthrow the problem of trapping into local optimum. In the proposed strategy, the swarming coefficients of dragonfly algorithm were updated in nonlinear fashion. The tuning between exploration and exploitation factors resulted in improve convergence ability of the dragonflies towards optimal solutions. Overall, 19 Benchmark functions and 10 modern CEC-C06 test were used to scrutinise the performance of the QDA algorithm. Furthermore, QDA results compared to widely known swarm-based techniques (PSO, DA, and GA) and three naive techniques (DA, WOA, and SSA), QDA out-striped the contesting optimisation techniques in maximum instances and induced comparative results. QDA was also applied to one of the popular combinatorial problem TSP. QDA has been tested on TSPLIB [18] instances and revealed as the improvised method for solving travelling salesman problem than traditional DA algorithm. Experimental data stipulates that QDA has better prominent performance compared with traditional DA and other optimization techniques.
168
D. Soni et al.
6 Future Work Simulation of natural behaviour for resolving the numerous optimisation problems has been a inspiring and fascinating zone for many researchers. The research work focuses on a variant of DA, namely quadratic dragonfly algorithm. There are several directions in which this research work could be embellished. The QDA algorithm can be hybridised with other nature inspired algorithms and metaheuristics. QDA can be utilised to solve various practical problems and the competitive results can be scrutinised with existing swarm intelligence-based algorithms.
References 1. Ací CI, Gulcan H (2019) A modified dragonfly optimization algorithm for single- and multiobjective problems using Brownian motion 2. Hammouri AI, Mafarja M, Al-Betar MA et al (2020) An improved dragonfly algorithm for feature selection. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2020.106131 3. Price KV, Awad NH, Ali MZ, Suganthan PN (2018) The 100-digit challenge: problem definitions and evaluation criteria. Electron. Eng., Nanyang Technol. Univ., Singapore, Tech. Rep, School Elect 4. Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems. Oxford University Press, Oxford 5. Yang XS, Deb S (2014) Cuckoo search: recent advances and applications. Neural Comput Appl 24(1):169–174 6. Yang XS (2010) Firefly algorithm stochastic test functions and design optimisation. Int J BioInspired Comput 2(2):78–84 7. Jadon SS, Tiwari R, Sharma H, Bansal JC (2017) Hybrid artificial bee colony algorithm with differential evolution. Appl Soft Comput 58 8. Khandelwal A, Bhargava A, Sharma A, Sharma H (2017) Modified grey wolf optimization algorithm for transmission network expansion planning problem. Arab J Sci Eng, 1–10 9. Sharma N, Sharma H, Sharma A (2018) Bee froth artificial bee colony algorithm for job shop scheduling problem. Appl Soft Comput 10. Sharma N, Sharma H, Sharma A, Chand Bansal J (2018) Grasshopper inspired artificial bee colony algorithm for numerical optimization. J Exp Theoret Artif Intell 11. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. http://www.alimirjalili.com/Projects. html 12. Xu L, Jia H, Lang C, Peng X, Sun K (2019) Novel method for multilevel color image segmentation based on dragonfly algorithm and differential evolution. A IEEE Access 1. https://doi. org/10.1109/access.2019.2896673 13. Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: Proceedings world congress on nature and biologically inspired computing (NaBIC). IEEE, pp 210–214 14. Abdullah JM, Ahmed Rashid T (2019) Fitness dependent optimizer: inspired by the Bee Swarming reproductive process. IEEE. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=& arnumber=8672851&isnumber=6514899 15. Liang J, Suganthan P, Deb K (2005) Novel composition test functions for numerical global optimization.In: Swarm intelligence symposium, 2005. SIS 2005. Proceedings 2005 IEEE, pp 68–75 16. Suganthan PN, Hansen N, Liang JJ, Deb K, Chen Y, Auger A et al (2005) Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization
Quadratic Dragonfly Algorithm for Numerical Optimization …
169
17. Saji Y, Riffi ME (2016) A novel discrete bat algorithm for solving the travelling salesman problem. Neural Comput Appl 27(7):1853–1866 18. http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ 19. Hatamlou A (2017) Solving travelling salesman problem using black hole algorithm. Soft Comput, pp 1–9 20. Pullan W, Adapting the genetic algorithm to the travelling salesman problem, pp 1029–1035 21. Verma OP, Jain R, Chhabra V (2014) Solution of travelling salesman problem using bacterial foraging optimisation algorithm. Int J Swarm Intell 1(2):179–192 22. Hammouri AI, Samra ETA, Al-Betar MA, Khalil RM, Alasmer Z, Kanan M (2018) A dragonfly algorithm for solving traveling salesman. In: 2018 8th IEEE international conference on control system, computing and engineering (ICCSCE). https://doi.org/10.1109/iccsce.2018.8684963
A Survey on Defect Detection of Vials C. R. Vishwanatha, V. Asha, Sneha More, C. Divya, K. Keerthi, and S. P. Rohaan
Abstract This paper presents a survey on pharmaceutical vial inspection. Various methods, techniques, and algorithms on vial inspection have been implemented over the years to ensure the quality of final product before it is released for public use. There is an abundance of research knowledge available for inspection of vial. Vial inspection normally consists of steps like image acquisition, pre-possessing, converting it into grayscale image, image segmentation, and feature extraction of image, edge detection, comparing the image with a standard image, and making the decision. Some of the commonly employed techniques include support vector machine, wavelet transform, phase difference map, Pearson fuzzy C-means clustering, PCNN, artificial neural networks, CNN, contour extraction, location segmentation, dynamic threshold, LBP descriptors, SVM, MF estimator, decision tree, advanced non-destructive testing (NDT), pulse-coupled neural networks (PCNNs), deep convolution neural network, FCOS, and many more to list. Some of these methods have also employed machine learning algorithms as well. Each of these techniques give different levels of accuracy and precision. Furthermore, the results or parameters achieved of these algorithms are summarized. The paper helps a lot for upcoming researchers to learn about the techniques used for vial inspection in the recent times. Keywords Image acquisition · Grayscale · Edge detection · Vial or Glass defects · Thresholding · ROI
C. R. Vishwanatha · V. Asha (B) · S. More · C. Divya · K. Keerthi · S. P. Rohaan Department of MCA, New Horizon College of Engineering, Bengaluru 5600103, India e-mail: [email protected]; [email protected] C. R. Vishwanatha e-mail: [email protected] Department of MCA, New Horizon College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_13
171
172
C. R. Vishwanatha et al.
1 Introduction For any enterprise, quality of the product it manufactures is vital. Particularly for pharmaceutical company, whose products are closely related to people health, it is important that the product they produce should be defect free. Poor or defective products not only make company liable but also cause various health problems to the people who consume them. Hence, testing the quality of product is an important part of production process. Several types of defects arise during the production of pharmaceutical vials, and it is unavoidable considering various assembly-related factors. This in turn affects product quality, thereby reducing efficiency of the product. Even a minor defect on the surface of the vial may cause a chemical reaction and contaminate the chemical composition of filled medicine. When the defect is significant, the chemical reaction may drastically happen due to environment exposure of the fluid and may remain unchecked in many cases. This in turn becomes a life threatening to already suffering patient to whom the medicine is injected. Pharmaceutical drug vials need to be inspected in order to ensure that the vial meets predetermined quality specifications and does not harm the patient. The vials may have defects like cracks, scratches, dents, black spots, wrinkles, bubbles, and other defects on the surface due to various reasons. The defects do arise at the base and top portion of the vial. There can be various factors for defects. They may be in the form of factors related to environment (such as dust, fibers, and personnel), factors related to manufacturing process (like filters and metal), factors related to various formulations (precipitates, undissolved material, agglomerates, incompatibility issues in drug/excipient) or factors related to components packaging (like silica, rubber closures, plastic, latex, polymers, glass vials, silicone), etc. [1]. The damage may happen during the process of manufacturing, or during the process of transportation, or due to any other reasons as well. Such defects imply distinctly the fact that vials are not manufactured perfectly and have environmental exposure or risky situation of leakage which should be taken out before putting them into usage. The delamination of glass is one of the main reasons in the recent times for product recall. This was caused due to glass lamellae formation over the shelf life [2]. Based upon the quantity of such particles, its size or shape, composition of particle, patient count, and route of administration, the presence of such matter can result in severe conditions like anaphylactic shock, phlebitis, vein irritation, pulmonary emboli, pulmonary dysfunction, immune system dysfunction, pulmonary granulomas, infarction, and even patient death. The research is done in order to avoid manual inspection which requires enormous human effort resulting in lot of mental stress, eye fatigue, and susceptible human error. In general, automatic inspection system not only substitutes human inspection but also enhances capabilities by providing accurate outputs.
A Survey on Defect Detection of Vials
173
2 Literature Review Liu et al. [3] have proposed an intelligent inspection for the glass bottles using machine vision. Watershed transform method is employed to separate the defective area, and the bottle wall features are extracted. Then from the images, features of bottle finish are extracted. For this process, the wavelet transform methods are used. Then as classifier, fuzzy SVM ensemble is used. In order to have better classification ability, ensemble method proposed using GA is adopted. The conducted tests show that the proposed method can reach rate of accuracy above 97.5%. Adamo et al. [4] have a procedure which uses registration procedure for images. The method constructs a larger frame of image using multiple image frames taken from the same scene. Image recording is emphasized for a single image of the entire pane of glass, making it possible to combine different images captured by different cameras. The proposed method works like this, first, a reference image is obtained, then the use of morphological operations determines its characteristics like connected components labeling and filling of areas. An effective procedure has also been introduced for compensating the changes in lighting system intensity profiles and was used in the test systems. This method is used in a lot of cases where scattered surface targets are illuminated by non-uniform light sources. The feature image and phase map are combined to effectively identify glass defects in the methods used by Jin et al. [5]. In this method, the feature image and phase map of glass defect are combined to test whether the problem of the projection grating method is solved. Here, the phase difference is obtained as a function of the defectfree and defect-free edge images, while the feature image of the defects containing the defects is also obtained by the 1D Fourier transform. The segmentation of defect region is carried out by integrating mathematical grayscale morphology along with threshold segmentation. The size and location of the defect are calculated then by the boundary coordinates of the connection region. Grayscale-based second iterative segmentation method is applied to compute low and high thresholds. Then, a defect image with cubic value is obtained. For defect type recognition, conjugated gradient neural network (CGNN) is designed, and the recognition accuracy lands on 86%. The proposed method can identify errors reliably which is evident from the typical errors obtained. George et al. [6] have proposed the idea of using fuzzy C-means clustering for detecting glass bottle defects. In this technique, it is shown that in an image sequence, only defects remain stable unlike the noise which appears in an image randomly. It is because of the fact that the defects maintain their places, while the image is being moved on the conveyer. Fuzzy C-means clustering is most commonly applied in the manufacturing industries of glass. It is a powerful method for unsupervised model building and data analysis. Various filters like Wiener filter, Gaussian filter, high pass filter, median filter, low pass filter have been used here. After undergoing filtering process, high PSNR value is given by the Wiener filter. A better value for universal quality index for image is also given by the method. The actual image and the noisy image are considered for showing the correlation between them. This is shown by the Pearson’s correlation coefficient.
174
C. R. Vishwanatha et al.
Zhuo et al. [7] have developed and proposed a method for detecting glass fragments present in liquid injection. As glass chips are heavier, they cannot move smoothly with liquids, and they are different from other foreign particles. By shaking and vibrating the container that the chip glass moves through the bottle front wall, they get the image sequences and solve the problem. The detection of the glasses moving can be done using optical flow algorithm. Glass fragments can be accurately detected. Daminelli et al. [8] have proposed to develop automated procedures for the inspection of glass products. Here, they have proposed an intelligent vision system. This intelligent vision system checks two main glass product defect types automatically: firstly, to detect serious defects in glass cups of household use and food packaging, called shards of glass, and the second is to identify warping in panels defect for home use. The tested results show that the application developed is good for detecting the defects studied. For both applications, the rate of success is greater than 95%. Ge et al. [9] have described a real-time visual-based programmed brilliant assessment framework for outside particles in infusion fluids. Obstacles to location because of unevenness such as decorated symbols, scratches, and bottle surface graduations are expelled using image manipulation sequence. PCNN parameters are adaptively achieved by water region zone in the histogram. The obtained segmentation is compelling. It is able to impeccably segment pictures indeed when there is a considerable cover within the escalated ranges of adjoining areas. The proposed machine finds the injection qualified or not agreeing to the smoothness and continuity traces of properties obtained from the moving object. By continues experiments, it is demonstrated that 99.1% of times, the objects are detected correctly which is far better than employing workers for the same task. Öztürk et al. [10] have explained that bias feed CNN is a new CNN structure. The structure of CNN’s bias feed in brief is as follows; a sample is been added to the input bias which causes different effect in each pixel that are more useful than single value used in the traditional CNN. By updating bias sample, many image manipulating applications, such as edge detection, noise removal, line detection, and brightness correction, can be accomplished. A prototype is made for this, and surface testing of the glass, a challenge for computer vision applications, was performed. The results obtained were quite satisfactory, with 98% rate of accuracy, 99% specificity, and 91% sensitivity. This method is very fast, and the average execution time of each test is 1 s. Due to the protection of the usual CNN structure, the method can be integrated into real-time systems. Xie et al. [11] have studied the quality control method of PET bottle packaging on machine vision. The detection of packaging quality of PET bottles is done by localization matching, detection of contrast, and extraction of the edge along with comparative testing. This method quickly and accurately detects packaging quality of a bottle. The method is implemented in a high-speed production line for conducting experiments. The results are quite satisfactory where false detection rate is below 0.35% at a speed of 67 ms per frame even though there rises one problem in the system is that the shaking of bottle causes the judgment error rate.
A Survey on Defect Detection of Vials
175
Liang et al. [12] have proposed a model, detection algorithm for the machine vision-based tube bottle defect. It is designed and implemented as part of the machine vision system in general. The method mainly researches on the algorithm for defect detection for tube-type bottle and proposes the working details about the same. By using the standard tubular bottle image and the associated information about the parameter, the algorithm for image processing performs a series of manipulation on the collected tube bottle image. Finally, the information of the parameter is compared between the tube-type bottle model and the one taken for detection. By this process, the objects that are defective are found along with types of defects. The location information is also obtained. Xu et al. [13] mainly proposed a machine vision assessing framework for online flaw location of PET caps. An image manipulating method was put forward to identifying the flaws like foreign objects on the surface and PET cap ring interface broken flaw. In order to recognize the flaws on the surface due to foreign objects, mean filtering method is used to gage the background. It then used dynamic threshold to compare the reviewed picture with the background picture. As a result, the broken area was found out. Due to the crack in the ring, the distribution of grayscale in the RGB scale is boosted by HSV scale transform. Then, the elements of HSV pictures are compared, in the thus obtained saturation picture the gap between the cap and the ring will be clearly identified with the help of segmentation. The tests conducted in this method show that it is able to reach a speed of 34,000 bottles in an hour with a precision rate near to 97%. Saad et al. [14] have proposed a computer vision-based automated framework for discovery of flaws in the shape for assessment of quality. As a testing product, the proposed framework utilized a soft drink bottle. The scrutiny system includes collection of information and pre-processing of image. Then, the morphological operation is applied followed by feature extraction of image and classification of flaws. Morphological method employed erosion process along with dilation is adopted in order to carry out segmentation of the picture. The feature set thus obtained determines the flaws present in the structure of the bottle. Then, Naïve Bayes classifier will be used on the obtained data in order to segregate the product either as pass or fail. Out of 100 testing samples, the method obtained accuracy rate of 100%. Prabuwono et al. [15] have proposed IVIS system. Testing on the developed IVIS results shows that the accuracy of acquisition of image for a moving bottle is around 94.26%. The method used edge detection technique along with neural network system for determining the defective bottles. In the developed system, the distance between the two bottles was 17 cm, and the speed was 106 rotation per minute maximum. Liu et al. [16] have proposed the method that definitively addresses pharmaceutical vial inspection issues in the packaging process. The method focuses first on finding region of interest (ROI). K-means clustering is then used for creating image dictionaries. The process then uses SVM classifier to find defective product. The TSEG methodology used here is better in terms of speed and accuracy as compared to MSBRW and PCRG methods. Among feature extraction details, the proposed LBP procedure offers better results compared to other procedures like SIFT and SURF. In this method, due to the choosing of SVM, accuracy of recognition and
176
C. R. Vishwanatha et al.
work efficiency has improved greatly even when there are only few testing products. LBP has obtained a 90% performance efficiency in this experiment. Kumchoo et al. [17] have proposed a bottle cap defect analyzing framework. The method checks for the flaws in the bottle cap to determine whether the cap has loosened or whether there is gap between safety ring and the cap. The cap of the bottle has two regions. One is cap thread region, and the other is the gap between the cap thread and the safety ring. Both regions are transformed from RGB scale to HSV scale. The groove features are then obtained using adjustment in the threshold for both cap thread region and link ring region. The flaws like fracture crack and broken link are determined. The test results demonstrate that the flaws in the safety cap ring have 84.62% of sensitivity, 100% specificity, and loosen caps have 87.88% of sensitivity, 83.87% specificity. Zhang et al. [18] have proposed a system for inspection of particles in the liquid vial or injection. The method has adopted cameras at various stages for image capture which is implemented using a mechanical structure. A prototype model for the same is introduced. LED light source is used for inspection process. For each vial, a total of eight pictures are captured sequentially. Any error during the process is mitigated using high-speed image acquisition and adjustment of positioning methods. In order to obtain precise position and segmentation for tracking, IFCNN is utilized. Here, an adaptive tracking system is introduced. This system is based upon a sparse model and is utilized for making a trajectory for moving target, by using which presence of foreign particle is determined. The genuine foreign particles are determined by combining the movement and tracking system from the trajectory. Thus, the system produces high accuracy in determining foreign particles presence which is evident from the tests conducted. The proposed system can be effectively used in monitoring pharmaceutical manufacturing process, classification of particles and in packaging of drugs. Liu et al. [19] have proposed a real-time device and developed two detection methods to detect flaws on the cap and oral liquid vial body. First, in order to capture the image of the vial cap and the vial body, a mechanical structure is designed. Next, horizontal intercept projection method is introduced to detect flaws on the cap. Multi-feature extraction and black top-hat transform are introduced for finding flaws on the body of the vial. Then on the developed system, the experiments were performed. The rate of detection of flaws is 98.2% and 99.8% on the vial cap and body, respectively. The time of execution for vial cap and vial body is 35 and 58 ms, respectively. This shows that the production line can perform defect detection at high speed and with high precision. Fu et al. [20] discussed a machine vision-based image acquisition model, a preprocessing model, a model for flaw detection and locating flaws using a positioning model. The bottle quality is found in the target by finding the defecting area. Then, the degree of defect is corrected using the centroid of the domain connected along with length and width of the region being inspected. The rate of defect findings is 100%, 91.6%, and 94.4%, respectively, for the types of samples selected which includes vials with crack, missing edges in vials and dirty vials. Range of the defect and centroid of the defect are compared with actual position of defect to determine flaws.
A Survey on Defect Detection of Vials
177
The backlight is used for acquiring the bottle image which is then pre-processed to determine the quality of the glass bottle, and then, defect range is identified. Median filtering technique for image followed by enhancement of image followed by detection of edge is used during pre-processing. After testing, it is evident that this procedure can be used widely for flaws detection of vials, and this method has 91.6% as the least rate of recognition. Zhou et al. [21] present a real-time machine vision device for inspection of bottom of the bottle. The method employed template matching and saliency detection algorithms. Here, more focus is on analysis of images. First, bottom of the bottle is located by combining detection of Hough circle with the prior size. Then, the ROI is divided into three parts: center panel area, annular panel area, and annular texture area. The proposed saliency detection method is then used for detecting flaws region within the central panel area. For finding the defects in the annular panel area, a multiscale filtering technique is used. For the annular texture area, finding the defects is carried out by combining template matching with multiscale filtering. Finally, to distinguish the quality of bottom of the bottle taken for testing, the results of the flaw detection of all the three measurement areas are combined. The results thus obtained show that most of the smaller size and low contrast defects are inspected accurately. The algorithms proposed here can reach the high F-measure, accuracy, and recall with a highest level of accuracy and precision as compared to other methods for the three measurement areas. The precision of TM, MMF, and RGES is 88.83%, 75.95%, and 41.03%, respectively. The precision is increased by 7.88% and 30.12%, respectively, for the first two methods, whereas for the last method, the precision is little low compared to that of ATS. But obviously, the recall, F-measure, and accuracy of RGES are better compared to those of the other existing methods. The methods proposed here are robust to fluctuations of pixel value. Still, the methods employed here are not fully perfect as still there are few defects that cannot be detected perfectly, especially the defects having small size in texture area. Song et al. [22] have come out with a solution for the issue of low efficiency and slow speed of defect detection with an improved FCOS (fully convolutional one-stage object detection) algorithm. Compared to original FCOS method, this algorithm has improved the effect of detection of defect by 6.9%. It is evident by the way of experimental analysis conducted. Koodtalang et al. [23] have proposed a method based on deep learning and image processing. Here, the bottom of the bottle is inspected. The method uses high pass filter and median filter first to take out the noises. For the bottom area, region of interest (ROI) is obtained using Hough circle transform. Unnecessary areas are masked in the obtained image after cropping it into a square shape. Then, a pretrained predictive model takes the masked and resized image as input to find the defective ones. A deep convolution neural network (CNN) is made as a predictor. The CNN is made of three convolutional layers and two fully connected layers. This method utilized Python program for implementation using OpenCV and Keras. The results obtained by experiments give the accuracies for bottom region and defect detection as 99.00% and 98.50%, respectively. 48 ms is the time taken by the method for classifying the defective bottle, whereas for locating bottom of the bottle, the
178
C. R. Vishwanatha et al.
method takes 22 ms. The method proposed could able to get high accuracy level. Hence, the method is appropriate for real-time applications. Kulkarni et al. [24] have proposed an automated system. In this system, the bottle cap surrenders can be distinguished. The products with the flaw will be rejected by the framework. The computer vision-based methods are utilized here. The system comprises of four strategies, which are used for bottle cap deformity location: pattern recognition, clustering, object detection, and line detection. They moreover present a comprehensive examination. Then, a comparison of all the four methods on various parameters is also carried out. The framework has a broad social value with good efficiency, improving the quality of assessment and productivity.
3 Proposed Method Light emitting diode (LED) technology is fully utilized in the modern times to effectively detect vial defects with the help of automated inspection systems. The prototype model proposed by us gives the setup required for capturing the image of the vial with a proper lighting source [2]. The model contains an infeed conveyer through which the vials move into the inspection system and are subjected to various levels of inspections or tests before they are segregated into good or bad collection trays. The hardware assembly makes sure that only the defective vials are rejected. The steps indicating solution to the vial inspection is shown in Fig. 1. The defect detection requires the identification of defective region (ROI). Then, thresholding is used to convert the image to binary form. It is a way of segmentation. It is done mainly to bifurcate foreground pixels from background pixels in order to manipulate the image successfully. In this method, the threshold version of g(a, b) is given by f (a, b) for the given threshold value. 255 if g(a, b) ≥ given threshold value f (a, b) = 0 0 Otherwise Once the binary image is obtained, we can use improvised version of conjugate gradient neural network (CGNN) method to identify flaws on the vial. Even the Canny edge detector can be used as it yields better results. We are working with clustering techniques (such as agglomerative or K-means clustering) also to improvise the results. This will be demonstrated in our next paper.
Fig. 1 Steps for vial inspection
A Survey on Defect Detection of Vials
179
4 Conclusion Various techniques and methodologies have been proposed and implemented in order to find the defects on the glass surface. In the recent years, for extracting image feature various techniques like linear discriminant analysis, robust features and gradient location orientation histogram are being used. The LBP operator which is nonparametric in nature is used first by Harwood et al. This technique is used then for local texture extraction of image by Ojala et al. [25]. V Asha et al. have proposed few models and methods for finding texture defects which can be put into use on glass surface as well [26–34]. This paper has presented basic understanding about inspection of vials and related type of materials based on machine vision. The survey precisely lists various techniques such as support vector machine, wavelet transform, phase difference map, Pearson fuzzy C-means clustering, PCNN, artificial neural networks, CNN, contour extraction, location segmentation, dynamic threshold, LBP descriptors, SVM, MF estimator, decision tree, advanced non-destructive testing (NDT), pulse-coupled neural networks (PCNN), deep convolution neural network, FCOS, and many more such techniques. Most of these methods first identify region of interest (ROI) and then apply the techniques or algorithm to find the defects. Many methods have employed common practice of obtaining image through industrial camera, then preprocessing the image followed by image quality enhancement, then segmentation, edge detection, binarization, thresholding, and clustering as per the need. Out of these methods, modified PCNN method, CNN method, Bayesian classification algorithm, Naïve Bayes classifier, and image median filtering have given better results. Their accuracy level has almost reached 100%. Few methods chosen here are for finding defects on the glass surface rather than vial surface. But they do provide information on methodologies that can be tried for vial surface also as it is made up of glass material. In most of these methods, the lighting source plays an important role while capturing the image of the product. Industrial cameras are used as they capture images with better quality and when they are on the move without blurriness. Different researchers have come up with their own ideas, techniques, and algorithms for vial inspection. Each of these methods have different accuracy levels as summarized in the Table 1. The survey clearly justifies the use of technology for inspecting the vials, thereby reducing human intervention manual inspection methodology and to achieve up to 100% accuracy levels.
180
C. R. Vishwanatha et al.
Table 1 Comparison of various techniques used S. No. Authors
Year
Techniques used
Accuracy or parameters achieved
Limitations of methodology used
1
Prabuwono et al. [15]
2006 • Image pre-processing • Image enhancement • Edge detection • Neural network
Average accuracy • No rate for acquisition implementation of image is details given for 94.26% at a speed edge detection of 106 rpm module, edge analyzing module, and neural network module • Accuracy obtained is very low • Inspects only cap portion
2
Liu et al. [3]
2008 • Watershed transform • Wavelet transform • Fuzzy support vector machine ensemble • Ensemble methods
Rate of accuracy may reach above 97.5%
• Top and base part of bottle is not inspected for defects • Accuracy level is low
3
Ge et al. [9]
2009 • Adaptive segmentation • Modified PCNN • Multi-threshold • Feature extraction
The rate of accuracy is about 99.1%
• The number of bottles taken for experiments is less
4
Adamo et al. [4] 2010 • Otzu’s thresholding • Canny edge detector • Morphological operations • Feature detection • Mosaicing
This setup is used • The data set for detecting taken for anomalies of about experiment is too 0.52 mm less • The accuracy is low compared to other methods
5
George et al. [6] 2013 • Image filtering • Fuzzy C-means clustering
PSNR value is high for Wiener filter than other filters, and better universal image quality index is obtained
• No implementation details and experimental results are not recorded • Only theoretical demonstration (continued)
A Survey on Defect Detection of Vials
181
Table 1 (continued) S. No. Authors
Year
Techniques used
Accuracy or parameters achieved
Limitations of methodology used
6
Jin et al. [5]
2014 • Phase difference map • Conjugate gradient neural network • Defect region segmentation
The rate of • Rate of accuracy accuracy is around is very low • Detects flaws 86% only on the surface part • Only four types of defects like stone, knot, bubble, and scratch are detected
7
Wang et al. [7]
2015 • Image binarization using Otsu method • Feature extraction • Optical flow method
The adopted optical flow method is able to find flaws on the glass successfully
• No implementation details with any data sets are shown, and experimental results are not recorded • Only theoretical demonstration
8
David et al. [8]
2015 • • • • •
Thresholding The average rage Segmentation of accuracy is Binarization around 95.7% Edge detection Hough transform
• Lower accuracy level • Lower data set
9
ÖZTÜRK and Akdemir [10]
2016 • Bias feed CNN • Bias template
The rate of accuracy is about 98% with 91% sensitivity and 99% specificity
• No. of products chosen for experiments is very less • Does not mention the various types of glass defects that can be recognized by the system developed
10
Wei et al. [11]
2017 • Bayesian classification • Matching localization • Edge extraction
Up to 99.65% accuracy rate is obtained
• Focuses only on the cap part of the bottle and finds defects only related to covering of the bottle (continued)
182
C. R. Vishwanatha et al.
Table 1 (continued) S. No. Authors
Year
Techniques used
Accuracy or parameters achieved
Limitations of methodology used
11
Liang et al. [12] 2017 • Image pre-processing • Location segmentation • Contour extraction
Compares the • Do not mention parameter info about the top and between the bottom portion detection tube of the bottle bottle and template • Does not provide implementation tube bottle details along with any real-time test case results
12
Xu et al. [13]
2017 • Mean filtering • Segmentation • Dynamic threshold
The rate of • Limited to only accuracy is near to cap portion of 100% for normal the bottle caps with link ring • The accuracy level for broken for foreign parameters like caps, it is 92%, for foreign caps and completely broken partly broken ring, it is 100%, ring is low and for partly broken ring, it is 97%
13
Saad et al. [14]
2017 • Image pre-processing • Segmentation • Feature extraction • Naïve Bayes classifier
The rate of accuracy is 100%
• Targets only shape feature defect of the bottle such as area, perimeter, major axis length, and extent • Does not provide details about whether defective bottles were used or not while experimenting
14
Liu et al. [16]
2017 • LBP feature extraction • LBP descriptors • K-means clustering • SVM classifier
The accuracy rate is about 90%
• Only lid portion flaws are detected • Lower accuracy level (continued)
A Survey on Defect Detection of Vials
183
Table 1 (continued) S. No. Authors
Year
Techniques used
Accuracy or parameters achieved
Limitations of methodology used
15
Kumchoo and 2018 • HSV color space The flaws in the • Does not cover Chiracharit [17] • Feature safety cap ring surface and extraction have 84.62% of bottom of the sensitivity, 100% vial • The average specificity, and detection value is loosen caps have falls below 100% 87.88% of sensitivity, 83.87% specificity
16
Zhang et al. [18] 2018 • Fast acceleration The accuracy rate split test is about 97.25% • BRIEF descriptor • IFCNN • Tracking target and classification
• Only filled vial is inspected • Does not detect any foreign object attached to the base of the vial • Lower accuracy value
17
Liu et al. [19]
2018 • Horizontal intercept projection • Black top-hat transform • Multi-features extraction
The accuracy rate is more than 98%
• Flaws at the base of the vial not considered • The execution time is little higher and not uniform for cap and the surface of the vial
18
Fu et al. [20]
2019 • Median filtering • Threshold • Canny edge detection
Has obtained recognition rate of 100% highest, and lowest recognition rate is 91.6%
• Only cracks on the vial surface get 100% accuracy • Lower accuracy level for flaws like missing edge, dirty bottle, and tottle bottle • Number of vials used in experiment is less (continued)
184
C. R. Vishwanatha et al.
Table 1 (continued) S. No. Authors
Year
19
Zhou et al. [21]
20
Techniques used
Accuracy or parameters achieved
Limitations of methodology used
2019 • Saliency detection • Multiscale mean filtering • Template matching
The precision of MMF, TM, and RGES is 75.95%, 88.83%, and 41.03%, respectively
• Lower precision level • Various flaws like cracks, dents, chipping, blackspot, etc., are not detected
Koodtalang et al. [23]
2019 • Median and high pass filtering • Hough circle transform • Deep convolution neural network (CNN)
The rate of accuracies for bottom region and defect detection is 99.00% and 98.50%, respectively
• The computation time is on little higher side due to absence of GPU • No real-time implementation
21
Song et al. [22]
2020 • Improved FCOS object detection
The detection of • The obtained defect is improved precision level is by 6.9% compared low • Flaws at the base to original FCOS part of the bottle method is not considered
22
Kulkarni et al. [24]
2021 • Pattern recognition • Clustering
Works better compared to other methodologies. The accuracy is around 94.12% (determined from the confusion matrix given)
• Only flaws on the cap part are taken care • Obtained accuracy level is very low
References 1. Langille S (2006) Particulate matter in injectable drug products. PDA J Pharm Sci Technol 67:186–200 2. Vishwanatha CR, Asha V (2020) Prototype model for defect inspection of vials. Int J Psychosoc Rehabil 24(05):6981–6986 3. Liu H, Wang Y, Duan F (2008) Glass bottle inspector based on machine vision. Int J Comput Electr Autom Control Inf Eng 2(8):2682–2687 4. Adamo F, Attivissimo F, Di Nisio A (2010) Calibration of an inspection system for online quality control of satin glass. Francesco IEEE Trans Instrum Measur 59(5):1035–1046 5. Jin Y, Chen Y, Wang Z (2014) A conjugate gradient neural network for inspection of glass defects. In: 11th international conference on fuzzy systems and knowledge discovery, pp 698– 703 6. George J, Janardhana S, Jaya J, Sabareesaan KJ (2013) Automatic defect detection inspectacles and glass bottles based on fuzzy C means clustering. In: International conference on current trends in engineering and technology, ICCTET’13, pp 8–12
A Survey on Defect Detection of Vials
185
7. Wang S, Zhuo Q, Xia J (2015) Detection of glass chips in liquid injection based on computer vision. In: International conference on computational intelligence and communication networks, pp 329–331 8. Cabral JDD, de Araújo SA (2015) An intelligent vision system for detecting defects in glass products. Int J Adv Manuf Technol, 485–494 9. Ge J, Wang YN, Zhou BW, Zhang H (2009) Intelligent foreign particle inspection machine for injection liquid examination based on modified pulse-coupled neural networks. Sensors 9:3386–3404 10. Öztürk S, Akdemør B (2016) Novel biasfeed cellular neural network model for glass defect inspection. CoDIT’16—April 6–8, pp 366–371 11. Xie H, Lu F, Ouyang G, Shang X, Zhao Z (2017) A rapid inspection method for encapsulating quality of PET bottles based on machine vision. In: 3rd IEEE international conference on computer and communications. IEEE, pp 2025–2028 12. Liang X, Dong L, Wu Y (2017) Research on surface defect detection algorithm of tubetype bottle based on machine vision Xiaoyu. In: 10th international conference on intelligent computation technology and automation, pp 114–117 13. Xu M, Ma Y, Chen S (2017) Research on real-time quality inspection of PET bottle caps. In: IEEE international conference on information and automation (ICIA), pp 1023–1026 14. Saad NM, Rahman NNSA, Abdullah AR, Wahab FA (2017) Shape defect detection for product quality inspection and monitoring system. In: Proceedings EECSI 2017, pp 19–21 15. Prabuwono AS, Sulaiman R, Hamdan AR, Hasniaty A (2006) Development of intelligent visual inspection system (IVIS) for bottling machine. IEEE 16. Liu Y, Chen S, Tang T, Zhao M (2017) Defect inspection of medicine vials using LBP features and SVM classifier. In: 2nd international conference on image, vision and computing, pp 41–45 17. Kumchoo W, Chiracharit W (2018) Detection of loose cap and safety ring for pharmaceutical glass bottles. In: 15th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-NCON2018), pp 125–130 18. Zhang H, Li X, Zhong H, Yang Y, Wu QJ, Ge J, Wang Y (2018) Automated machine vision system for liquid particle inspection of pharmaceutical injection. IEEE Trans Instrum Measur 67(6):1278–1296 19. Liu X, Zhu Q, Wang Y, Zhou X, Li K, Liu X (2018) Machine vision based defect detection system for oral liquid vial. In: Proceedings of the 2018 13th world congress on intelligent control and automation July 4–8, pp 945–950 20. Fu L, Zhang S, Gong Y, Huang Q (2019) Medicine glass bottle defect detection based on machine vision. IEEE, pp 5681–5685 21. Zhou X, Wang Y, Xiao C, Zhu Q, Lu X, Zhang H, Ge J, Zhao H (2019) Automated visual inspection of glass bottle bottom with saliency detection and template matching. IEEE, pp 1–15 22. Song T, Liu MH, Xu Y (2020) Research on bottle defect detection based on improved FCOS. In: 5th international conference on mechanical, control and computer engineering (ICMCCE), pp 1156–1159 23. Koodtalang W, Sangsuwan T, Sukanna S (2019) Glass bottle bottom inspection based on image processing and deep learning. In: Research, invention, and innovation congress (RI2C 2019) 24. Kulkarni R, Kulkarni S, Dabhane S, Lele N, Paswan RS (2019) An automated computer vision based system for bottle cap fitting inspection. IEEE 25. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture c1assification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987 26. Asha V, Bhajantri NU, Nagabhushan P (2012) Similarity measures for automatic defect detection on patterned textures. Int J Inf Commun Technol 4 (2/3/4):118–131 27. Asha V, Bhajantri NU, Nagabhushan P (2012) Automatic detection of texture-defects using texture-periodicity and Jensen-Shannon divergence. J Inf Process Syst 8(2):359–374 28. Asha V, Nagabhushan P, Bhajantri NU (2011) Unsupervised detection of texture defects using texture-periodicity and universal quality index. In: Proceedings of the 5th Indian international
186
29.
30. 31. 32.
33.
34.
C. R. Vishwanatha et al. conference on artificial intelligence (IICAI-2011), Tumkur, India, 14–16 December 2011, pp 206–217 Asha V, Bhajantri NU, Nagabhushan P (2011) Automatic detection of texture defects using texture-periodicity and chi-square histogram distance. In: Proceedings of the 5th Indian” international conference on artificial intelligence (IICAI-2011), Tumkur, India, 14–16 December 2011, pp 91–104 Asha V, Bhajantri NU, Nagabhushan P (2011) Automatic detection of defects on periodically patterned textures. J Intell Syst 20(3):279–303 Asha V (2019) Texture defect detection using human vision perception based contrast. Int J Tomogr Simul (IJTS) 32(3):86–97 Asha V, Nagabhushan P, Bhajantri NU (2012) Automatic extraction of texture-periodicity using superposition of distance matching functions and their forward differences. Pattern Recog Lett 33(5):629–640 Asha V, Bhajantri NU, Nagabhushan P (2011) GLCM-based chi-square histogram distance for automatic detection of defects on patterned textures. Int J Comput Vis Robot (IJCVR) 2(4):302–313 Asha V, Bhajantri NU, Nagabhushan P (2011) Automatic detection of texture defects using texture-periodicity and gabor wavelets. In: Venugopal KR, Patnaik LM (eds) International conference on information processing 2011, communication in computer and information science (CCIS) 157. Springer, Berlin Heidelberg, pp 548–553
Tversky-Kahneman: A New Loss Function for Skin Lesion Image Segmentation Do-Hai-Ninh Nham, Minh-Nhat Trinh, Van-Truong Pham, and Thi-Thao Tran
Abstract This paper proposes a novel loss function inspired by the TverskyKahneman probability weighting function to effectively deal with medical image segmentation tasks. The proposed loss, which is called Tversky-Kahneman loss function, is assessed on the official Skin Lesion datasets of the ISIC 2017 and ISIC 2018 Challenge. To evaluate our new loss function, we propose modified U-Net-based model to obtain quantitative results in Dice score and Jaccard index. Various experiments indicate that our new loss function provides a more promising and time-saving performance than other loss functions. Keywords Medical image segmentation · Tversky-Kahneman · Skin lesion datasets
1 Introduction Image segmentation is a critical aspect of image processing as well as computer vision [1], and it has become a dominant area in the field of medical image analysis. A frequent task in medical image analysis is detecting and segmenting pathological regions that commonly take up an area of the full image. In the recent years, prevalent researching field in biomedical image segmentation has developed with many successful strategies; for instance, Particle Swarm Optimization approach from Bansal [2], Spider Monkey Optimization by Kumar et al. [3], Differential Evolution algorithm by Bansal and Sharma [4]. Convolutional neural network (CNN) is a popular method for pixel-wise semantic segmentation. In the area of skin lesion segmentation, CNN has been adopted to D.-H.-N. Nham School of Applied Mathematics and Informatics, Hanoi University of Science and Technology, Hanoi, Vietnam M.-N. Trinh · V.-T. Pham · T.-T. Tran (B) Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_14
187
188
D.-H.-N. Nham et al.
produce a segmentation mask that specifies the lesion region [5]. A relatively new segmentation method by CNNs is the application of dilated convolutions. In the convolutional layers, the weights are scatteredly distributed over a large receptive field without decreasing the coverage over the whole input image [6]. CNNs also attain notable successes as in [7], Long et al. propose a fully convolutional network (FCN) for image segmentation. Traditional fully connected layers of CNNs are replaced with convolutional layers to achieve a coarse feature map, before upsampling it with deconvolutional layers to leverage temporal coherence. Moreover, Conditional Random Fields (CRFs) and CNNs have been combined by Lin et al. [8] for a better exploration on spatial correlation between pixels, although they still need to build up a CRF to refine the CNN output. U-Net [9] is also a famous image segmentation architecture trained end-to-end. Skip connections operation admits propagating dense feature maps from the encoder to the respective decoder layers. Thus, the segmentation output is demonstrated to be more accurate, as the spatial information is adopted in the deeper layer. Proves have been indicated that deep learning model efficiency could be enhanced by applying various loss functions. Regarding to classification tasks, the L2 norm, which is known as mean squared error (MSE) and cross-entropy (CE), is widespreadly employed as loss functions [10]. Also, cross-entropy and the Dice loss (DC) are frequently employed for image segmentation tasks [9], with the main role on extracting features from peculiar regions [11]. Though this could result in good classification and segmentation performance, CE and DC have certain drawbacks in highly unbalanced class training, due to their assumption on identical importance of distribution of labels. In the recent years, there has been a growing concern in exploiting the active contour models as loss functions for training the neural networks. In active contour models, Mumford-Shah functional [12], which was proposed by Mumford and Shah, has been inspiring to many developed methods such as active contour model and level-set methods [13], and proximal methods [14] for classical image segmentation. However, in the traditional active contour-based level-set approach, the results highly depend on the contour initialization. Meanwhile, the deep learning-based approach can overcome this drawback by eliminating initialization of contour. Progressively encouraging the advantage of Mumford-Shah functional and the AC loss with some adjustments obtains the LMS loss provided by Trinh et al. [15]. Even though this loss function has been designed for CMR image dataset segmentation, it is proved to gain quite an auspicious performance on different kinds of clinical images. However, LMS performance counts significantly on the parameters α and β fine-tuning in the level-set formulation and slowly converges. While Mumford-Shah, AC and LMS loss functions focus on edge-preserving filtering method, region-based Tversky loss [16] and Focal Tversky loss [17] prefer to control the information flow implicitly through pixel-level affinity and tackle classimbalanced problem. However, their convergence speed is witnessed to be not good enough. Hence, we propose a novel regional Tversky-Kahneman loss to not only address non-lesion tissue, but also progressively encourage the convergence speed during end-to-end training. In this paper, the main contributions of our work are:
Tversky-Kahneman: A New Loss Function for Skin …
189
• Proposing a novel loss called Tversky-Kahneman to improve the overall segmentation performance. • Experimenting on multiple datasets to prove the efficiency of the proposed loss function over other loss functions applying in various models. Experiments are executed on official skin lesion datasets from lesion boundary segmentation challenge of ISIC 2017 and ISIC 2018, without external data usage. Tversky-Kahneman loss is proved to express better performances in almost cases. • Making some modifications based on U-Net-based model to customize a new U-Net [9] architecture and to validate our proposed loss function and provide state-of-the-art results.
2 Related Work FCN-based semantic segmentation. FCNs [7] frequently generate pixel-wise labeling results with usage of encoder-decoder architecture for high-resolution images. For multi-scale feature fusion, feature pyramid plays as an efficient approach. Pyramid pooling module has been introduced in PSP-Net [18], which encourages more representative context information extraction. Furthermore, as Atrous Spatial Pyramid Pooling [19] (ASPP) makes the use of atrous convolution filters [20] at several dilation rates to capture small image information, it is utilized into the model structure for higher segmentation accuracy. Channel Attention. Attention mechanisms prefer highlighting the most salient features, as well as avoiding meaningless features that provide favorable segmentation results. Residual Attention Network [21] is proposed by using self-attention mechanisms to effectively seize long-range dependencies. Wang et al. [22] create attention maps integrating local and global features by a attention module. Also, Attention U-Net [23] uses gating signal as cross-attention in the skip connection. Attention Up and Concate block help feeding attention modules with more essential features, before these modules are concatenated with the max-pooling outputs of decoder layers. Tversky Loss. Lacking of labels balance, the training process might be led to unwanted local minima convergence, thus predictions might susceptible face with non-lesion tissue; which results in high-precision and low-recall segmentation outcome. In order to weigh the false negative (FNs) during segmentation process for imbalanced data, Tversky loss [16] has become useful as the additional level of control over the FNs yields better small scale lesions than other losses.
190
D.-H.-N. Nham et al.
3 Methodology 3.1 Proposed Network Architecture As shown in Fig. 1, an input image first goes through a customized U-Net structure, which includes an encoder and a decoder, to generate feature maps for the subsequent segmentation mask prediction. The input images are resized to 192 × 288, then normalized by rescaling to [0, 1] before being fed into the network. The encoder, which is well-known for extracting salient features, is divided into four downsample deep blocks. Each downsample block contains two sub-blocks, each includes a 2D convolution layer before Mean-Variance normalization layer (MVN) [11] and ReLU activation. While Batch Norm could calculate windowed statistics and switch between accumulating or using fixed statistics, MVN basically centers and standardizes a batch at a time. However, MVN is utilized since it is a primary but advantageous procedure that substantially strengthen the network learning ability. Instead of applying conventional skip connection paths, residual skip connections [24] are chosen, as in Fig. 2a, which could provide additional challenging spatial information. It is experimentally validated that this kind of connection are noticeably advantageous in model convergence, as it skips some layer in the neural network before feeding the output of a layer as the next layer input. In our model structure, dataflow passes through a chain of two downsample sub-blocks, before reaching 2 × 2 Max Pooling layer then being delivered into residual connection. With reference to the decoder architecture, it contains four upsample deep blocks and a bottle-neck block including ASPP-based structure. Our version of ASPP contains five parallel operations, these are a series of 1 × 1 with four 3 × 3 convolutions
Fig. 1 Our proposed customized U-Net-based model
Tversky-Kahneman: A New Loss Function for Skin …
191
Fig. 2 Sub-modules applied for proposed architecture
with dilation rates set (1, 6, 12, 18). These rates are arranged to occupy multi-scale context; also, incorporated image-level features via Global Average Pooling combination. As above, both training and testing images have been cropped to 192 × 288 due to GPU limitation, by the usage of the feature maps’ nominal stride 16, hence receiving feature vectors of size 12 × 18. This output feature maps are delivered into a 1 × 1 convolution with 1024 filters, before being bilinearly upsampled to the initial dimensions and combined into a single vector via concatenation. We also have an adjustment on ASPP structure, that Batch Norm layer is displaced by MeanVariance normalization layer. Accordingly, the feature outputs of the encoder, which are symbolized by: E 1 , E 2 , E 3 , E 4 which E i ∈ R Hi ×Wi ×Ci Hi =
1 1 Hi−1 ; Wi = Wi−1 ; Ci = 25+i ; Hi , Wi , Ci ∈ N∗ 2 2
i ∈ [1, 4]
(1)
are then passed into skip-connected Attention Up and Concate block (as indicated in Fig. 2b, c) at the first stage of each upsample deep block, before going deeper into that upsample deep block to eliminate gradient vanishing and assemble more features. Consequently, the outputs of upsample blocks and ASPP-based block are: D1 , D2 , D3 , D4 , D5 which Di ∈ R Hi ×Wi ×Ci
192
D.-H.-N. Nham et al.
Hi =
1 1 Hi−1 ; Wi = Wi−1 ; Ci = 25+i 2 2
i ∈ [1, 5]
(2)
Di+1 and E i are fed into Attention Up and Concate block that: Di+1 , E i
attention
D i+1 , E
concatenation i
[D i+1 , E i ] i ∈ [1, 4]
Denote D0 as the final output of model, providing that D0 is convolutional output of D1 that D0 ∈ R Hi ×Wi ×2 with 2 is the number of predicted labels. D0 is aimed for further comparison with the ground truth.
3.2 Proposed Tversky-Kahneman Loss In model structure, Softmax activation is applied to create the loss function from the output layer, which includes c planes. c stands for the sum of labels for classification; and in skin lesion segmentation c = 2. Here imagine that there are N pixels for prediction and N pixels for ground truth labels, denote P and L be the predicted set and the ground truth set correspondingly such that |P| = |L| = N . Denote pic and lic be orderly the element of P and L that i ∈ {1, 2, ..., N } and c ∈ {0, 1}; pic ∈ [0, 1]; lic ∈ {0, 1} representing predicted label probability and ground truth labels, respectively. In Expected Utility Theory, Cumulative Prospect Theory accounts for anomalies in the observed behavior of economic agents [25]. Several different probability weighting functions have been suggested to be employed to the cumulative probability of the outcomes, one of them is the Tversky-Kahneman probability weighting function: zγ (3) ω(z) = 1 [z γ + (1 − z)γ ] γ where z ∈ [0, 1] is the cumulative probability distribution of gains or losses in a number of economical fields, γ ∈ (0, 1) is a parameter. To be more detailed, z would be the probability distribution of gain or loss if there is only one economical field; otherwise, z is the cumulative probability distribution of gains or losses in several economical fields. There has been a variety of applications from the Tversky-Kahneman probability weighting function; such as waiting-time analysis for a decision-maker and behavioral economics [26]. Inspired by this kind of function, a new loss for medical image segmentation is proposed, which is also named as Tversky-Kahneman: (x) =
xγ 1
[x γ + (1 − x)γ ] γ
(4)
Tversky-Kahneman: A New Loss Function for Skin …
193
subject to N
N pi1li0 + β i=1 pi0 li1 x= N N N 0.5 i=1 ( pi0 li0 + pi1li1 ) + α i=1 pi1li0 + β i=1 pi0 li1 α
i=1
i ∈ {1, 2, ..., N }
(5) and α, β regulate the penalty amplitude of pi1li0 as the false positive (FP) and pi0 li1 as the false negative (FN), respectively. It is noteworthy that these two parameters are tuned for balancing the trade-off between FP and FN, as [16] has proved, with the condition that α + β = 1. The larger β, the higher weight of recall compared to precision. When α = β = 0.5, x is simplified as the accuracy loss. And it could be easily seen that x ∈ [0, 1]. Noticeably, γ = 1 shortens the loss function to (x) = x. Additionally, it is observable that γ > 1 the loss function concentrates more on low-level accuracy predictions that are mis-classified. It is also evidenced that the loss function suppression when the class accuracy stays high, frequently when the model is swiftly convergent and γ > 2. This tendency is visualized in Fig. 3 as the increasing trend of x value is mapped to less flatter areas of the Tversky-Kahneman curve while γ value is increasing. Experiments have been conducted with high values of γ till it indicates that the overall result displays the best when γ ∈ (1, 2), and in addition, the best performance is confirmed with γ = 43 . Thus, all experiments are trained in case of γ = 43 .
Fig. 3 Tversky-Kahneman graph function on different γ cases
194
D.-H.-N. Nham et al.
3.3 Evaluation Metrics In biomedical image segmentation, Dice coefficient is a statistical formula to measure similarity between segmentation outcomes. The Dice coefficient is calculated by: N
2 pic lic + dice(P, L) = N i=1 i=1 ( pic + lic ) +
(6)
The smooth coefficient prevents zero division, supposing to be 1e − 10. The Jaccard Index is also a statistical formula to measure the similarity and diversity of sample objects. It is calculated by the formula: N jaccard(P, L) = N
i=1
i=1 ( pic
pic lic +
+ lic − pic lic ) +
(7)
4 Experimental Results 4.1 Datasets Our loss is experimented on the official ISIC Skin Lesion 2017 and 2018 Dataset. In the ISIC 2017, there are 2000 images for training data and blind held-out testing data with 600 images. While in ISIC 2018, there are 2594 images officially with 80-20 train-test split in our experiments.
4.2 Training We validate our customized U-Nets with Tversky-Kahneman loss layer to segment skin lesions. Our model is programed on Tensorflow 2.7 and trained end-to-end, with cost minimization on several epochs is performed by using NADAM optimizer [27] with an initial learning rate of 0.001, with Nesterov momentum for improving convergence. The learning rate is divided by 2 every 10 epochs, before coming to 0.00001 and being consistently controlled through the remainder training duration. All images are preprocessed by center-cropping and normalizing, before being randomly flipped and rotated in the range of [− π6 , π6 ] radians. The training time for the network is approximately 1.5 h at maximum on NVIDIA Tesla P100 16 GB GPU.
Tversky-Kahneman: A New Loss Function for Skin …
195
Fig. 4 Representative segmentation performances of our proposed model and Tversky-Kahneman loss for some difficult input cases of a skin lesion ISIC 2017, and b skin lesion ISIC 2018 datasets
4.3 Representative Results Quantitative results created by our proposed model and Tversky-Kahneman loss function for some difficult input cases are provided. As Fig. 4 displays, our loss and model attain quite a high-quality performance on complex inputs.
4.4 Evaluation To evaluate the efficiency of our new Tversky-Kahneman loss function, our U-Net model is trained with several parameter pairs of α and β. The performance outcomes are displayed in Table 1. From Table 1, it is obvious that according to all combined test measures, the best performance is observed from the customized U-Net trained with α = β = 0.5. To illustrate a fairer validation of our proposed U-Net model and the proposed Tversky-Kahneman loss and to prove our results being state of the art, 8 different cases of variations of each skin lesion dataset are validated. Ablation test results are recorded in Table 2. As Table 2 demonstrates, it is apparent that training with Tversky-Kahneman could obtain much effective and accurate segmentation results. To be more detailed, in the ISIC 2017 Skin Lesion Dataset, the best performance
196
D.-H.-N. Nham et al.
Table 1 Performances under variety of α and β parameters Dataset Penalties Dice coefficient Skin lesion ISIC 2017 α α α α α Skin lesion ISIC 2018 α α α α α
= 0.5, β = 0.4, β = 0.3, β = 0.2, β = 0.1, β = 0.5, β = 0.4, β = 0.3, β = 0.2, β = 0.1, β
= 0.5 = 0.6 = 0.7 = 0.8 = 0.9 = 0.5 = 0.6 = 0.7 = 0.8 = 0.9
0.854 0.847 0.843 0.844 0.842 0.886 0.853 0.881 0.864 0.830
Jaccard index 0.767 0.763 0.757 0.757 0.756 0.813 0.771 0.808 0.789 0.751
Table 2 Skin lesion segmentation performances on different types of loss functions Datasets Performance metrics Dice coefficient Jaccard index Result Epoch Skin lesion ISIC 2017
Skin lesion ISIC 2018
Binary cross-entropy (BCE) + Dice loss Accuracy loss Tversky loss (α = 0.3, β = 0.7) Focal Tversky loss (α = 0.3, β = 0.7) Mumford-Shah loss Active contour loss LMS loss Tversky-Kahneman loss (ours) Binary cross-entropy (BCE) + Dice loss Accuracy loss Tversky loss (α = 0.3, β = 0.7) Focal Tversky loss (α = 0.3, β = 0.7) Mumford-Shah loss Active contour loss LMS loss Tversky-Kahneman loss (ours)
0.840
19–22
0.756
0.849 0.851 0.839
29–32 40–43 38–40
0.766 0.766 0.752
0.848 0.847 0.844 0.854 0.861
18–20 19–21 30–32 46–48 18–20
0.763 0.762 0.758 0.767 0.781
0.860 0.847 0.854
20–23 32–35 59–62
0.779 0.764 0.773
0.874 0.876 0.883 0.886
46–48 46–49 40–42 46–48
0.802 0.803 0.809 0.813
Here the “Epoch” column indicates the range epoch number for the Dice Coefficient and the Jaccard Index reaching their peaks
exceeds others by 0.3 to 1.5%, while in the ISIC 2018 Skin Lesion Dataset, the best outpaces others by 0.3 to 3.9%. From each dataset, top 5 loss functions with best accuracy are selected to observe training processes through loss function graphs. Obviously, the ISIC 2017
Tversky-Kahneman: A New Loss Function for Skin …
197
Fig. 5 Experimental training process with top 5 loss functions in each dataset
Dataset observes the best performances by Tversky-Kahneman, Tversky, Accuracy, Mumford-Shah and Active Contour loss functions. Whilst with the ISIC 2018 Dataset, top 5 are Tversky-Kahneman, LMS, Active Contour, Mumford-Shah and BCE Dice loss functions. Therefore, the experimental training process is displayed as Fig. 5 follows. As Fig. 5 evidences, the Tversky-Kahneman line is proved to be the most effective, the smoothest and the most fastly-convergent, out of all the other loss functions.
5 Conclusion We have introduced a new loss function based on the Tversky-Kahneman probability weighting function, to achieve a more balanced trade-off between precision and recall also a better convergence speed in segmentation. Moreover, our proposed loss layer has been added to a customized U-Net network to produce state-of-theart performances (0.854 Dice Coefficient Score and 0.767 Jaccard Index Score on ISIC 2017 Dataset and 0.886 Dice Coefficient Score and 0.813 Jaccard Index Score on ISIC 2018 Dataset). Our experimental results in skin lesion segmentation evidently indicate that all performance evaluation metrics on the test data using the Tversky-Kahneman loss layer further exceed other performances provided by other kinds of loss layer. Even though parameters α and β have been weighted equally, our proposed approach still obtains better outcomes when comparing to the latest results in skin lesion segmentation. In the future, we will discover more about the relationship between parameters α, β on different models, as well as work on complex segmentation datasets to compare against state-of-the-art results using relevant criteria.
198
D.-H.-N. Nham et al.
Acknowledgements This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005. Minh-Nhat Trinh was funded by Vingroup JSC and supported by the Master, Ph.D. Scholarship Program of Vingroup Innovation Foundation (VINIF), Institute of Big Data, code VINIF.2021.ThS.33.
References 1. Szeliski R (2010) Computer vision: algorithms and applications. Springer 2. Bansal J (2019) Particle swarm optimization. Evolutionary and swarm intelligence algorithms, pp 11–23. https://doi.org/10.1007/978-3-319-91341-4_2. ISBN 978-3-319-91339-1 3. Kumar S, Sharma B, Sharma V, Sharma H, Bansal J (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput Inf Syst 28. https://doi.org/10. 1016/j.suscom.2018.10.004 4. Bansal J, Sharma H (2012) Cognitive learning in differential evolution and its application to model order reduction problem for single-input single-output systems. Mem Comput 4. https:// doi.org/10.1007/s12293-012-0089-8 5. Jafari MH, Karimi N, Nasr-Esfahani E, Samavi S, Soroushmehr SMR, Ward K, Najarian K (2016) Skin lesion segmentation in clinical images using deep learning. In: 2016 23rd international conference on pattern recognition (ICPR). https://doi.org/10.1109/ICPR.2016. 7899656 6. Wolterink JM, Leiner T, Viergever MA, Išgum I (2017) Dilated convolutional neural networks for cardiovascular MR segmentation in congenital heart disease. Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-52280-7-9 7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. arXiv 8. Lin G, Shen C, van dan Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. arXiv 9. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. arXiv 10. Zhou X-Y, Shen M, Riga CV, Yang G-Z, Lee S-L (2017) Focal FCN: towards small object segmentation with limited training data. CoRR 11. Tran PV (2017) A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv 12. Mumford D, Shah J (1989) Optimal approximations by piecewise smooth functions and associated variational problems. Commun Pure Appl Math, pp 577–685 13. Chan TF, Vese LA (2001) Active contours without edges. IEEE Trans Image Process, pp 266–277. https://doi.org/10.1109/83.902291 14. Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J Math Imaging Vis. https://doi.org/10.1007/s10851-010-0251-1 15. Trinh MN, Nguyen NT, Tran TT, Pham VT (2021) A semi-supervised deep learning-based approach with multiphase active contour loss for left ventricle segmentation from CMR images. In: The 3rd international conference on sustainable computing SUSCOM-2021 16. Salehi SSM, Erdogmus D, Gholipour A (2017) Tversky loss function for image segmentation using 3D fully convolutional deep networks. arXiv 17. Abraham N, Khan NM (2018) A novel focal tversky loss function with improved attention U-Net for lesion segmentation. arXiv 18. Zhao H, Shi J, Qi X, Wang X (2017) Pyramid scene parsing network, arXiv, Jiaya Jia 19. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv
Tversky-Kahneman: A New Loss Function for Skin …
199
20. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv 21. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. CoRR 22. Wang Y, Ni D, Dou H, Hu X, Zhu L, Yang X, Xu M, Qin J, Heng P-A, Wang T (2019) Deep attentive features for prostate segmentation in 3D transrectal ultrasound. IEEE Trans Med Imaging, pp 2768–2778. https://doi.org/10.1109/tmi.2019.2913184 23. Jo S, Ozan O, Michiel S, Mattias H, Bernhard K, Ben G, Daniel R (2019) Attention gated networks: learning to leverage salient regions in medical images. arXiv 24. Ange L, Shuyue G, Murray L (2020) DC-UNet: rethinking the U-Net architecture with dual channel efficient CNN for medical images segmentation. arXiv 25. Ingersoll, J (2008) Non-monotonicity of the Tversky-Kahneman probability-weighting function: a cautionary note. Eur Finan Manag, pp 385–390. https://doi.org/10.1111/j.1468-036X. 2007.00439.x 26. Takemura K, Murakami H (2016) Probability weighting functions derived from hyperbolic time discounting: psychophysical models and their individual level testing. Front Psychol 7. https://doi.org/10.3389/fpsyg.2016.00778. ISSN: 1664-1078 27. Dozat T (2016) Incorporating Nesterov momentum into Adam
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens Manoj Kumar Deka
and Yashu Pradhan
Abstract In this paper, we have studied the trend in tea productivity when there are variations in soil moisture of tea gardens. The variations could have occurred due to excessive rainfall or dry season or due to other factors. A system of sensors, data recorders, and processors will be required to understand the trend, and an efficient watering system can help to compensate for dry season. This proposed system can maintain the soil moisture level required for optimum tea production. There are numerous approaches to study the factors so a real-time data acquisition system can add in the field of tea research. In this paper, productivity trend of tea leaves has been analyzed for a year and an attempt to achieve optimum productivity, using data acquisition system (DAS). During this analysis, key factors responsible for tea productivity like soil moisture, temperature, and annual rainfall have been studied while using regression method to find relation among temperature, soil moisture, and rainfall in tea productivity. One of the many factors is soil moisture, and unlike natural factors like weather and rain, we have a scope to maintain soil moisture for enhancing productivity. This system can be used to meet the rising demand for tea by maintaining soil moisture for increasing tea productivity. Keywords DAS · Soil moisture · Temperature · Rainfall · Tea productivity · Data acquisition · AVR microcontroller
1 Introduction Due to lack of knowledge and poor maintenance of agricultural soil for tea plants in the past, the health of tea industry is at risk. The evidence of such destructive change can be apparently judged by gradual decline of productivity and quality of tea leaves [1]. Therefore, the present scenario demands for an intelligent system that can monitor and provide accurate information regarding water requirement for individual tea plant, i.e., there is need of control of watering in plants which will be M. K. Deka (B) · Y. Pradhan Bodoland University, Kokrajhar,, Assam 783370, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_15
201
202
M. K. Deka and Y. Pradhan
efficient in terms of cost, water, and electric power. However, in practice, it seems to be quite impossible to care for each nook and corner of the vast wide spread tea garden by human workforce. Hence, a network of wireless communication will help in solving this problem in vast tea gardens. The solution would be a data acquisition system which help to sense soil moisture from soil and perform operations to raise level of moisture in soil; sufficient for nourishment of plant [2–4]. Without any data acquisition system, real-time monitoring and control operation of soil moisture cannot be achieved. Hence, we gather soil temperature and soil moisture using a DAS for analysis and controlling the tea yield.
2 Objective The main objective of this paper is to point out inter-relation between tea productivity, soil moisture, and temperature. In addition to it, we seek to use the key factors after thorough analysis to carve out a road map which can be used to obtain optimum productivity of tea [5] in tea gardens when it is about maintain the soil moisture of soil in tea gardens. Also, this study can be useful in improving the data acquisition system from time to time in near future.
3 Review of Literature Evans et al. [6] successfully designed a site-specific irrigation system using six sensor stations implemented across agricultural field. The system monitors field condition based on soil property map. The next step is to periodical sample inputs from sensor and transmission to the base station via wireless Bluetooth medium. Jyothipriya et al. [7] suggested a drip irrigation control system using GSM-based ZigBee. Results obtained demonstrate the system which is capable in water preservation. Aqeel ur-Rehman et al. [8] provided an evaluation on WSN usage for irrigation, fertilization, pest control, and horticulture. General concepts and brief description were introduced yet it highlighted benefits for using WSN in agriculture. Many techniques for measuring soil electrical conductivity have been proposed. It highlights principal advantages of the electrical conductivity probe for measuring soil moisture content are its ease of use, simplicity, low cost of equipment, for the relatively large volume of soil sampling [9]. The major advantages of EMI are that it does not need to be inserted in the ground, it is easy and quick to operate and can provide estimates over large areas and substantial depths (of order 10 m). A disadvantage of this method is that the task of isolating the effects from soil moisture content at a particular depth is difficult [10]. To monitor humidity for a high-tech polyhouse environment, wireless sensor was built with AVR ATmega8L microcontroller and RF ZigBee module for protected data transmission [11]. To develop the precision and reliability for monitoring, humidity
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
203
on base station use of smart sensor module has a benefit. The system develops an application for observing various factors such as soil moisture and humidity and gives remote monitoring using ZigBee which sent data wirelessly [12]. Hade et al. [13], the system helped to automate irrigation process remotely which is built on embedded system to save farmers time, water, and resources brought with money. The important tasks like soil test for water content, salinity, chemical constituents are collected wirelessly and processed further to come up with better drip irrigation plan. This automatic monitoring system model using wireless sensor network (WSN) helped farmer community to improve the yield. A system developed by Hanggoro et al. [14] based on greenhouse monitoring and controlling using Android mobile application. It monitors and controls the humidity inside a greenhouse with Android mobile phone and Wi-Fi connection via serial communication to a microcontroller and to a humidity sensor.
4 Methodology The data acquisition system discussed in this paper is designed for sensing soil moisture content in the soil. The operation of DAS has three main stages, namely sensing stage, processing stage, and controlling stage. For sensing 10HS Decagon moisture sensor probe is installed, AVR microcontroller [15–17] does the processing of signals, and depending on the appropriate requirement of moisture in garden soil, the central control unit will start or stop the water sprinkler. Figures 1 and 2 show the flowchart of the overall system and simplified block diagram of the moisture sensor, respectively. The data acquisition system involved in the present study can be represented in block diagram as shown in Fig. 3.
5 Factors Affecting Tea Productivity The four main factors which will affect the productivity of the tea leaves are rainfall, temperature, and moisture of soil. With data acquisition system and tea garden area of 14,400 sq. ft., we will study the effect of four main factors on productivity. The garden is equally dived into two parts, both parts are equipped with data acquisition system, but only one half is equipped with facilities to maintain soil moisture using irrigation [18–22]. Other half soil moisture is governed by natural condition. It is evident from Table 1 that increased soil moisture has translated to better productivity which suggests a relation between soil moisture and productivity. Hence, an attempt to maintain the highest possible soil moisture was made using DAS system, and the results are shown in the following table. After setting up the data acquisition system, the list of the collected data are given in Table 2.
204
Fig. 1 Flowchart of the system data acquisition system
Fig. 2 Simplified block diagram of a moisture sensor
M. K. Deka and Y. Pradhan
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
205
Fig. 3 Block diagram of data acquisition and control system
Table 1 Rainfall, temperature, soil moisture, and productivity in the year of 2015 Month
Rainfall (MM)
Temperature (°C)
Soil moisture
Productivity
Jan
0
20.96
9.45
0
Feb
53.8
22
13.63
0
Mar
21.4
27.41
12.33
55
April
27.2
33.17
12.78
70
May
509.4
31.48
40.35
135
June
826.8
32.27
58.9
150
July
446
33.62
44.63
152
Aug
879
31.36
63.41
164
Sep
572
31.55
40.82
170
Oct
12.2
30.7
16.32
100
Nov
0
27.1
9.13
65
Dec
0.8
22.24
8.1
0
It shows relation between soil moistures and productivity of experimented garden area
From Table 2, Figs. 4 and 5 have been drawn for graph representation of temperature, soil moisture, and productivity in terms of without using data acquisition system (DAS) and with using data acquisition system (DAS), respectively, for the year 2015. The productivity differences without and with DAS for the year 2015 are shown in Fig. 6.
206
M. K. Deka and Y. Pradhan
Table 2 Productivity difference and different parameters affecting the productivity such as soil moisture in the year of 2015 of experimented garden area Month
Rainfall (MM)
Temperature (°C)
Soil moisture
Jan
0
20.96
Feb
53.8
22
13.63
38.13
0
0
Mar
21.4
27.41
12.33
34.55
55
94
April
27.2
33.17
12.78
40.19
70
104
May
509.4
31.48
40.35
40.35
135
149
June
826.8
32.27
58.9
58.9
150
160
July
446
33.62
44.63
44.63
152
163
Aug
879
31.36
63.41
63.41
164
170
Sep
572
31.55
40.82
40.82
170
159
Oct
12.2
30.7
16.32
36.13
100
145
Nov
0
27.1
9.13
33.51
65
140
Dec
0.8
22.24
8.1
34.17
0
0
Before use of data acquisition system 9.45
Productivity After use of data acquisition system 37.23
Before use of data acquisition system 0
After use of data acquisition system 0
Temperature, Soil Moisture & Producvity without Data Acquision System in 2015 Temperature , Soil Moisture, Producvity
200 Temperature 150
Soil Moisture
100
Producvity without using DAS
50 0 1
2
3
4
5
6
7
8
9
10 11 12
Months --->
Fig. 4 Graph for temperature, soil moisture, and productivity without using data acquisition system in 2015
6 Result and Discussion Trends of Productivity Captured Using Data Acquisition System (DAS) The months of January and February, considered non-productive period, proved to be true as there was no noticeable productivity it shown. Feb has more rainfall than Mar, but due to nourishment from Feb, rainfall productivity starts to increase.
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
207
Temperature, Soil Moisture, Producvity
Temperature, Soil Moisture & Producvity with Data Acquision System in 2015 200 180 Temperature
160 140
Soil Moisture
120 100 80
Producvity with using DAS
60 40 20 0 1
2
3
4
5 6 7 8 Month---->
9 10 11 12
Fig. 5 Graph for temperature, soil moisture, and productivity with using data acquisition system in 2015
Difference graph in 2015 200 180 Producvity without using DAS
Producvity with using DAS
160 Producvity
140 120 100 80 60 40 20 0 1
2
3
4
5
6
7
8
9
10
11
12
Month
Fig. 6 Graph for productivity difference in 2015
From March to April, more rainfall increased moisture level, thereby increasing productivity. April to May temp increases but tremendous rainfall increases moisture which resulted in production increase. May to June temperature and moisture both increase; which resulted in maximum productivity of the year. June to July temperature increases but moisture falls, resulting drop in production. July to August in fact highest rainfall of the month moisture is at highest with second highest temperature gives highest productivity of the year. August to Sept temperature is almost same but drop in moisture reduced productivity. Sept to October temperature drops moisture drastically dropped so productivity also dropped. October to November temperature
208
M. K. Deka and Y. Pradhan
Table 3 Productivity difference and different parameters affecting the productivity such as soil moisture in the year of 2019 of experimented garden area Month
Rainfall (MM)
Temperature (°C)
Jan
9.8
21.37
Feb
1.2
Mar
43
April May
Soil moisture Before use of data acquisition system
Productivity After use of data acquisition system
Before use of data acquisition system 0
After use of data acquisition system
9.08
38.08
0
23.75
8.23
33.23
0
0
28.94
11.56
33.11
61
140
188.8
29.62
20.79
35.15
82
153
576.6
31.51
31.77
34.77
150
152
June
1146.2
30.56
45.36
45.36
165
175
July
532.8
33.34
38.65
38.65
163
173
Aug
1155.2
32.06
46.37
46.37
166
176
Sep
310
31.81
29.49
36.49
143
162
Oct
2.4
31.63
12.86
34.54
82
159
Nov
7.4
26.36
9.84
33.13
70
138
Dec
16.8
21.81
10.22
34.22
0
0
moisture both dropped to lowest productivity of the month. November to December moisture and temperature further decreased resulting in un-noticeable productivity shown in Tables 2, 3 and 4 in three different years. And Figs. 7 and 10 show the temperature, soil moisture, and productivity without using data acquisition system, whereas Figs. 8 and 11 show the temperature, soil moisture, and productivity with using data acquisition system for two different years, i.e., 2019 and 2020. And Figs. 9 and 12 represent the productivity differences for the setup of without data acquisition system and with using data acquisition system for the above said two different years, i.e., for 2019 and 2020. Trends of Productivity with DAS as Controller of Soil Moisture The productivity has started to increase after the month of February. In the month of June, July, and August; rate of production increase per month is in the range of 7–12%. Also, these three months have highest productivity for the entire year. It highlights the optimum level of production has been achieved in plot where Ladgaonkar et al. [23, 24], data acquisition system was maintaining water moisture in soil since January shown in Tables 2, 3 and 4 in three different years.
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
209
Table 4 Productivity difference and different parameters affecting the productivity such as soil moisture in the year of 2020 of experimented garden area Month
Rainfall (MM)
Temperature (°C)
Soil moisture Before use of data acquisition system
After use of data acquisition system
Productivity
Jan
4.6
20.29
21.37
38
Feb
0
24.81
23.75
34.12
0
0
Mar
132.6
29.74
28.94
33.7
73
98
April
348.6
30.79
29.62
40.14
90
120
May
447.8
30.55
31.51
42.66
143
155
June
658
32.58
30.56
48.19
160
165
July
583.4
31.27
33.34
47.11
163
169
Aug
114.6
34.79
32.06
38.25
140
168
Sep
318.4
31.48
31.81
35.46
145
165
Oct
191.4
30.51
31.63
39.37
120
168
Nov
0
27.39
26.36
35.14
86
105
Dec
0
23.41
21.81
38.11
0
0
Before use of data acquisition system
After use of data acquisition system
0
0
Temperature, Soil Moisture, Producvity
Temperature, Soil Moisture & Producvity of Graden without using Data Acquision System in the year 2019
180 160 140 120 100 80 60 40 20 0
Temperature Soil Moisture
1 2 3 4 5 6 7 8 9 10 11 12
Month
Fig. 7 Graph for temperature, soil moisture, and productivity of without using data acquisition system in the year of 2019
7 Conclusion When smart watering system is facilitated to plants even during non-productive season, annual production found to be improved by 25–30%. The practice of focusing on yearly productivity plan was found effective compare to monthly productivity
210
M. K. Deka and Y. Pradhan Temperature, Soil Moisture & Producvity of Garden using Data Acquision System in the year 2019
Temperature, Soil Moisture, Producvity
200
Temperature
180 160
Soil Moisture
140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12
Month
Fig. 8 Graph for temperature, soil moisture, and productivity of using data acquisition system in the year 2019
Producvity
Producvity Difference graph in 2019 200 180 160 140 120 100 80 60 40 20 0
Producvity without using DAS
1
2
3
4
5
6
7
8
9
10
11
12
Month
Fig. 9 Graph for productivity difference in the year of 2019
plan. The monthly productivity after use of data acquisition was higher than monthly productivity before use of data acquisition system. It was found that, at same temperature, month with high productivity has more soil moisture level compared to month with low productivity with low moisture level graph Figs. 6, 9, and 12 with respect to three different years. Also, beyond 40% of soil moisture, productivity starts to saturate, and no further increase of productivity is observed. However, during this saturation phase, productivity increase was directly proportional to surrounding temperature, but magnitude of change is very small. Although maintain soil moisture between
Temperature, Soil Moisture, Producvity
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
211
Temperature, Soil Moisture & Producvity of Garden without using Data Acquision System in the year 2020
180 160 140 120 100 80 60 40 20 0
Temperature Soil Moisture
1
2
3
4
5
6
7
8
9 10 11 12
Month
Fig. 10 Graph for temperature, soil moisture, and productivity of garden without using data acquisition system in the year 2020
Temperature, Soil Moisture, Producvity
Temperature, Soil Moisture & Producvity of Garden using Data Acquision System in the year of 2020
180 Temperature
160 140 120 100 80 60 40 20 0 1
2
3
4
5
6
7
8
9
10 11 12 Month
Fig. 11 Graph for temperature, soil moisture, and productivity of garden using data acquisition system in the year 2020
25 and 40% is continued for Jan, Feb, March, but there was no profitable productivity but with approach of March, we observed rise in temperature which gave 25% increase in productivity. April had higher temperature and moisture content so productivity jump of 14.2% is observed. Productivity is observed above temperature of 27°. It is also found that higher rainfall in the previous month increases the soil moisture for next month.
212
M. K. Deka and Y. Pradhan
Producvity Difference graph in 2020
180 Producvity without using DAS
160
Producvity
140 120 100 80 60 40 20 0 1
2
3
4
5
Month 6 7
8
9
10
11
12
Fig. 12 Graph for productivity difference in the year of 2020
8 Future Scope The distinct scopes to be extracted from this study are drawn below: 1. The constant monitoring facilities of the system will give scope to preventive the role from future draught situation hence may be used in draught control. 2. Using the system gives growers more control over their irrigation by delivering accurate data on field and crop conditions, thereby lowering their costs and raising their yields (and in theory, earning a higher profit from their operations). The access to real-time data on the state of the plants and the levels of moisture in the soil, right from a smartphone or browser, allows users to control and monitor their system for optimal irrigation scheduling. 3. Soil parameter like NPK detection in soil may also be achieved with data acquisition system. 4. Production forecast may be developed using the productivity–soil moisture and temperature relation obtained from this research with yearly data, precision will be as high as amount of data increases. 5. Using AI, the system can be trained more precisely using collected data from data acquisition system.
Trends of Tea Productivity Based on Level of Soil Moisture in Tea Gardens
213
References 1. Manimaran P, Yasar Arfath D (2016) An intelligent smart irrigation system using WSN and GPRS module. Int J Appl Eng Res 11(6):3987–3992. ISSN 0973-4562 2. Wild J, Kopecký M, Macek M, Sanda M, Jankovec J, Haase T (2019) Climate at ecologically relevant scales: a new temperature and soil moisture logger for long-term microclimate measurement. Agric For Meteorol 268:40–47 3. Yu L, Gao W, Shamshiri RR, Tao S, Ren Y, Zhang Y, Su G (2021) Review of research progress on soil moisture sensor technology. Int J Agric Biol Eng 14(4) Open Access at https://www.ija be.org 4. Goswami MP, Montazer B, Sarma U (2018) Design and characterization of a fringing field capacitive soil moisture sensor. IEEE Trans Instrum Meas 68(3):913–922 5. Baruah P (2017) Origin, discovery of tea, wild tea and early development of tea in Assam, indigenous tea and tea drinking habit among the tribes in Assam of India. Report of Tea Research Association. J Tea Sci Res, 34–39 6. Kim (James) Y, Evans RG, Iversen WM (2008) Remote sensing and control of an irrigation system using a distributed wireless sensor network. J Name IEE Trans Instrum Measur, 1379– 1387 7. Jyothipriya AN, Saravanabava TP (2013) Design of Embedded system for drip Irrigation automation. Int J Eng Sci Invention, 34–37 8. Rehman A, Abbasi AZ, Islam N, Shaikh ZA (2014) A review of wireless sensors and networks’ applications in agriculture. Comput Stand Interfaces 36:263–270 9. Zegelin S (1996) Soil moisture measurement, field measurement techniques in hydrologyworkshop notes, Corpus Christi College, Clayton, 1–22 10. Brocca L, Ciabatta L, Massari C, Camici S, Tarpanelli A (2017) Soil moisture for hydrological applications: open questions and new opportunities. Water, 1–20 11. Ladgaonkar BP, Pawar AM (2011) Design and implementation of sensor node for wireless sensors network to monitor humidity of HighTechPolyhouse environment. IJAET 1(3):1–11. ISSN: 2231-1963 12. Chavan CH, Karande V (2014) Wireless monitoring of soil moisture, temperature& humidity using Zigbee in agriculture. IJETT 11:493–497. ISSN: 2231-5381 13. Hade AH, Sengupta MK (2014) Automatic control of drip irrigation system &monitoring of soil by wireless. IOSR-JAVS, 57–61. E-ISSN: 2319-2380 14. Hanggoro, Aji., Reynaldo, Rizki.: Greenhouse monitoring and controlling using Android mobile application. IEEE conference,79–85 15. Mazidi MA,Mazidi JG (2019) The 8051 microcontroller and embedded systems. Low Price Edition, PEARSON 16. Mazidi MA, Naimi S, Naimi S (2019) The AVR microcontroller and embedded systems using assembly and C. PEARSON 17. Atmel.com visited (2015) Link: http://www.atmel.com/Images/Atmel-42735-8-bit AVRMicrocontroller-ATmega328-328P_Datasheet.pdf 18. Dutta R, Remote sensing a tool to measure and monitor tea plantations in Northeast India. Page No 19. Hade AH, Sengupta MK (2014) Automatic control of drip irrigation system &monitoring of soil by wireless. IOSR-JAVS 7(4) Ver. Iii, pp 57–61. E-ISSN: 2319-2380, P-ISSN: 2319-2372 20. Awasthi A, Reddy SRN (2013) Monitoring for precision agriculture using wireless sensor network-a review. Global J Comput Sci Technol Netw Web Secur 13(7) Versions 1.0 Year 21. Int J Adv Eng Res Dev IJAERD 2(1). e-ISSN: 2348-4470 , print-ISSN:2348-6406 22. Torres-Sanchez R, Navarro-Hellin H, Guillamon-Frutos A et al (2020) A decision support system for irrigation management: analysis and implementation of different learning techniques. Water 12(2):548. https://doi.org/10.3390/w12020548
214
M. K. Deka and Y. Pradhan
23. Khanna N, Singh G, Jain DK, Kaur M (2014) Design and development of soil moisture sensor and response monitoring system. Int J Latest Res Sci Technol 3(6):142–145. ISSN (Online):2278-5299 24. Ladgaonkar BP, Pawar AM (2011) Design and implementation of sensor node for wireless sensors network to monitor humidity of HighTech polyhouse environment. IJAE 1 1(3). TISSN: 2231-1963
Structural Optimization With the Multistrategy PSO-ES Unfeasible Local Search Operator Marco Martino Rosso , Angelo Aloisio , Raffaele Cucuzza , Rebecca Asso , and Giuseppe Carlo Marano
Abstract The convergence of meta-heuristic optimization algorithms is not mathematically ensured given their heuristic nature of mimicking natural phenomena. Nevertheless, in recent years, they have become very widespread tools due to their successful capability to handle hard constrained problems. In the present study, the particle swarm optimization (PSO) algorithm is investigated. The most important state-of-the-art improvements (inertia weight and neighbourhood) have been implemented and an unfeasible local search operator based on self-adaptive Evolutionary Strategy (ES) algorithm has been proposed. Firstly, the current PSO-ES has been tested on literature constrained benchmark numerical problems compared with PSO which adopts the traditional penalty function approach. In conclusion, some constrained structural optimization truss design examples have been covered and critically discussed. Keywords Structural optimization · Self-adaptive evolutionary strategies (ES) · Structural benchmark · Multistrategy particle swarm optimization
1 Particle Swarm Optimization Introduction A mathematical problem involving the minimization of at least one objective function (OF) f (x) is denoted as an optimization problem, which may be constrained or not, depending on parameters gathered in a design vector x defined in a search space . In recent years, meta-heuristic algorithms and evolutionary algorithms (EAs) have been successfully employed in many engineering applications and structural optimization M. M. Rosso (B) · R. Cucuzza · R. Asso · G. C. Marano DISEG, Department of Structural, Geotechnical and Building Engineering, Politecnico di Torino, Corso Duca Degli Abruzzi, 24, 10128 Turin, Italy e-mail: [email protected] A. Aloisio Civil Environmental and Architectural Engineering Department, Università degli Studi dell’Aquila, via Giovanni Gronchi n.18, 67100 L’Aquila, Italy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_16
215
216
M. M. Rosso et al.
design tasks [1–8]. They do not require information from the OF gradient, a characteristic of the time and computationally expensive gradient-based approaches. In the EAs field, J. Holland firstly developed the population-based genetic algorithm (GA) [9, 10], which mimics the Darwinian Theory and genetics phenomena. Eberhart and Kennedy in [11] proposed the particle swarm optimization (PSO) algorithm in 1995, another famous population-based approach which mimics the food search behaviour of animals in the natural environment like fish schooling or birds flockings. In the mechanisms of the algorithm, each particle of the swarm act as an intelligent agent and explore the search space in order to improve the optimum solution of the optimization problem. At the beginning, the PSO has been able to solve unconstrained optimization only, and then different strategies have been later adopted to even solve constrained ones. In the next sections, after a brief review of the PSO, a particular focus on the novel multistrategy method has been discussed. Eventually, some literature constrained mathematical benchmark problems have been successfully solved by the enhanced PSO. In the final part, real-world structural optimization problems have been solved, comparing performance with other techniques.
2 The Particle Swarm Optimization (PSO) Algorithm The PSO algorithm is based on a population of N of intelligent agents whose position in the search space identifies a trial solution of the optimization problem. To explore the search space, the particles of the swarm flies independently, even if a global intelligent movement appears considering the entire swarm during the iterative optimization approach. The standard PSO formulation is based on the classical mechanics perspective; therefore, each particle i in every generation k is fully characterized by its position k x i and velocity k vi in the search space. The next position of the particle is influenced by two kind of information gathered from the swarm: a self-memory allows the particle to remember its best visited position which acts as local attractor denoted as pbest k x iPb . On the other hand, a global attractor denoted as gbest k x Gb is based on information shared among the particles of the entire swarm. To prevent the explosion of the velocity, this term has been clamped by an upper bound v max = γ (x u − x l )/τ , considering a time unit τ = 1 to make it consistent with a physical velocity and γ ∈ [0.1, 1] as suggested in [12]. The position and the velocity of each particle follow the below adjusting rules: (k+1)
vi =k vi + c1(k+1) r 1i ∗ (k+1)
k
x iPb −k x i + c2(k+1) r 2i ∗ k x iGb −k x i ,
x i =k x i + τ
(k+1)
vi (τ = 1),
(1) (2)
where the symbol ∗ denotes the element-wise multiplication [13], whereas c1 and c2 denoted the cognitive and the social acceleration factor. To introduce a certain level of randomness in the above quite-deterministic update rules, two uniform sampled ran-
Structural Optimization With the Multistrategy PSO-ES …
217
dom scalars between 0 and 1, (k+1) r 1i and (k+1) r 2i , have been considered to increase the domain exploration. The algorithm termination is usually set as the achievement of a priorly set number of iterations kmax . However, it is not easy to priorly estimate the correct number of maximum iteration because it is strongly problem-oriented [14]. Therefore, some other approaches can be based directly on the monitoring of the variation of the OF during the iterations. A stop criterion could be based on a predefined number of stagnations, which means that the OF registers small variations within a certain threshold level for a certain number of iterations. Shi [15] improved the standard PSO formulation introducing the inertia weight term k w applied to the previous iteration velocity to manipulate the inertial effect of each particle to the movement The hyperparameters of the PSO need to be fine-tuned to reach the best performances. For example, the population size defines the level of exploration of the search space and it is suggested to be a number comprise between 20 and 100 when the design vector size is less than 30 [12]. In the following implementations, as suggested in literature, e.g. by Quaranta et al. [12], acceleration factors can be set as constant scalars equal to 2 and inertia weight . A fundamental aspect of the PSO is the information sharing among the agents, defined by the particles interconnection topology, also known as neighbourhood. If the information of every particle is shared with the entire swarm, it is denoted as fully connected or gbest topology. However, this strategy particularly suffers of premature entrapment convergence to local optima. Therefore, lbest models have been proposed to slow down the convergence ensuring enough exploration. Among the different implementations illustrated in [16], in this study, the ring topology has been adopted. Defined a neighbourhood radius and considering a particles indexing order, the information are shared only among the particles who belong to their neighbourhood. In [17], an example of multi-population PSO involves a dynamic topology adjustment during iterations. Constraint handling in EAs is a challenging task especially because of unfeasible trial solutions. Numerous strategies have been developed and in [18] have been reconducted to five main typologies: penalty functions-based methods, methods based on special operators and representations, methods based on repair algorithms, methods based on the separation between OF and constraints and hybrid methods. Due to its simplicit, the most adopted method is the penalty approach (death, static, dynamic or adaptive) which delivers an unconstrained version of the problem φ(x) = f (x) + H (x) with H (x) as a penalty function [19]. For the preservation of swarm diversity and optimization performances, it is necessary to select the best approach to deal with constraints. Indeed, the death penalty approach does not represent at all the ideal solution because it brings a dreadful loss of information from unfeasible points [18]. In the structural optimization field [20], the static and dynamic penalty functions are the most widespread used constraint handling techniques. The static penalty function Hs (x) depends on HNVC the number of constraints that are violated by each particle and HSVC the sum of all violated constraints: Hs (x) = w1 HNVC (x) + w2 HSVC (x) ; HSVC (x) =
np p=1
max{0, g p (x)}
(3)
218
M. M. Rosso et al.
with w1 and w2 as static control parameters usually set to w1 = w2 = 100 [21]. In this study, w1 = 0 and 1000 < w2 < 10,000 have been assumed for penalty-based PSO adopted as a comparison with the enhanced multistrategy PSO. The dynamic penalty approach attempted to improve the static version by allowing a more relaxed constraint handling at the beginning and an increasingly penalty value approaching to the end of the kmax iterations, according to a k h is a dynamic penalty factor [21]: min{ f (x) +k h Hd (x)} with x∈
Hd (x) =
np
k
h=
√ k
θ p (x)[max{0, g p (x)}]γ p (x) ; 10 < Hd (x) < 1000
(4)
(5)
p=1
Usual values of the above factors are reported in [21]. A careful calibration of the penalty is crucial because an high value will reduce exploration and diversity, whereas a too low value will not contrast properly the constraint violation.
3 Multistrategy PSO Considering the Newtonian dynamics-based PSO [11], an enhanced PSO, illustrated in Fig. 1, has been implemented with the most well-acknowledged literature improvements, implementing an additional unfeasible search feature to boost the optimization performance. The very beginning swarm is random sampled in the domain through the latin hypercube sampling. The OF and the level of violation of each constraint are then evaluated. From these evaluations, each particle is associated to a accomplish to a precise goal according to its position in the domain and its violation value. If it is located in the feasible region, its goal is to optimize the OF. On the contrary, when it is unfeasible, its goal is to minimize the envelope of the violation level of the violated constraints. The just explained mechanism inspired the notation of “multistrategy”, since it breaks free from any arbitrary penalty factor or something else, but the PSO relies only on OF and constraint violations. As depicted in Fig. 1, after the definition of the aim of each agent of the first population, the swarm evolution cycle can begin with the usual evolutionary stages such as the Velocity update (1) and the Position update (2). The pbest (self-memory) of each particle is upgraded everytime a new better feasible position is explored, as well as the gbest. When kmax is reached, the evolutionary cycles stop. When the feasible region is very narrow, the swarm may stagnate without founding a region of feasible design parameters. However, in the multistrategy approach, the swarm has attempted to minimize the violations. This means that probably the gbest reaches an unfeasible position located enough close to the admissible feasible area of the domain. This aspect inspired us to enhance the local exploration to attempt to seek the feasible region. Therefore, after k = kES operator stagnations, the swarm begins a local exploration adopting another meta-heuristic algorithm developed by H. P. Schwefel and I. Rechenberg, the Evolutionary Strategy (ES) [9, 23]. This EAs is based on Darwinian Theory of Evolution. A parent
Structural Optimization With the Multistrategy PSO-ES …
219
population is sampled in the nearby of the unfeasible gbest point. Every member of the population generates an offspring by a mutation in its genome which is sampled from a normal Gaussian distribution xi + N(0, σ ), ruled by a mutation step σ [23]. Finally, in every iteration, only the best individuals will survive to the next generation among parents and offspring. The advantage of ES is related to a single parameter σ which should be tuned. In this study, the self-adaptive ES (SA-ES [23]) has been taken into account. Thus, the genomes encodes the mutation step (x1 , ..., xn , σ ) and it is indirectly evolved through the fittest individuals. It is even possible to consider a self-adaptive mutation step for each single gene (x1 , ..., xn , σ1 , ..., σn ) [23]. In the current study, the uncorrelated steps self-adaptive ES has been selected to perform the local search in the nearby of the unfeasible gbest position x Gb,unfea when the swarm stagnates kES operator = 10. The size of the parent local population has been set to N p = 50, sampled with a multivariate Gaussian centred in the unfeasible gbest position, with standard deviation of each genome component equal to σi = |τ · N(0, 1)|
(6)
where τ which is the learning rate parameter assumed as 1/ N p . No = 100 offspring have been generated foremost by randomly selected parents with this mutation scheme: (7) σi,off = max (0, |σi + N(0, 1)|) . Next generations have been obtained by updating the covariance of the multivariate Gaussian mixture model. The mating pool is thus composed by parents and offspring, in a μ + λ-ES strategy, from which the fittest N p elements will survive. A maximum number of local iterations has been set to kmax,Local = 50, unless a feasible point has been found, becoming the new gbest for the PSO algorithm. Figure 2 depicts an illustrative example of the ES unfeasible local search operator capabilities. Even though the above unfeasible local search, the admissible feasible area may have not been discovered after further kmax Unfeas Stagn = 15 stagnations, thus the PSO is completely restarted from the initial LHS sampling as shown in Fig. 1. On the other hand, if a feasible point is found, the PSO starts again to evolve setting it as the new gbest. When the PSO stagnates kmax Feas Stagn = 50 times on the same feasible gbest, the population is restarted with latin hypercube sampling to attempt to scan again the search space exploiting the available iterations left before the termination criterion. In the following, the enhanced multistrategy PSO has been applied to some mathematical benchmark problems and some real-world structural optimization problems.
4 Numerical Benchmark Problems Implemented in MATLAB environment, the proposed multistrategy PSO solved some constrained mathematical benchmark problems whose mathematical statements are reported in [24]. In total, 13 constrained problems have been considered, the
220
M. M. Rosso et al.
Fig. 1 Enhanced PSO multistrategy flow chart
ones with inequalities constraints only. For the sake of comparisons, the current multistrategy PSO has been compared with other PSO with classic penalty approaches proposed by Alam [25]. The code has been adapted for the static and dynamic penalty approach as in (3) and (4). The penalty factors were problem-oriented adjusted and calibrated. The size of the swarm is N = 100 with the total maximum iterations of kmax = 500 for all the PSOs implementations. Table 1 presents the results obtained by 50 execution performed in a independent way and illustrates comparisons such as worst and best obtained results, mean and standard deviation of the OF for the three different implemented PSOs. The results provided by the enhanced multistrategy PSO are in very good agreement with the theoretical global minimum solutions and in accordance with the other PSO penalty-based variants. These results demonstrate that the current multistrategy PSO implementation is effective to cope with constrained mathematical problems, without requiring troublesome calibrations of many arbitrary hyperparameters as it happens for penalty-based techniques.
5 Structural Optimization Examples In this final paragraph, some acknowledged literature structural optimization benchmark problems have been solved by the proposed multistrategy PSO. The PSO-ES has been compared in the following with other penalty PSO implementations [25].
Structural Optimization With the Multistrategy PSO-ES … Generation :12
Generation :12
16
Feas. Reg. Constr.
25
20
Unfea. Reg. Constr. 15 10
15
-22000000
-24000000
-26000000
-30000000
-28000000
-34000000
-32000000
-36000000
-40000000
-38000000
-44000000
-42000000
14
14
5
x
2
x2
15
35 30 5 2 20
35 30
16
221
5
10
15
20
25
30
35
13
13 2
4
6 x
8
0
10
4
6
8
10
x1
1
(b) Constraints - Gen. 12
(a) OF - Generation 12 Local Generation :1
16
2
Local Generation :1
16
Feas. Reg. Constr.
25
20
Unfea. Reg. Constr. 15 10
15
-22000000
-26000000
-24000000
-30000000
-28000000
-34000000
-32000000
-38000000
-36000000
-40000000
-44000000
-42000000
14
14
5
x
2
x2
15
35 30 5 2 20
35 30
0
5
10
15
20
25
30
35
13
13 0
2
4
6 x
8
10
1
(c) OF - ES operator Gen. 1
0
2
4
6
8
10
x
1
(d) Constraints - ES Gen. 1
Fig. 2 Example problem g06 [22]; a, b the objective function and the constraints violation are depicted as contour plot. The black cross symbol stands for the unfeasible gbest, whereas the red and green dots indicates the swarm positions at a certain iteration instant. In c, d, purple dots: local search population ES search operator; green dots: feasible points
To make a more complete comparison with a completely different and independent approach, the GA from MATLAB Optimization Toolbox has been also analysed. In particular, a spring design and two different spatial truss optimization problems have been considered [26]. They are configured as a size optimization task, where the main goal is usually to reduce the self-weight w of the structure respecting the safety constraints, which is related to the cost [7]. Denoting with ρi the unit weight, the purpose is to find the optimal cross-sectional areas Ai of every i-th structural element which are lower and upper bounded (box-type hyper-rectangle domain) Ai ∈ [AiLB , AiUB ]. In general, two inequality constraints are considered: the assessment of the maximum allowable stress σadm (strength constraint) and the assessment of the codes admissible displacement δadm (deformation limitation). For the truss problem, the general optimization problem statement is:
PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic PSO-ES PSO-static PSO-dynamic
g01
g12
g09
g08
g07
g06
g04
g02
Algorithm
Problem −15.000 −15.000 −15.000 0.803570 0.801460 0.793580 −30666.0 −30,666.0 −31,207.0 −6961.8 −6973.0 −6973.0 24.426 24.034 24.477 0.095825 0.095825 0.095825 680.640 680.630 680.630 1.000 1.000 1.000
−15.000
1.000
680.6300573
0.095825
24.3062091
−6961.81388
−30,665.539
0.803619
Best OF
Real optimum
Table 1 Numerical benchmark examples taken from [24], results for 50 runs −12.002 −12.000 −12.000 0.609630 0.520130 0.382850 −30666.0 −30,665.0 −30,137.0 −6958.4 −6973.0 -6973.0 27.636 30.203 30.112 0.095825 0.095825 0.095825 680.980 680.720 680.730 1.000 1.000 1.000
Worse OF −14.443 −13.938 −13.920 0.758960 0.710500 0.665970 −30666.0 −30,665.0 −31,138.2 −6960.7 −6973.0 −6973.0 25.413 28.508 27.043 0.095825 0.095825 0.095825 680.730 680.660 680.660 1.000 1.000 1.000
Mean 0.8948 1.4333 1.4546 0.0636 0.0736 0.0870 2.197E−05 0.8659 252.203675 0.9752 0.0000 0.0000 1.1209 1.4351 1.8821 6.96E−17 6.77E−17 7.10E−17 0.0794 0.0175 0.0189 0.0000 2.12E−15 0.0000
Std. dev.
222 M. M. Rosso et al.
Structural Optimization With the Multistrategy PSO-ES …
min x∈
s.t.
f (x) =
Ne l
223
ρi L i Ai
i=1
AiLB ≤ Ai ≤ AiUB σi ≤ σadm
(8)
δ ≤ δadm in which the structure is composed by Ne l members of actual length L i . The structural steel mechanical properties are expressed in imperial units, i.e. the unique unit weight for each member equal to ρi = ρ = 0.1 lb/in3 and the Young’s modulus of 107 psi.
5.1 Tension/Compression Spring Optimization Benchmark The first problem is referred to a well-acknowledged continuous constrained engineering problem which aims to find the minimum weight of a spring, as depicted in Fig. 3a. Under the action of an axial force F, some inequality constraints are posed by considering respectfulness of requirements related to the minimum deflection, on shear stress and on surge frequency. The design vector is composed of three parameters: the wire diameter (x1 = d), the mean coil diameter x2 = D and the number of active coils x3 . The design parameters are bounded in the following intervals: 0.05 ≤ x1 ≤ 2.0, 0.25 ≤ x2 ≤ 1.3 and 2.0 ≤ x3 ≤ 15.0 The problem formulation is stated as [27]: min
f (x) = (x3 + 2)x2 x12
s.t.
g1 (x) = 1 −
x∈
x23 x3 ≤0 7.178x14
4x22 − x1 x2 1 + ≤0 3 4 12.566(x2 x1 ) − x1 5.108x12 140.45x1 g3 (x) = 1 − ≤0 x22 x3 x2 + x1 −1≤0 g4 (x) = 1.5 g2 (x) =
(9)
In Table 2, the comparison results are illustrated. The theoretical optimal solution of the above-mentioned problem is reported in the first column of Table 2. All the considered algorithms provide great performances on this simple benchmark problem which has only three design parameters, even the penalty approaches. It is worth noting that the mean and the standard deviation of the proposed multistrategy PSOES appear to be lower than the other algorithms, demonstrating the reliability of the proposed method to solve engineering optimization tasks.
224
M. M. Rosso et al.
Fig. 3 a Spring problem and b ten bars truss problem Table 2 Results for 100 runs of tension/compression spring design Cross section (in2 ) Parameter Solution [27] Static pen. PSO-dynamic pen. PSO GA 1 2 3 Best OF (lb) Worse OF (lb) Mean (lb) Std. dev. (lb)
0.051690 0.356750 11.287126 0.012665 – – –
PSO-ES
0.051690 0.357636 11.287891 0.012665 0.017416
0.051470 0.351474 11.603168 0.012666 0.015080
0.0.51770 0.358910 11.141985 0.012642 0.321591
0.051759 0.358400 11.191072 0.012665 0.012722
0.012933 0.000599
0.012928 0.000445
0.023281 0.042785
0.012714 0.000013
5.2 Ten Bars Truss Design Optimization As depicted in Fig. 3b, a planar ten bars truss cantilever structure is now considered. The cantilever span is 720 in. (1in. = 25.4 mm) and 360 in depth with elements numbered from 1 to 10. The loads are two downward equal forces of 100 kips (1 kips = 4.4482 kN). The cross-section areas to be optimized are considered as continuous variables within the admissible range [0.1, 35] in.2 . The maximum admissible stress is σadm = ±25 ksi, whereas the maximum displacement has been set to δadm = ±2 in for every structural node. The population of the PSO has been set to 50 individuals and the stopping criterion has been set to a maximum number of iterations of 500 both for PSO-ES and GA. Since their dreadful results, for the penalty-based PSO, the swarm size has been increased to 500 particles. 100 runs have been independently performed in total, calculating the mean and standard deviation of the OF. Table 3 illustrates the optimization results comparing PSO-ES with static penalty and dynamic penalty PSO and, finally, with GA. The penalty approaches provide underwhelming results, demonstrating their unsuitability when solving structural optimization tasks. Conversely, the proposed multistrategy PSO-ES algorithm
Structural Optimization With the Multistrategy PSO-ES …
225
Table 3 Results for 100 runs of the optimization of the truss with 10 elements Cross section (in2 ) Element Ref. sol. from PSO-static PSO-dynamic GA [26] 1 2 3 4 5 6 7 8 9 10 Best OF (lb) Worse OF (lb) Mean (lb) Std. dev. (lb)
28.920 0.100 24.070 13.960 0.100 0.560 21.950 7.690 0.100 22.090 5076.310 – – –
29.6888 18.3211 19.9891 18.2381 2.3404 20.8674 21.1805 16.0851 6.0845 25.5632 6141.986 8415.134 7294.455 516.7823
30.3092 14.7464 16.5717 25.1945 4.5489 26.1207 32.2698 0.2168 7.5871 23.524 6333.035068 8675.749551 7501.394582 475.3885728
30.145 0.100 22.466 15.112 0.101 0.543 21.667 7.577 0.100 21.695 5063.250 5144.148 5079.744 14.1194
PSO-ES 30.372 0.110 23.644 15.391 0.101 0.496 20.984 7.410 0.103 21.378 5063.328 5229.108 5076.473 24.8666
delivers excellent results in agreement with another completely independent implementation such as the GA.
5.3 Truss Optimization With Twenty-Five Bars The last problem is depicted in Fig. 4a, and it consists of a twenty-five bars threedimensional truss tower structure with ten numbered structural nodes and under two different load cases. The footprint is a square shaped of side 200 in., which tapers to 75 in. at an elevation of 100 in., and finally reaches the maximum elevation from the ground at 200 in. Since the cross-section areas have been gathered into 8 groups as shown in Fig. 4b, the 8 design parameters are considered as continuous variables whose upper and lower bounds are defined by the interval [0.01, 3.40] in2 . The maximum admissible stress is σadm = ±40 ksi, whereas the maximum displacement is δadm = ±0.35 in in every direction. The population of the PSO has been set to 50 individuals and the stopping criterion has been set to a maximum number of iterations of 500 both for PSO-ES and GA. Since their dreadful results, for the penalty-based PSO, the swarm size has been increased to 500 particles. 100 runs have been independently performed calculating the mean and standard deviation of the OF. Table 4 summarizes the results comparing PSO-ES with static and dynamic penalty PSO and, finally, with GA. Even in this last example, the penalty approaches provide underwhelming results, demonstrating their unsuitability when solving structural optimization tasks.
226
M. M. Rosso et al.
Fig. 4 a Twenty-five bars truss problem and b eight bar groups representation Table 4 Results for 100 runs of the optimization of the truss with 25 elements Cross section (in2 ) Bar group Ref. sol. from PSO-static PSO-dynamic GA [26] 1 2 3 4 5 6 7 8 Best OF (lb) Worse OF (lb) Mean (lb) Std. dev. (lb)
0.100 1.800 2.300 0.200 0.100 0.800 1.800 3.000 546.010 – – –
2.054 2.675 1.402 3.388 0.204 0.453 1.274 0.048 568.186 10,0583.118 1673.393 9991.0201
1.116 2.670 1.942 0.166 0.342 1.985 1.976 2.345 596.058 22,954.297 1122.518 3129.3192
0.010 2.023 2.941 0.010 0.010 0.671 1.673 2.694 545.236 557.755 547.828 2.0743
PSO-ES 0.011 1.976 2.989 0.010 0.011 0.690 1.689 2.654 545.249 552.378 546.003 0.7879
Conversely, the proposed multistrategy PSO-ES algorithm delivers excellent results in agreement with another completely independent implementation such as the GA.
6 Results Analysis Summary and Discussion Similarly to [28], in this last paragraph, the main outcomes and critical observations are finally discussed. Table 1 demonstrated the strength of the proposed PSO to deal with mathematical benchmark with inequality constraints. 50 independent runs have been performed and basic statistics have been extracted. On average, the mean value of the multistrategy PSO-ES approaches to the theoretical solutions in a more reliable
Structural Optimization With the Multistrategy PSO-ES …
227
way, since the standard deviations appear to be lower than the other penalty-based PSO. Indeed, these latter sometimes stall quite far from the theoretical solutions and often with almost nil standard deviation. This fact demonstrates the penaltybased PSO tendency to be trapped in local optima. The multistrategy PSO has been finally tested on structural optimization real-world case studies. Tables 2, 3 and 4 reported the results for 100 independent runs. These values demonstrated the dreadful performances of the penalty-based PSO algorithm to deal with this kind of problem. Conversely, the multistrategy PSO-ES behaves excellently, approaching very close to the reference solutions in a reliable way. For the sake of comparison, another EA has been tested in order to produce a valuable comparison with another completely different technique rather than the currently adopted PSO ones. Nevertheless, this last comparison further consolidates the multistrategy PSO-ES effectiveness to reach the optimal solution. Futhermore, this comparison with GA contributed pointing out the reliability of the proposed method, because the solutions provided lower standard deviations with respect to the GA in two problems out of three.
7 Conclusions In the current study, the state-of-the-art improvements of the PSO have been implemented with a multistrategy approach [15, 16]. In addition, in place of the penalty functions method to cope with constraint handling, the information of violation was used to govern the optimization. When PSO excessively stagnates in the unfeasible region, the swarm begins a local search based on a self-adaptive ES, increasing the nearby exploration, hopefully close to the feasible region. This feature powers up the current algorithm with respect to other PSO. Moreover, outstanding results for real-world structural optimization benchmarks have been obtained. In all the problems, the PSO-ES provide solutions closer to the theoretical one and in a reliable way, demonstrated by the less standard deviation with respect to the GA or the other PSO implementations. The hybridisation of the PSO with machine learning and probabilistic approaches, e.g. estimation distribution algorithm (EDA) [29], may represent very promising future studies to further improve the capabilities of swarm-based algorithms.
References 1. Di Trapani F, Tomaselli G, Sberna AP, Rosso MM, Marano GC, Cavaleri L, Bertagnoli G (2022) Dynamic response of infilled frames subject to accidental column losses. In: Pellegrino C, Faleschini F, Angelo Zanini M, Matos JC, Casas JR, Strauss A (eds) Proceedings of the 1st conference of the european association on quality control of bridges and structures. Springer International Publishing, Cham, pp 1100–1107 2. Asso R, Cucuzza R, Rosso MM, Masera D, Marano GC (2021) Bridges monitoring: an application of ai with gaussian processes. In: 14th international conference on evolutionary and
228
3.
4.
5. 6. 7. 8.
9. 10. 11. 12. 13.
14. 15. 16. 17.
18.
19. 20. 21. 22.
23. 24.
M. M. Rosso et al. deterministic methods for design, optimization and control. Institute of Structural Analysis and Antiseismic Research National Technical University of Athens Aloisio A, Pasca DP, Battista L, Rosso MM, Cucuzza R, Marano G, Alaggio R (2022) Experimental tests and validation. Indirect assessment of concrete resistance from fe model updating and young’s modulus estimation of a multi-span psc viaduct. Elsevier Struct 37:686–697 Sardone L, Rosso MM, Cucuzza R, Greco R, Marano GC (2021) Computational design of comparative models and geometrically constrained optimization of a multi domain variable section beam based on timoshenko model. In: 14th international conference on evolutionary and deterministic methods for design, optimization and control. Institute of Structural Analysis and Antiseismic Research National Technical University of Athens Cucuzza R, Rosso MM, Marano G (2021) Optimal preliminary design of variable section beams criterion. SN Appl Sci 3 Cucuzza R, Costi C, Rosso MM, Domaneschi M, Marano GC, Masera D, Optimal strengthening by steel truss arches in prestressed girder bridges. Proc Instit Civil Eng Bridge Eng 0(0):1–21 Rosso MM, Cucuzza R, Trapani FD, Marano GC (2021) Nonpenalty machine learning constraint handling using pso-svm for structural optimization. Adv Civil Eng Rosso MM, Cucuzza R, Aloisio A, Marano GC (2022) Enhanced multi-strategy particle swarm optimization for constrained problems with an evolutionary-strategies-based unfeasible local search operator. Appl Sci 12(5) Rafael M, Panos PM, Mauricio G, Resende C (2018) Handbook of heuristics, 1st edn. Springer Publishing Company, Incorporated Lagaros ND, Papadrakakis M, Kokossalakis G (2002) Structural optimization using evolutionary algorithms. Comput Struct 80(7):571–589 Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95— international conference on neural networks, vol 4, pages 1942–1948 Quaranta G, Lacarbonara W, Masri S (2020) A review on computational intelligence for identification of nonlinear dynamical systems. Nonlinear Dyn 99:01 Plevris V (2009) Innovative computational techniques for the optimum structural design considering uncertainties. Ph.D. thesis, Institute of Structural Analysis and Seismic Research, School of Civil Engineering, National Technical University of Athens (NTUA) Li B, Xiao RY (2007) The particle swarm optimization algorithm: how to select the number of iteration, pp 191–196 Shi Y (1998) Gireesha Obaiahnahatti B. A modified particle swarm optimizer 6:69–73 Medina A, Pulido GT, Ramírez-Torres J (2009) A comparative study of neighborhood topologies for particle swarm optimizers, pp 152–159 Liang JJ, Suganthan PN (2006) Dynamic multi-swarm particle swarm optimizer with a novel constraint-handling mechanism. In: 2006 IEEE international conference on evolutionary computation, pp 9–16 Coello CA (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11):1245–1287 Rezaee Jordehi A (2015) A review on constraint handling strategies in particle swarm optimisation. Neural Comput Appl 26:01 Dimopoulos GG (2007) Mixed-variable engineering optimization based on evolutionary and social metaphors. Comput Methods Appl Mech Eng 196(4):803–817 Parsopoulos K (2002) Vrahatis M. Particle swarm optimization method for constrained optimization problem 76:214–220 Simionescu P-A, Beale DG, Dozier GV (2004) Constrained optimization problem solving using estimation of distribution algorithms. In: Proceedings of the 2004 congress on evolutionary computation (IEEE Cat. No. 04TH8753), vol 1. IEEE, pp 296–302 Beyer H-G (1995) Toward a theory of evolution strategies: self-adaptation. Evol Comput 3(3):311–347 Wen L, Ximing L, Yafei H, Yixiong C (2013) A hybrid differential evolution augmented lagrangian method for constrained numerical and engineering optimization. Comput Aided Des 45(12):1562–1574
Structural Optimization With the Multistrategy PSO-ES …
229
25. Alam M (2016) Codes in matlab for particle swarm optimization 26. Camp CV, Farshchin M (2014) Design of space trusses using modified teaching-learning based optimization. Eng Struct 62–63:87–97 27. Cagnina L, Esquivel S, Coello C (2008) Solving engineering optimization problems with the simple constrained particle swarm optimizer. Informatica (Slovenia) 32:319–326 28. Pawan B, Sandeep K, Kavita S (2018) Self balanced particle swarm optimization. Int J Syst Assur Eng Manag 9(4):774–783 29. Pelikan M, Hauschild MW, Lobo FG (2015) Estimation of distribution algorithms, pp 899–928
Fuzzy 2-Partition Kapur Entropy for Fruit Image Segmentation Using Teacher-Learner Optimization Algorithm Harmandeep Singh Gill
and Guna Sekhar Sajja
Abstract For the past few years, image classification has been a rapidly growing computer vision application. For the extraction and detection of objects from images, image segmentation has been a popular digital image processing method. Type-II partition fuzzy is an excellent strategy for selecting threshold values. To choose the best threshold values for the Fuzzy-II partition, various parameters must be optimized. Partitioned fuzzy-II-oriented Kapur entropy and teacher-learner-based optimization (TLBO) were used to investigate possible parameters and subsequent appropriate threshold values. The fuzzy-II partition entropy is used in the proposed study approach to evaluate its potential for determining the best threshold value using the TLBO algorithm. Fruit images are utilized in experimental work to segment based on optimal threshold values. The SSI, PSNR, and uniformity parameters have been used to compare the segmentation performance of the proposed approach with the existing approaches. Keywords Fuzzy 2-partition · Kapur entropy · Image · Segmentation
1 Introduction Rapid development in soft computing-based optimization approaches and machine vision applications has made the platform for the image processing methods for developing image segmentation and classification algorithms [1]. Image segmentation is a prominent issue all over the world of image processing and classification. The intention of image segmentation is to segregate into recognizable zones with unique properties and then extract the items of interest. It is a key approach in image processing, computer vision, and pattern recognition. As a result, the performance H. S. Gill (B) Mata Gujri Khalsa College, Kartarpur (Jalandhar), Punjab 144801, India e-mail: [email protected] G. S. Sajja Information Technology Department, University of the Cumberlands, Kentucky, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_17
231
232
H. S. Gill and G. S. Sajja
of higher-level items presented above is determined by the segmentation technique used. The threshold-based technique (thresholding) is the most popular among the available strategies, and numerous scholars have worked extensively in this area. Image threshold values favor the image segmentation into dark and bright regions. Due to the complex nature of the obtained fruit images, distinguishing dark areas from bright ones in the image is a difficult task. Obtained threshold results can be affected by intensity, texture, and noise issues. Image categorization and segmentation become increasingly complicated due to these concerns. To address these concerns, integrated fuzzy-II partitions of Kapur entropy and TLBO are proposed [2, 3]. In the fuzzy image processing and segmentation approach, the image is converted into a fuzzy domain using the fuzzy membership functions. The membership function values are modified to segment the acquired fruit images. Sometimes, image acquisition may decrease or lower the quality of the selected image. Image enhancement algorithms are utilized to improve image quality [4, 5]. The rest of the paper is discussed as: The fuzzy-II partition Kapur entropy is introduced in Sect. 2. The TLBO algorithm and experimental work are defined in Sects. 3 and 4, respectively Finally, Sect. 5 brings the article to a close and offers some suggestions for the future.
2 Fuzzy-II Partition Kapur Entropy The practice of transitioning a grayscale image to black and white using the appropriate threshold is known as image thresholding. The optimization problem is defined as the process of selecting the best combination of threshold values from a grayscale image. To optimize the image segmentation problems, optimization strategies required an objective function. The TLBO algorithm is used to solve the optimization problem in the suggested research. The objective function of the optimization problem is developed using Kapur entropy. The presence of meaningful information (feasible level) in the image used to pick the best threshold levels for segmentation is referred to as entropy. Kapur’s approach is similarly unconventional. Fuzzy is a robust method to handle uncertainty and vagueness. In the proposed work, fruit image (spatial domain) transforms into the fuzzy domain by two membership functions are involved. Let I will be a fruit image. Let us assume that I have L gray levels: {0, 1, 2, …, (L − 1)}. A segmented fruit image is defined as RBG: s(x, y) =
RBG if(i, j) ≤ T RDG otherwise
(1)
where T is the ideal threshold value, RDG and RFG are the dark and bright regions, respectively, and x = 1, 2, 3, . . . , M and y = 1, 2, 3, . . . , N . The width of the fruit image I is M, and the height is N.
Fuzzy 2-Partition Kapur Entropy for Fruit Image Segmentation Using …
233
The probability of a particular gray level image I is calculated as follows: P(i) =
h(i) , i = 1, 2, 3, . . . , L − 1 M∗N
(2)
where h(i) is the fruit image histogram. As a result, the probability distribution of the image’s gray level values is depicted as follows: p(0), p(1), p(2), . . . , p(i), . . . , p(L − 1) To transform I from the spatial domain to the fuzzy domain, two fuzzy membership functions are used. This is done to divide I into dark and light areas. The following equations are used to define such functions:
μDG(i)S(i; a, b, c) =
⎧ 0 ⎪ ⎪ ⎪ ⎨
(i−a)2 (c−a)(b−a) (i−c)2 ⎪ ⎪ ⎪ 1 − (c−a)(c−b)
⎩
1 ⎧ 1 ⎪ ⎪ ⎪ ⎨ 1 − (i−a)2 (c−a)(b−a) μBG(i)Z (i; a, b, c) = (i−c)2 ⎪ ⎪ ⎪ ⎩ (c−a)(c−b) 0
i ≤a a max( pit )
(7)
Q i , Rm ≥ 0
(8)
pit , p tj , b > 0
(9)
xm ∈ {0, 1}
(10)
Under the constraints section, Eqs. (1) and (2) are the power consumption demands of the communities of prosumers and consumers, respectively, expressed as functions of their respective dynamic prices. Inequality Eq. (4) constrains that the sum of the energy collected from the prosumers and from the DRERs must satisfy the summation of the communities’ energy demand. Constraint (5) ensures that total carbon sales cannot exceed the government budget to be invested in the carbon market. Equation (6) expresses the value of the buy-back price offered to the prosumers, which is less than the sum of the fixed price and the dynamic price. Constraint (8) is a non-negativity constraint, and constraint (9) shows that all prices cannot be zero.
Dynamic-Differential Pricing, Buy-Back Pricing and Carbon Trading …
311
Constraint (10) is a binary constraint for the establishment of DRER technology in the potential power generation location.
3 Results and Discussion We applied a fuzzy approach to handle the uncertainty. The application of triangular fuzzy numbers of the form u˜ = [u 1 , u 2 , u 3 ] is among the commonly known fuzzy representations. The membership function, the expected interval, and the expected values of the triangular representation are formulated using the fuzzy number u. ˜ In this study, we consider a community comprising 150 prosumers and 200 consumers. The carbon emission rates differ for different types of traditional power generation facilities. For coal and natural gas, the carbon emission rates were estimated as 0.849 and 0.619 kg/kWh, respectively, provided that they are moderate technologies [26], and the carbon emission rate of natural gas was considered for the current model. Researchers have also implied the importance of setting a recycling rate [26, 27] and in the present study, the cost of recycling of the solar panels was incorporated with an estimate of 0.08$/kWh. Other fuzzy and non-fuzzy parameters were set to validate the model. A test data of the constant parameters is set as β = 0.619, θ = 0.01, ϕ = 0.08, h = 0.015, k = 0.03, Pi = 0.025, Pj = 0.045, S = 15,000, and the uniform parameters are presented in Table 1. According to the results from the test data for the model, 150 prosumers generate a total of 176,901.62 kW to secure their own consumption fully or partially an generate income by selling the surplus energy to the electricity company. The company also generates a total of 18,000 kW from the 15 potential DRER sites to cover the energy demand of the communities that cannot be covered by the prosumers. The energy generation capacity of the prosumers is approximately 1200 kW. However, the capacity is limited to 500 kW for some prosumers. The electricity company decides to offer a buy-back price of 0.175 $/kWh to the prosumers to collect the surplus energy they generate. By doing so, the company gains a net profit of $1.599E + 08. Table 1 Uniform parameters
Parameters
Values
lm
Uniform (200, 800)
li
Uniform (90, 800)
Lj
Uniform (150, 1200)
zi
Uniform (1500, 1900)
zj
Uniform (2200, 2800)
312
Y.-C. Tsao et al. 0.24
0.22
Energy Price ($/kWh)
0.20
0.18
0.16
0.14
0.12
0.10 0
2
4
6
8
10
12
14
16
18
20
22
24
Energy Planning Time (H) I1 I11
I2 I12
I3
I4 I13
I5 I14
I6 I15
I7 I16
I8 I17
I9 I18
I10 I19
I20
Fig. 1 Dynamic energy price offered to prosumers
The energy price offered to the prosumers dynamically changes over time and from prosumer to prosumer and a dynamic energy pricing for representative prosumers is shown in Fig. 1. The energy company in the proposed model also offers a dynamic and differential price to the consumers of the community (see Fig. 2).
4 Conclusion In the present study, the model considers a power company that strives to establish DRERs while inspiring prosumers to participate in the energy market by offering buy-back prices for their saved surplus energy. The company also takes on the cost of recycling the solid wastes resulting from energy generation technologies, such as the solar panels of the DRERs established by both the company itself and by the prosumers. In the model, the involvement of prosumers could realize the generation of a total of 176,901.62 kWh. The establishment of new DRERs by the electricity company also generated a total of 18,000 kWh to satisfy the demands of both prosumers and consumers of the community that is not covered by the collective generation capacity of the prosumers. Therefore, the company can save carbon emissions amounting
Dynamic-Differential Pricing, Buy-Back Pricing and Carbon Trading …
313
0.50
0.45
Energy Price ($/kWh)
0.40
0.35
0.30
0.25
0.20
0.15
0.10 0
2
4
6
8
10
12
14
16
18
20
22
J9 J18
J10 J19
J20
24
Energy Planning Time (H) J1 J11
J2 J12
J3
J5
J4 J13
J14
J6 J15
J7 J16
J8 J17
Fig. 2 Dynamic energy prices offered to consumers
to nearly 120,644.1 kgs of carbon. The company also generated a total profit of 1.599E+08$, out of which 18,096.62 $ or 1.13% of the total profit was generated from carbon sales. In addition, 89.82% of the energy demand of the modern community was secured by the carbon-free energy generated by the prosumers.
References 1. Amin W et al (2020) A motivational game-theoretic approach for peer-to-peer energy trading in islanded and grid-connected microgrid. Int J Electr Power Energy Syst 123(July):106307. https://doi.org/10.1016/j.ijepes.2020.106307 2. Adeyemi A, Yan M, Shahidehpour M, Bahramirad S, Paaso A (2020) Transactive energy markets for managing energy exchanges in power distribution systems. Electr J 33(9):106868. https://doi.org/10.1016/j.tej.2020.106868 3. Hahnel UJJ, Herberz M, Pena-Bello A, Parra D, Brosch T (2020) Becoming prosumer: Revealing trading preferences and decision-making strategies in peer-to-peer energy communities. Energy Policy 137(September 2019):111098. https://doi.org/10.1016/j.enpol.2019. 111098. 4. Kuznetsova E, Anjos MF (2019) Challenges in energy policies for the economic integration of prosumers in electric energy systems: a critical survey with a focus on Ontario (Canada). Energy Policy 142(June):2020. https://doi.org/10.1016/j.enpol.2020.111429
314
Y.-C. Tsao et al.
5. Lu T, Wang Z, Wang J, Ai Q, Wang C (2019) A data-driven Stackelberg market strategy for demand response-enabled distribution systems. IEEE Trans Smart Grid 10(3):2345–2357. https://doi.org/10.1109/TSG.2018.2795007 6. Chen Y et al (2020) A comparison study on trading behavior and profit distribution in local energy transaction games. Appl Energy 280(October):115941. https://doi.org/10.1016/j.ape nergy.2020.115941. 7. US-EIA (2020) Annual energy outlook 2020 (with projections to 2050) 8. Brown D, Hall S, Davis ME (2020) What is prosumerism for? Exploring the normative dimensions of decentralised energy transitions. Energy Res Soc Sci 66(September 2019):101475. https://doi.org/10.1016/j.erss.2020.101475 9. Jiang Y, Zhou K, Lu X, Yang S (2020) Electricity trading pricing among prosumers with game theory-based model in energy blockchain environment. Appl Energy 271(January):115239. https://doi.org/10.1016/j.apenergy.2020.115239 10. Tsao YC, Vu TL (2019) Power supply chain network design problem for smart grid considering differential pricing and buy-back policies. Energy Econ. 81:493–502. https://doi.org/10.1016/ j.eneco.2019.04.022 11. Hosseini-Motlagh SM, Nouri-Harzvili M, Choi TM, Ebrahimi S (2019) Reverse supply chain systems optimization with dual channel and demand disruptions: sustainability, CSR investment and pricing coordination. Inf Sci (Ny) 503:606–634. https://doi.org/10.1016/j.ins.2019.07.021 12. Klaimi J, Rahim-Amoud R, Merghem-Boulahia L, Jrad A (2018) A novel loss-based energy management approach for smart grids using multi-agent systems and intelligent storage systems. Sustain Cities Soc 39(January):344–357. https://doi.org/10.1016/j.scs.2018.02.038 13. Qin Q, Liu Y, Huang J-P (2020) A cooperative game analysis for the allocation of carbon emissions reduction responsibility in China’s power industry. Energy Econ 92:104960. https:// doi.org/10.1016/j.eneco.2020.104960 14. Tsao YC, Van Thanh V (2020) A multi-objective fuzzy robust optimization approach for designing sustainable and reliable power systems under uncertainty. Appl Soft Comput J 92:106317. https://doi.org/10.1016/j.asoc.2020.106317 15. Lopez A, Ogayar B, Hernández JC, Sutil FS (2020) Survey and assessment of technical and economic features for the provision of frequency control services by household-prosumers. Energy Policy 146(July 2020). https://doi.org/10.1016/j.enpol.2020.111739 16. Zhang YJ, Liang T, Jin YL, Shen B (2020) The impact of carbon trading on economic output and carbon emissions reduction in China’s industrial sectors. Appl Energy 260(December 2019):114290. https://doi.org/10.1016/j.apenergy.2019.114290 17. Yavari M, Ajalli P (2021) Suppliers’ coalition strategy for green-resilient supply chain network design. J Ind Prod Eng. https://doi.org/10.1080/21681015.2021.1883134 18. Hua W, Jiang J, Sun H, Wu J (2020) A blockchain based peer-to-peer trading framework integrating energy and carbon markets. Appl Energy 279(July):115539. https://doi.org/10.1016/j. apenergy.2020.115539 19. Lee-Geiller S, Kütting G (2021) From management to stewardship: A comparative case study of waste governance in New York City and Seoul metropolitan city. Resour Conserv Recycl 164(May 2020):105110. https://doi.org/10.1016/j.resconrec.2020.105110 20. F. Fizaine, “The economics of recycling rate: New insights from waste electrical and electronic equipment,” Resour. Policy, vol. 67, no. January, p. 101675, 2020, doi: https://doi.org/10.1016/ j.resourpol.2020.101675. 21. Araee E, Manavizadeh N, Aghamohammadi Bosjin S (2020) Designing a multi-objective model for a hazardous waste routing problem considering flexibility of routes and social effects. J Ind Prod Eng 37(1):33–45. https://doi.org/10.1080/21681015.2020.1727970 22. Sadeghi Ahangar S, Sadati A, Rabbani M (2021) Sustainable design of a municipal solid waste management system in an integrated closed-loop supply chain network using a fuzzy approach: a case study. J Ind Prod Eng 38(5):323–340. https://doi.org/10.1080/21681015.2021.1891146. 23. Chen Y et al (2021) Modeling waste generation and end-of-life management of wind power development in Guangdong, China until 2050. Resour Conserv Recycl 169(February):105533. https://doi.org/10.1016/j.resconrec.2021.105533
Dynamic-Differential Pricing, Buy-Back Pricing and Carbon Trading …
315
24. Zhou JG, Li LL, Tseng ML, Ahmed A, Shang ZX (2021) A novel green design method using electrical products reliability assessment to improve resource utilization. J Ind Prod Eng 00(00):1–12. https://doi.org/10.1080/21681015.2021.1947402 25. Tseng ML, Tran TPT, Ha HM, Bui TD, Lim MK (2021) Sustainable industrial and operation engineering trends and challenges toward Industry 4.0: a data driven analysis. J Ind Prod Eng 00(00):1–18. https://doi.org/10.1080/21681015.2021.1950227 26. Quaschning V (2021) Specific carbon dioxide emissions of various fuels. https://www.volkerquaschning.de/datserv/CO2-spez/index_e.php. Accessed 13 Sept 2021 27. Lase IS, Ragaert K, Dewulf J, De Meester S (2021) Multivariate input-output and material flow analysis of current and future plastic recycling rates from waste electrical and electronic equipment: the case of small household appliances. Resour Conserv Recycl 174(April):105772. https://doi.org/10.1016/j.resconrec.2021.105772
Abstractive Summarization is Improved by Learning Via Semantic Similarity R. Hariharan, M. Dhilsath Fathima, A. Kameshwaran, A. Bersika M. C. Sweety, Vaidehi Rahangdale, and Bala Chandra Sekhar Reddy Bhavanam
Abstract Semantic similarity is a training approach that takes into account the semantic meanings of the generated summaries. The most difficult to find is sentence similarity. The sentence is determined not only by the tokens themselves, but also by how two sentences are interconnected and framed. There are multiple degrees to semantic similarity, and sentences might be similar in one condition but opposite in another. This can be accomplished through pre-training and fine-tuning phases of the BERT. The presence of multiple potentially valid predictions is one of the challenges of abstractive summarization. Alternative replies are ineffectively handled by mostly used non-subjective functions for supervised learning. Instead, they serve as a pretraining sound. In this research, we are proposing a semantic similarity technique for training that takes into account the semantic meanings of summaries. Identification semantic similarity score among generated and reference summaries is our training goal, which is computing by adding additional layer. Our proposed architecture obtains model achieves a new state-of-the-art action, with a ROUGE-L score of 41.5 on the CNN/DM dataset, by using pre-trained language models. We also conducted human evaluations to supplement the computerized validation and achieve higher scores than the baseline and reference summaries. Keywords Abstractive summarization · BERT · Semantic similarity · Pre-training · Fine-tuning
1 Introduction Machine learning (ML) is a form of artificial intelligence (AI) that understands the expertise of combining facts and algorithms to replicate the way humans learn and R. Hariharan (B) · M. Dhilsath Fathima · A. B. M. C. Sweety · V. Rahangdale · B. C. S. R. Bhavanam Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India e-mail: [email protected] A. Kameshwaran Dr. M. G. R Educational and Research Institute, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_23
317
318
R. Hariharan et al.
improves its accuracy on a regular basis. When someone tries to produce an article or a project, they may make mistakes. For example, even if their grammar is good and the rest of the sentences are accurate, there is a potential that two sentences may not follow each other. To address this, we are employing the BERT model to determine the semantic similarity of two sentences. Pre-training with language models has been shown to enhance several natural language processing tasks. We introduce a new language model termed bidirectional encoder representations from transformers (BERT), which stands for bidirectional encoder representations from transformers. Transformer-based machine learning technique for natural language processing pretraining model is bidirectional encoder representations from transformers. BERT is both conceptually and empirically simple. It achieves new latest effects on 11 natural language processing tasks, which also includes increasing the GLUE rating to 80.5 percent (7.7% factor absolute improvement), multiNLI accuracy to 86.7 percent (4.6 solute improvement), SQuAD v1.1 query answering test F1 to 93.2 (1.5 points absolute improvement), and SQuAD v2. zero test F1 to 83.1 percent (1.5 points absolute improvement) (5.1-factor absolute improvement). In this research, we use the BERT: bidirectional encoder representations from transformers feature to improve fine-tuning-based procedures. BERT overcomes the previously mentioned uni-directionality limitation by employing a “masked language model” (MLM) and a pre-training objective model. Semantic similarity is a method of determining how similar two statements are in terms of meaning. We propose an approach termed semantic similarity, which prepare the model using the semantic difference between the generated and reference summaries. When trained using maximum likelihood loss, the semantic similarity technique allows the proposed model to accommodate a variety of legitimate summaries, which could behave as training noise. On the CNN/Daily Mail dataset, our proposed method achieves new state-of-the-art results, with ROUGE-1 scoring 44.72 and ROUGE-L scoring 41.53. Our approach creates superior summaries than both the baseline and the reference summaries, according to human review results. Particularly in terms of creativity and relevance, our model outperforms others. Our model may be fine-tuned with minimal computational effort by utilizing pre-trained language models and transfer learning techniques.
2 Problem Statement Identification semantic similarity score among generated and reference summaries is our training goal, which is computing by adding additional layer. It is a task that involves figuring out how similar two sentences are in terms of meaning. The BERT model, which takes two sentences as inputs and outputs similarity, will be fine-tuned in this section. We improve the summarization by utilizing BERT. In Chennai, for example, Narendra Modi addresses the press. In Tamil Nadu, the Prime Minister welcomes the media.
Abstractive Summarization is Improved by Learning Via Semantic …
319
3 Related Work Devlin et al. released a new language illustration version called as BERT, which stands for bidirectional encoder representations from transformers, as part of their deep bidirectional transformers for language understanding pre-training [1]. Unlike current language illustration trends, BERT is designed as pre-trained deep bidirectional illustration from unlabeled text data where conditioning each left and appropriate context in all layers collectively. Finally, pre-trained BERT has fine-tune with additional output layer to produce current fashions for a wide range of activities, including query answering and language translation. Rogers et al. proposed transformer-based fully and had driven portions of the artwork in many areas of NLP, but our understanding of what is behind their success remains restricted [2]. This paper offers a summary of almost a hundred and fifty studies on the well-known BERT model. We examine the current state of knowledge on how BERT works, what kinds of statistics it learns, and how it is expressed, as well as odd changes to its training targets and architecture, the over parameterization problem, and compression methods. Following that, we establish standards for destiny research. Yu et al. suggested a work that incorporates BookCorpus and Wikipedia and obtain excellent overall performance on many natural language processing tasks by fine-tuning downstream tasks [3]. However, it still lacks project-specific and domain-related data to improve the overall performance of the BERT version, necessitating additional fine-tuning method evaluations. To address these issues, the BERT4TC textual content type model is developed, which uses auxiliary sentences to convert the type project into a binary sentence-pair one. To address these issues, a BERT-based entirely textual content type version BERT4TC is developed, which uses auxiliary sentences to convert the type project into a binary sentence-pair one, addressing the problem of limited education records and projectattention. Denk et al. proposed a pre-training assignment of filling in the blanks, i.e., guessing a term that became masked out of a sentence based completely on the final words [4]. However, in some cases, having additional context can help the model make the correct forecast, such as by taking into consideration the location or time of writing. This encourages us to improve the BERT structure by incorporating a global state for conditioning in a fixed-size environment. We discuss our innovative methodologies and apply them to a business use case in which we complete fashion clothing with missing items based on a specific customer. Our strategies greatly improve personalization when compared to other techniques in the literature, according to an experimental comparison. By fine-tuning the BERT model using the SNLI Corpus, suggested natural language inference. The task of identification how similar sentences are in terms of meaning is known as semantic similarity. This example shows how to expect sentence semantic similarity with transformers using the Stanford Natural Language Inference (SNLI) Corpus. A BERT version that takes sentences as inputs and generates a similarity rating for those sentences will be fine-tuned. Rizvi et al. discussed “the absence of training data is one of the most difficult problems in natural language processing” [5]. Because NLP is such a broad field with so
320
R. Hariharan et al.
many fascinating tasks, the most task-specific datasets are only a few thousand or a few hundred thousand human-labeled training instances. The use of pre-training models on large unlabeled textual content statistics to analyze language representations began with phrase embedding’s such as Word2Vec and GloVe. The way we performed NLP tasks changed as a result of these embeddings. For comparison, the BERT base structure has the same version length as OpenAI’s GPT. BERT’s developers have created a set of rules to represent the version’s enter textual material. Many of them are creative layout choices that improve the version. Singh discussed masked language modeling is a fill in the blank job in which a version attempts to guess what the masked phrase must be based on the context phrases surrounding the masks token [6]. In a self-supervised situation, masked language modeling is an excellent way to teach a language version (without humanannotated labels). After that, the version can be fine-tuned to execute a various of supervised NLP tasks. Textual content is vectorized into integer token ids by the text vectorization layer. It converts a collection of strings into a token indices chain (one sample = 1D array of integer token indices in sequence) or a dense representation (one sample = 1D array of flow values encoding an unordered set of tokens). Akbik et al. proposed to use the internal states of a trained character language model to generate a new sort of word embedding called contextual string embeddings [7]. They proposed that embeddings have the distinct properties of (1) being trained without any explicit notion of words and thus fundamentally modeling words as sequences of characters and (2) being contextualized by their surrounding text, implying that the same word will have different embeddings depending on its context. Al-Rfou et al. demonstrated that a deep (64-layer) transformer model [8] with fixed context outperforms RNN variations by a considerable margin, attaining state-of-the-art performance on two common benchmarks.
4 Dataset Description The CNN/Daily Mail dataset is an English language dataset with just over 300,000 non repeated news stories published by CNN and Daily Mail journalists. Although the initial version was designed for machine reading and comprehension as well as abstractive question answering, the revised model supports both extractive and abstractive summarization. A model for abstractive and extractive summarization can be trained using the CNN/Daily Mail dataset (Version 1.0.0 was developed for machine reading and comprehension and abstractive question answering). The ROUGE score of the output summary for a given article is differentiate to the highlight written by the original article author to determine the model’s performance.
Abstractive Summarization is Improved by Learning Via Semantic …
321
5 Methodology The semantic similarity technique calculates the semantic similarity score in a straightforward manner. The semantic similarity score assesses how similar the generated and reference summaries are model of the starting point by using an autoregressive approach, our underlying model, BERT, is employed to construct Sgen. The encoder part encodes Sdoc, whereas the decoder part computes the probability distribution p(s g t |s g 1, s g 2, s g t − 1, Sdoc) at time step t of token s g t given prior tokens and the original document’s sequence, Sdoc. We discovered that if our training goal is just to reduce Lsemsim, the model takes an unusually lengthy time to learn. We included maximum likelihood loss as one of our training objectives and used the teacher forcing approach to solve this problem. There are 287k training pairings, 13k validation pairs, and 11k testing pairs in the preprocessed CNN/DM dataset.4.2. To save time throughout the learning process, we used transfer learning. Rather than using random values as the starting point of the model, transfer learning uses a pre-trained weight. The inclusion of pre-trained weights allows the model to learn the task quickly and efficiently [7]. As the starting point for our model 3, we use BERT large weights fine-tuned on CNN/DM dataset (bart-big-cnn). Learning using the SemSim methodology is more difficult than learning with the maximum likelihood method. When the model is learned by maximizing maximum likelihood, and also the model learns the correct answer token directly because each token is formed from the sequence. The proposed model that uses the SemSim technique, on the other hand, learns at the sequence level. As a result, learning using SemSim takes much longer than learning with the maximum likelihood method. We can solve this problem by transferring weights from the BERT-large-CNN, which already knows how to summarize using maximum likelihood. The pre-trained rewarded model of Al-Rfou et al. is used to transmit weights to the SemSimlayer [8]. The rewarded model calculates the similarity between the created sequence and the reference summary and rewards the RL-based technique with it. Our model’s SemSim layer has the same structure as Rami Al-Rfou et al. which is made up of a language model that encodes sequences and a linear layer [8]. The language model we utilized was BERT large. Finally, SemSim layer has freeze by not updating any weights, so the number of parameters remains same which can allow an open comparison with baseline model.
5.1 Dataset Collection CNN/Daily Mail is a text summary dataset. Human-made abstractive summary bullets were generated as questions (with one of the elements obscured) and stories as the appropriate passages from which the system is anticipated to answer the fill in the blank question from news stories on CNN and Daily Mail Websites. We used additional preprocessing processes after BERT, such as replacing escape characters. There are 287k training pairings, 13k validation pairs, and 11k testing pairs in the preprocessed CNN/DM dataset.
322
R. Hariharan et al.
5.2 Architecture of BERT This section introduces BERT and its precise implementation (Fig. 1). Pre-training and fine-tuning are two steps in our approach. During pre-training, the model is trained on unlabeled information through a variety of pre-training tasks. The BERT version is fine-tuned after being initialized with the pre-training parameters, and all of the parameters have fine-tuned the use of classified facts from downstream activities. Each downstream assignment has its own fine-tuned model, despite the fact that they all start with the same pre-skilled parameters. Sentence transformer library is sincerely used for growing textual content embeddings for our sentence embedding, we are able to use this framework to locate similarity among sentence embeddings for greater than one hundred languages, we will generate sentence stage embedding the use of sentence transformer as soon as the embeddings are to be had the real we will compute semantic similarity first off that needs to be pre-trained that has to divide the embedding and tokens the use of semantic similarity, however, this manner is carried out for two sentences after that we must cosine the embedding of sentences one withinside the x-axis and the y-axis, then we are able to get the similarity prediction of this sentences. Our input image is capable of unambiguously representing each a single sentence and two sentences, allowing BERT to handle a variety of down-movement tasks. Fig. 1 Architecture of BERT
Abstractive Summarization is Improved by Learning Via Semantic …
323
Fig. 2 Architecture of SEMSIM
5.3 Architecture of SEMSIM Both semantic and structural knowledge are captured using the SemSim model (Fig. 2). This model calculates the semantic similarity score, which quantifies the similarity between two provided sentences or summaries, and is very efficient. In addition to all of the model’s computational components, the model file contains rich semantic information about the model’s contents. The SemSim model can determine whether the contents of both sentences have the same meaning, indicating that it is semantically interoperable. SemSim’s architecture is multi-scale and multi-domain by default. The SemSim layer uses a simple linear function and embeddings to calculate the semantic similarity score. The SemSim model can design a summary using only its own words and sentence patterns. The reference and generated summary sequences are encoded using a pre-trained language model. BERT is the language model utilized here, which computes the complete sequence’s embeddings as well as encodes all words as tokens of the input sequence as a dense vector. Maximum likelihood loss is one of our training goals. Because the weights in the SemSim layer are not updated, the number of parameters does not increase, and the gradient is calculated and flows from the SemSim layer’s backward path.
5.4 Training of BERT BERT employs a transformer, which is an attention mechanism which learns the contextual relationships between words in a sentence. An encoder receives the text input, and decoder generates a job forecast. Unlike directional models, which read the text input sequentially, the transformer encoder reads the entire sequence of words at once either left-to-right or right-to-left. It is labeled as bidirectional, yet it would be more accurate to describe it as non-directional, respectively. Before entering word sequences into BERT, this characteristic allows the model to derive the context of a word from its surroundings (12ft and right of the word). A [MASK] token is used to replace 15% of each sequence. The model then attempts to predict the original value
324
R. Hariharan et al.
of the masked words using the context provided by the other, non-masked words in the sentence. The model acquires a knowledge to recognize whether the second sentence in a pair is a successor to the original document during the BERT training process, which takes a pair of sentences as input. The second sentence is a follow-on sentence to the original document, and the remaining 50% chooses a random sentence from the corpus as the second sentence during training. The random set is believed to be independent from the first set. BERT can be used for a wide range of linguistic tasks, with the simplest version adding a small layer to the middle: Classification tasks, such as sentiment evaluation, are completed after the next sentence class, by using a class layer on the top of the transformer output for the [CLS] token [9–11]. In question answering tasks (for example, SQuAD v1.1), the software receives a question about a textual content series and is required to indicate the answer inside the series. A Q&A variant can be taught using BERT by learning more vectors that denote the start and end of the solution. The software program in named entity recognition (NER) receives a textual content series and is required to mark the many types of entities (person, organization, date, etc.) that appear inside the text. A NER version can be trained using BERT by feeding the output vector of each token into a class layer that predicts the NER label [12].
6 Comparison with Other Models As the main purpose of our paper is to find the semantic similarity among the two sentences, we are using BERT with SemSim strategy because it gives higher accuracy as compared to the other baseline models to meet the basic requirement of the paper. Comparison of proposed methodology with existing methods is addressed in Table 1. Some models are faster and give higher accuracy in some cases compared to our proposed model BERT, but according to our basic purpose and requirement of the paper, BERT gives higher accuracy compared to other models. BERT can achieve around 99.7% of highest training accuracy on training dataset and around 93.8% of accuracy on validation of the dataset. BERT gives overall 91% of accuracy compared to other pre-trained language processing models.
7 Testing and Validation We tested CNN/DM dataset by giving the data for the sentence pairs in the development and test sets of the project and computing their similarity with the reference summary. We will mostly work with CNN/DM, as is standard in the literature. BERT scored nearly 71% of accuracy, whereas BERT with semantic similarity scored 80% of accuracy.
Abstractive Summarization is Improved by Learning Via Semantic …
325
Table 1 Comparison of our methodology with existing methods Model
Description
Advantages Disadvantages
Accuracy and Applications output
Long short-term memory (LSTM)
It can understand context along with latest dependency
It can It is prone to over process, fitting classify, and predict time series
It achieves 78.74% of accuracy and gives hidden state as an output
Generative pre-trained transformer 3 (GPT-3)
It produces human-like text using deep learning. It can automatically generate text summarizations and programming code
It can create anything with a text structure
It does not have It achieves any understanding 76% of of the words, and accuracy it also lacks in semantic representation of the real world
It generates text-based adventure games It translates conventional language into formal computer code
Text-to-text transfer transformer (T5)
It enables the model to learn from all input tokens instead of the small masked out subset
It reframes all NLP tasks into a unified text-to-text format
The cost of running several different simulations can be high
It achieves 93.2% of accuracy on SQuAD 1.1 and gives a string representation for the class as an output
Language translation Question answering task
ALBERT
It is little version of BERT
It decreases the size of a model and reduces longer training time
It is built for specific task only because of which it is unable to adapt new domains and problems
It achieves 89.4% of accuracy on RACE benchmark
Text summarization Sentence classification
Bidirectional encoder representation from transformers (BERT)
BERT is an pre-trained NLP model that practices to predict missing words in the text
It can process larger amounts of text and language. It is capable to fine-tune data to the specific language context
It is very compute-intensive at inference time so that it can become costly
It achieves 99.7% of highest training accuracy on training dataset and 93.8% accuracy on validation dataset, and it gives 2 variables as an output
Text classification or sentence classification To find semantic similarity among two sentences Question answering task Text summarization
Time series prediction Sign language translation
326
R. Hariharan et al.
8 Result and Discussion Our model BERT produces more favorable summaries than the reference summaries, and also with the help of SemSim architecture, semantic similarity score can be calculated among two sentences efficiently as compared to the other natural language processing models. This is due to the following reasons: (1) A large-scale language processing models have a strong ability for text generation. (2) Some of the reference summaries receive low scores compared to the generated summaries because of which the reference summaries are not always an ideal summary of the document. We also discuss the difference between our BERT model and other baseline models. The datasets we are using are CNN/DM dataset. Some of the reference summaries are of poor quality because of which it provides less scores rather than generated summaries. Generally, we assume that the test set and training set belong to the same quality because the test set has a very similar distribution of the training set, so if the test set is of poor quality, then the test training will also be of poor quality as well. We observed that the test set and test training are related to each other. So, this low quality can reflect during the training of model. Thus, it is tough for a model to produce more better results as compared to the results produced using datasets. However, the generated summaries evaluated by our model are better than the reference summaries. We perform pre-training and fine-tuning to our model on various corpus which gives higher learning rate and better performance. The purpose of our model is to understand and learn the basic context of the language after pretraining phase and what specific task it needs to perform after fine-tuning phase. BERT shows outstanding performance in both relevance and readability because the underlying structure of our model is trained to extract the salient information with the help of encoder and to generate a complete sentence by using decoder. Pre-training is one of the reasons which enables BERT to provide favorable scores. Lastly, we would like to highlight the strength of SemSim approach. Our model shows positive difference in relevance and creativity. Our model is trained to generate a sequence that has similar meaning to the sequence taken as reference. Our model achieves higher relevance and creativity scores because of its structure. Our model provides flexibility and produces more better results as compared to other models, and it meets the purpose and requirements of our project for the required implementation. It gives higher accuracy as compared to many other models for the required purpose of the paper.
9 Conclusion and Future Work In the paper, we introduce semantic similarity strategy which is a straight forward approach for training summarization models. Our approach, semantic similarity lets the model to train with more flexibility as compared to the other maximum likelihood methods, and also due to this, flexibility model can throw back more better
Abstractive Summarization is Improved by Learning Via Semantic …
327
results. SemSim strategy allows a model to write a summary by using its own words and sentence structures. Also, our models give back the noticeable information of the original documents efficiently in the summary. Using BERT with SemSim strategy for finding, the similarity between the two sentences gives higher accuracy as compared to the other baseline models. BERT achieves around 91% of accuracy and according to human estimation, our model generates more favorable summaries than the summaries taken as the reference compared to the other baseline models.
References 1. Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 2. Rogers A, Kovaleva O, Rumshisky A (2020) A primer in bertology: What we know about how BERT works. Trans Assoc Comput Linguist 8:842–866 3. Yu S, Su J, Luo D (2019) Improving BERT-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612 4. Denk TI, Ramallo AP (2020) Contextual BERT: conditioning the language model using a global state. arXiv:2010.15778 5. Rizvi MSZ (2019) Demystifying BERT: a comprehensive guide to the groundbreaking NLP framework. https://www.analyticsvidhya.com/blog/2019/09/demystifying-bert-ground breaking-nlp-framework 6. Singh A (2018) End-to-end masked language modelling using BERT in 2018. https://keras.io/ examples/nlp/masked_language_modeling/ 7. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649 8. Al-Rfou R, Choe D, Constant N, Guo M, Jones L (2019) Character-level language modeling with deeper self-attention. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, No 01, pp 3159–3166 9. Ando RK, Zhang T, Bartlett P (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6(11) 10. Hariharan R, Raj SS, Vimala R (2018) A novel approach for privacy preservation in bigdata using data perturbation in nested clustering in Apache Spark. J Comput Theor Nanosci 15(9– 10):2954–2959 11. Hariharan R, Mahesh C, Prasenna P, Kumar RV (2016) Enhancing privacy preservation in data mining using cluster based greedy method in hierarchical approach. Ind J Sci Technol 9(3):1–8 12. Bentivogli L, Clark P, Dagan I, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. In: TAC
A Systematic Review on Explicit and Implicit Aspect Based Sentiment Analysis Sameer Yadavrao Thakur, K. H. Walse, and V. M. Thakare
Abstract Opinion mining is also called sentiment analysis is a branch of web mining and text mining that is the process of identifying and determining the orientation of individual web users regarding various products, services, social comments on social media, different e-commerce websites, political issues, hotels, restaurants, emotions or sentiments of individual users about different entities and services of his/her interest on web or Internet. OM or SA interchangeably can be used to monitor and build a recommendation system based on individual user’s reviews that can be document-based, sentence-based, or aspect-based on reviews. Extracting features or aspects of web-based entities or services is a core task of SA. Aspects or features are broadly categorized as implicit and explicit features. The recommendation systembased build with such functionality can enable users to improve buying or selling decisions, also it can improve market intelligence, improve the manufacturing of products, improve services to users, and so on. Most of the time, reviews contain clear information (explicit), but they may also contain information that is not clearly stated or hidden (implicit) that cannot be ignored as may lead to the wrong decision or recommendation to individual users. This area has many challenging issues or problems that are still unaddressed or unsolved. Keywords Sentiment analysis · Opinion mining · Implicit aspects · Explicit aspects · Recommendation system
S. Y. Thakur (B) · V. M. Thakare P.G. Department of CS and Engineering, Sant Gadge Baba Amravati University, Amravati (M.S), India e-mail: [email protected] V. M. Thakare e-mail: [email protected] K. H. Walse Department of CS and Engineering, Anuradha Engineering College, Chikhali (M.S), India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_24
329
330
S. Y. Thakur et al.
1 Introduction Sentiment analysis (SA) or opinion mining (OM) is the field that is most benefited from technology improvement and there is a rise in the internet activities such as social media communication, manufacturer/buyer/seller websites, different reviews on websites about products and services, discussion forums, chatting of users, ecommerce websites, and many other online activities, which positively impact on research initiatives [1]. SA is the process of analyzing views, sentiments, reviews, emotions, or opinions of people about different entities, products, services, political events, medical services, organizations of interest, fashion, trends, etc. [2]. Nowday, opinion mining or sentiment analysis is one of the most important research areas with wide exposure to engineering, research, and application. In industry, as well in academics this term is in use frequently and refers to an analogous meaning in the field of study. Users express their reviews or opinions about products, services, movies, mobile, laptops, hotels and their food, ambiance, political statements issued by political leaders, and many other entities of user’s interest. But, the majority of other entities are not yet investigated. SA is categorized into three different levels that are document, sentence, and aspect level. At the document and sentence level, an objective is to find the overall sentiment of a document or sentence. Whereas, at aspect level SA, an objective is to investigate sentiment or opinion about each aspect or feature present in the review. Features or aspects are classified as either explicit or implicit. Feature or aspect is said to be explicit if it is present in the review, else, is said to be implicit. There are many applications of sentiment analysis in our diversified sphere of life. It is used for measuring the level of user satisfaction or dissatisfaction with various products or services and improving their limitations, prediction of change in price according to news sentiments, manufacturing products, and services, promoting and improving products according to customer satisfaction or requirements reflected in reviews given by customers. This paper presents a systematic review on two different categories of aspect level sentiment analysis that is implicit and explicit aspect extraction from reviews focused on different proposed research work and summarizes and classifies it. Also, provides researchers with the current state-of-the-art to help researchers for future outcomes. This SR provides directions for new opportunities in this research area along with taxonomies, open challenges, and future directions on implicit and explicit aspect extraction in SA. In Sect. 1 of this paper, we present a taxonomy of sentiment analysis, a summary of various implicit and explicit aspect extraction techniques, which may be classified mainly in unsupervised, supervised, or semi-supervised. Section 2 is about related work that is carried out. In the third and fourth sections of this paper, we discussed the results of various methods obtained during review and their analysis, and in Sects. 5 and 6, we are presenting data and language domains involved. Section 7 describes performance measures used by researchers. In Sects. 8 and 9, we have presented
A Systematic Review on Explicit and Implicit Aspect Based Sentiment … Fig. 1 Classification diagram for SA
331
Sentiment Analysis
Document Level
Sentence Level
Explicit Aspect
Aspect Level
Implicit Aspect
Unsupervised Supervised
Semi-supervised
major findings, open research problems, and future directions. We finally present the conclusion in Sect. 10. This state-of-the-art review is carried out for (1) To investigate the techniques used for implicit and explicit aspect-sentiment extraction. (2) To investigate various performance measures, used domains of data, used languages to identify I&E aspects from the years 2017 to 2021 (3) To know what are the limitations or opportunities related to techniques. (4) To know future research questions and many more [2]. Earlier work by researchers indicates that study and research focus was given to identify explicit aspects only, which is not enough because the research is incomplete without studying implicit aspect-sentiment extraction as it adds value to understanding the exact meaning of the review. This study explores many facets of this research area. Classification Diagram for SA is shown in Fig. 1. Example Implicit and Explicit aspects or features for mobile phone reviews. Comment1 Although this mobile phone is too heavy, it has a nice exterior and is a little cheap. Comment2 The sound quality of this phone is very good. Here, explicit aspects are (1) Exterior and (2) sound quality. Implicit aspects are (1) Weight implied by the heavy word. (2) Price implied by cheap words. The above-mentioned example represents implicit as well as explicit models of aspect-based sentiments for review on mobile phones. Implicit aspects are not mentioned but implied implicitly.
2 Literature Review Tubishat et al. [1] presented a systematic review of 45 research articles based on implicit aspect extraction in sentiment analysis. It gives attention to the taxonomy of sentiment analysis, implicit and explicit aspects of entities, and provides deep
332
S. Y. Thakur et al.
information about opportunities and open challenges in implicit sentiment analysis. Sentiment analysis is generally divided into three types, namely, document, sentence, and aspect level. The first one investigates aspect-opinion words in a complete document, the second one investigates the opinion of complete sentences. And, the third one computes the aspect-opinion of a specific entity. Aspects are explicit and implicit ones. The researcher has mentioned that lots of the research work focuses on explicit sentiments and very little research work is there in implicit sentiments. Therefore, there is a need to do more work on implicit aspect identification. Future scopes are as (1) implicit aspect/feature extraction, and implicit opinion detection (2) implicit feature extraction for unexplored data domain (3) design and investigate a hybrid approach for implicit feature extraction (4) fake opinion detection related to implicit aspect (5) non-availability of standard datasets and many more. Major categories of techniques used are unsupervised, supervised, and semi-supervised. Very few researchers have used supervised, semi-supervised, and hybrid approaches. Huang et al. [3] emphasize implicit opinion analysis, especially on detection and categorization of opinion. They observed that it is difficult to identify implicit opinions as compared to its counterpart. They used two different methods, DNN and WE from applied convolution neural network CNN and word representation models from word2vec and Glove in their experiment. In their experiment, they used a dataset for hotel reviews written in the Chinese language, they adopted POS, chi-square test, and PMI techniques to build a lexicon. Implicit feature extraction is done with the help of Glove, word2vec, and bag of words. For sentiment classification, they used CNN and SVM models. The result shows that the CNN model achieves higher performance than the traditional SVM model. The result also confirms that implicit opinions are equally important to explicit ones. Dadhaniya et al. [4] proposed an approach for opinion summarization based on multiple aspects using implicit features. Extraction of features or aspects from the reviews is a critical stage in opinion mining. Although much of the earlier studies focus on finding explicit features, little effort is made to detect implicit features. The paper gives clear suggestions of building solid methods to identify hidden sentiment. The author first extracts explicit features and opinion words. Whenever a new opinion word is encountered, rules are used. Finally, classification-based feature selection is used to detect implicit features. They are applying LDA for opinion summary generation. Joshi et al. [5] explains various trends, approaches, datasets, and issues in sarcasm detection, which is a very crucial step in SA. Different data sets are utilized in sarcasm detection and are divided into different categories, namely, short text, long text, transcripts, and other miscellaneous data sets. Social media text tends to be short reviews and discussion forum posts, movie reviews, book reviews, news articles, and long text. Data sets of transcripts and dialogue have also been used. Miscellaneous data sets include lexical indications like smiles, different types of quotes, etc. There are three different approaches to sarcasm detection, (1) statistical, (2) rule-based, and (3) deep learning. Many of the approaches used a bag of words as features.
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
333
Identifying sarcasm through specific evidence is carried out with the help of rulebased approaches. Sarcasm detection through a statistical approach is dependent on different learning algorithms like some logistic regression, chi-square test, balanced winnows algorithm, NB, decision trees, binary logistic regression, HMM, bagging, boosting. DNN, CNN, and RNN (LSTM) are the approaches of sarcasm detection using a deep learning approach. Major concerns in this area are (1) identification of scareostic patterns and making use of these patterns as features. (2) Using contextual information for sarcasm detection. Scareostic patterns are among one of the future directions for the detection of sarcasm. Classification is a very common task in sarcasm detection, in that given a text and objective is to predict whether a text is sarcastic or not. The sarcasm classifier is compared with the human ability to check its accuracy. There are three stages in the development of sarcasm detection research: (1) extraction of semi-supervised patterns that will help to identify implicit sentiment, (2) Create large scale annotated datasets by making use of hash tag-based supervision, and (3) effective use of the context of the target. Kama et al. [6] extract implicit aspects and words of sentiment the polarity of which depends on aspects. From the product forum, Turkish informal texts are collected and the technique of aspect-sentiment mapping is applied. They suggest that sentiment mapping can be improved by analyzing Turkish language grammar. Tubishat et al. [7] proposed an algorithm to extract explicit and implicit aspects by making use of a hybrid approach, an algorithm known as a whale optimization algorithm. The algorithm works on the principle of extracting explicit aspects by making use of the best dependency relation patterns. They are investigating whether future work on WOA can also be applied to implicit aspect extraction. Feng et al. [8], a new method constituting a deep convolution network and a sequential algorithm is proposed for labeling the words in sentences. First, the aspects are extracted which are composed of different vectors to train the DCNN, and then to obtain the sentiment labeling of the sentence sequential algorithm is applied. The study has limitations in that research considers only reviews of mobile phone products, the rest of the domains are not covered to verify results. Mushtaq [9] use explicit and implicit mining to find relationships between technology-related reviews to identify knowledge related to architecture-relevant and technology in that area. Then knowledge structuring is done using classification and clustering. This research is useful in making decisions about architecture and technology in the process of software development. Hannach and Benkhalifa [10] make use of adjectives and verbs to identify implicit aspects. This method is a combination of two techniques known as WordNet-based semantic relations and scheme for term weighting for improving training sets of data for implicit aspect identification. First, they apply preprocessing on data in a crime dataset to obtain relevant reviews and then information retrieval techniques. Model is evaluated by using multiple classifiers like MNB, SVM, and Random Forest. Results indicate that the WordNet semantic relation term-weighting scheme is not only required but also improves the performance of the classifier.
334
S. Y. Thakur et al.
Irfan et al. [11] in their work presented identification of implicit features based on the feature-opinion pair from a training dataset using the co-occurrence association rule mining approach. The multi-aspect LDA technique is used to generate a feature-opinion pair. They proposed a hybrid context-aware recommendation framework for a personalized recommendation of various items. For this, they incorporate the rating inference of users’ textual reviews with collaborative filtering methods. Opinion feature pairs with high noun frequency are extracted. NB and SVM classifiers are used for classification. Precision, recall, and F measurements are used to obtain performances. Bias in ratings is a major factor that may affect the results. Kren et al. [12] researchers have developed a framework for the opinion model of IPTV viewers. This framework uses implicit feedback given by viewers on channel change events and contents for determining the viewer’s opinion. In their method, by making use of the probability of channel change events and contents, first, they built explicit feedback. TF-IDF technique is used for selecting features after initial word filtering. Their analysis is towards building a model of opinion versus the interest of the viewers. Classifiers like NB, SVM, decision tree, and random forest algorithms are used for classification purposes. Lazhar [13] uses a data mining technique known as association rule mining to classify implicit features. In the beginning, a set of association rules is generated with the help of a corpus of reviews and then a learning classification model is built. A set of opinion words is predicted with this model. POS tagging and dependency parsing are used to extract feature-opinion pairs. The next step in this model is to generate frequent itemsets and rule pruning to obtain significant rules. In the end, they built a classifier using firing rules and confidence score, and the mean of the confidence score is calculated corresponding to the rules fired. Then the corresponding feature is selected with a maximum mean. The single opinion model shows poor accuracy and for multiple opinion words, it shows improvement. Not all the existing machine learning methods are useful for extracting some implicit aspects, therefore Ray and Chakrabarti [14] proposed an approach of DL for extracting aspects within a text and the corresponding sentiment of the user. A deep convolutional network is used to identify aspects from sentences containing an opinion. For the improvisation of a method, a rule-based approach is used. Techniques like hierarchical clustering and PCA are used for determining the category of an aspect. But, the result shows that this method could not extract implicit aspects. Wei et al. [15] presented a model which uses a BiLSTM technique with multiple attention mechanisms for extracting implicit sentiment. This model has an excellent performance in NLP. The model consists of two LSTM layers due to which it can capture the context of the text. Ganganwar et al. [16] presented a survey on recently proposed techniques for detecting implicit aspects. They classified the studies according to approaches they followed. Normally, people tend to express opinions on some parts, aspects of a product or service not as a whole, in this case, document level SA may be insufficient. Aspect-level sentiment analysis can give more useful knowledge about the author’s opinion on various aspects of a product or service. They found context-based,
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
335
unsupervised, rule-based approach, supervised, SVM-based approach, corpus-based and dictionary-based model co-occurrences between opinions and other words. limitations are (1) Cannot figure out implicit features in small scale corpora -Considered only adjectives as opinion words (2) Accuracy of method depends on the dependency parser accuracy and the opinion lexicon (3) For the datasets having less data but more unique implicit features the results are not that good to be useful in practice. (4) The training set has sentences having only one explicit feature—the sentences containing explicit and implicit features both are considered as explicit sentences. - Sentences containing infrequent explicit and implicit aspects are ignored. Detection of implicit features is challenging as people express their views very differently. The sentences may not be grammatically correct, abbreviations may be used, and language habit differs per benchmark to evaluate extraction of implicit aspects. Xu et al. [17] makes use of non-negative matrix factorization techniques for extracting implicit aspects. In this approach, first aspect clustering is done by joint information on the co-occurrence with intra-relations of aspect and opinion words. Then it collects the context information of aspects. Word vectors are used to represent reviews. The last constructed classifier is used to predict the target implicit aspects. Xiang et al. [18] addresses the issue of identifying the implicit polarity of text by using attention mechanism-based models of NN. To train models, they use RNN with gated recurrent units. For capturing multiple aspects, a multiple hop attention mechanism is used. Due to the vanishing gradients of RNN in its training, it does not give better results but has better performance over previous systems and baseline. Hajar and Mohammed [19] author developed a technique for implicit aspect detection by making use of adjectives and verbs. In this method training data, improvement is done for naive Bayes classifiers using synonym and definition relations from WordNet. Fang et al. [20] presents a model for implicit opinion analysis of Chinese customer reviews. From the raw review dataset, they extract the implicit opinionated review dataset. To construct product feature categories they constructed a clustering algorithm and introduced the idea of feature-based opinion patterns that can be discovered from implicit-opinionated datasets. Chi-squared test and PMI techniques are used to find the sentiment intensity and polarity of each FBIOP. With the help of this model, customer needs are captured by analyzing their implicit opinions. Due to this, product improvement and upgrades become possible. Talpur and Huang [21] proposed a solution for the problem of implicit sentiment detection. An isolated list of sentences is formed, which are called complete sentences. Further, these sentences are broken into words from which anonymous words are removed for aspect matching. This method can analyze single and multiple opinionated texts. In this technique, they are using NB tree classifiers that give better performance with F-score. Wang et al. [22] presented a model that is based on two techniques that reduce the problem of features that are weak to identify implicit sentiments in Chinese text. The first technique is HKEM and the second one is multi-pooling. The first one collects information from the text at a different level, and the latter extracts multiple features from the text. This model collects character, local, and global levels of information
336
S. Y. Thakur et al.
from the text in the form of a matrix which then becomes input later which extracts important features. Maylawati et al. [23] presented detection of implicit sentiments in product reviews using the FIN algorithm. An algorithm is based on association rule mining The explicit words from the text are grouped first into clusters and using association rules implicit features are extracted. Velmurugan and Hemalatha [24] proposed the work to extract implicit and explicit relationships among the products in the market basket of customers using association rules and apriori algorithm. This helps in making a decision quickly for selecting and purchasing products, and improving business. Liao et al. [25] proposed the identification of implicit sentiments implied by facts. They used semantic representations for the modeling of sentences. Semantic features are then analyzed at multiple levels which contain sentiment word, implicit sentiment at the sentence level, and context semantic at the document level. Xiang et al. [26] present a mechanism for the identification of implicit event labels and related implicit polarity labels. They handle two different tasks of event classification and implicit sentiment detection. They use multi-task learning and message passing techniques to build the model. Attention-based RNN is used for text classification. Verma and Davis [27] presents a study that is based on reviews in the airline domain for analysis and extraction of implicit aspects and sentiments. First, they identified the entity and corresponding implicit aspects and after that, they are classifying these implicit aspects using machine and ensemble learning. Van Hee et al. [28] worked on news texts for investigating implicit sentiments. They manually annotated news text. They are using lexicon and machine learning approaches for experiments.
3 Analysis In this segment, we are providing an analysis of the existing methods, datasets, and associated limitations and challenges. This analysis is based on 29 research articles collected on implicit and explicit aspects based on sentiment analysis from the years 2017 to 2021. These research articles are collected from various electronic databases like Springer, IEEE, Elsevier, Inderscience, etc. This segment gives details about various categories of techniques used for the extraction of implicit and explicit reviews. Machine learning, lexicon-based, and hybrid are three major categories for this area. Further, these techniques are classified as supervised, semi-supervised, and unsupervised. Phase one of the analysis mainly concentrates on a broad outline of these techniques.
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
337
3.1 Outline of Aspect Extraction Techniques In ABSA, identifying aspects whether implicit or explicit or both is the main activity. In this research, techniques used are primarily grouped into supervised, semisupervised, and unsupervised. This classification of techniques is briefly depicted in Table 2. Table 1 represents which article works on implicit, explicit, or both types of SA in respective years of investigations.
3.1.1
Supervised Aspect Extraction Techniques
When we make use of these techniques to extract aspects and corresponding sentiments from reviews, it needs to be labeled or annotated data and trained algorithms. Some examples of these techniques are association rule mining [22, 24], Conditional random fields [10, 13], and long short-term memory [14].
3.1.2
Semi-Supervised Aspect Extraction Techniques
These techniques can work on both annotated and unannotated data and need very little training or absolutely no training for extraction of aspect and corresponding sentiment from the reviews. Some examples of these techniques are recurrent neural network (RNN) [18], and dependency parsing [29]. Table 1 Implicit and explicit aspects Ref. No.
Year
Aspect
Ref. No.
Year
Aspect
[1]
2018
Implicit
[16]
2019
Implicit
[3]
2017
Implicit
[17]
2019
Explicit, Implicit
[4]
2017
Implicit
[18]
2019
Implicit
[5]
2017
Explicit, Implicit
[19]
2019
Implicit
[6]
2017
Implicit
[29]
2019
Implicit
[7]
2018
Explicit, Implicit
[20]
2020
Implicit
[8]
2018
Implicit
[21]
2019
Implicit
[9]
2018
Explicit, Implicit
[22]
2020
Implicit
[10]
2018
Implicit
[23]
2020
Implicit
[11]
2019
Explicit
[24]
2020
Explicit, Implicit
[12]
2019
Implicit
[2]
2020
Explicit, Implicit
[13]
2019
Implicit
[25]
2018
Implicit
[14]
2019
Explicit, Implicit
[26]
2021
Implicit
[15]
2019
Implicit
[27]
2021
Implicit
[28]
2021
Implicit
338
S. Y. Thakur et al.
Table 2 Techniques and approaches S. No.
Techniques
Approach
References
1
Bag of words
Unsupervised
[3]
2
CNN
Supervised
[3]
3
SVM
Supervised
[3, 10, 11, 27]
4
Word embedding
NLP based
[3]
5
LDA
Unsupervised
[5]
6
Aspect sentiment mapping
Aspect based
[20]
7
Whale optimization algorithm
Meta-heuristics
[7]
8
Co-occurrence matrix
Supervised
[7, 17, 22]
9
Normalized google distance
Semantic similarity
[7]
10
DCNN
ANN
[8]
11
Part of speech tagging
NLP
[3, 8, 10, 11, 14, 17, 29, 22, 24]
12
Wordnet semantic relations
Dictionary based
[10]
13
Random forest
Supervised
[10, 13, 27]
14
MNB
Statistical
[10]
15
SMO
Supervised
[10]
16
Word stemming
Text preprocessing in NLP
[11, 13]
17
Naïve Bayes
Supervised, Statistical
[11, 13, 19]
18
Stop-word-removal
Text preprocessing in NLP
[13, 17, 22]
19
TF-IDF
Statistical
[13, 23]
20
Association rule mining
Unsupervised
[22, 24]
21
Apriori algorithm
Unsupervised
[22, 24]
22
Chi-square test
Statistical
[3, 20, 22]
23
Rule based
Unsupervised
[14]
24
Dependency parsing
Unsupervised
[14, 17]
25
Machine learning
Unsupervised
[14]
26
BiLSTM (LSTM)
Supervised
[14]
27
RNN
ANN
[18]
28
Double propagation
Semi-supervised
[17]
29
Non-negative matrix factorization
Supervised
[17]
30
Synonym relation
Dictionary based
[19]
31
Definition relation
Dictionary based
[19] (continued)
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
339
Table 2 (continued) S. No.
Techniques
Approach
32
Word segmentation
NLP based, Supervised [21]
33
Parsing
NLP based unsupervised
[29]
34
Co-occurrence rule
Unsupervised
[29]
35
PMI
Unsupervised
[3, 20]
36
FBIOP
Unsupervised
[20]
37
Hierarchical knowledge enhancement learning
Unsupervised
[22]
38
Multi-pooling
Unsupervised
[22]
39
FIN algorithm
Unsupervised
[23]
40
Particle swarm optimization
Unsupervised
[23]
41
K-means
Unsupervised
[23]
42
NLTK
Unsupervised
[24]
43
Data augmentation
Supervised
[8]
44
SDT-CNN
Supervised
[25]
45
BERT
Unsupervised
46
Conditional random field Supervised
[27]
47
Decision trees
Supervised
[13, 27]
48
Word2vec
Unsupervised
[3]
49
Glove
Unsupervised
[3]
50
Sequential algorithm
Supervised
[8]
51
Conditional random fields
Unsupervised
[27]
3.1.3
References
[28]
Unsupervised Aspect Extraction Techniques
Methods or algorithms in this category that work on unlabeled data, are not trained and are most popularly used in ABSA. Some examples of these techniques are LDA [5] and rule-based [14].
3.2 Explicit Aspect Extraction Techniques In the first phase, we have seen the outline of techniques for ABSA. Now, we will discuss the techniques which are used for extracting explicit aspects and the corresponding sentiments only. From the previous studies, it is observed that unsupervised techniques are used most of the time to extract explicit aspects and associated sentiments. This is because they are feasible for unlabeled data, no training required, less
340
S. Y. Thakur et al.
amount of time to deploy, and faster. Semi-supervised and supervised techniques are also used next to unsupervised techniques for the same.
3.3 Implicit Aspect Extraction Techniques Investigating implicit aspects and associated sentiments is a new and challenging field as compared to its counterpart because implicit sentiment is ambiguous and semantic due to which it is very difficult to extract. It has been seen that mostly unsupervised techniques are used. Some research articles also indicate that semi-supervised and supervised techniques can also give better results followed by unsupervised techniques.
3.4 Implicit and Explicit Aspect Extraction Techniques While investigating both aspects and related sentiments from reviews it becomes challenging to deploy a technique that can efficiently perform on both parameters. In such conditions, semi-supervised techniques are mostly deployed because in this case a little bit of data annotations is required.
4 Aspect Extraction Techniques with Their Limitations This section briefly explains the working, some of the benefits, and limitations of the techniques widely used during the process of extracting explicit and implicit aspects and corresponding sentiments with more or less impact. However, CRF is found to be a method with a high impact for extracting explicit and combined implicit and explicit aspects.
4.1 Conditional Random Fields Verma and Davis [27] used conditional random fields (CRF) for extracting implicit aspects based on user-generated reviews from an airline domain-specific aspectbased annotated corpus collected from Trip advisor and Airline ratings which are online microblogging platforms primarily used for viewing reviews and experiences of travelers. Conditional random fields (CRF) is fundamentally a process of combining the advantages of graphical modeling and discriminative classification, which combines the ability to efficiently model multivariate outputs with the capacity of leveraging a large number of input features for prediction. Being the
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
341
prominent techniques in both explicit and implicit aspect extraction, a significant number of researchers relied on the CRF for the aspect extraction, and eventually led to reasonable performances [27]. However, it suffers from the drawback of the limited dataset which affects the strength of the research findings. It is an ineffective approach to detect emotionally related conversations such as joy, disgust, sadness. Its drawback comes from the polarity identification approach, which lacks the capacity of handling the positive, negative, and neutral polarity categories. It suffers a problem of multi-word aspect terms which entails missing lots of aspects due to unreliable feature selection. This, in turn, reduces the recall value. The result obtained in the end shows that the techniques are effective in detecting aspect terms and their polarity as well as the aspect categories and their polarity. However, it was found that the technique cannot handle such tasks in another domain due to dataset limitations.
4.2 Non-negative Matrix Factorization Xu et al. [17] used this technique for the detection of implicit aspects. In this technique, they first form the clusters of product aspects and opinion words. The context of the aspect is then identified and reviews are represented using a word vector, and in the end with the help of a classifier build, detects implicit aspects. This approach divides the original data into two matrices that do not contain any negative value and then are compared for investigation [17].
4.3 Double Propagation Xu et al. [17] used this method for extracting aspects and associated opinion words from a dataset of reviews based on one product only. In this method, rules of extraction are formed and used to obtain explicit aspects and related opinion words. This method works on syntactic information between aspects and opinion words. Based on initial opinion words and available existing sentiment words this method extracts new aspects and opinion words and continues to extract them until there are no new words [17].
4.4 Convolution Neural Network (CNN) Ray and Chakrabarti [14] used this algorithm to find sentiments or opinions in the set of input reviews. This is an AI-based algorithm which is also called a deep learning algorithm. It consists of different layers like the input layer, hidden layers, and output
342
S. Y. Thakur et al.
layers. These layers automatically learn the input and generate features. This algorithm is useful in the classification of features and requires less pre-processing. Efficient feature selection and overfitting are limitations of this method [14].
4.5 Parts of Speech (POS) Tagging Ray and Chakrabarti [14] used this process also called one of the preprocessing steps in which words in the sentence are attached with tags or labels as per their part of speech like nouns or adjectives. This process helps in finding the polarity score of the corresponding aspect in the review sentence [14].
4.6 Dependency Parsing Ray and Chakrabarti [14] used dependency parsing to extract corresponding sets of aspect and opinion words using some syntactic relation and set of rules. With the help of this technique, they are obtaining the grammatical structure of a sentence, and the relationship between main and other words. Here, researchers use a dependency parser for extracting aspect and dependency relations with opinion words. The limitation of the method is that it cannot extract all pairs of aspects and opinions [14].
4.7 Word Embedding It is a technique of word representation for text where the words having the same meaning have similar representation and are mapped to real numbers vectors. Each word in the input sentence is represented as a word vector Ray and Chakrabarti [14]. In this paper, the author uses the Skip-Gram Model for the implementation of the WE technique. This method is said to be very efficient and predictive for word embedding in text. One-dimensional integer vectors are used for the representation of the target word tokens and a sample of the context word tokens in the input sentence. If the sample word is present in context then the prediction is 1, otherwise, it is 0 [14].
4.8 Rule-Based Ray and Chakrabarti [14] uses a set of rules with deep learning techniques for extraction of aspects and to detect polarity from electronics product reviews, movie reviews, and restaurant reviews. Researchers use Stanford Dependencies rules to extract aspects. Rule-based methods or approaches have limitations in that they
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
343
cannot detect more complex, syntactic, and semantic relations among the aspects [14].
4.9 Recurrent Neural Network Xiang et al. [18] use a gated recurrent unit model of RNN to learn the representation of sentences. It is an artificial neural network that can process sequential data. It is observed that this model has a limitation of vanishing gradients in its training and therefore could not give better results. The errors that researchers observed are misclassification of neutral category, negation in incorrectly predicted examples, and the last is that this model cannot give results for extremely complex sentences with multiple events [18].
4.10 Long Short-Term Memory Wei et al. [15] in their paper states that only from explicit sentiment words we could not identify aspects. They proposed a Bidirectional LSTM model with multipolarity orthogonal attention for implicit sentiment analysis. It is a sequence process model. This model consists of two LSTM layers, one for taking the input in the forward direction and the other in the backward direction. The BiLSTM model is widely used in NLP tasks and has excellent performance. Due to its bidirectional properties, context information can be obtained precisely [15].
4.11 Multinomial Naïve Bayes Hannach and Benkhalifa [10] used MNB which is a variant of the Naive Bayes statistical algorithm and is widely used for building classifiers for the classification of text and SA. This model assumes that the features are independent of each other to estimate probabilities simply. It is a probabilistic model [10].
4.11.1
Support Vector Machine
Hannach and Benkhalifa [10] used SVM which is a classification algorithm in supervised categories. It is widely used in text classification and sentiment analysis. SVM has always proven to be the best classifier and gives the highest performance for classification tasks. Researchers in their work, use the version of SVM known as sequential minimal optimization (SMO) [10].
344
S. Y. Thakur et al.
4.12 Random Forest Hannach and Benkhalifa [10] used a random forest algorithm for text classification and SA. It is a supervised machine learning algorithm constructed from decision tree algorithms. RF is a well-known classifier based on many classification trees. To construct a forest, decision tree nodes are created first based on a randomly chosen subset of features. For building individual trees, the bagging method is used to generate training data subsets, and lastly, a random forest classifier is obtained by combining these individual trees [10].
4.13 Bag of Words It is a method of extracting features from text which are then used in text modeling. When we make use of any algorithm in NLP, it makes use of numbers in the sense that we could not directly input our text to that algorithm. It is a preprocessing step used for text that converts it into a bag of words which gives us the total occurrences of the most frequently used words in the text [3].
4.14 Word2vec Word2vec is an algorithm used for producing distributed representations of words. It generates or suggests synonymous words or additional words for words in input sentences. A neural network is used to learn associations from a large set of text. Each distinct word is represented by a list of numbers called vectors [3].
4.15 GloVe GloVe stands for Global vectors and is used as a distributed representation model for words. This is an unsupervised learning algorithm. The words are mapped into a meaningful space where the semantic similarity is used as the distance between two words. This algorithm is trained on aggregated global word-word co-occurrence statistics from a corpus [3].
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
345
4.16 Latent Dirichlet allocation LDA is a technique or mechanism which is used to extract topics from documents. In this technique, the words in each document are mostly captured by imaginary topics where the topics are identified based on the likelihood of co-occurrences of words contained in them. Though the technique is a generative statistical model it belongs to AI and machine learning fields [5].
4.17 TF-IDF A popular technique for assigning weights to terms or words as per their importance in a document. TF means the numeric value which indicates how many times a term occurs in a document and IDF means a factor that vanishes the weight of terms that occurs commonly and frequently and increases the weight of terms that occurs rarely. This technique is used in searches of search engines, information retrieval, and user modeling [13, 23].
4.18 FIN Algorithm The fin algorithm is mainly used for generating or discovering a frequent itemset (means group of items) from transaction databases where each transaction is a set of items, and the set of transactions is called a transaction database. In this algorithm, each itemset is assigned a support value that is how many times the itemset appears in the transaction database. A frequent itemset is obtained by observing an itemset appearing in at least minsup transactions from the transaction database, where minsup is a parameter specified by the user. Items having a support value less than the minimum support are discarded [23].
4.19 Whale Optimization Algorithm Whale optimization algorithm for rule selection (WOA) is a newly bio-inspired metaheuristic algorithm, which proved its search capability in comparison with the other famous optimization algorithms such as genetic slgorithm (GA) and particle swarm optimization (PSO). The search methodology proposed in WOA is based on mimicking the hunting behavior used by humpback whales to catch their prey [7].
346
S. Y. Thakur et al.
4.20 Explicit Aspect Sentiment Mapping Aspect sentiment mapping is a very important step in ABSA. It corresponds to aspects with sentiment words. In this technique, first, they extract the group of sentiment words and noun groups from sentences and then match them. Thereafter, the sentiment score is calculated for each aspect in the noun group and the mapping between aspect and sentiment score is generated. This technique gives better results at the basic level [6].
4.21 Feature-Based Implicit Opinion Pattern This technique makes use of frequent pattern mining, pattern selection, and polarity calculation to extract implicit reviews if the review contains at least one feature and expresses positive or negative sentiment [20].
4.22 Association Rule Mining In papers [13, 24] an approach is presented that is based on association rule mining and classifications technique. A set of association rules is created from the corpus by grouping explicit feature-opinion pairs. Then these rules are used to construct classification models that will predict the appropriate target for each set of opinion words. ARM is an important data mining technique. Two measures of support and confidence are used to prune rules.
4.23 Word Stemming, Tokenization, and Stop Word Removal These are the techniques that are used as preprocessing techniques that require before making use of text for sentiment analysis. Tokenization is the technique of finding words as tokens by separating sentences into different words, numbers, and punctuation symbols. Stop-word removal techniques remove stop words like “is”, “our”, “at” and many such do not contribute to represent information, and so are removed. Word stemming means removing prefixes and suffixes from the word and converting it to the original word [11].
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
347
4.24 Naïve Bayes It is a collection of algorithms used for classification based on the naive Bayes theorem which relies on the principle that each pair of features that will be classified is independent of each other. This model uses probabilities for the distribution of reviews in each class [11].
4.25 Co-occurrence Rules Co-occurrence rules make it possible to identify concepts and group them if they are strongly related to a set of documents or sentences. The concepts strongly co-occur when they frequently appear in a set of documents. It correlates aspects and opinion words. This method gives better results with large datasets [29].
4.26 Parsing This technique is also known as syntax analysis breaks the sentence into different components called tokens and defines their syntactic role, and then it forms patterns from a sequence of tokens. Parsing a tree is usually generated for representation [29].
4.27 Hierarchical Knowledge Enhancement-Based Presentation Learning This method is used to detect Chinese implicit sentiments. This is a knowledge learning method to work on character-level information, but do not have strong reasoning and are unable to detect hidden sentiments [22].
4.28 Multi-pooling It is a method used to handle multiple sentiments in the text. It solves the problem of weak features by combining multiple maximum features and obtaining a high-level feature matrix. The limitation is that it has no strong reasoning ability [22].
348
S. Y. Thakur et al.
4.29 Word Segmentation In this method, sentences are broken into meaningful words. Word boundaries are determined in a sentence with the help of computer algorithms. The output of this method is a list of words [21].
4.30 Sentence Tokenization When individual sentences are identified by splitting text, the process or method is called sentence tokenization. Here, the obtained sentence may be a token of a particular paragraph of text [24].
4.31 NLP Based The main role of natural language processing is to make computers able to communicate with humans in their languages, measure sentiments in a given text, and classify sentiments, speech recognition, text summarization, and many more tasks. In NLP, NLTK or natural language toolkit is a huge library of functions like stemming, tokenizing, pos tagging, and other various algorithms for preprocessing of text that can be used for building sentiment analysis models. NLTK is written in Python language [24].
4.32 Apriori Algorithm This algorithm is used in market basket analysis for mining frequent itemsets and further for discovering association rules from transaction databases. Two factors, support, and confidence are used to prune association rules where support is the frequency of item occurrence and confidence is a conditional probability [24].
4.33 Multi-level Semantic Fused Representation In this technique, semantic decision tree-based convolution and neural network models are used. Semantic representations are useful in sentence modeling. This model has five layers as a traditional CNN model. The limitation is that the model cannot make complete use of dependency relation information [25].
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
349
4.34 Multi-task Learning In [26], the author presented the message passing multi-task learning model in which multiple tasks are learned in parallel in such a way that what is learned by one task helps other tasks to learn also. This is achieved by sending useful information back to the model during training. It is a subfield of machine learning.
4.35 Bert The study [28] presents the BERT model designed for computers to understand the language. It can read the input in both directions and is a machine learning technique for NLP.
4.36 Chi-Square Test This test is used to check whether the two variables are related by chance or by relation [3].
4.37 Normalized Google Distance (NGD) Normalized Google Distance (NGD): NGD is used to find relatedness (cooccurrence) between two words based on a web search (Google). Where f (x, y) returns the number of web pages containing both x and y, f (x) is the number of returned pages that contain x, f (y) the number of returned pages that contain y, and N is the total number of pages indexed by Google. The values returned from applying NGD are between 0 and infinity. The value of 0 means that both terms x and y always co-occur together in the webpage, but if the terms never co-occur together within the same page its NGD equals infinity [7].
4.38 Particle Swarm Optimization It is a method that optimizes the problem under study based on swarm intelligence. In sentiment analysis, PSO is used for feature selection so that the dimensions of features get reduced. Based on the threshold value, word features are selected [23].
350
S. Y. Thakur et al.
4.39 K-Means Clustering It is an unsupervised technique of groping unlabeled data into groups represented by numbers like 0,1,2 and so on. The distance of a data point from its cluster centroid is calculated and accordingly data points are placed in respective clusters. With this technique in aspect-based sentiment analysis first explicit aspect clusters are obtained, and then the implicit aspect is extracted from a cluster of explicit aspects [23]. The summary of techniques is presented Table 2.
5 Data Domain In this section, we discussed data domains that are used for explicit and/or implicit aspects. The data domains found in different research papers ranging from the years 2017 to 2021are listed in a Table 3. It is analyzed that product review, hotel review datasets are mostly explored by the researchers for investigation in explicit and implicit aspect-based sentiment analysis. The product review dataset contains reviews of multiple products and is suitable for aspects extraction as it contains different product reviews on multiple products. Two broad categories of data domains are used in this study namely single data domain and multiple data domains. Single data domains mean there is a use of only one data domain for study in paper, whereas multiple data domains mean two or more than two data domains are used for study in the paper.
6 Language Domain In this particular section of our systematic review, we discussed the language of the data domain used to extract explicit and/or implicit aspects. From the tabular data collected and analyzed in Sect. 5, we observed frequencies of languages used for aspect extraction and infer that the most commonly used language is English, the second most commonly used language is Chinese, and thereafter Turkish and Dutch. Other languages are not used and remain unexplored for explicit and/or implicit aspect extraction and become a thrust area for future work.
7 Evaluation Metrics Overview In the process of sentiment analysis, groping the sentiment as positive or negative sentiment, explicit or implicit sentiment is a classification problem. When we opt for a set of classifiers for sentiment prediction, it becomes necessary to verify the
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
351
Table 3 Datasets and languages S. No.
Data domains
Language
References
1
200 real time sentences in CSV file
English
[5]
2
Hotel reviews
Chinese
[3]
3
Product reviews
English
[7, 14]
4
Mobile phone reviews
Chinese
[26]
5
Tech and architecture related post
English
[9]
6
Crime dataset
English
[10]
7
Yelp dataset
English
[11]
8
Feedback of IPTV viewers
English
[13]
9
Hotel reviews
English
[14, 29, 22]
10
Product reviews
English
[14, 24]
11
Movie reviews
English
[14]
12
Restaurant reviews
English
[14]
13
2019 SMP-ECISA Chinese implicit sentiments
Chinese
[15]
14
Customer review dataset
English
[17]
15
ABSA-15
English
[17]
16
SemEval 2015 dataset
English
[18]
17
ClipEval dataset
English
[18, 26]
18
Electronic product dataset
English
[19]
19
Restaurant dataset
English
[19]
20
Hotel review dataset
English
[29]
21
Car forum review dataset
Chinese
[20]
22
Electronic product dataset
English
[21]
23
SMP-ECISA2019 dataset
Chinese
[19]
24
Product review dataset
English
[23]
25
Airline review dataset
English
[27]
26
Product review on specific mobile phone
Turkish
[6]
27
ClipEval dataset
English
[26]
28
Fact-implied implicit sentiment corpus
English
[25]
29
Dutch news event dataset
Dutch
[28]
performance of these classifiers by calculating the accuracy of prediction. Since, sometimes accuracy alone could not give better results for classifiers, in that case, we required other measures like precision, recall, F-measure, and F1-measure for evaluating the performance of classifiers. The most commonly used evaluation metric for sentiment analysis is a set of these measures. True positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) are the parameters or input factors given to these measures to verify the performance.
352
S. Y. Thakur et al.
7.1 Accuracy It is a ratio of the number of records classified correctly to the total number of records. When a particular model achieves high accuracy it means that the model is trained properly and the results obtained are accurate. It is determined using the following formula. Accuracy = (TP + TN)/(TP + FP + FN + TN)
(1)
7.2 Precision It is a measure to calculate exactness and is a ratio of the number of correctly classified true positive records to the total number of true and false-positive records. It is determined by using the following formula. Precision = (TP)/(TP + FP)
(2)
7.3 Recall This is a measure to calculate the completeness of results and can be obtained by taking a ratio of correctly classified true positive records to the total number of true positive records and false-negative records. It is determined by the following formula. Recall = (TP)/(TP + FN)
(3)
7.4 F1 or F Measure When the values of precision and recall go very high or very low, it becomes very difficult to make decisions based on these parameters. In that case, we have to verify and check the validity of results using the F measure. The F measure is a harmonic mean or weighted average of precision and recall and is determined using the following formula. F1 = (2 ∗ Precision ∗ Recall)/(Precision + Recall)
(4)
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
7.4.1
353
Evaluation Metrics Analysis
In this section, we observed that most of the researchers used these four evaluation measures: accuracy, precision, recall, and F score for measuring the performances of models of explicit and/or implicit aspect extraction. From this systematic review, we can confirm that not only accuracy but other measures are also equally used to measure the performance of models.
8 Major Findings In the area of research of aspect-based sentiment analysis, finding explicit as well as implicit aspects and corresponding sentiments is equally important as implicit sentiments are hidden and if ignored will impact business intelligence, recommendation systems, forecasting, and planning of entities and services for individual users of the web. This research area is attracting the attention of many researchers and raising the publications in the last few years. Three major approaches and techniques used to explore this area are the machine learning approach, lexicon-based approach, and hybrid approach. Unsupervised, supervised, semi-supervised, and combinations of techniques are used to detect sentiments. Unsupervised techniques are most frequently used as they work on unlabeled data and are faster than other ones, but semi-supervised and supervised or different combinations may also give better results. More work is required in implicit aspect extraction as compared to explicit aspect extraction because implicit opinions are ambiguous and semantic by nature. From the study of existing research papers under investigation for systematic review, it reveals that unsupervised techniques like dependency parsing, statistical, rule-based, pattern-based, and NLP-based are the most commonly used techniques for extracting explicit aspects as unsupervised approaches do not require data annotation and training, which ultimately reduces the cost of the model. Among the supervised approach which is second in usage in extracting explicit aspects, the techniques like conditional random fields (CRF), CNN, LSTM, and backpropagation are used by most of the researchers. The researcher relies mostly on CRF as it shows good results with less number of features. RNN is a technique among the semi-supervised category for extracting explicit aspects. As most of the researchers focus on extracting explicit aspects, there is limited work on pure extraction of implicit aspects due to its ambiguous nature and most of the techniques used are trial-based by involving all three approaches. Techniques like co-occurrence, SVM, matrix factorization, and CRF are used for the extraction of implicit aspects. The study, also reveals that while extracting combined aspects most of the work relied on semi-supervised techniques like lexicon-based, RNN, and clustering as some of the data requires annotation, especially in the case of implicit aspects due to its ambiguous nature.
354
S. Y. Thakur et al.
We also explore that among the performance evaluation measures of models, accuracy is not the only measure to evaluate but precision, recall, and F measure also that can validate the performance of a model. This study also emphasizes data domains and language domains which indicates that among the data domain product review data and hotel review data are frequently used and other data domains are less explored or completely ignored. Similarly in the language domain, English is the most frequently used language, Chinese is the second most used language, and other languages like Spanish, Arabic, and many more are unexplored.
9 Future Research Directions In this systematic review, the research papers from 2017 to 2021 based on explicit and/or implicit aspect extraction in sentiment analysis are investigated thoroughly. After going through findings, results, analysis, and discussion, future research directions are formulated. 1. Implicit aspect and target sentiment extraction are becoming a more promising future research direction because of their ambiguous and semantic nature. 2. The data domains other than product and hotel reviews are there like online educational services, clothes, fashion, healthcare, sports, and travel that impact on day to day life of individuals are unexplored 3. To avoid bias in results, the dataset should be unbiased, and hence standard datasets in explicit/implicit aspect and sentiment detection are required for precise evaluation. 4. English and Chinese are the most common languages used for explicit and/or implicit aspect and sentiment detection, other languages, cross-domain, and multilingual problems are unexplored. 5. Identifying the source and cause of implicit sentiment is required 6. Hybridization of approaches and techniques can remove the disadvantages of individual approaches and techniques. 7. For optimal performance of the model, the right features should be selected because they play an important role in aspect and sentiment extraction. Appropriate Feature selection for the same is a future research direction
10 Conclusion Through this study, we have presented an in-depth systematic review of explicit and implicit-based sentiment analysis. Nowadays, this research area is found to be the most dynamic research area in NLP. For this SR, we collected 29 research papers on explicit and/or implicit sentiment analysis ranging from the year 2017 to 2021. Aspect extraction is the most vital stage of sentiment analysis to carry out sentiment classification. Very few studies are conducted in this area to identify various research issues.
A Systematic Review on Explicit and Implicit Aspect Based Sentiment …
355
This review is conducted to investigate and learn taxonomies, approaches, frameworks, aspect extraction techniques, data domains, language domains, architectures, performance metrics in explicit and/or implicit aspect-based sentiment analysis. This review shows that there are three major approaches to ABSA, viz., machine learningbased, lexicon-based, and hybrid, and three categories of techniques, viz., unsupervised, supervised, and semi-supervised techniques are usually used to identify aspects and corresponding sentiments. The review also shows that most of the studies cover only product review and hotel review data domains, and other data domains which are of interest and impact on the day-to-day life of individuals in society are completely ignored. The further review gives details about language domains most common and rarely used and performance metrics for evaluating optimal models. Future research directions listed in the review may be beneficial to upcoming researchers in this area.
References 1. Tubishat M, Idris N, Abushariah MA (2018) Implicit aspect extraction in sentiment analysis: Review, taxonomy, opportunities, and open challenges. Inf Process Manage 54(4):545–563 2. Maitama JZ, Idris N, Abdi A, Shuib L, Fauzi R (2020) A Systematic review on implicit and explicit aspect extraction in sentiment analysis. IEEE Access 8:194166–194191 3. Huang HH, Wang JJ, Chen HH (2017) Implicit opinion analysis: Extraction and polarity labeling. J Am Soc Inf Sci 68(9):2076–2087 4. Dadhaniya B, Dhamecha M (2017) A novel approach for multiple aspect based opinion summarization using implicit features. Int J Eng Res V6(05) 5. Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection. ACM Comput Surv 50(5):1–22 6. Kama B et al (2017) Analyzing implicit aspects and aspect dependent sentiment polarity for aspect-based sentiment analysis on informal Turkish texts. In: Association for computing machinery, MEDES’17, 7–10 Nov 2017, Bangkok, Thailand. Young M (2017) The technical writer’s handbook. University Science, Mill Valley, CA, 1989 7. Tubishat M, Idris N (2018) Explicit and implicit aspect extraction using whale optimization algorithm a hybrid approach. In: Proceedings of the 2018 international conference on industrial enterprise and system engineering (IcoIESE 2018) 8. Feng J, Cai S, Ma X (2018) Enhanced sentiment labeling and implicit aspect identification by integration of deep convolution neural network and sequential algorithm. Clust Comput 22(S3):5839–5857 9. Mushtaq H, Hayat B, Azkar S, Bin U, Shahzad M, Siddique I (2018) Implicit and explicit knowledge mining of crowdsourced communities: architectural and technology verdicts. Int J Adv Comput Sci Appl 9(1) 10. Hannach HE, Benkhalifa M (2018) WordNet-based implicit aspect sentiment analysis for crime identification from twitter. Int J Adv Comput Sci Appl 9(12) 11. Irfan R, Khalid O, Khan MUS, Rehman F, Khan AUR, Nawaz R (2019) SocialRec: A contextaware recommendation framework with explicit sentiment analysis. IEEE Access 7:116295– 116308 12. Kren M, Kos A, Sedlar U (2019) Modeling opinion of IPTV viewers based on implicit feedback and content metadata. IEEE Access 7:14455–14462 13. Lazhar F (2019) Implicit feature identification for opinion mining. Int J Bus Inf Syst 30(1):13 14. Ray P, Chakrabarti A (2020) A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl Comput Inform 18(1/2):163–178
356
S. Y. Thakur et al.
15. Wei J, Liao J, Yang Z, Wang S, Zhao Q (2020) BiLSTM with multi-polarity orthogonal attention for implicit sentiment analysis. Neurocomputing 383:165–173 16. Ganganwar V, Rajalakshmi R (2019) Implicit aspect extraction for sentiment analysis: a survey of recent approaches. Procedia Comput Sci 165:485–491 17. Xu Q, Zhu L, Dai T, Guo L, Cao S (2019) Non-negative matrix factorization for implicit aspect identification. J Ambient Intell Hum Comput 11(7):2683–2699 18. Xiang C, Ren Y, Ji D (2019) Identifying implicit polarity of events by using an attention-based neural network model. IEEE Access 7:133170–133177 19. El Hajar H, Mohammed B (2019) Using Synonym and definition WordNet Semantic relations for implicit aspect identification in sentiment analysis. In: NISS19: proceedings of the 2nd international conference on networking, information systems & security, Rabat Morocco 27–29 Mar (2019) 20. Fang Z, Zhang Q, Tang X, Wang A, Baron C (2020) An implicit opinion analysis model based on feature-based implicit opinion patterns. Artif Intell Rev 53(6):4547–4574 21. Talpur DB, Huang G (2019) Words segmentation-based scheme for implicit aspect identification for sentiments analysis in English text. Int J Adv Comput Sci Appl 10(12) 22. Wang H, Hou M, Li F, Zhang Y (2020) Chinese implicit sentiment analysis based on hierarchical knowledge enhancement and multi-pooling. IEEE Access 8:126051–126065 23. Maylawati DH, Maharani W, Asror I (2020) Implicit aspect extraction in product reviews using FIN algorithm. In: 2020 8th International conference on information and communication technology (ICoICT), pp.1–5 24. Velmurugan T, Hemalatha B (2020) Mining implicit and explicit rules for customer data using natural language processing and Apriori algorithm. Int J Adv Sci Technol 29(9s):3155–3167 25. Liao J, Wang S, Li D (2019) Identification of fact-implied implicit sentiment based on multilevel semantic fused representation. Knowl-Based Syst 165:197–207 26. Xiang C, Zhang J, Ji D (2021) A message-passing multi-task architecture for the implicit event and polarity detection. PLoS ONE 16(3):e0247704 27. Verma K, Davis B (2021) Aspect-based opinion mining and analysis of airline industry based on user-generated reviews. SN Comput Sci 2(4) 28. Van Hee C, De Clercq O, Hoste V (2021) Exploring implicit sentiment evoked by fine-grained news events. In: Proceedings of the 11th workshop on computational approaches to subjectivity, sentiment, and social media analysis, pp 138–148 29. Setiowati Y, Djunaidy A, Siahaan DO (2019) Pair extraction of aspect and implicit opinion word based on its co-occurrence in corpus of Bahasa Indonesia. In: 2019 International seminar on research of information technology and intelligent systems (ISRITI), pp 73–78
Experimenting Different Classification Techniques to Identify Unknown Author Using Natural Language Processing Priyanshu Naudiyal, Chaitanya Sachdeva, and Sumit Kumar Jindal
Abstract Every author has its unique style of writing and also a certain fashion of using certain words making their literature different from others. In today’s scenario, where there is a lot of plagiarism and also exploitation of various articles it becomes important to identify the author. This experiment aims at detecting authors based on their use of certain words and their writing style. An accuracy of 84% in detecting the actual authors was observed. Behavior of various mathematical models was also observed for predicting the authors. Impact of stop words and cleaning of sentences in predicting the result was also analyzed. Keywords Stylometry · Natural language processing · Naive Bayes · SVM · Decision tree · Lemmatization · Stemming
1 Introduction Natural language processing (NLP) is a domain of artificial intelligence (AI) dealing with human and computer interactions. It gives machines the ability to understand human texts. Human language is very complex and ambiguous which includes various variations like sentence variation, tense variation and even use of different words which mean the same thing. So, NLP breaks down these complex variations into a form that is readable to machines. There are many NLP tasks, such as language translation, speech recognition, sentiment analysis, summarizing of stories, word prediction, and many more. One of the important tasks of NLP is authorship identification. Authorship identification is identifying the author from its style of writing. In this world, where everything is available online the risk of piracy and plagiarism also increases. Since every author has some unique style or stylometry, it becomes necessary to identify to whom that piece of writing belongs. Authors are identified based on stylometry [1]. It P. Naudiyal · C. Sachdeva · S. K. Jindal (B) School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_25
357
358
P. Naudiyal et al.
can also be beneficial for various tasks including bibliography, plagiarism detection, and book suggestions. The study of linguistic style which is usually applied to written languages is called stylometry. Majority of previous research was based on stylometric features. It also becomes difficult sometimes to identify what features to use for author identification, so to overcome this problem two or more stylometric features can be used. Stylometric analysis can be classified into two main groups, i.e., supervised and unsupervised methods. Supervised techniques refer to methods having labels for categorization of data. It has a very wide variety of applications. It can be used in investigation fields such as forensics, for example, suspects can be identified based on their messages, whereas unsupervised techniques can be used when there is no prior knowledge. Based on the dataset used in this work, the supervised technique fits the best, as it contains labels for each author with their text snippets and requires less time to execute. After classifying the dataset as supervised or unsupervised, the next step is to find the perfect model. The model which gives the most desired outputs is known as the perfect model. To find the perfect model, one needs to understand, analyze and process the data. Based on these observations, various algorithms are used and then finally the promising one is chosen. During the proposed work, various classification models were used to identify the author accurately. Initially, support vector machine (SVM) was used. SVM is a supervised machine learning model, it can be used for both regression and classification. It is generally used for classification. In SVM, the output is predicted by segregating two classes by a hyperplane (planes used to segregate different classes). The right hyper plane is found by measuring the distance of the plane from the center of both classes [2]. Next, naive Bayes model was used. Naive Bayes is a classification model based on the assumption that the presence of a particular feature in a class is unrelated to the presence of any other feature, it is easy to build and suitable for large datasets. Finally, the decision tree model was used. It is also a classification model and falls under supervised machine learning models. It uses tree representation to solve the problems, similar to trees, it has leaves and roots, where each leaf represents labels of the dataset, whereas the root node represents the entire dataset. It creates a model which predicts the target variable by making simple decisions based on rules that are formulated on the basis of previous data.
2 Literature Survey Mohsen et al. [3] did identification using deep learning algorithms. A stacked denoising autoencoder was used to bring out document features and then SVM is used for classification. Finally, concluding that chi-square-based feature selection outperforms frequency-based features. It required less time as well as less dimensions.
Experimenting Different Classification Techniques to Identify …
359
Yavanoglu [4] used data mining and artificial neural network (ANN) and proposed an intelligent classification technique for authorship identification. The dataset consisted of around 22,000 Turkish newspaper articles belonging to different genres. The experiment indicated a success rate of 97% and was achieved by Levenberg Marguardt based classifier. Layton et al. [5] suggested a fresh pre-processing technique for authorship identification. They gathered 14,000 twitter user constructed corpus which included usernames, dates, and contents. The suggested method, source code authorship profile (SCAP) is categorized into three stages. The first stage includes the division of the dataset into training data and test data. In the second stage, merging of all authors as training data takes place and the top L most frequent n grams is calculated. In the last stage, the model is tried in the test dataset for assigned profiles. Experimental results exemplified that accuracy rate is over 70%. Qian et al. [6] did authorship identification using article level gated recurrent unit (GRU) with a best result of 69.1% accuracy. In their experiment, they classified GRU to different levels. First level was Sentence-Level GRU, it showed that larger the hidden size, higher the accuracy but still the testing accuracy was too low as compared to training accuracy. Next level was article level GRU and it showed better results as compared to the sentence level GRU. Napoli et al. [7] proposed a contemporary procedure for text classification using Chebyshev polynomials and holomorphic transforms of the coefficient space. This allowed an automatic analysis of the text and provides a set of coefficients called Chebyshev holomorphic projections. Aslantürk et al. [8] detected the author of the given text by using various features coming from texts and authors. In their research rough set-based classifiers were taken on in a cascading style to achieve author attribution. It was concluded that cascading classifier yielded good results. Rexha et al. [9] proposed a way to identify authors based on high contrast similarity. In their work they focused on how humans judge different writing styles. This observation thus helped them to mimic such behavior. They conducted two studies simultaneously to determine if humans can identify authors or not. The first one was a quantitative approach which involved crowds while the second one was qualitative executed by authors of the cited paper. In first experiment, decision was made on stylometric and content features while in second they described on what their judgement was based. Dugar et al. [10] proposed a deep neural network based authorship identification. They highlighted the importance of hyperparameter tuning for the purpose. The results showed that proper choice and balance in the hyperparameter can improve the already established outputs. Sadman et al. [11] aims to evaluate the importance of stylometry in using it as a fallback authentication method. They aimed at detecting differences between writings on the same topic and tested if these differences were enough to identify the authors. They achieved an accuracy of 74% in detecting original authors.
360
P. Naudiyal et al.
3 Methodology On comparing SVM and naive Bayes, the major difference between the models is that naïve Bayes assumes every term to be independent where SVM does not consider them independent instead it looks for a certain number of interactions. Naive Bayes is probabilistic in nature whereas SVM attributes on the geometric configuration. Furthermore, it is also found that naive Bayes is better for snippets whereas SVM is good for long paragraphs. So, for the dataset, obtained from Kaggle Corpus [12] naive Bayes seems to be the appropriate choice. To further strengthen the claim, the decision tree model was compared with naive Bayes. The decision tree algorithm is a flexible and easy to understand model. It builds a classifier using the entropy table. However, decision tree also has some downsides. It sometimes tends to over fit the data giving wrong results. It is suitable for sequential data which is not the kind of data being dealt with. It does not work well with the continuous target variable compared to categorical. Finally, the model chosen was naive Bayes. It gives better results, is easy to implement and better for treating large numbers of classes. After deciding on the model next is data processing. Data processing is the manipulation and collection of data and turning it into meaningful information. Data processing involves many steps. Data collections: collecting data is the first step in data processing. Data is taken from different surveys or different websites. It is important that the data taken should be trustworthy. Next stage is data preparation or data pre-processing. Data pre-processing is a technique used to transform raw data into a useful and efficient form. It consists of three steps; the first step is data cleaning. In data cleaning all the redundant information is either removed or fixed. It deals with missing values, incorrect data, and noisy data. To deal with missing values there are different ways. One of the ways is to just remove the row or column having missing value but it might reduce the dataset size which might give poor results. Second method is to just fill the missing value with the mean of that column, this can also sometimes cause trouble. The third method is to use regression and predict the missing values. Choice of method changes from dataset to dataset. After filling these values, next step is to clean the data. Cleaning the data means removing the duplicate values or irrelevant values. The dataset might also contain noisy data, i.e., meaning less data without any significance. Origin of noisy dataset can be wrong data collection or faulty data entry. Noisy data can be handled by binning method. The Binning method works greatly on sorted datasets. The whole dataset is divided into several sections of equal sizes and then several different methods are performed on it. The data from different segments can be replaced by its mean or boundary values. Other than replacing with mean or boundary value another way is regression. The data can be made smooth by just using regression to predict the actual value. Apart from regression clustering can also be used. This approach is similar to data clustering. If the outliers are detected
Experimenting Different Classification Techniques to Identify …
361
they are just replaced by similar values in the cluster. Outliers are termed as data having anomalies. Since Cleaned data may or may not be in the proper form needed by the machine therefore, Data Transformation is performed. Transforming data into readable form is called Data Transformation. There are several steps in this process. Normalization being one of the methods. Normalization refers to the scaling the value of data present in the dataset to a common notion and discretion. Next step is attribute selection. It means that some new attributes can be built from the old existing attributes. After attribute selection, discretization of the data is performed, i.e., replace the raw values of numerical attributes by interval levels. While working with huge dataset analysis becomes tough and complex. In order to solve this problem, reduction of data is needed. The process of removing data without harming the overall dataset is called data reduction. Various steps involving data reduction are data cube aggregation, attribute subset selection, numerosity reduction and dimensionality reduction. Dimensionality reduction is the most important step. It reduces the size of data by encoding mechanism. It can be lossy or lossless. Now when data is processed extraction of important features comes next. For finding the most important features the use of correlation matrix is taken into account. Correlation matrix represents a statistical dependence between the variables. It gives an idea of the labels that are related to each other and the association among them. The most correlated features are extracted and the rest are discarded. After all, this data need to be visualized for further analysis. Data visualization is a field which deals with graphical representation. It represents data in a visual form which helps in communicating with data easily. Data visualization is important because it allows trends and patterns to be seen easily. If data is large in size data visualization comes handy. By just looking at the trend or pattern of data, interpretation can be done easily about what modifications are needed in the dataset or what model is to be used for predicting the author. In the obtained dataset the important features are words, therefore, visualization of words used by different authors is being done. Below images show words used by authors. The above images (Figs. 1, 2, and 3) show the frequent words used by various authors. The most frequent word is identified as the word with the largest font size in the above images. Figure 1 shows that Edgar Ellen uses the words “process” and “afforded” maximum times. Figure 2 shows that H. P. Lovecraft uses the words “never” and “fumbling” most frequently. Figure 3 shows that Mary Shelly uses the words “looked” and “lovely” maximum number of times. After all the data has been processed and visualized now the next steps comes to dividing the data into training and testing dataset. For dividing the dataset, scikit learn library was used. Now after dividing the dataset, cleaning the text is the next process as it makes it easy for machines to process. Every text has certain expressions that make text processing tough. For example, use of different tenses, extra use of spaces, use of punctuation, and use of stem words (like ing, ies). All these anomalies make the analysis tough and complex [13]. So, to clean the texts two processes, namely, lemmatization and stemming were used.
362
P. Naudiyal et al.
Fig. 1 Frequent words used by Edgar Ellen (author 1)
Fig. 2 Frequent words used by H. P. Lovecraft (author 2)
Fig. 3 Frequent words used by Mary Shelly (author 3)
Lemmatization and stemming are the libraries used in NLP for text normalization. In stemming, stem is the part of the word to which infection (changing/adding) affixes such as -ed,-s,-ize, etc., are added. So, stemming a word sometimes may result in a completely different word [14]. In lemmatization, the infected form of words are grouped together so that they can be analyzed as a single item. Unlike stemming, lemmatization depends on correctly identifying the intention of the text. Lemmatization can be a lengthy and complex procedure as compared to stemming, but it gives way better results which further helps in better predictions [15].
Experimenting Different Classification Techniques to Identify …
363
Fig. 4 Front end of WebApp
Now after making the dataset perfect for analysis, for the given dataset the most suitable model comes out to be naive Bayes theorem as discussed above. Since the number of classes are more, so multinomial naive Bayes was used. Naive Bayes classifier is an algorithm based on Bayes’ theorem. It follows the principle that every pair of analysis being classified is independent of each other. Naive Bayes is a supervised learning algorithm. Despite its simple nature it works well on large and complex dataset [16]. P(y|x1 , . . . , xn ) =
P(y)P(x1 , . . . , xn |y) P(x1 , . . . , xn )
(1)
Naive Bayes is represented by the formula given in Eq. 1. x and y represent two classes and P(y) and P(x) represent their probabilities. Now since our data is multinomially distributed, the use of multinomial naive Bayes is taken into account. Multinomial naive Bayes model is an important variant of naive Bayes used in NLP [17]. θ yi =
N yi + α N y + αn
(2)
In Eq. 2, α is shown which is the smoothing rate and θ represents the probability of x given the probability of y, where x and y are two events. The most efficient naive Bayes model is used and the result is predicted as well as deployed on a web server for better presentation as shown in Fig. 4.
4 Results and Discussions In this study, various models were tried for getting the best result. Initially, the decision tree was used.
364
P. Naudiyal et al.
It is evident from Fig. 5 that the accuracy by decision tree (47.595%) was very poor. Moving to next model, SVM was used. As shown in Fig. 6 accuracy obtained by SVM (74.0048%) is better than decision tree but still not satisfying enough. As shown in Fig. 7, accuracy obtained by naïve Bayes (84.141%) is highest among the three. So after performing authorship identification, the results obtained were as follows: From Table 1, it can be concluded that the best model for author identification will be naive Bayes. Further continuing with naive Bayes, performance was measured. Figure 8 shows the classification report with various components of performance metrics for the naïve Bayes algorithm. So, the average score of all precision recall and f1-score as shown in Fig. 9 came out to be 0.84 and shows that this NLP model has been successful in predicting the results with an accuracy of 84.4% Furthermore, the model was deployed on the website using Streamlit and Python. Streamlit is an open-source software which helps in turning data scripts into shareable web apps in minutes. It is coded in python and no front-end is required. One sentence of Mary Shelly is taken to check if naive Bayes algorithm predicts the author correctly. As evident from Fig. 10, prediction is done correctly. Next sentence of Edgar Ellen was tested and checked if it predicts correctly. As evident from Fig. 11, prediction is done correctly for Edgar Ellen.
Fig. 5 Accuracy from decision tree
Fig. 6 Accuracy from SVM
Fig. 7 Accuracy from Naive Bayes
Table 1 Comparison of accuracy
S. No.
Model
Accuracy (%)
1
SVM
74.004
2
Naive Bayes
84.14
3
Decision tree
47.599
Experimenting Different Classification Techniques to Identify … Fig. 8 Classification report
Fig. 9 Confusion matrix
Fig. 10 Predicts correctly for Marry Shelly
365
366
P. Naudiyal et al.
Fig. 11 Predicts correctly for Ellen Edgar
5 Conclusion This study has proposed an intelligent and efficient solution to the authorship identification problem. The chosen model has proved its accuracy and has provided the desired results. The lemmatization proves out to be more effective than stemming and has helped in making data better for analysis. Performance metrics like recall, precision, and f-measure came out to be 0.84 indicating the model performed very well. The naive Bayes predicted the result with an accuracy of 84%, whereas SVM and decision tree predicted the results with an accuracy of 74 and 47% much lower than naive Bayes. So, the achieved results are promising and can be used for future studies. Data Availability Statements Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References 1. Gomez Adorno H, Posadas Duran JPF, Sidorove G (2018) Stylometric approach for detecting writing style changes in literary texts. Computacion y Sistemas 2. Zhang Y (2012) Support vector machine classification algorithm and its application. In: 2012 ICICA 3. Mohsen AM, El-Makky NM, Ghanem N (2016) Authorship identification using deep learning. In: 2016 IEEE international conference 4. Yavanoglu O (2016) Intelligent authorship identification with Turkish newspapers. In: 2016 IEEE international conference 5. Layton R, Watters P, Dazeley R, Authorship attribution for Twitter in 140 characters or less. In: 2010 Second cybercrime and trustworthy computing workshop 6. Qian C, He T, Zhang R, Deep learning based authorship identification 7. Napoli C, Tramontana E, Lo Sciuto G (2015) Authorship semantical identification using holomorphic Chebyshev projectors. In: 2015 Asia Pacific conference on computer aided system engineering
Experimenting Different Classification Techniques to Identify …
367
8. Aslanturk O, Sezer EA, Sever H, Raghavan V, Application of cascading rough set-based classifiers on authorship attribution. In: 2010 IEEE international conference 9. Rexha A, Kroll M, Ziak H, Kern R (2016) Authorship identification of documents with high content similarity. Springer 10. Dugar TK, Gowtham S, Chakraborty UKr (2019) Hyperparameter tuning for enchanced authorship identification using deep neural networks 11. Sadman N, Gupta KD, Haque MA, Sen S, Poudyal S (2020) Stylometry as a reliable method for fallback authentication. IEEE 12. https://www.kaggle.com/c/spooky-author-identification/data 13. Malik HH, Bhardwaj VS (2014) Automatic training data cleaning for text classification. Columbia University 14. Balakrishna V Stemming and lemmatization: a comparison of retrieval performance 15. Ozturkmenoglu O, Alpkocak A (2012) Comparison of different lemmatization approaches for information retrieval on Turkish text collection. In: 2012 International symposium on innovation in intelligent systems and applications 16. Qin F, Tang XJ, Cheng ZK Application and research of multi_label Naive Bayes classifier 17. Abbas M, Ali K, Memon S, Jamali A (2019) Multinomial Naive Bayes classification model for sentiment analysis. In: 2019 IJCSNS
Categorical Data: Need, Encoding, Selection of Encoding Method and Its Emergence in Machine Learning Models—A Practical Review Study on Heart Disease Prediction Dataset Using Pearson Correlation Nishoak Kosaraju, Sainath Reddy Sankepally, and K. Mallikharjuna Rao Abstract Data can be defined as the joint collection of facts and statistics, which yields meaningful insights on proper analysis. In general, real-world data is a combination of both categorical and numerical data. Many Machines learning algorithms do not support categorical data, therefore, to utilize the data efficiently categorical data should be converted into numerical data without any distortion in the data distribution. The process of transforming the categorical data into numerical data is called “categorical encoding”. Categorical encoding is one of the crucial steps in data preprocessing, as most of the machine learning models work better with numerical data. There are many types of categorical encoding techniques, each technique has trade-offs and has a notable influence on the outcome of the analysis so, choosing an optimal technique based on the situation is a challenging task. The main objective of this paper is to provide insights on choosing a technique that not only converts the categorical data into numerical data but also which helps in making the transformed data to become a much better representative of the target variable. In this paper, we analysed and implemented various encoding techniques on the heart disease prediction dataset and were attentive in selecting the best encoding technique which meets the main objectives of the paper. As per the best of our knowledge, this is the first paper that focuses completely on the analysis of basic categorical encoding techniques based on their correlation with the target variable.
N. Kosaraju (B) · S. R. Sankepally · K. Mallikharjuna Rao Data Science and Artificial Intelligence, International Institute of Information Technology, Naya Raipur, India e-mail: [email protected] S. R. Sankepally e-mail: [email protected] K. Mallikharjuna Rao e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_26
369
370
N. Kosaraju et al.
Keywords Encoding · Categorical data · One hot encoding · Label encoding · Frequency encoding · Ordinal encoding · Mean encoding · Heart disease prediction · Pearson correlation
1 Introduction According to the World Health organization (WHO) cardiovascular diseases (CVD) are one of the leading causes of death globally, it estimates that in 2019, globally 17.9 million people died due to cardiovascular diseases which is 32% of all global deaths [1]. It is very important to detect cardiovascular as early as possible. A machine learning model which can predict accurately, built by using past data can be very helpful to detect and manage cardiovascular diseases. The dataset named heart disease or failure prediction is available in Kaggle which is created by combining different datasets already available in UCI machine learning repository [2] consists of such past data, but the main challenge is all attributes are not numerical, many vital factors such as Chest pain type, resting electrocardiogram results, …etc., are categorical which are not suitable to train many machine learning models. So as part of data preprocessing, this paper aims to convert the categorical data into numerical data by using various categorical encoding techniques, as these types of cases are very sensitive this paper aims to choose a technique such that the transformed data becomes much better representative of target variable which eventually can increase the accuracy of machine learning model. The remainder of the paper is organized like Sect. 1 deals with the introduction, Sect. 2 elucidates the background work and literature, Sect. 3 exhibits the experimental setup and results, and Sect. 4 provides the conclusion, and the last section detailed the references.
2 Background Work and Literature With progress of technology in domains related to artificial intelligence systems are becoming more autonomous, thus discovering the knowledge from observational sensory inputs is becoming a vital task. To facilitate the prediction of effects from a given action and analyse their underlying data generation mechanism, encoding a series of cause-effect relations between events, causal networks can be used. However, in practical scenarios missing data are ubiquitous. Attributes with categorical data are much frequent in real-time datasets and usually these are of high cardinality [3]. Proper understanding of data is very important for better analysis. Performing various data preprocessing steps is very important, before moving to the actual analysis of the data. Plenty of researchers demonstrated the importance of handling the attributes with categorical data in various research domains. Most of the machine learning and deep
Categorical Data: Need, Encoding, Selection of Encoding Method …
371
Categorical Encoding Techniques One Hot
Input Experimental dataset
Attribute selection {A1, A2, …An}
Label Frequency
Encoded dataset
Ordinal Mean
Fig. 1 Block diagram of encoding categorical data
learning algorithms cannot support the data which consists of categorical attributes or features, thus transforming the categorical data into numerical data without any data distortion is vital task. The study [4] demonstrated the effect of categorical encoding on the accuracy of mass real estate valuation. Figure 1 represents the architectural view of categorical encoding process, in general encoding categorical data is the second step of data preprocessing, i.e., after handling missing data. The process begins with loading the complete data set and then attribute selection, followed by choosing and applying appropriate categorical encoding technique on the selected attribute(s). Attributes without any categorical are the outcome of this entire process which is generally called an encoded dataset.
2.1 One Hot Encoding [5] Categorical encoding is essential to convert the categorical data into numerical form, one hot encoding is one of the widely used encoding technique, but this transformation is at the expense of memory and computational concerns, which are direct result of increase in the cardinality of the data set [6]. In one hot encoding [7, 8], we transform the categorical data into numerical data by using the principle of “identifying whether the unique category of that categorical feature(column) is present in that record(row) or not”. Integers 0 and 1 are used to represent the absence and presence of the category respectively [9]. For every unique category in the features (columns) with categorical data, a new column is introduced. In these columns integer 1 is used to fill in the position that corresponds to that category and all remaining positions are filled with 0. Advantages:
372
N. Kosaraju et al.
• Retaining the impact of specific category on overall prediction without any changes • Comes handy when ordinal relation between data doesn’t exists. Disadvantages: • Increases cardinality of the data set. • Predictions can be biased if occurrence of few categories are more common than others.
2.2 Label Encoding [10] Label encoding is one of the most common categorical encoding techniques used to convert categorical data into numerical data. In label encoding, we assign a random number to each unique category in the feature, without adding any extra columns to the data. Advantages: • Label encoding is very quick and easy. • It does not increase the cardinality of the data set. Disadvantages: • Numbers which are assigned to each unique category in the feature are chosen arbitrarily. • There is a probability that few categories may not occur in similar frequencies to other categories so, this method may not be efficient while working with data which has outliers.
2.3 Ordinal Encoding [8, 11] Ordinal encoding is extended version of the label encoding, which is typically a kind of label encoding on ordinal data. In ordinal encoding, the labels are given based on appropriate order [11]. The order may start with 0,1,2,…, and so on. Advantages: • Ordinal encoding is also very quick and easy. • The labels are given based on an order such that they are not labeled completely random. Disadvantages: • Impact of the category before and after encoding on target variable with respect to another category can be hugely altered.
Categorical Data: Need, Encoding, Selection of Encoding Method …
373
2.4 Frequency Encoding [12, 13] In this technique each category of a categorical column is replaced with its frequency number in that column [12, 14]. Advantages: • Quick and easy to implement • Comes handy when frequencies are correlated with target variable. Disadvantages: • Less efficient when there is no visible correlation between target variable and frequencies of the categories.
2.5 Mean Encoding [15] In mean encoding [16], target variable is basis for the conversion of categorical into numerical data, this technique works on the principle of ratio of happening of positive class in the target variable. Mean encoding represents probability of occurrence of positive class in the target variable form that unique category [17]. The unique categories are replaced by mean of the target variable [18]. For instance, assume that X is a category in a feature, and has eight occurrences out of which 4 yields target variable value as 0 and 4 yields 1, then X will be replaced by ([4(0) + 4(1)]/8) that equals 0.5. Advantages: • Every step embodies information from the target variable • The transformed column (or) feature has better correlation with target variable. Disadvantages: • Possibility of overfitting as, every step embodies information from target variable.
3 Experimental Setup and Results In this section, based on the various analysis we try to apply best suitable encoding technique to encode the categorical features on the real-time data set named heart disease (failure) prediction [19] available in Kaggle which is created by combining different datasets which are available in UCI Machine learning Repository. The dataset consists of 918 records (rows) and 12 features (columns), out of 12 features five features are categorical. The pictorial representation of dataset is given in Fig. 2. From Fig. 2, Yellow colour represents the categorical data and violet colour denotes the non-categorical data. It is evident that Sex, Chest pain Type, Resting ECG, Exercise Angina, and ST_Slope are the only features with categorical data.
374
N. Kosaraju et al.
Fig. 2 Features of heart disease dataset
Fig. 3 Distribution of data in categorical columns
The graphical representation of distribution of the categorical data in the above said features are given in Fig. 3. From Fig. 3, we can observe that chest pain type has maximum number of 4 categories namely Typical Angina (TA), Atypical Angina (ATA), Non-Anginal Pain (NAP), Asymptomatic (ASY), Sex has two categories namely Male (M) and Female (F), Exercise Angina have two categories Yes and No, and RestingECG has three categories such as Normal, ST, LVH, and ST_Slope consists of three categories, namely, up, flat, and down. The visualization of data distribution of each category of each feature is given in Fig. 4.
3.1 Encoding Chest Pain Type Feature Let’s begin by encoding the data in chest pain type column as it is having maximum number of categories. Moreover, type of chest pain plays a crucial role in cardiovascular diseases. Four categories in this feature represent four different types of chest pain, namely, Typical Angina (TA), Atypical Angina (ATA), Non-Anginal Pain (NAP), and Asymptomatic (ASY).
Categorical Data: Need, Encoding, Selection of Encoding Method …
b) ST_Slope
a) Sex
c) Exercise Angina
d) Resting ECG
e) Chest Pain Type
Fig. 4 Graphical visualizations of data distribution in each feature
375
376
N. Kosaraju et al.
While dealing with these types of sensitive data it is very important to choose an encoding technique which is not only focuses on transforming the categorical data into numerical but also one should make sure that, data after this transformation becomes much better representative of the target variable. We will begin with one hot encoding, as discussed earlier this is very powerful technique, which helps in retaining the impact of the specific value on the overall prediction. Chest pain type feature is split into four new columns after performing one hot encoding, the same is provided in Table 1. The entries with one are highlighted to represent that category is present in that row, and all the others are filled with 0. For instance, only 4th row has chest pain type as ASY so, ASY in 4th row is filled with one and all others are filled with 0. But main disadvantage of this technique high cardinality, it increases the number of columns by three in our case, which is not advisable as we are already having 12 columns. Now let’s move on to the remaining techniques, Table 2 shows the new nomenclature of the categories after various encoding techniques. Now, let’s visualize the data of Chest pain Type after applying above discussed encoding techniques. Simultaneously, we also compare these with heart disease column (Target variable) to understand which technique yields the better correlation with target variable. From Fig. 5, we can observe the data distribution of all categories in chest pain type column after every encoding, for instance we can see that Typical Angina (TA) in Table 1 First five rows of new columns after one hot encoding
ASY
ATA
NAP
TA
0
1
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
1
0
Table 2 Nomenclature of categories after encoding Category
After label encoding
After frequency encoding
After ordinal encoding
After mean encoding
Typical Angina (TA)
3
46
2
0.434782
Atypical Angina (ATA)
1
173
0
0.138728
Non-Anginal Pain (NAP)
2
203
1
0.354680
Asymptomatic (ASY)
0
496
3
0.790323
Categorical Data: Need, Encoding, Selection of Encoding Method …
377
original data has labels, 3, 46, 2, 0.434782 after label encoding, frequency encoding, ordinal encoding, and mean encoding. Table 3 is derived based on the observations drawn from Fig. 5, we can observe that there is no change in the data distribution of the features after applying various categorical encoding techniques on chest pain type column. Now let’s analyse which technique made the transformed data to become much better representative of the target variable. This paper uses Pearson correlation coefficient to identify how closely the target variable and the encoded data are related. The mathematical expression of Pearson correlation [20] is as follows in Eq. 1.
a) Before Encoding
b) After Label Encoding
c) After Frequency Encoding
d) After Ordinal Encoding
e) After Mean Encoding
Fig. 5 Graphical visualizations of data distribution before and after encoding
378
N. Kosaraju et al.
Table 3 Representation of data distribution with respect to target variable (Typical Angina) after each encoding Chest pain type
Encoding type
Label name after encoding
Data distribution (w.r.t target variable)
Typical Angina
Original
TA
26 entries with heart disease as 0 20 entries with heart disease as 1
Label encoding
3
26 entries with heart disease as 0 20 entries with heart disease as 1
Frequency encoding
46
26 entries with heart disease as 0 20 entries with heart disease as 1
Ordinal encoding
2
26 entries with heart disease as 0 20 entries with heart disease as 1
Mean encoding
0.433782
26 entries with heart disease as 0 20 entries with heart disease as 1
Original
ATA
149 entries with heart disease as 0 24 entries with heart disease as 1
Label encoding
1
149 entries with heart disease as 0 24 entries with heart disease as 1
Frequency encoding
173
149 entries with heart disease as 0 24 entries with heart disease as 1
Ordinal encoding
0
149 entries with heart disease as 0 24 entries with heart disease as 1
Mean encoding
0.138728
149 entries with heart disease as 0 24 entries with heart disease as 1
Atypical Angina
(continued)
Categorical Data: Need, Encoding, Selection of Encoding Method …
379
Table 3 (continued) Chest pain type
Encoding type
Label name after encoding
Data distribution (w.r.t target variable)
Non-Anginal Pain
Original
NAP
131 entries with heart disease as 0 72 entries with heart disease as 1
Label encoding
2
131 entries with heart disease as 0 72 entries with heart disease as 1
Frequency encoding
203
131 entries with heart disease as 0 72 entries with heart disease as 1
Ordinal encoding
1
131 entries with heart disease as 0 72 entries with heart disease as 1
Mean encoding
0.354680
131 entries with heart disease as 0 72 entries with heart disease as 1
Original
ASY
104 entries with heart disease as 0 392 entries with heart disease as 1
Label encoding
0
104 entries with heart disease as 0 392 entries with heart disease as 1
Frequency encoding
496
104 entries with heart disease as 0 392 entries with heart disease as 1
Ordinal encoding
3
104 entries with heart disease as 0 392 entries with heart disease as 1
Mean encoding
0.790323
104 entries with heart disease as 0 392 entries with heart disease as 1
Asymptomatic
380
N. Kosaraju et al.
(αi − α) βi − β r= 2 (αi − α)2 βi − β
(1)
Here, r = Pearson correlation coefficient α = values of the first attribute chosen from the set X β = values of the second attribute chosen from the set X α = mean of the values in first attribute β = mean of the values in second attribute. where X = {Heart disease, chest pain type_label, chest pain type_freq, chest pain type_ordinal, chest pain type_mean} It is very clear from the above analysis that, while encoding the categorical data, data distribution is not at all distorted, the second point which we can conclude from Fig. 6 is that the magnitude of Pearson correlation between target variable heart disease and chest pain type after using mean encoding (0.540) is maximum, followed by ordinal encoding (0.537) at second. So, we can conclude that data after mean encoding is much better representative of target variable, and we proceed by applying mean encoding as it is meeting our objectives. In the similar way, we can analyse and choose a best suitable technique to encode all the remaining categorical features.
Fig. 6 Correlation matrix
Categorical Data: Need, Encoding, Selection of Encoding Method …
381
4 Conclusion Categorical encoding is one of the very important tasks in data preprocessing domain. Many techniques are available, each technique has its own trade off, choosing a technique which suits better to the given attribute is one of the main challenges. This paper provided complete review on principles, underlying mechanisms, advantages, and disadvantages of five widely known encoding techniques. This paper is an attempt to show that categorical encoding is not only about conversion of categorical data to numerical data, but it is also about making the transformed data better representative of target variable. In addition, this paper provides practical review on implementation of categorical encoding techniques in a health care data set namely heart failure prediction. The experimental results show how transformed data can become better representative of target variable if appropriate technique is used. Although mechanisms presented in this paper try to make better classifications, the point to be noted here is when features are encoded based on the target variable and embodies information from target variable there is very high probability of problem of overfitting the data and the attribute data may become biased in few circumstances so, using regularisations in techniques such as mean encoding can be the future scope of research. Acknowledgements Authors are thanking the management of Dr. SPM IIIT Naya Raipur, Director and Vice chancellor, Dean Academics, Dean Research and development, and the financial approval committee for providing the financial support. Author’s Contributions Categorical data plays a vital role in construction of machine learning models. Machine learning model requires numerical data to process the available data. Authors swotted and abridged the significance of encoding categorical data using heart disease prediction.
References 1. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) 2. https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/ 3. Dahouda MK, Joe I (2021) A deep-learned embedding technique for categorical features encoding. IEEE Access 9:114381–114391. https://doi.org/10.1109/ACCESS.2021.3104357 4. Gnat S (2021) Impact of categorical variables encoding on property mass valuation. Procedia Comput Sci 192:3542–3550 5. Von Eye A, Clogg CC (eds) (1996) Categorical variables in developmental research: methods of analysis. Elsevier 6. Lopez-Arevalo I, Aldana-Bobadilla E, Molina-Villegas A, Galeana-Zapién H, Muñiz-Sanchez V, Gausin-Valle S (2020) A memory-efficient encoding method for processing mixed-type data on machine learning. Entropy 22(12):1391. https://doi.org/10.3390/e22121391 7. Alkharusi H (2012) Categorical variables in regression analysis: a comparison of dummy and effect coding. Int J Educ 4(2):202–210 8. Hancock JT, Khoshgoftaar TM (2020) Survey on categorical data for neural networks. J Big Data 7:28. https://doi.org/10.1186/s40537-020-00305-w
382
N. Kosaraju et al.
9. Potdar K, Pardawala TS, Pai CD (2017) A comparative study of categorical variable encoding techniques for neural network classifiers. Int J Comput Appl 175(4):7–9 10. Liu C, Yang L, Qu J (2021) A structured data preprocessing method based on hybrid encoding. J Phys: Conf Ser 1738(1) 11. Seveso A et al (2020) Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings. BMC Med Inf Decis Mak 20(5):1–14 12. Baldissera F (1984) Impulse frequency encoding of the dynamic aspects of excitation. Arch Ital Biol 122:43–58 13. Greene RL, Stillwell AM (1995) Effects of encoding variability and spacing on frequency discrimination. J Mem Lang 34(4):468–476 14. Jian S et al (2018) Cure: flexible categorical data representation by hierarchical coupling learning. IEEE Trans Knowl Data Eng 31(5):853–866 15. Yu N, Li Z, Yu Z (2018) Survey on encoding schemes for genomic data representation and feature learning—From signal processing to machine learning. Big Data Min. Anal. 1(3):191– 210 16. Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. “O’Reilly Media, Inc.” 17. Kunanbayev K, Temirbek I, Zollanvari A (2021) Complex encoding. In: 2021 International joint conference on neural networks (IJCNN). IEEE 18. Jo T (2021) Data encoding. In: Machine learning foundations. Springer, Cham, pp 47–68 19. Fedesoriano (2021) Heart failure prediction dataset. https://www.kaggle.com/fedesoriano/ heart-failure-prediction, Sept 2021 20. Pearson’s Correlation Coefficient (2008). In: Kirch W (eds) Encyclopedia of public health. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5614-7_2569
Intelligent Computational Model for Accurate and Early Diagnosis of Heart Failure Isaac Kofi Nti, Owusu Nyarko-Boateng, Adebayo Felix Adekoya, Patrick Kwabena Mensah, Mighty Abra Ayidzoe, Godfred Kusi Fosu, Henrietta Adjei Pokuaa, and R. Arjun
Abstract Heart disease accounts for a sizable portion of worldwide mortality and morbidity.Using clinical data analytics to predict heart disease survival is a challenging task. However, the inception of data mining tools and platforms helps transform large amounts of unstructured data generated by the healthcare industry into relevant information that allows informed decision-making. Numerous studies have proven that appropriate feature engineering techniques aimed at essential characteristics are vital for enhancing the performance of machine learning models. This study aims to identify relevant characteristics and data mining approaches that can significantly improve mortality prediction by heart failure. An intelligent computational predictive model based on machine learning has been introduced to identify and diagnose heart failure. Numerous performance indicators, including accuracy, recall, F1-score, precision, AUC, and Cohen’s Kappa statistic, are utilised to assess the proposed model’s usefulness and strength. The overall performance examination of the suggested methodology outperformed various state-of-the-art approaches in predicting patients’ deaths owing to heart failure. Keywords Machine learning · Health informatics · Heart failure · Tree-based algorithms · Heart disease
I. K. Nti (B) · O. Nyarko-Boateng · A. F. Adekoya · P. K. Mensah · M. Abra Ayidzoe Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana e-mail: [email protected] G. K. Fosu · H. Adjei Pokuaa Department of Computer Science, Sunyani Technical University, Sunyani, Ghana R. Arjun School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_27
383
384
I. K. Nti et al.
1 Introduction Heart disease (or cardiovascular disease) is the principal cause of mortality on today’s globe [1, 2]. The World Health Organization (WHO) revealed that about 17.9 million people die each year from heart disease [3]. Heart illness impairs blood vessel function and can result in coronary artery infections, which weaken the patient’s body [4, 5]. Acute heart failure (AHF) expenditures in the USA are approaching $39 billion per year and are anticipated to almost quadruple by 2030. Likewise, cardiovascular diseases (CVDs) are increasing in prevalence in Ghana. Globalisation undoubtedly contributes to this trend by bringing about substantial behavioural changes throughout the country. Multifaceted measures are required to handle this growing load. To win the war against CVDs in Ghana, concerted efforts from all stakeholders are required, including maximising the benefits of globalisation in terms of public health [6]. Because of the high prevalence of cardiovascular diseases as a cause of death worldwide, researchers have conducted several studies on predicting heart disease [7, 8]. Thus, predicting heart disease is more critical than ever to lower the death rate. People’s mortality rates rise due to a failure to detect cardiac disease early. Machine learning (ML), a subset of artificial intelligence, enables risk scoring system prediction. Based on data in an electronic health record (EHR), machine learning algorithms (MLAs) can produce clinically meaningful predictions regarding the future occurrence of future occurrences [2, 9]. Numerous traditional heart failure mortality risk assessments are based on “onesize-fits-all” scoring procedures. In addition, they are typically created from populations that do not adequately represent the variety of patients observed in real-world clinical practice [10]. In comparison, MLAs give personalised risk evaluations for patients and may be trained using data from various real-world populations [3, 11, 12]. Furthermore, MLAs can make predictions based on enormous volumes of data accessible in the EHR in impossible ways with tools that need manual scoring by including dynamic changes within variables and complicated interactions between variables [1, 13]. In addition, MLAs have been demonstrated to outperform established mortality prediction methods in other areas of cardiovascular medicine, indicating their potential value for predicting death in patients with AHF. Hence, this study adopts an intelligent and low-complexity algorithm for feature selection, and different unique tree-based ensemble classifiers to predict mortality by heart failure. This research can enhance the healthcare system, particularly in low-income countries such as Ghana, where the doctor–patient ratio is low, and serve as a valuable tool for healthcare practitioners in predicting heart failure survival. It will also help doctors understand that if a person with heart failure survives, they can focus on important risk factors. This work adds to the body of knowledge in the following areas: 1. To examine the primary risk variables (significant characteristics) associated with MLA performance to predict heart failure correctly. 2. Develop an excellent decision support system capable of accurately diagnosing heart failure patients’ survival.
Intelligent Computational Model for Accurate and Early Diagnosis …
385
3. The performance of several MLAs is examined using the adaptive synthetic oversampling sampling (ADASYN) for predicting cardiac patient survival. The remainder of this research is divided into the following sections: Sect. 2 summarises pertinent research. The methods for the research are presented in Sect. 3. Section 4 contains the findings and explanation of our experiments. Finally, Sect. 5 summarised the findings of the investigation.
2 Related Works Several studies have been carried out to help minimise the mortality rate of people across the globe due to heart failure. However, literature has shown that techniques based on ML are more efficient than conventional techniques. This section presents a few pertinent studies that used ML to predict heart failure. To get an accurate prognosis of a patient’s survival, [14] examined the performance of nine ML models. Among them are the decision tree (DT), the stochastic gradient classifier (SGD), the adaptive boosting classifier (AdaBoost), the random forest (RF), the logistic regression (LR), the gradient boosting classifier (GBM), the support vector machine (SVM), the extra tree classifier (ETC), and the Gaussian Naive Bayes classifier (GNB). To address the issue of class imbalance, the study used the Synthetic Minority Oversampling Technique (SMOTE). Their empirical analyses showed that the ETC outperformed other models and achieved an accuracy of 92.62% with SMOTE in predicting patients’ heart survival. Even though they had much success, the authors think that better feature selection methods can improve the accuracy of ML models for predicting heart failure [14]. Likewise, Deepika and Balaji proposed a novel ML predictive model based on Multi-Layer Perceptron for Enhanced Brownian Motion based on Dragonfly Algorithm (MLP-EBMDA) for predicting heart disease [8]. The proposed MLP-EBMDA obtained a prediction accuracy of 94.28%. Eighteen ML models were compared to obtain the most accurate prediction of heart failure [5]. The experimental results showed that SVM, extra trees, XGBoost, and CatBoost were the best models. However, the researcher says that when two or more models are used together, they can improve the accuracy of predictions in this field. Radhachandran et al. [15] proposed an ML-based predictive model using the Emergency Heart Failure Mortality Risk Grade (EHMRG), DT, and LR. Their predictive model recorded an area under the receiver operating characteristic curve (AUROC) of 0.84 and a sensitivity of 68.4%. Similarly, the performance of DT, SVM, LR, and Artificial Neural Networks (ANN) in predicting heart failure was carried out [1]. The study reported that DT outperformed LR, SVM, and ANN. The author concludes that ANN is not as competitive as the DT or SVM based on the observed results. Giridhar et al. [12] proposed an RF-based predictive model and adopted the SMOTE for data imbalance treatment. The paper reported a prediction accuracy of 90%. The study concluded that further improvement is possible with appropriate and
386
I. K. Nti et al.
enhanced feature selection techniques. Likewise, four (4) MLAs, including RF, LR, ANN, and GBM, were compared to an established robust algorithm to predict the first cardiovascular event over ten years [13]. The authors conclude that ML dramatically enhances the accuracy of heart risk prediction, allowing for identifying more patients who may benefit from preventive therapy while avoiding the treatment of those who do not require it. Seven (7) different ML classifiers (i.e. DT, K-NN, RF, rotation forest (ROT), NB, SVM, LR, and Bayes Network (BN)) were applied to the reduced feature subset to address chronic heart failure [16]. The authors reported accuracy (91.23%), specificity (89.62%), and sensitivity (93.83%). Ten (10) MLAs were compared to select the best model that improved accuracy [17]. The study outcome shows that efficient feature selection leads to an improvement in MLA performance. Chicco and Jurman proposed a heart failure predictive framework based on RF, DT, GBM, ANN, Naïve Bayes, SVM, and linear regression [18]. The study reported that RF outperformed all other models with an accuracy of 74% with all features and 58.5% with the two most significant features.
3 Methodology Figure 1 shows the data pipeline for this study. It consists of three (3) phases, i.e. (i) study dataset, data pre-processing, feature engineering, and data partitioning, (ii) dataset modelling, and (iii) model evaluation. The following section explains the steps involved in each of the phases.
3.1 Dataset Description The Heart Failure Clinical Dataset (HFCD) [18] was downloaded from Kaggle. Our dataset contained a total of two hundred and ninety-nine (299) records with thirteen Heart Dataset
Train Set Data Preprocessing
1
Data Partition
Test Set
Fig. 1 Study data pipeline
Model’s Evaluation
ML Models
Feature Engineering
0
Intelligent Computational Model for Accurate and Early Diagnosis …
387
independent (x) features and one dependent (y) feature (see Table 1). Of the 299 observations, one hundred and four (194) were male, while five (105) were female. The dependent variable (target class) had two values: one (1) represents the deceased, and zero ((0) stands for alive). The dataset was pre-processed to clean it from missing data and then normalised using the max–min function defined in Eq. (1). In machine learning, the quality of the input variables affects the performance of the ML and the computational time. Therefore, the study adopted a random forest to perform feature engineering to identify the most significant feature among the 13 input features. We aim to rank the features from the most significant to the least significant. The dataset was balanced using adaptive synthetic oversampling (ADASYN) to attain equal class labels (1 and 0). The ADASYN adopts a subjective distribution for varied minority class instances (0) according to their level of learning difficulty. As a result, more synthetic data is created for minority class instances that are more difficult to learn than for minority class instances that are simpler to learn [19]. Thus, the ADASYN approaches improve learning about data distributions in two ways: (1) by reducing bias caused by class imbalance and (2) by adaptively shifting the classification decision boundary to more complex occurrences. We partitioned the clean data into two train sets (70%) and test sets (30%); this ratio was adopted because it is believed to help prevent overfitting [20]. b =
b − bmin bmax − bmin
(1)
3.2 Data Modelling Several machine learning techniques can be used in this study. However, we selected six tree-based ensemble algorithms that have reported better accuracy in other areas of health [14], energy, finance [20], education [21], network security [22], and more. These MLAs include random forest (RF), extra tree classifier (ET), adaptive boosting classifier (AdaBoost), light gradient boosting machine (LightGBM), CatBoost, and gradient boosting classifier (GBM). Random Forest (RF) is a very effective ensemble learning technique based on decision trees (DTs). RF has been widely employed in various applications and has demonstrated good performance. It extracts several samples from the original sample using the bootstrap resampling approach, models the decision tree for each bootstrap sample, and then averages the predictions of multiple decision trees [23, 24]. Extra Tree Classifier (ET) This class provides a meta-estimator that fits many randomised DTs to several resamples of the dataset and uses averaging to increase predicted accuracy and avoid overfitting. ET is a subset of the popular random forest method. It frequently outperforms the RF method [25]. Adaptive boosting classifier (AdaBoost) classifier is a meta-estimator that begins by fitting a classifier to the initial dataset and then fitting further copies of the classifier
388
I. K. Nti et al.
Table 1 Description of study dataset features Features
Description
Range
Age
Patient’s age (measure in year)
[40–95]
Anaemia
Red blood cell or haemoglobin deficiency
[1, 0]
Creatinine phosphokinase (CPK)
CPK enzyme concentration in the blood (measure in mcg/L)
[23, …, 7861]
High blood pressure
It indicates patient’s hypertension status
[1, 0]
Diabetes
It indicates patient’s diabetes status
[1, 0]
Sex
Gender of patient man or woman
[1, 0]
Smoking
An indication whether a patient smoke or not
[1, 0]
Ejection fraction
The proportion (%) of blood which leaves the heart during each contraction
[14, …, 80]
Serum sodium
Sodium level in the blood (measured [114, …, 148] in mEq/L)
Platelets Serum creatinine
Creatinine level in the blood (measured in mg/dL)
[0.50, …, 9.40]
Platelets
Platelets level in the blood (measured in kiloplatelets/mL)
[25.01, …, 850.00]
Time
Follow-up period in days
[4, …, 285]
(Output) death event (y)
If the patient died during the period covered by the follow-up
[1, 0]
mcg/L—micrograms per litre; mL—microliter; mEq/L—milliequivalents per litre
to the same dataset, but with the weights of erroneously classified examples altered to ensure that succeeding classifiers focus on more complex situations. AdaBoost works by giving a higher weight to instances that are hard to categorise and a lower weight to those already well-classified [26]. Extreme Gradient Boosting (XGBoost) Due to its great efficiency and considerable versatility, extreme gradient boosting (XGBoost), an enhanced supervised method inside the gradient boosting framework, has been well recognised in Kaggle machine learning contests. For instance, XGBoost’s loss function incorporates a regularised term into the objective function, which aids in smoothing the final learned weights and preventing overfitting [23, 24]. CatBoost and Gradient Boosting classifier (GBM) CatBoost is a newly opensourced gradient boosting toolkit that effectively handles categorical features and beats previous publicly accessible gradient boosting implementations on a range of prominent publicly available datasets in terms of quality. Gradient boosting is an effective machine learning approach that produces cutting-edge outcomes for various practical problems [27].
Intelligent Computational Model for Accurate and Early Diagnosis …
389
3.3 Model Evaluation We evaluate the performance of our predictive models using a mixture of five wellknown assessment indicators (see Table 2) for classification performance. They are (i) Precision, (ii) Accuracy, (iii) Recall, (iv) F1-Score, and (v) Area Under the Curve (AUC).
4 Results and Discussions All tests were carried out in a Python environment, using various libraries (such as Scikit-learn, Pandas, Matplotlib, and others) on an HP laptop (Spectre x360) equipped with an Intel® CoreTM i7 CPU and 16.0 GB of RAM. The GridSearch technique was used to determine the optimal parameters for each model. Table 7 in Appendix shows details of the hyper-parameters of models. In the following section, we present our obtained results. Firstly, we present some basic visualisation and statistical analysis of the study data. Secondly, we show how well the models did without feature engineering (all 12 input features) and feature engineering (important features). Table 2 Definition and explanation of evaluation metrics S/N
Metric
1.
Accuracy (ACC) =
2.
Precision =
3.
Recall =
4.
F1-Score (FS) =
5.
AUC
TP TP+FP
TP TP+FN
(TP+TN) (TP+FP+TN+FN)
(2×P×R) (P+R)
Description It expresses the percentage of correct forecasts to total predictions. TP = true positive, TN = true negative, FN = false negative, FP = false positive It quantifies a classification model’s ability to classify the positive class. A measure closer to one (1) is better
It demonstrates how well a classification model performs when the actual outcome is positive. For example, a recall value closer to one (1) is better It shows the equilibrium between (P) and (R), i.e. is the consonant mean (P), and sensitivity (R) The “best” classifier will have AUC values reaching 1, with 0.5 being comparable to “random guessing”
390
I. K. Nti et al.
4.1 Dataset Visualisation and Statistical Analysis Data visualisation aids in explaining hidden patterns within a dataset; visualising the attributes makes it possible to obtain more qualitative information about the dataset. Figure 2 shows the heat map distribution (correlation) among the features of the study dataset. Correlations close to 1.0 and −1.0 indicate that the variable is favourably or adversely connected. Correlations around zero are less correlated. It can be seen that there is high correction among the features. From Fig. 2, it can be seen that serum_creatinine and age are highly and positively correlated at 0.29 and 0.25, respectively, while time and ejection_fraction are highly and negatively correlated at 0.53 and 0.27, respectively. Figure 3 shows the relationship between mortality and age. It was observed that the death rate is high in old age. The average age of survivors is about 60 years old for both sexes. Still, the average age of deceased male patients is 65, which is slightly older than that of deceased female patients, who are 60. Figure 4 shows the association between jection_fraction (A) and serum_creatinin (B) and death. It can be seen in Fig. 4a that a considerable difference in ejection fraction exists between survivors and deceased patients. Patients with a smaller quantity of blood pumped out of the heart are more likely to die, whereas those with an ejection fraction greater than 35 are more likely to live. On the other hand, Fig. 4a shows no significant difference in serum creatinine levels between the two patient groups.
Fig. 2 Heat map distribution
Intelligent Computational Model for Accurate and Early Diagnosis …
391
Fig. 3 Box plot of age against death
Fig. 4 Plot of ejection_fraction/serum_cretinine and death
4.2 Feature Selection When developing a machine learning model, feature importance is determined by assigning scores to features to represent their relative significance. The ratings provide light on the dataset, indicating which aspects are most or least significant. In the study, the RF was used for this purpose. From the RF model, we can see the relative significance scores for each feature by using “feature importance.” Fig. 5 shows the plot of features ranked from most minor importance (top) to the most significant feature. For example, it can be seen that time is the most critical factor in predicting mortality by heart failure, followed by ejection fraction, serum sodium, age, and serum creatinine. It is good for people in the hospital because doctors can figure out how long a person will live by looking at nine important things.
392
I. K. Nti et al.
Fig. 5 Feature importance ranking
4.3 Models’ Performance Table 3 gives the performance of the machine learning models on all 13 features without ADASYN (imbalanced data). It was observed that the random forest algorithm recorded the best performance, i.e. accuracy (81.38%), AUC (88.54%), recall (67.22%), precision (80.02%), F1-Score (70.34%), and Kappa (57.5%) compared with CatBoost, LightGBM, AdaBoost GBM, and ET. CatBoost and LightGBM were the second-and third-best models with accuracy (81.38%, 80.97%) and F1-Score (70.01%, 71.52%), respectively. The outcome shows that the RF outperformed all five other tree-based ensemble models using 13 features. The ET was the least performing model for predicting mortality by heart failure, with an accuracy of 75.98% and an F1-Score of 59.46%. The recall and precision values achieved by models compared with the accuracy metrics are given in Table 3 suggests that the models were partially biased towards the majority class. Table 4 gives the performance of the machine learning models on all 13 features with ADASYN (balanced data). The ADASYN is an improvement of the SMOTE Table 3 Models’ performance on all features with imbalanced dataset Models
ACC
AUC
Recall
Prec
F1
Kappa
RF
0.8138
0.8854
0.6722
0.8002
0.7034
0.575
CatBoost
0.8138
0.8809
0.6722
0.783
0.7001
0.5721
LightGBM
0.8097
0.8503
0.7042
0.7686
0.7152
0.5748
AdaBoost
0.797
0.8126
0.6556
0.76
0.6846
0.5389
GBM
0.793
0.8402
0.6694
0.7449
0.6896
0.5377
ET
0.7598
0.8595
0.5389
0.7494
0.5946
0.4386
Intelligent Computational Model for Accurate and Early Diagnosis …
393
algorithm, which has proven robust literature. We adopted it to balance the study dataset by adding synthetic data to the minority class. A careful look at Table 4 (balanced data) and Table 3 (imbalance data) gives that the machine learning models’ performance had significantly increased in all the evaluation metrics when ADASYN was introduced to balance the study data. The performance of RF, CatBoost, and GBM increased by 11% with ADASYN. The performance of LightGBM and AdaBoost increased by 12% and 6%, respectively, with ADASYN. Interestingly, the ET, which was the least performing model without ADASYN (see Table 3), showed good performance and obtained an accuracy of 90.24% (see Table 4), i.e. an increase of 16% compared with that of Table 3. There was a big increase in the ET’s performance when it used ADASYN (16% more), which means the ET is very sensitive to how the target value’s labels are mixed. Table 5 gives the performance of models using the eight most significant features and ADASYN. These features are time, ejection fraction, serum sodium, serum creatinine, age, platelets, and creatinine phosphokinase. Comparatively, the results with the significant feature as against all other features (see Fig. 6) show a marginal increase in prediction accuracy by the CatBoost (0.3%), RF (0.7%), and ET (1.3%). However, we observed a decrease in the accuracy measure of LightGBM (1%) and GBM (1.4%). The AdaBoost’s accuracy remained unchanged, with 13 features and eight features, as given in Table 5. The computational complexity of the models with eight principal features has been reduced compared with all 13 features, as shown in Table 4. The AdaBoost did well on all 13 features, but it took 38% less training than the eight most important features. Table 4 Models’ performance on all features with a balanced dataset Models
Bal. ACC
AUC
Recall
Prec.
F1
Kappa
TT (sec)
CatBoost
0.915
0.966
0.946
0.901
0.921
0.829
2.715
RF
0.912
0.981
0.952
0.894
0.919
0.822
0.795
ET
0.902
0.981
0.916
0.901
0.904
0.805
0.834
LightGBM
0.921
0.960
0.952
0.907
0.926
0.841
1.045
AdaBoost
0.851
0.925
0.862
0.859
0.854
0.701
0.205
GBM
0.890
0.940
0.946
0.866
0.899
0.780
0.192
Table 5 Models’ performance using eight most significant features with a balanced dataset Models
ACC
AUC
Rec
Prec.
F1
Kappa
TT (sec)
CatBoost
0.918
0.964
0.952
0.902
0.924
0.835
1.671
RF
0.918
0.980
0.964
0.895
0.925
0.834
0.489
ET
0.915
0.983
0.940
0.903
0.919
0.829
0.513
LightGBM
0.912
0.954
0.958
0.890
0.919
0.822
0.643
AdaBoost
0.851
0.924
0.862
0.860
0.855
0.700
0.126
GBM
0.878
0.944
0.940
0.850
0.888
0.755
0.118
394
I. K. Nti et al.
0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8
Balanced accuracy without feature selection Balanced accuracy with feature selection Fig. 6 Accuracy of the six ML models using all features (13) and nine most significant features with ADASYN
All adopted ML models were assessed on a range of performance metrics, using the full 13 features and principal features selected with RF. This research study makes recommendations for which principal features to utilise in conjunction with which classification model while developing an intelligent computational model for predicting mortality among patients with heart failure. As determined by our experimental results, we observed that CatBoost and RF are the optimal classifiers (see Table 5 and Fig. 6). Doctors and other caregivers should use the new method to quickly and accurately spot people who have heart disease at an early stage. Table 6 gives a comparative analysis of the empirical results from this study with literature based on the accuracy (ACC), F1-score, precision/sensitivity (Prec/Sen), and recall (Rec). As seen in Table 6, our proposed technique’s performance is moderate compared with the literature.
5 Conclusions Heart failure is the leading cause of mortality globally, accounting for 17.9 million fatalities each year. As a result, early detection and identification of cardiac disorders can significantly minimise the number of fatalities connected with them. Several MLAs have been created in this area to perform several tasks related to the management of heart failure, including diagnosis, illness categorization, outcome prediction, and response prediction to therapy. A fast-developing corpus of study has concentrated on using machine learning to predict death in heart failure patients. This study proposed an intelligence computationally less complex predictive model based on six tree-based ensemble techniques to prove the prediction accuracy of patients’ mortality due to heart failure. Based on our empirical results, time was the most critical factor in predicting patients’ mortality by heart failure. Followed by ejection fraction, serum sodium, age, and serum creatinine. The prediction results with the
Intelligent Computational Model for Accurate and Early Diagnosis …
395
Table 6 Comparison of the proposed model with literature References
Optimised model
FST
DBT
ACC
F1
Prec./Sen
Recall
[14]
ETC
RF
SMOTE
93%
93%
93%
93%
[8]
MLP-EBMDA
94%
96%
96%
96%
[18]
RF
GBM and SVM
74%
55%
NA
NA
[5]
SVM
ET, XGBoost, and CatBoost
SMOTE
87%
82%
NA
NA
[15]
EHMRG
NA
NA
NA
NA
68%
NA
[1]
DT
NA
80%
71%
79%
65%
[12]
RF
NA
[16]
DT, K-NN, RF, NB, SVM, LR, and BN
NA
Our study
CatBoost
RF
SMOTE
90% 91%
ADASYN
92%
94%
92%
90.2%
95%
FST—feature selection technique and NA—not applicable, DBT—data balancing technique
most significant features against all features show a marginal increase in prediction accuracy by the CatBoost (0.3%), RF (0.7%), and ET (1.3%). Nevertheless, we observed a decrease in accuracy in LightGBM (1%) and GBM (1.4%). The accuracy of the AdaBoost remained unchanged, with 13 features and eight features. The CatBoost and RF outperformed all other ML models used in this study. Notwithstanding the success record in this study, the size of the dataset used was minimal. However, the researchers believe that with an additional dataset, the performance of the models can be enhanced beyond the current performance. Hence, future research explores the addition of medical images on heart diseases to improve accuracy.
Appendix See Table 7.
Table 7 Hyperparameter of models MLA
Hyperparameters
RF
n_estimators = 150, criterion = ‘gini’, max_depth = None, min_samples_split = 2
XGBoost
loss = ‘deviance’, learning_rate = 0.1, n_estimators = 150, subsample = 1.0
ERT
n_estimators = 100, criterion = ‘gini’ or ‘mse’, min_samples_split = 2 (continued)
396
I. K. Nti et al.
Table 7 (continued) MLA
Hyperparameters
DUM
strategy = ‘mean’ or ‘prior.’
AdaBoost
base_estimator = None, n_estimators = 50, learning_rate = 1.0, loss = ‘linear’, algorithm = ‘SAMME.R’
LightGBM
loss = ‘deviance’ or ‘ls’, learning_rate = 0.1, n_estimators = 150, subsample = 1.0
GBM
loss = ‘deviance’, learning_rate = 0.1, n_estimators = 150
References 1. Almazroi AA (2022) Survival prediction among heart patients using machine learning techniques. Math Biosci Eng 19:134–145. https://doi.org/10.3934/mbe.2022007 2. Florio KL, Grodzinsky A (2022) Cardiovascular indexes in the era of preeclampsia: prevention or long-term outcome prediction? J Am Coll Cardiol 79:63–65. https://doi.org/10.1016/j.jacc. 2021.11.003 3. Gao J, Zhang H, Lu P, Wang Z (2019) An effective LSTM recurrent network to detect arrhythmia on imbalanced ECG dataset. J Healthc Eng 2019. https://doi.org/10.1155/2019/6320651 4. Alotaibi FS (2019) Implementation of machine learning model to predict heart failure disease. Int J Adv Comput Sci Appl 10:261–268. https://doi.org/10.14569/ijacsa.2019.0100637 5. Wang J (2021) Heart failure prediction with machine learning: a comparative study. J Phys Conf Ser 2031:0–8. https://doi.org/10.1088/1742-6596/2031/1/012068 6. Ofori-Asenso R, Garcia D (2016) Cardiovascular diseases in Ghana within the context of globalisation. Cardiovasc Diagn Ther 6:67–77. https://doi.org/10.3978/j.issn.2223-3652.2015. 09.02 7. Banerjee A, Chen S, Fatemifar G, Zeina M, Lumbers RT, Mielke J, Gill S, Kotecha D, Freitag DF, Denaxas S, Hemingway H (2021) Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. BMC Med 19:1–14. https://doi.org/10.1186/s12916-021-01940-7 8. Deepika D, Balaji N (2022) Effective heart disease prediction using novel MLP-EBMDA approach. Biomed Signal Process Control 72:103318. https://doi.org/10.1016/j.bspc.2021. 103318 9. Kagiyama N, Shrestha S, Farjo PD, Sengupta PP (2019) Artificial intelligence: practical primer for clinical research in cardiovascular disease. J Am Heart Assoc 8. https://doi.org/10.1161/ JAHA.119.012788 10. Passantino A (2015) Predicting mortality in patients with acute heart failure: role of risk scores. World J Cardiol 7:902. https://doi.org/10.4330/wjc.v7.i12.902 11. Callahan A, Shah NH (2017) Machine learning in healthcare. In: Key advances in clinical informatics. pp 279–291. Elsevier. https://doi.org/10.1016/B978-0-12-809523-2.00019-4 12. Giridhar US, Gotad Y, Dungrani H, Deshpande A, Ambawade D (2021) Machine learning techniques for heart failure prediction: an exclusively feature selective approach. In: 2021 International conference on communication information and computing technology (ICCICT). IEEE, pp 1–5. https://doi.org/10.1109/ICCICT50803.2021.9510091 13. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 12:e0174944. https:// doi.org/10.1371/journal.pone.0174944 14. Ishaq A, Sadiq S, Umer M, Ullah S, Mirjalili S, Rupapara V, Nappi M (2021) Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 9:39707–39716. https://doi.org/10.1109/ACCESS.2021.3064084
Intelligent Computational Model for Accurate and Early Diagnosis …
397
15. Radhachandran A, Garikipati A, Zelin NS, Pellegrini E, Ghandian S, Calvert J, Hoffman J, Mao Q, Das R (2021) Prediction of short-term mortality in acute heart failure patients using minimal electronic health record data. BioData Min 14:1–15. https://doi.org/10.1186/s13040021-00255-w 16. Plati DK, Tripoliti EE, Bechlioulis A, Rammos A, Dimou I, Lakkas L, Watson C, McDonald K, Ledwidge M, Pharithi R, Gallagher J, Michalis LK, Goletsis Y, Naka KK, Fotiadis DI (2021) A machine learning approach for chronic heart failure diagnosis. Diagnostics 11:1863. https:// doi.org/10.3390/diagnostics11101863 17. Muhammad Y, Tahir M, Hayat M, Chong KT (2020) Early and accurate detection and diagnosis of heart disease using intelligent computational model. Sci Rep 10:1–17. https://doi.org/10. 1038/s41598-020-76635-9 18. Chicco D, Jurman G (2020) Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMCMedical Inf Decis 5:1–16. https://doi. org/10.1186/s12911-020-1023-5 19. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328. https://doi.org/10.1109/ IJCNN.2008.4633969 20. Nti IK, Adekoya AF, Weyori BA (2019) A systematic review of fundamental and technical analysis of stock market predictions. Artif Intell Rev 53:3007–3057. https://doi.org/10.1007/ s10462-019-09754-z 21. Nti IK, Akyeramfo-Sam S, Bediako-Kyeremeh B, Agyemang S (2021) Prediction of social media effects on students’ academic performance using machine learning algorithms (MLAs). J Comput Educ. https://doi.org/10.1007/s40692-021-00201-z 22. Nti IK, Nyarko-Boateng O, Adekoya AF, Arjun R (2021) Network intrusion detection with StackNet: a phi coefficient based weak learner selection approach. In: 2021 22nd International Arab conference on information technology (ACIT). IEEE, pp 1–11. https://doi.org/10.1109/ ACIT53391.2021.9677338 23. Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimisation. Geosci Front 12:469–477. https://doi.org/10.1016/j.gsf.2020.03.007 24. Bai J, Li Y, Li J, Yang X, Jiang Y, Xia S-T (2022) Multinomial random forest. Pattern Recognit 122:108331. https://doi.org/10.1016/j.patcog.2021.108331 25. Alsariera YA, Adeyemo VE, Balogun AO, Alazzawi AK (2020) AI meta-learners and extratrees algorithm for the detection of phishing websites. IEEE Access 8:142532–142542. https:// doi.org/10.1109/ACCESS.2020.3013699 26. Kumar M, Gupta S, Gao X-Z, Singh A (2019) Plant species recognition using morphological features and adaptive boosting methodology. IEEE Access 7:163912–163918. https://doi.org/ 10.1109/ACCESS.2019.2952176 27. Sahin EK (2020) Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int 1–25. https://doi.org/10.1080/10106049.2020.1831623
Genetic Algorithm-Based Clustering with Neural Network Classification for Software Fault Prediction Pushpendra Kumar Rajput , Aarti , and Raju Pal
Abstract Early prediction of faults enhances software quality and thereby affects the lower cost. However, early prediction of software faults is a crucial task and may lead to the serious issues like underestimation and overestimation. Such misunderstood situations occur when it comes to the predictions of faulty modules during the early stages of the development life cycle. In this direction, a large community of researchers proposed method-level and object-level metrics to achieve project quality. Uncertainty and lack of data availability allowed researchers to perform diverse experimental explorations. Similar project experience developed in the past has played a vital role in analogy-based approaches. Extensive research is carried out on publicly available datasets from the promise repository. In this paper, the authors propose two different methods using an analogy-based approach for fault prediction at early stages. At first, we developed a comprehensive hybrid model integrating genetic algorithm and regression. The second method employs clustering by fusing neural network features for classification with best-fitted reduced parameters that deal with uncertainty. The proposed model achieves 95% accuracy which is better than other considered methods. Experimental design and result findings show validity and phenomenal outcome of proposed methods. Keywords Software fault prediction · Object-oriented (OO) metrics · Classification · Clustering · Genetic algorithm
P. K. Rajput (B) School of Computer Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand 248007, India e-mail: [email protected] Aarti Lovely Professional University, Phagwara, Punjab 144411, India R. Pal CSE & IT Department, Jaypee Institute of Information Technology, Noida, Uttar Pradesh 201304, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_28
399
400
P. K. Rajput et al.
Nomenclature Abbreviation GA ANN FCM NASA OO ROC
Description Genetic algorithm Artificial neural network Fuzzy C-Means National aeronautics and space administration Object-oriented Receiver operating characteristic
1 Introduction Reducing cost and offer better quality of the product are two major responsibilities of Software Quality Assurance (SQA) in any software industry. SQA involves supervising and administrating software development process. It performs various tasks, such as formal source code scrutiny, code guidance, software checking and fault estimation throughout life cycle [1]. The techniques used for fault predictions are always aiming ease the allotment of SQA resources effectively and economically [2]. Earlier identification of the issue will prompt the faultless product and distribution of excellent software projects. Quality is correlated with defects, so fault prediction techniques help in removing faults that requires the attention of the developers at all stages of development process for assuring quality product and customer satisfaction. In last two decades, early estimation of fault proneness of code module has gained extensive consideration. Prior issue forecast investigations utilized a wide scope of learning algorithms to anticipate the issue inclination of programming modules [3, 4]. However, selection of software metrics, datasets and learning approaches are some critical aspects in fault prediction modeling. Different researchers and practitioners employed different group of software metrics in their approaches [5]. Object-oriented (OO) design metrics are one the popular group of metrics among practitioners. To deal with these metrics, the main construct for consideration is class as software module [6]. Moreover, the object-oriented metrics perform better than Line of Code (LOC) and complexity metrics in the situation of independent variables [7]. Software mining techniques allow engineers to allocate budgets, resources and efforts to those modules that are more prone to errors. Mining is efficient and adequate approach to allocate the fair number of resources during software development life cycles than testing [8]. Analogy-based software fault prediction employs historical software faults [9]. Various similarity measures and learning criteria are being used to find out error proneness in software components [10]. Numerous software researchers and practitioners used distinctive techniques such as Naive Bayes [1], Genetic Programming [11], Decision Trees [12], neural networks [13], Casebased Reasoning, fuzzy logic [14] and logistic regression [15] to predict faults. Researchers used two approaches to handle faultiness behaviors. One way is to
Genetic Algorithm-Based Clustering with Neural Network …
401
use bug-localization techniques to establish relationship between faults and objectoriented metric. Another way is to use characteristics of faultiness in each segment to predict faultiness behavior of new projects. Siy [16] predict faults from change history using generalized linear model and weighted time damp model with delta predictors that characterize number of changes made to system. Shepperd [15] also utilized linear regression to construct and evaluate prediction system. Menzies [17] et al. proposed border-flow clustering based on intraclass prediction mechanism by extracting dependencies among classes and presented empirical investigation on 29 releases of open-source projects from PROMISE repository by using stepwise linear regression. Capretz [18] validated OO metrics from datasets of NASA repository using neuro-fuzzy approach to deal with the heuristic and imprecise inputs. They used ANFIS structure to identify the relationship between metrics and faultiness and identified RFC, WMC and SLOC as strongest correlated metrics with fault proneness. For the implementation of ANFIS, they used K-mean clustering to clustered different project with similarity measures. They found SLOC as more effective metric as RMSE value for this metric found smaller than other metrics. Catel et al. [19] proposed clustering-based approach on datasets from Turkish white good manufacturer. They discuss about the effectiveness of metrics and selection of clusters done heuristically using threshold-based approach. Basili et al. [20] used logistic regression to know effects of OO metrics on medium-sized systems and found all metrics except LCOM as predictors for fault proneness. Raju [21] developed a support vector machine and performed fault prediction with cross-validation of historical project dataset. Alsghaier [22] proposed a hybrid mechanism fusing whale algorithm and genetic algorithm. The finding shows that proposed approach performs better for mentioned datasets. Feature selection is another perspective identified by researchers, which is playing an important role in fault prediction techniques [23]. In this article, we introduced a comprehensive hybrid model integrating genetic algorithm and regression analogy-based methods and performed empirical implementation using AR1, AR2 and AR3 datasets. We developed a comprehensive hybrid model integrating genetic algorithm and regression. It uses Fuzzy C-Means algorithm for grouping the projects at first phase and in second phase, the method generates regression equation for each cluster with optimized parameters with GA. Further, the neural network with optimized parameters is employed on it for further classification of new projects [24]. The organization of remaining part of the paper is described as follows. Section 2 revisits modeling techniques used for software fault prediction along with a description of datasets followed by a discussion about the evaluation criteria. Section 3 deliberates about the methodology used for fault proneness. Section 4 comprises a detailed description of proposed algorithm with design of experiments that demonstrate the use of proposed technique. Section 5 summarizes the results and effectiveness of proposed work. At the end, authors conclude performed work and findings in Sect. 6.
402
P. K. Rajput et al.
2 Background 2.1 Modeling Techniques Genetic Algorithm Genetic algorithm (GA) is a heuristic approach based on evolutionary process of natural selection and genetics to solve optimization problems. GA works with “survival of fittest” principle to simulate process for evolutions [25]. GA is analogous to genetic structure and chromosome within population competes for the mates. The population reflects a mapping of possible solution in the form of chromosome. A chromosome comprises genes, which are represented with binary strings. Survivor produces more offspring into the search space. Selection, crossover and mutation are the three major operation in GA. Selection of the individual depends on the fitted value that follows basic principle of “survival of the fittest.” Crossover initiates reproduction process. Randomly selected strings perform mating. Larger fitness value offers higher chances of selection for reproduction. To avoid local optima, mutation operation randomly modifies few bits of produced strings. This way of randomness provides diversity into the search space that makes it different from conventional optimization techniques. The process of GA is illustrated in Fig. 1. Fig. 1 GA operations
Population
Regression equation
Mutation
Optimization of regression parameters
Crossover
Selection Stopping criteria met?
Stop
Genetic Algorithm-Based Clustering with Neural Network …
403
Artificial Neural Network Artificial neural network (ANN) works with “what-f” analysis criteria that enhances its ability for adapt to learn. Adaptive learning and self-organizing process create its own organization during learning process [26]. ANN consists of processing units (neurons) that communicate by sending signals with the help of weighted connection to each other. An activation function and external input (bias) along with learning rules give capability of being trained. Back-propagation calculates activation function by computing summation of inputs with weights as depicted in Eq. 3. net j =
xi. wi, j
(3)
i
Neuron calculates activity yi using sigmoid function defined in Eq. 4. xj =
1 1 + enet j
(4)
The activation function transfers network input and previous activation state to new activation state. During starting phase, weights of each layer are randomly chosen. Then output of each hidden layer is calculated using Eq. 5. xh j = f
n
wi j xi − b j
(5)
i=1
where b j is a bias term used for jth output. At last, error sum of square is calculated, and weights are adjusted to lower error using Eq. 6. 1 (yi − y)2 2 k=1 n
E=
(6)
Fuzzy C-Means (FCM) Clustering FCM is a delicate clustering technique. In place of assigning one point to one cluster, the approach defines probability-based score to have a place in each cluster [27]. The stepwise procedure of FCM is as follows: Input: Feature extracted dataset and number of clusters. Output: k clusters with data values. Step 1: Randomly select initial centroids by choosing k out of n values. Step 2: Calculate the membership function for each attribute using Eq. 7. k 2/m−1 1 µi j = di j /di p p=1
(7)
404
P. K. Rajput et al.
Step 3: Calculate the degree of membership di j using Eq. 8 for each data value and assign each data value to the cluster for which the Euclidean distance is minimum.
m 2 xi p − x j p (8) di j = p=1
Step 4: Calculate the new centroid of the cluster based on the mean value of membership degree of corresponding cluster through Eq. 9: vj =
n
µi j
m
n m µi j xi , ∀ j = 1, 2, . . . , k
i=1
(9)
i=1
Step 6: Repeat steps 3–5 until the values obtained denote stable cluster values. Step 6: Stop.
2.2 Datasets The study uses well-defined datasets, available publicly for research use. The datasets are available in PROMISE repository (www.Promisedata.org) for NASA-related projects. The study chose following datasets based upon suitability of the model. AR1: This is obtained from Turkish white-goods manufacturer given by Software Research Laboratory (Softlab), Bogazici. This dataset contains 121 projects with 30 attributes having 112 fault-prone and 9 non-faulty modules. Out of 30 attributes, one attribute is used to indicate faultiness behavior of module and rest 29 attributes are static code parameters. The statistical description is given in Table 1. AR3: The AR3 dataset is a collection of projects collected from http://opensc ience.us. This dataset originally contains 63 projects with 30 attributes. Out of 63 projects, 8 are defect-prone and total defect distribution is 12.90%. The descriptive statistics of the dataset is given in Table 2. Table 1 Descriptive statistics for defect attribute of AR1 dataset
Fault
Frequency
0
112
92.6
92.6
1
9
7.4
100.0
121
100.0
Total
Percent
Cumulative percent
Genetic Algorithm-Based Clustering with Neural Network … Table 2 Descriptive statistics for defect attribute of AR3 dataset
Fault
Frequency
0
55
87.3
87.3
1
8
12.7
100.0
63
100.0
Total
Database
405
FCM Clustering
Percent
Cumulative percent
Generate regression for each cluster
of faulty data
New project as input
Neural Network Generation
Classify new project for cluster
Optimize parameters using GA
Prediction of fault proneness
Fig. 2 Proposed methodology
3 Proposed Methodology The proposed methodology depends on clustered procedure and utilized for fault estimation utilizing regression. The proposed technique uses FCM algorithm for grouping the projects at first phase. In second phase, the method generates regression equation for each cluster with optimized parameters. The parameters are optimized using GA for individual cluster. In the last phase of methodology, the neural network acquires a supervised learning and employs it for further classification of new projects. Identification of best suitable cluster reasoned the user to estimate the fault using respective regression equation. The process of proposed methodology, which comprises all three phases, is depicted in Fig. 2.
4 Experimental Design To validate proposed method, we implemented two experiments. The experiments also demonstrate the effectiveness of proposed methodology with a comparative analysis of well-known machine learning techniques used in software fault prediction.
406 Random population of n chromosome is generated
P. K. Rajput et al.
Weights are extracted
Tune the parameters using weights
Fitness value is computed
Crossover performed
Replace fitness value with maximum fitness value
Stopping criteria met?
Use for testing
Fig. 3 Optimization process of GA
Experiment 1: In this first phase of implementation, the author demonstrated the effectiveness of proposed methodology using genetic algorithm. The experiment utilizes two regression models for fault prediction; one without optimization and second with optimized parameters received with GA. The process for optimization using GA is depicted in Fig. 3. The experiment is implemented with genetic toolbox of MATLAB2012. The following steps give detailed description of experiment. Input: Regression equation of dataset as a whole. Output: Optimized parameter values of equation and prediction of faults. Training process Step 1: Find the regression equation for whole dataset. Step 2: Apply genetic algorithm to optimize parameters for regression equation. Step 3: Use the original and optimized parameters of regression equation for prediction. Testing Process Step 1: Use original parameters of regression for prediction of faultiness behavior of new project. Step 2: Use optimized parameter for prediction of fault proneness. Experiment 2: In this experiment, the authors addressed the inherent nonlinearity in the data available for software fault prediction. The authors made the use of certain relationship between the data to reduce it classifying the data into different clusters. To find cluster for the feature extracted attributes, we use the FCM algorithm. The steps below describe the implementation of experiment using FCM. Training process Step 1: Clustering dataset into three clusters
Genetic Algorithm-Based Clustering with Neural Network …
407
Step 2: Find regression equation for each cluster. Step 3: Apply genetic algorithm to optimize parameters for each regression equation. Step 4: Build two models using actual and optimized parameters values of regression equation. Step 5: Train the neural network using clustered data. // the input layer to NN is project attributes and output is corresponding cluster number. Step 6: Prediction of fault proneness of new project. Testing Process Classification: The new project is fed into NN model to identify to which cluster it belongs. Prediction: Both original and optimized parameter are used for prediction of faultiness behavior of new projects.
5 Results and Discussion The dataset is divided into training and validation sets where 70% samples are considered as the training set taken randomly from the dataset, while the testing set contains the remaining 30% samples. Therefore, for AR1 dataset, 88 projects are taken as training set and 33 projects are taken for validation set. For training, univariate linear regression and logistic regression are used. Then, the coefficients of regression equation are optimized using genetic algorithm. The parameter settings for GA are taken as per the literature [14]. Due to enormous number of attributes, we used correlation-based strategy to quantify the association between dependent and independent attributes. The attributes are selected using Pearson Product Moment correlation coefficient, which is given in Table 3. Reducing the number of attributes is main goal for developing prediction-based model. Regression analysis is related to access the relationship given by correlation Table 3 Correlation using Pearson’s correlation for AR1 dataset Total Halstead operands length
Total operators
Cyclomatic Normalized Halstead density cyclomatic effort complexity
Halstead time
Branch count
0.10
0.10
0.14
0.16
0.16
0.15
0.16
Decision Cyclomatic Condition Multiple count complexity count condition count
0.10
Decision density
Unique Halstead Defects operators difficulty
0.16
0.21
0.27
0.17
0.18
0.19
0.29
1.00
408
P. K. Rajput et al.
for prediction mechanism. It quantifies strength and direction of linear relationship between range −1 to +1 where sign indicates direction and magnitude indicates strength of relationship. Confusion matrix-based criteria is employed to assess the performance of proposed model. The authors also present Receiver Operating Characteristic (ROC) curve properties for showing performance of proposed model. The quality of model depends upon the number of variables selected for logistic and linear regression, as measured by ROC parameters. A genetic algorithm is applied to search best-suited optimized parameters for regression equation. Then during the testing phase, new projects are taken to identify faultiness behavior of new projects. The coefficient given in Table 4 provides us the necessary information to predict the faults from the other metrics of datasets and also provides information about how faults of historical projects statistically significant to the model. Sig. (p-value) indicates the statistical significance of the regression model. If p-value is less than 0.05, then we considered regression model as statistically significant and if it greater than 0.05, then it is considered as non-statistically significant. Std. Error indicates that number of observed values falls below the regression line. On the other side, it also discusses about how wrongly regression model uses entities of response variable. The coefficient table computed using linear regression is given in Table 4. Second approach used for first experiment is logistic regression that predicts probability of an output based on two values (dichotomy). Sometime, linear regression is not applicable for predicting the binary values because it predicts the result outside the acceptable range. The coefficient using logistic regression is given in Table 5. To access the applicability of prediction, TP (true positives), TN (true negatives), FP (false positives) and FN (false negatives) are evaluated. Higher the values of TP rate, more acceptable prediction will be there. A descriptive summary is given in Table 6, using different evaluation criteria, namely true negative ratio (TNR), true positive ratio (TPR), false negative ratio (FNR) and false positive ratio (FPR). The finding clearly indicates that estimated result using optimized parameters of regression produces superior results than other regression techniques. The value of TNR and TPR for linear regression using optimized parameters is higher than logistic regression. The value of FNR is lower for optimized parameters than linear and logistic regression. The simulation result for optimization of regression coefficients is shown in Fig. 4. Figure 4a shows best fitness value at 147.25 during optimization and Fig. 4b shows diagrammatical representation for the optimized values of individual metrics. Regression analysis is always not convenient and appropriate method for data. Then suitability of model can be accessed by defining the residuals. Residual is basically difference between observed (y) and expected value yˆ defined by Eq. 16 Residual(e) = Observed(y) − Predicted yˆ
(16)
The studentized residual and studentized deleted residual is obtained for those observation when regression have to re-run for omitting the observation. These regressions are helpful to pull regression line close to expected observation. The
Genetic Algorithm-Based Clustering with Neural Network …
409
Table 4 Coefficient for linear regression Unstandardized coefficients Metric
B
Std. error
Sig.
(Constant)
−0.23
0.32
blank_loc
0.07
0.04
0.06
comment_loc
0.00
0.01
0.99
code_and_comment_loc
−0.13
0.11
0.24
executable_loc
−0.01
0.02
0.61
unique_operands
−0.05
0.02
0.03
unique_operators
0.10
0.04
0.02
total_operands
0.03
0.02
0.12
total_operators
−0.01
0.02
0.57
halstead_volume
0.00
0.01
0.46
halstead_level
0.14
0.23
0.55
halstead_difficulty
−0.05
0.03
0.14
halstead_effort
0.00
0.00
0.92
halstead_error
−9.56
9.12
0.30
decision_count
0.01
0.14
0.93
condition_count
0.04
0.05
0.34
multiple_condition_count
−0.01
0.11
0.92
cyclomatic_complexity
−0.05
0.13
0.71
cyclomatic_density
−0.10
0.59
0.87
decision_density
0.04
0.07
0.59
design_complexity
−0.01
0.02
0.56
design_density
0.06
0.04
0.14
normalized_cyclomatic complexity
0.16
0.65
0.81
formal_parameters
−0.05
0.05
0.36
residuals values for linear regression are given in Table 7. Quality of test depends upon sensitivity and specificity of a diagnostic test. The estimation using ROC curve based on statistical method (Regression analysis) called maximum likelihood estimation (MLE). One of the most popular measures for ROC is area under the curve (AUC). Several nonparametric tests are used for evaluating ROC. The comparison of ROC parameters using regression and geneticbased regression is given in Table 8. Our proposed technique uses fuzzy clustering with univariate regression strategy for classification of the new projects. Table 9 provides comparison of TPR, TNR, FPR and FNR parameters with simple logistic and linear regression for AR1 and AR3 datasets. Lower values of FPR and FNR indicated that model has accurate estimate. Use of proposed clustering strategy to deal with heterogeneity provides better results
410
P. K. Rajput et al.
Table 5 Coefficient for logistic regression Metric
B
total_loc
6.53
S.E. 8422.57
Df
Sig.
Exp(B)
1.00
1.00
686.46
blank_loc
6.09
7963.17
1.00
1.00
442.32
comment_loc
−12.32
25,282.18
1.00
1.00
0.00
code_and_comment_loc
−159.95
124,328.97
1.00
1.00
0.00
unique_operands
5.76
34,391.66
1.00
1.00
318.55
unique_operators
48.51
55,529.48
1.00
1.00
1.6 × 1021
total_operands
−21.23
34,413.94
1.00
1.00
0.00
total_operators
18.78
16,598.37
1.00
1.00
143,045,211.61
halstead_volume
9.87
6461.45
1.00
1.00
19,326.15
halstead_level
−1017.58
666,145.29
1.00
1.00
0.00
halstead_difficulty
−29.20
60,528.59
1.00
1.00
0.00
halstead_effort
0.01
28.15
1.00
1.00
1.01
halstead_error
−28,047.24
11,798,229.70
1.00
1.00
0.00
branch_count
−150.88
96,623.15
1.00
1.00
0.00
call_pairs
−28.33
18,316.13
1.00
1.00
0.00
condition_count
80.99
62,234.42
1.00
1.00
1.6 × 1035
multiple_condition_count
215.00
159,262.46
1.00
1.00
2.3 × 1093
cyclomatic_complexity
152.89
180,983.48
1.00
1.00
2.5 × 1066
cyclomatic_density
105.06
1,132,561.23
1.00
1.00
4.2 × 1045
decision_density
23.43
73,648.41
1.00
1.00
14,927,092,578.43
design_density
101.76
50,868.16
1.00
1.00
1.5 × 1044
normalized_cyclomatic complexity
1200.51
699,912.81
1.00
1.00
0.00
formal_parameters
−56.42
17,069.68
1.00
1.00
0.00
Constant
−788.587
1
0.999
0.000
548,960.388
Table 6 Comparison of proposed model for TNR, FPR, FNR and TPR values for regression techniques with optimized parameters using genetic algorithm Technique used
TNR
FPR
FNR
TPR
Logistic regression
0.90
0.09
0.85
0.14
Linear regression
0.97
0.028
0.5
0.5
Linear regression with optimized coefficient
0.97
0.029
0.33
0.66
than simple logistic and linear regression. AR1 dataset provides better results for TNR and TPR with clustering techniques than simple linear and logistic regression. For AR3 dataset, 44 projects are taken as training purposes and 19 projects are taken as testing purposes. The results for all evaluation criteria are given in Tables 9 and 10. The precision rate for AR1 is 0.50 and for AR3 is 0.80, which is higher than
Genetic Algorithm-Based Clustering with Neural Network … x 10
Fitness value
2
5
411
Best: 1475.25 Mean: 2016.52 Best fitness Mean fitness
1
0
0
20
80
40 60 Generation
100
Current best individual
(a)
Current Best Individual
5
0
-5
1
2
3
4
5 6 7 8 9 10 11 12 13 14 15 Number of variables (15)
(b) Fig. 4 Simulation results to optimize parameters of AR1 dataset using genetic algorithm Table 7 Residual values for linear regression Predicted value
Minimum
Maximum
Mean
Std. deviation
−0.23
0.61
0.07
0.141
Std. predicted value
−2.161
3.776
0.000
1.000
Standard error of predicted value
0.060
0.225
0.104
0.037
Adjusted predicted value
−0.40
1.04
0.07
0.175
Residual
−0.543
0.882
0.000
0.223
Std. residual
−2.194
3.565
0.000
0.899
Stud. residual
−3.036
3.728
0.000
1.026
Deleted residual
−1.040
0.973
−0.001
0.296
Stud. deleted residual
−3.175
4.007
0.010
1.068
Table 8 Comparison of ROC parameters for regression techniques with optimized parameters using genetic algorithm Technique used
Precision Recall pf
Accuracy Specificity Sensitivity
Logistic regression
0.13
0.14
0.09 0.85
0.91
0.14
Linear regression
0.50
0.50
0.03 0.95
0.97
0.50
Linear regression with optimized 0.67 coefficient
0.67
0.03 0.95
0.97
0.67
412
P. K. Rajput et al.
Table 9 Comparison of proposed model for TPR, TNR, FNR and TPR with and without clustering Dataset
Technique used
TNR
FPR
FNR
TPR
AR1
Linear regression
0.97
0.028
0.5
0.5
AR3
Logistic regression
0.90
0.09
0.89
0.11
Linear regression with FCM clustering
0.97
0.03
0.66
0.33
Linear regression
0.85
0.14
0.2
0.8
Logistic regression
0.93
0.06
0.75
0.25
Linear regression with FCM clustering
0.92
0.07
0.2
0.8
other two techniques. The probability of false alarm in both dataset decreases than other techniques. Figure 5 indicates regression lines for both training and testing purposes. Figure 5a, b shows regression lines where R value during training is 0.68 and during testing is 0.94. For better fit, R value should be higher and R values during testing phase is higher than training phase. Feed forward neural network is used for classification purposes for clustering algorithm. The best performance for clustering found at 0.6138 shown in Fig. 5c. The comparisons of ROC parameters for both regressions based as well as clustering based are given in Table 10. This prediction mechanism improves values for precision, recall and accuracy. The value for accuracy, precision and sensitivity for hybrid model is higher than the regression strategy that specifies that proposed algorithm have highly prediction accuracy. Table 10 Comparison of ROC parameters with and without clustering strategy for regression DataSet
AR1
AR3
Precision
Recall
Probability of false alarm(pf)
Accuracy
Specificity
Sensitivity
Linear regression
0.50
0.50
0.03
0.95
0.97
0.50
Logistic regression
0.08
0.11
0.10
0.84
0.90
0.11
Linear regression with FCM clustering
0.50
0.33
0.03
0.92
0.97
0.33
Linear regression
0.67
0.80
0.14
0.84
0.86
0.80
Logistic regression
0.50
0.25
0.06
0.80
0.94
0.25
Linear regression with FCM clustering
0.80
0.80
0.07
0.89
0.93
0.80
Genetic Algorithm-Based Clustering with Neural Network … Mean Squared Error
Fit
Fit 2
0
Data
3
Data
3
2
Y=T
Y=T
1
1 0
0
-1
-1
413
10
Train Validation
-10
10
Test Best
0
-1
0
1
2
Target
(a)
3
-1
0
1
Target
(b)
2
3
2
4
6
8
10
12
Epochs
(c)
Fig. 5 Simulation Results for regression line for AR1 dataset: a training, b testing and c comparison of regression line for training, testing and validation phase
6 Conclusion Uncertainty and lack of data availability may lead the fault estimation to underestimation or overestimation. Both such scenarios arise challenges for management and development of software. To deal with nonlinearity and uncertainty in the data, we proposed a GA-based clustered method for prediction of fault proneness using NN classification. The historical data is used to train the model, and new projects are classified based on trained model. GA is used to optimize parameters for regression coefficient and achieves 95% accuracy on the considered datasets. The findings in the study verify that proposed hybrid model consisting of GA with NN classification performs better and produces significant results over the regression model. In the future, advanced optimization and classifier techniques can be embedded with the proposed hybrid approach. Furthermore, more statistical tools and methods can be applied to test the validity of the model.
References 1. Rathore SS, Kumar S (2021) An empirical study of ensemble techniques for software fault prediction. Appl Intell 51(6):3615–3644 2. Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327 3. Pandey SK, Mishra RB, Tripathi AK (2021) Machine learning based methods for software fault prediction: a survey. Expert Syst Appl 172:114595 4. Sharma D, Chandra P (2018) Software fault prediction using machine-learning techniques. In: Smart computing and informatics. Springer, Singapore, pp 541–549 5. Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273 6. Dejaeger K, Verbraken T, Baesens B (2012) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Software Eng 39(2):237–257
414
P. K. Rajput et al.
7. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Software Eng 38(6):1276–1304 8. Rhmann W, Ansari GA (2021) Ensemble techniques-based software fault prediction in an opensource project. In: Research anthology on usage and development of open source software. IGI Global, pp. 693–709 9. Kaur I, Kaur A (2021) Comparative analysis of software fault prediction using various categories of classifiers. Int J Syst Assur Eng Manag 12(3):520–535 10. Singh A, Bhatia R, Singhrova A (2018) Taxonomy of machine learning algorithms in software fault prediction using object oriented metrics. Procedia Comput Sci 132:993–1001 11. Wu Q (2011) Fuzzy fault diagnosis based on fuzzy robust v-support vector classifier and modified genetic algorithm. Expert Syst Appl 38(5):4882–4888 12. Bishnu PS, Bhattacherjee V (2011) Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150 13. Jin C, Jin SW, Ye JM (2012) Artificial neural network-based metric selection for software fault-prone prediction model. IET Softw 6(6):479–487 14. Bhattacharya S, Rungta S, Kar N (2013) Software fault prediction using fuzzy clustering & genetic algorithm. Int J Dig Appl Contemp Res 2(5) 15. Shepperd M, Cartwright M, Kadoda G (2000) On building prediction systems for software engineers. Empir Softw Eng 5(3):175–182 16. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Software Eng 26(7):653–661 17. Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: 28th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 640–645 18. Aarti, Sikka, G, Dhir R (2019) Empirical validation of object-oriented metrics on cross-projects with different severity levels. Int J Comput Syst Eng 5(5–6):304–332 19. Aarti, Sikka G, Dhir R (2020) Grey relational classification algorithm for software fault proneness with SOM clustering. Int J Data Min, Model Manag 12(1):28–64 20. Rathore SS, Kumar S (2015) Predicting number of faults in software system using genetic programming. Procedia Comput Sci 62:303–311 21. Raju KS, Murty MR, Rao MV, Satapathy SC (2018) Support vector machine with k-fold cross validation model for software fault prediction. Int J Pure Appl Math 118(20):321–334 22. Alsghaier H, Akour M (2021) Software fault prediction using Whale algorithm with genetics algorithm. Softw: Pract Experience 51(5):1121–1146 23. Cui C, Liu B, Li G (2019) A novel feature selection method for software fault prediction model. In: 2019 Annual reliability and maintainability symposium (RAMS). IEEE, pp 1–6 24. Seber GA, Lee AJ (2012) Linear regression analysis, vol 329. Wiley 25. Mirjalili S (2019) Genetic algorithm. In: Evolutionary algorithms and neural networks. Springer, Cham, pp 43–55 26. Wang SC (2003) Artificial neural network. In: Interdisciplinary computing in java programming. Springer, Boston, MA, pp 81–100 27. Bezdek JC, Ehrlich R, Full W (1984) FCM: The fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Nodule Detection and Prediction of Lung Carcinoma in CT Images: A Relative Study of Enhancement and Segmentation Methods K. A. Nyni and J. Anitha
Abstract Image enhancement and segmentation plays an indispensable role in the accurate analysis of affected nodules in lung Computed Tomography (CT) images. Computer-aided detection or diagnosis has become very crucial in the healthcare system for fast detection of lung cancer. The radiologist has a difficult time in correctly identifying the cancerous lung nodules. Because of the vast number of patients, radiologists frequently overlook malignant nodules in imaging. Many recent studies in the field of automated lung nodule diagnosis have revealed significant improvements in radiologist performance. When detecting pulmonary nodules, imaging quality must be taken into account. This has prompted us to investigate the pre-processing stage of lung CT images, which includes a contrast enhancement and segmentation stage. In this paper, different lung nodule enhancement and segmentation methods are compared. The different enhancement methods compared are Histogram Equalization (HE), Contrast Limited Adaptive Histogram Equalization (CLAHE), Image Complement (IC), Gamma Correction (GC) and Balanced Contrast Enhancement Technique (BCET). The five different segmentation methods compared are Adaptive Image Thresholding (AIT), Flood Fill Technique (FFT), Fast Marching Method (FMM), Grayscale Intensity Difference (GSID) and Watershed Segmentation (WS). Keywords Image enhancement · Segmentation · CT images · Lung nodule
K. A. Nyni (B) Department of Mechatronics Engineering, Jyothi Engineering College, Cheruthuruthy, Kerala, India e-mail: [email protected] K. A. Nyni · J. Anitha Department of Electronics & Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_29
415
416
K. A. Nyni and J. Anitha
1 Introduction The second dominant cause of death worldwide is the cancer that accounts for nearly 10 million deaths in 2020 [1–3]. It has a pernicious effect on human lives [4, 5]. Cancer is a state that affects any part of the body and one of the features includes rapid growth of abnormal cells. In cancer, the normal cells are transformed into tumour cells that normally get progressed from benign lesions to a malignant tumour in a multistage process. The prevalence of cancer escalates with age and if not identified early, treatments would not be effective. By avoiding the risk factors and executing different prevention procedures, 30–50% of cancers can be prevented [6]. The treatment for cancer can be made more effective if diagnosed early. The mortality rate can be reduced if cancer is diagnosed early, and the doctors can ensure proper timely treatment. With drastic improvement in the technology, computer-aided detection and diagnosis (CADe & CADx) systems [7–14] plays a vital role for providing screening assistance to radiologists. It aids to detect small-sized nodules that are missed and misdiagnosed. Thus, the accuracy of lung nodule detection can be improved [15–17]. The nodule detection should be made possible by reducing the false positivity and this should be the intention of all findings. The CADe system tries to include all the nodule candidates thus detecting the false positives that can change the results of the diagnosis. Though the CADe systems aids in false positive reduction and improves the sensitivity, there are chances to miss some of the nodules. The CADe systems are designed to recognize the nodules taking into consideration different characteristics [18–20]. One of the most dangerous state that affect human lives is cancer [21]. As per data obtained from Global Burden Cancer [2], there are 14.1 million cases [3] of cancer in the world and lung cancer being the cause of death took first place with a percentage of 19% [22] and the cancer deaths recorded 8.2 million deaths. Hence, early detection of cancerous nodules plays a cardinal role to avoid cancer growth and spread. Previously, researches were conducted to analyse lung cancer using clustering method in microarray data [23, 24] and detection of lung cancer with common image processing techniques obtained good results and accuracy [25]. In this study, a relative analysis of image enhancement and segmentation techniques are done in investigating the effect of lung cancer using different image processing methodologies. Lung CT images are examined using different enhancement and segmentation [16, 26–31] methods for lung cancer detection system.
2 Nodule Detection System A generalized nodule detection system is shown in Fig. 1. It consist of four stages that determines the presence of cancerous lung nodule. Image scanning is the first stage wherein the input lung CT image is read. From the Luna-16 lung cancer data set, 100 images are analysed [32] in this paper. Next, image enhancement that improves the quality of the image is used. Different filtering
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
417
Input Lung CT Image
Image Preprocessing Image Segmentation methods
Image Enhancement methods
Feature Extraction
Nodule Detection & Prediction of Cancer Fig. 1 Stages of nodule detection system
methods are used to enhance different properties of the image. In the enhanced image, the nodules need to be identified and that is achieved by segmentation method. Enhancement and segmentation methods are in the second stage. The third stage is feature extraction stage wherein the extracted features give the conclusion whether the detected nodule is cancer affected or not. Computer-aided nodule detection [33] is very important as the very small-sized nodules [16, 34, 35] may get missed by the radiologists and digital systems could identify great number of missed nodule candidates. Though all the potential candidates are tried to include by the system in the nodule detection stage, there could also be false positives [36] that results in wrong diagnosis or mis-diagnosis [35, 37–40]. Some CAD systems can still miss a few nodules though it can achieve a low false positive rate with high sensitivity in the false positive reduction stage. The CAD systems have difficulty in detecting variety of nodules as the pulmonary nodules have complex features [41] that include size, shape, calcification patterns and margin. Reduction in false positive results is regarded as very essential in the follow up procedure and this tries to eliminate fault diagnosis.
418
K. A. Nyni and J. Anitha
3 Image Enhancement Techniques Image enhancement is a method that sharpens the image features, that include edges and boundaries and makes the image more useful for display and analysis. Image enhancement block is shown next in Fig. 2. A common image processing method that accentuate significant information in an image while removing other information, increasing the quality of identification, is the image enhancement technique [30, 35, 41–52]. The Contrast Stretching is used to improve the overall visibility of an image where low contrast occurs due to low or poor lighting. The relative frequency of occurrence of gray levels in an image is represented by the Histogram of an image. By using this, it is helpful to stretch the low contrast levels of an image. Based on spatial operations, many enhancement methods are used where each pixel is replaced by the average of the neighbourhood pixels average. Spatial averaging method is used for smoothing and filtering of images. Transformation-based filtering transforms the frequencies or it smoothens the image by attenuating some frequencies to enhance others. In pseudo colouring method, each of the gray levels of black and white image is mapped into a particular assigned colour. By various enhancement methods, the image is made better than the original image for the application. In this study, five different enhancement strategies are used and are briefly discussed in the next section.
3.1 Histogram Equalization (HE) In HE, gray levels within an image are evenly distributed. Each gray levels in the image appear with equal chance. For improving the quality of the image, HE adjusts
Image Enhancement
Contrast Stretching and Histogram modification
Spatial averaging and Noise Filtering Fig. 2 Block diagram for image enhancement
False or Pseudo Coloring
Transform based Filtering
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
419
the brightness and contrast of dark or low-contrast images [53]. Towards the lower end of the grayscale, the image information would be packed to the dark end by skewing the histogram of the dark images. The image quality is improved and more uniformly distributed histogram is generated by re-distributing the gray levels. The histogram of an image with intensity levels in between 0 and I-1 is a discrete function and is represented by: hist(ik)=[mk]
(1)
where m is the number of pixels in the image with intensity i, and ik is the kth intensity value. The histograms are standardized by using total number of pixels in the image. A normalised histogram, assuming an X × Y image, is related to the probability of occurrence of mk in the image as given in Eq. (2). pro(ik) =
mk X×Y
(2)
3.2 Contrast Limited Adaptive Histogram Equalization (CLAHE) Adaptive Histogram Equalization (AHE) is a better HE method that applies HE to small portions in the image, thereby enhancing the contrast of each portion separately. Rather than varying the overall information, it adapts the edges and contrast in each section of the image to the distribution of pixel intensities. On the other hand, AHE may over emphasize the noise components [53] of the image. The images enhanced by CLAHE, on the other hand, appear more natural compared to HE enhanced images. It was discovered that certain regions were saturated when HE was used to X-ray images. The same approach as that of AHE was used by CLAHE to handle this particular issue, but inside the designated region, the threshold parameter limits the amount of contrast enhancement produced. First Red, Green and Blue (RGB) colour space of the original image is converted to Hue, Saturation and Value (HSV) colour space. Then CLAHE processes the value component of HSV without changing the Hue and Saturation components after which each gray level is reorganized to the cropped pixels. Decreased user-defined limit is applied to each pixel value, and the image is re-transformed to RGB from HSV colour space [27].
420
K. A. Nyni and J. Anitha
3.3 Image—Invert/Image Complement (II/IC) II/IC in binary image is a process of converting zeros to ones and ones to zeros. Or it is a process where the black colour is reversed to white and vice versa in an image. Here, the difference between original pixel value and maximum intensity value of 255 in an 8-bit gray scale image is calculated and pixel values for the new image is created using the result. The mathematical expression for image complement is given by: a = 255−b
(3)
where b and a are the original and new or converted picture intensity values. The lungs, being the region of interest, appear lighter, whereas bones appear darker with this approach. Among the radiologists, as this is a habitually utilized practise, this may also help deep networks in improved classification [54]. The flipped copy of the original image is the same as the histogram of the complemented image.
3.4 Gamma Correction (GC) GC [55] is performed on the pixels of the image and is a nonlinear operation. Linear operations like scalar multiplication, addition and subtraction are accomplished on individual pixels in picture normalization. Using the projection link between the pixel value and the gamma value, the pixel value is altered in accordance with the internal map in Gamma Correction and the quality of the image is improved. Let Pi be the pixel value having range [0, 255], θ be the angular value, be the value of gamma and g be the pixel’s grayscale value (gεPi). If gm is the midpoint of the range [0, 255], from group Pi to group θ, the linear mapping is defined as: φ : Pi → θ, θ = (ε/ε = φ(g), φ(g) =
πg 2gm
(4)
The mapping from θ to is defined as: h : θ → , = (γ |γ = h(g))
(5)
h(g) = 1 + f1(g) andf1(g) = acos(φ(g)
(6)
where weighted factor is a ε [0, 1] and according to the mapping, group Pi and group pixel values can be related. In relation to a given gamma value, arbitrary pixel value can be calculated. If we take γ (g) = h(g), the correction function of Gamma is given by:
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
g 1/ g Gr(g) = 255 255
421
(7)
where Gr(g) is the gray scale output pixel correction value.
3.5 Balancing Contrast Enhancement Technique (BCET) BCET is a method for enhancing [43], balancing contrast in visual data by using a histogram pattern. The parabolic function derived from the picture data and is used to arrive at the solution wherein the basic shapes of the histogram is not altered. Using a and b coordinates in an XY plane, the general parabolic functional form is defined as b = x(a − y)2 + z
(8)
where x, y and z are the three coefficients and are calculated using the lowest, maximum and mean of the input and output picture values in the following formulae: y= I pmax2 (Oimean − Oimin) − Iimean(Oimax − Oimin) + Iimin2 (Oimax − Oimean) 2[h(Oimean − Oimin) − Iimean(Oimax − Oimin) + Iimin(Oimax − Oimean)]
x=
(Oimax − Oimin) (Ipmax − Iimin)(Ipmax + Iimin − 2y)] z = Oimin − x(Iimin − y)2
(9) (10) (11)
where ‘Iimin’ is the minimal value of the input, ‘Ipmax’ is the maximal value of the input, ‘Iimean’ is the mean value of the input, ‘Oimin’ is the minimal value of the output, ‘Oimax’ is the maximal value of the output, and ‘Oimean’ is the mean value of the output pictures. The differences between the various image enhancing approaches are depicted in Fig. 3. The various enhancement methods and corresponding performance parameter, the mean square error (MSE) is depicted in Table 1.
4 Image Segmentation Techniques A method of separating an image into number of parts is called as image segmentation [27, 29, 56]. Segmentation offers a very supreme part in medical image processing. In computer-aided systems, after pre-processing, segmentation is done wherein certain features are separated or segmented and is fed to the next stage of image classification.
422
K. A. Nyni and J. Anitha
Image 1
(a)
(b)
(c)
(d)
(e)
Image 2
(a)
(b)
(c)
(d)
(e)
(d)
(e)
Image 3
(a)
(b)
(c)
Fig. 3 Various enhancement methods applied to five lung CT images and its histogram: a HE, b CLAHE, c BCET, d image invert and e Gamma Correction
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
423
Image 4
(a)
(b)
(c)
(d)
(e)
(d)
(e)
Image 5
(a)
(b)
(c)
Fig. 3 (continued) Table 1 Various Image enhancement methods with corresponding performance parameter for five different images Enhancement methods MSE for image 1
MSE for image 2
MSE for image 3
MSE for image 4
MSE for image 5
HE
51.23
50.58
56.3
46.87
50.52
CLAHE
49.98
47.713
51.934
49.269
50.41
BCET
50.12
49.06
53.42
49.98
49.96
Image invert/complement
48.55
47.73
50.21
47.32
49.27
Gamma Correction
46.26
45.45
41.2
49.04
47.54
424
K. A. Nyni and J. Anitha
Image Segmentation
Template Matching
Texture Matching Thresholding
Boundary Detection
Fig. 4 Block diagram for image segmentation
The segmentation is done using several methods, and the block diagram of which is shown in Fig. 4. There are various segmentation methods to extract different region of interest that includes tumour, lesion, boundary, etc. from the images. Different segmentation methods [57] were used and their accuracy levels were compared. The different segmentation methods used are as follows.
4.1 Adaptive Image Thresholding (AIT) The thresholding value for gray scale images is generated by the AIT method [58] that uses a locally adjustable method. The threshold value for this method is determined by the local mean intensity in the vicinity of each pixel. Sensitivity is a scalar inside the range that indicates how sensitive something is, and this approach calculates a factor sensitivity threshold value [27, 59].
4.2 Flood—Fill (FF) Segmentation In this method, similar gray value regions of the image is picked. The gray scale image is used to create areas of equal intensity levels. It specifies the seed pixel’s row and column indices as the beginning point. A binary mask, BW, that the function provides, indicate the pixel that are 8-connected to the seed pixel and have related intensities. Here, a reference image is selected, the mean intensity is calculated with other images and the scaling factor of which is determined [60].
4.3 Fast Marching (FM) Approach FM approach is used for binary picture segmentation. From one pixel to another pixel, the computer selects the spot with the shortest arrival time based on ongoing
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
425
computations. The FM approach is a simplified variant of level-set improvement. Here the differential equation is influenced by a positive speed term and the level-set contour that results, only gets bigger over time. In practise, this algorithm can be utilised as a powerful gateway [61].
4.4 Grayscale Intensity Difference (GSID) Based Segmentation In the input image, using the GSID-based segmentation, each pixel weights are determined. To determine the weights, the difference between pixel intensity and reference grayscale intensity is used. A minor difference will provide large weight value and when fluctuation is high, weight value would be low [62].
4.5 Watershed Segmentation (WS) WS is done using markers where the image I is treated as a height function in the technique. Higher estimates of I indicate the presence of constraints in the first data. Watersheds can thus be thought as the final or intermediate step in a hybrid division approach, with the age of the edge highlight map as the underlying division. The watershed method is restricted by an adequate marking function to avoid over segmentation [63]. Table 2 depicts the performance parameter being analysed from various segmentation methods.
5 Performance Evaluation In computer-aided detection and diagnosis systems, the performance evaluation [64] plays a very significant role. There are various performance indices, some which are briefly described.
5.1 Mean Squared Error (MSE) and Peak Signal to Noise Ratio (PSNR) MSE and PSNR compare the compression quality of the image. The cumulative squared error between the processed image and the original image is represented by MSE, whereas the measure of peak Signal to Noise Ratio gives the measure of peak
426 Table 2 Various image segmentation methods for image 1 with corresponding performance parameter
K. A. Nyni and J. Anitha Segmentation methods
Accuracy (%)
Adaptive image threshold
79.83
Flood-fill technique
76.03
Fast marching method
90.12
Gray scale intensity difference
83.56
Watershed segmentation
93.02
error. Lower the value of MSE, lower will be the error. To compute PSNR, first MSE is calculated using the equation shown next. MSE =
x,y [a(x,
y) − b(x, y)]2
x∗y
(12)
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
PSNR = 10 log10
F2 MSE
427
(13)
5.2 Accuracy, Sensitivity and Specificity 5.2.1
Accuracy
Differentiating healthy and affected images correctly is done by measuring the accuracy of the system. Accuracy is measured by calculating the true positive proportion in patients. Mathematically, it is given by, Accuracy =
5.2.2
TP + TN TP + TN + FP + FN
(14)
Sensitivity
In order to identify the affected cases, sensitivity is measured that calculates the proportion of true positives in patients. It is given mathematically as Sensitivity =
5.2.3
TP TP + FN
(15)
Specificity
Specificity identifies the healthy cases correctly by identifying the true negatives in healthy cases. It is given by, Specificity =
TN TN + FP
(16)
where TP is true positive, FP is false positive, FN is false negative and TN is true negative.
428
K. A. Nyni and J. Anitha
6 Conclusions In this paper, five different enhancement and segmentation methods are compared using five different lung CT images. In enhancement methods compared along with performance evaluation, Gamma Correction is observed to have less mean square error of 46.26 for image 1 and among the five segmentation approaches being compared with its performance parameter, watershed algorithm appear to deliver better accurate segmentation, with an accuracy of 93.02% for image 1. Early diagnosis of lung nodule is very crucial for increasing the life span of lung cancer patients. In future, powerful algorithms can be chosen and this paper can be escalated by applying various methods to infuse the needed features for the detection of lung nodules, prediction and classification of lung cancer.
References 1. Sung H et al (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J clin 71(3):209–249 2. Pilleron S et al (2021) Estimated global cancer incidence in the oldest adults in 2018 and projections to 2050. Int J Cancer 148(3):601–608 3. Cao W et al (2021) Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J 134(7):783 4. Piñeros M et al (2021) Scaling up the surveillance of childhood cancer: a global roadmap. JNCI: J Natl Cancer Inst 113(1):9–15 5. Siegel RL et al (2021) Cancer statistics, 2021. CA: A Cancer J Clin 71(1):7–33 6. Bade BC, Cruz CSD (2020) Lung cancer 2020: epidemiology, etiology, and prevention. Clin Chest Med 41(1):1–24 7. Ozdemir O, Russell RL, Berlin AA (2019) A 3D probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose CT scans. IEEE Trans Med Imaging 39(5):1419– 1429 8. Sebastian K, Devi S Computer aided detection of lung cysts using convolutional neural network (CNN). Turk J Physiotherapy Rehabil 32:3 9. Jungblut L et al (2022) First performance evaluation of an artificial intelligence-based computeraided detection system for pulmonary nodule evaluation in dual-source photon-counting detector CT at different low-dose levels. Invest Radiol 57(2):108–114 10. Park S et al (2021) Computer-aided detection of subsolid nodules at chest CT: improved performance with deep learning–based CT section thickness reduction. Radiology 299(1):211– 219 11. Hsu H-H et al (2021) Performance and reading time of lung nodule identification on multidetector CT with or without an artificial intelligence-powered computer-aided detection system. Clin Radiol 12. Perl RM et al (2021) Can a novel deep neural network improve the computer-aided detection of solid pulmonary nodules and the rate of false-positive findings in comparison to an established machine learning computer-aided detection? Invest Radiol 56(2):103–108 13. Choi SY et al (2021) Evaluation of a deep learning-based computer-aided detection algorithm on chest radiographs: case–control study. Medicine 100(16) 14. Hwang EJ et al (2021) Implementation of the cloud-based computerized interpretation system in a nationwide lung cancer screening with low-dose CT: comparison with the conventional reading system. Eur Radiol 31(1):475–485
Nodule Detection and Prediction of Lung Carcinoma in CT Images: …
429
15. Minaee S et al (2020) Deep-covid: predicting covid-19 from chest x-ray images using deep transfer learning. Med Image Anal 65:101794 16. Zheng S et al (2021) Deep convolutional neural networks for multiplanar lung nodule detection: improvement in small nodule identification. Med Phys 48(2):733–744 17. Gao J et al (2021) Lung nodule detection using convolutional neural networks with transfer learning on CT images. Comb Chem High Throughput Screening 24(6):814–824 18. Abbas A, Abdelsamea MM, Gaber MM (2021) Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl Intell 51(2):854–864 19. Shariaty F et al (2022) Texture appearance model, a new model-based segmentation paradigm, application on the segmentation of lung nodule in the CT scan of the chest. Comput Biol Med 140:105086 20. Yar H et al (2021) Lung nodule detection and classification using 2D and 3D convolution neural networks (CNNs). Artif Intell Internet of Things 2021:365–386 21. Chenyang L, Chan S-C (2020) A joint detection and recognition approach to lung cancer diagnosis from CT images with label uncertainty. IEEE Access 8:228905–228921 22. Tavassoli F (2003) Pathology and genetics of tumours of the breast and female genital organs. International Agency for Research on Cancer, World Health Organization 23. Raza K (2014) Clustering analysis of cancerous microarray data. J Chem Pharm Res 6(9):488– 493 24. Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214 25. Abdillah B, Bustamam A, Sarwinda D (2017) Image processing based detection of lung cancer on CT scan images. J Phys: Conf Ser. IOP Publishing 26. Verma S et al (2013) Analysis of image segmentation algorithms using MATLAB. In: Proceedings of the third international conference on trends in information, telecommunication and computing. Springer 27. Indumathi R, Vasuki R (2021) Segmentation of lung cancer from CT image—A comparative analysis. Mater Today: Proc 28. Huidrom R, Chanu YJ, Singh KM (2017) A fast automated lung segmentation method for the diagnosis of lung cancer. In: TENCON 2017–2017 IEEE Region 10 conference. IEEE 29. Uzelaltinbulat S, Ugur B (2017) Lung tumor segmentation algorithm. Procedia Comput Sci 120:140–147 30. Dutande P, Baid U, Talbar S (2021) LNCDS: A 2D–3D cascaded CNN approach for lung nodule classification, detection and segmentation. Biomed Signal Process Control 67:102527 31. Tiwari L et al (2021) Detection of lung nodule and cancer using novel Mask-3 FCM and TWEDLNN algorithms. Measurement 172:108882 32. Kaggle Homepage. https://www.kaggle.com/fanbyprinciple/luna-lung-cancer-dataset 33. Gu D, Liu G, Xue Z (2021) On the performance of lung nodule detection, segmentation and classification. Comput Med Imaging Graph 89:101886 34. Lin J-S et al (1996) Reduction of false positives in lung nodule detection using a two-level neural classification. IEEE Trans Med Imaging 15(2):206–217 35. Su Y, Li D, Chen X (2021) Lung nodule detection based on faster R-CNN framework. Comput Methods Programs Biomed 200:105866 36. Rey A, Arcay B, Castro A (2021) A hybrid CAD system for lung nodule detection using CT studies based in soft computing. Expert Syst Appl 168:114259 37. Haibo L et al (2021) An improved yolov3 algorithm for pulmonary nodule detection. In: 2021 IEEE 4th Advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE 38. Zhang Y et al (2021) Lung nodule detectability of artificial intelligence-assisted CT image reading in lung cancer screening. Artif Intell (AI) 6:7 39. Gürsoy Çoruh A et al (2021) A comparison of the fusion model of deep learning neural networks with human observation for lung nodule detection and classification. Br J Radiol 94:20210222 40. Wu L, Li X (2021) Research on early screening of lung cancer based on artificial intelligence. In: Proceedings of the 2021 international conference on bioinformatics and intelligent computing
430
K. A. Nyni and J. Anitha
41. Shi J et al (2021) Comparative analysis of pulmonary nodules segmentation using multiscale residual U-Net and fuzzy C-means clustering. Comput Methods Programs Biomed 209:106332 42. Tawsifur R et al (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images 43. Rahman T et al (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med 132:104319 44. Acharya UK, Kumar S (2021) Genetic algorithm based adaptive histogram equalization (GAAHE) technique for medical image enhancement. Optik 230:166273 45. Kaur K, Jindal N, Singh K (2021) Fractional Fourier Transform based Riesz fractional derivative approach for edge detection and its application in image enhancement. Signal Process 180:107852 46. Liu J et al (2021) Artificial intelligence-based image enhancement in pet imaging: noise reduction and resolution enhancement. PET Clin 16(4):553–576 47. Khairudin NAA et al (2021) Improvising non-uniform illumination and low contrast images of soil transmitted helminths image using contrast enhancement techniques. In: Proceedings of the 11th national technical seminar on unmanned system technology 2019. Springer 48. Wu Z, Zhou Q, Wang F (2021) Coarse-to-fine lung nodule segmentation in CT images With image enhancement and dual-branch network. IEEE Access 9:7255–7262 49. Murugesan M et al (2021) A Hybrid deep learning model for effective segmentation and classification of lung nodules from CT images. J Intell Fuzzy Syst 2021(Preprint), pp 1–13 50. Liang J et al (2021) Reducing false-positives in lung nodules detection using balanced datasets. Front Public Health 9:517 51. Lai KD, Nguyen TT, Le TH (2021) Detection of lung nodules on CT images based on the convolutional neural network with attention mechanism. Ann Emerg Technol Comput (AETiC) 5(2) 52. Akter O et al (2021) Lung cancer detection using enhanced segmentation accuracy. Appl Intell 51(6):3391–3404 53. Zimmerman JB et al (1988) An evaluation of the effectiveness of adaptive histogram equalization for contrast enhancement. IEEE Trans Med Imaging 7(4):304–312 54. Rao Y, et al (2021) Global filter networks for image classification. Adv Neural Inf Process Syst 34 55. Veluchamy M, Subramani B (2019) Image contrast and color enhancement using adaptive gamma correction and histogram equalization. Optik 183:329–337 56. Shaziya, H., K. Shyamala, and R. Zaheer. Automatic lung segmentation on thoracic CT scans using U-Net convolutional network. in 2018 International conference on communication and signal processing (ICCSP). 2018. IEEE. 57. Liu X et al (2021) A review of deep-learning-based medical image segmentation methods. Sustainability 13(3):1224 58. Trivizakis E et al (2021) A neural pathomics framework for classifying colorectal cancer histopathology images based on wavelet multi-scale texture analysis. Sci Rep 11(1):1–10 59. Atya HB et al (2021) Lung cancer TTFields treatment planning sensitivity to errors in torso segmentation. AACR 60. Roychoudhury A, Missura M, Bennewitz M (2021) Plane segmentation in organized point clouds using flood fill. In: 2021 IEEE International conference on robotics and automation (ICRA). IEEE 61. Savic M et al (2021) Lung nodule segmentation with a region-based fast marching method. Sensors 21(5):1908 62. Zhang K et al (2021) Nucleus image segmentation method based on GAN network and FCN model 63. Nurçin FV, Imanov E (2021) Selective hole filling of red blood cells for improved markercontrolled watershed segmentation. Scanning 2021 64. Pandian AP (2021) Performance evaluation and comparison using deep learning techniques in sentiment analysis. J Soft Comput Paradigm (JSCP) 3(02):123–134
Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm M. Goudhaman, S. Sasikumar, and N. Vanathi
Abstract Clustering problems with multiclasses and ambiguities have been handled by fuzzy clustering for decades in real-world applications. Among the most popular algorithms, FCM algorithm used fuzzy clustering techniques, and it is much simple to implement. Due of its initialization sensitivity, the Fuzzy C Means method can readily trap local optimal values. The cheetah chase technique (CCA) is a global optimization algorithm for solving a variety of optimization problems. We presented a hybrid method in this paper, optimization of fuzzy cheetah chase for clustering in fuzzy (FCCO) combines Fuzzy C-Means to provide benefit. While comparing the benefits of FCCO and FCM, recommended hybrid method is capable and motivating findings with the outcomes of our studies. Keywords Fuzzy Clustering · Fuzzy C Means method · Cheetah Chase Algorithm
1 Introduction Clustering technique is a widely used learning technique in ML that involves grouping of data objects into distinct sets called clusters. Objects in each cluster are more closely related to objects in other clusters. These approaches are used in data mining (Tan et al. [4]), machine learning (Alpaydin [1]), and pattern recognition (Webb [5]) and more areas. Clustering can be hard or soft. An example of hard clustering method is K-Means-Clustering that divides data items into N quantity of clusters, where N value determined by the application’s
M. Goudhaman (B) CSE, Saveetha Institute of Medical and Technical Sciences—[SIMATS], Chennai, India e-mail: [email protected] S. Sasikumar Faculty of CSE, Saveetha Engineering College, Chennai, India N. Vanathi Faculty of Science and Humanity, KCG College of Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_30
431
432
M. Goudhaman et al.
purpose. This method is unsuitable for real-world data sets with no clear boundaries between clusters. Following LotfiZadeh’s presentation of fuzzy theory, the researchers are focusing on the implementation of fuzzy logic in clustering. Partially allocating data objects to various clusters is possible using fuzzy algorithms. The degree of membership in a fuzzy cluster is determined by calculating the difference between the data object and the cluster’s center. FCM algorithm has been developed by Bezdek [2], and this has been the most extensively used Fuzzy clustering algorithm for the last few decades. The cheetah chase algorithm (CCA) (Goudhaman [3]) is an optimization algorithm for solving a variety of optimization issues. Here, we present FCM-FCCOA, a hybrid approach that uses FCM and FCCO algorithms. The FCM-FCCOA outperforms the FCM and FCCO algorithms in five real-datasets, according to experimental results. The structure for presenting the paper is formed below. The FCM is shown in the second section. The FCCOA for clustering is shown in the third section. Our proposed hybrid approach in fourth section and fifth section reports the experimental data. The suggested work’s conclusion is presented in the last section.
2 FCM Algorithm The FCM algorithm divides a group of n items A = {{a1 , a2 , a3 , . . . , an , } into c (value range 1 to n) fuzzy clusters with Z = {z1 , z2 , …, zc } cluster in RN space dimension. n × c matrix is made with c columns and n rows, where n denotes the count of object’s data and c denotes cluster’s count. The component in the ith row and jth column in α, which specifies the ith object membership function with jth cluster. The following are the characters of α αi j [0, 1]∀i = 1, 2, 3 . . . n, ∀ j = 1, 2, 3 . . . c c
αi j = 1, ∀i = 1, 2, 3 . . . n,
(1)
(2)
j=1
0
1), regulates the ambiguity of the generated clusters, d ij is object ai ’s distance in Euclid to the cluster center z j . The zj , is jth cluster’s centroid, is found using Eq. 6. n i=1
z j = n
αimj ai
i=1
αimj
(6)
The FCM is an iterative algorithm, and it works just like (Bezdek [2]): C Means with Fuzzy Algorithm 1. Membership function Initialization by values αi j , i equals 1 to n & j equals 1 to c for m (m > 1); 2. Find the central points of the cluster z j . 3. Calculate the distance in Euclid di j . 4. Membership function updated by αi j . αi j =
1 2 c di j m−1 k=1
(7)
dik
5. If it hasn’t converged yet, go to 2nd step. Many halting rules that can be employed. The method can be stopped when there is a small variation in the centroid value or when the goal function, Eq. (4), can no longer be minimized. The FCM method is prone to local optima because to its sensitivity to beginning values.
3 Algorithm of Cheetah Chase Goudhaman [3] proposed the cheetah chase optimization algorithm (CCOA), which is based on iterations and was enthused by the cheetah’s chasing to catch its prey with original velocity, speed and accelerations. The algorithm is based on a population of cheetahs pursuing prey whose positions reflect possible solutions to the intentional issue, with search space velocities set at random. The cheetahs’ movement velocities and positions are updated for each iteration in order to determine the best place for the search. In addition, the fitness value is used to calculate each cheetah’s position for each iteration. The pbest-best position in personal-and the gbest-best position in global are used to determine the speed of each cheetah movement. The cheetah’s velocity and position are updated using the equations below. B(q + 1) = i.B(q) + ( pbest(q)−P(q)) + (gbest(q)−P(q));
(8)
434
M. Goudhaman et al.
P(q + 1) = P(q) + B(q + 1)
(9)
where B and P represent the cheetah’s velocity and position, respectively. d 1 and d 2 are random values in the ranges from 0 to 1, i is inertia weight, k 1 and k 2 are acceleration co-efficient that control the impact of the searching process pbest value and gbest value of Cheetahs hunting process, and k 1 and k 2 are acceleration coefficient that control the impact of the searching process pbest value and gbest value of Cheetahs hunting process. For fuzzy clustering, the fuzzy cheetah chase optimization algorithm (FCCOA) is a novel proposed approach intended to describe fuzzy relationships among cheetah location and velocity data. P, the cheetah’s position in the FCCO algorithm, depicts the fuzzy association among a bunch of data items, o = {o1 , o2 , … on }, to the group of cluster center’s, Z = {z 1 , z 2 , … z c }. The following is a representation of X: X = α11 . . . α1c . . . αn1 αnc
(10)
With the constraints specified in (1),αi j i is the object’s affiliation function with cluster j, and (2). As a result, we can see that each cheetah’s position matrix is similar fuzzy matrix in the FCM algorithm. In addition, each cheetah’s velocity is expressed as a matrix with rows (n) and columns (c), having elements in the range of −1 to 1. We obtain Eqs. 11 and Eq. 12 as below the position and velocity updates of the cheetahs through matrix operations. B(q + 1) = i.B(q) + ( pbest(q)−P(q)) + (gbest(q)−P(q));
(11)
P(q + 1) = P(q) + B(q + 1)
(12)
Once the position matrix update done, constraints may be violated as given in Eqs. 1 and 2. So, the position matrix should be normalized. First hand, make the matrix negative elements to become none. Once done changed as zero, and should reexamined by sequence of arbitrary numbers between the ranges zero to one. Followed by the matrix should go through the subsequent conversion without altering the restrictions: c c c c α1 j . . . α1c α1 j . . . αn1 αn j αn j (13) Pnormal = α11 i=1
i=1
i=1
i=1
In FCCO algorithm, the fitness function needed to evaluate the generalized solutions. For calculating the generalized solutions, below equation is used. f (P) =
K Jm
(14)
Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm
435
K denotes constant and Jm denotes objective function of Eq. (4). Lesser is Jm , the effect of clustering will be better & the individual fitness f (P) will be higher. The FCCO algorithm described for the fuzzy clustering problem.
4 Hybrid Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm for Clustering Problem Algorithm 2. Fuzzy Cheetah Chase Optimization for Fuzzy Clustering A1. Parameter initialization including the size of population N, k 1 , k 2 , i, and the iteration limit. A2. Build a hunting process with N Cheetahs (B, P, gbest, pbest and n × c matrx). A3. Reset P, B, pbest for each Cheetah and gbest for the whole hunting process. A4. Compute the cluster center for every Cheetah using Eq. 6. A5. Compute the fitness value of every Cheetah using Eq. 14. A6. Compute pbest for every Cheetah. A7. Compute gbest for the whole hunting process. A8. Velocity matrix updated for each cheetah using Eq. 11. A9. Position matrix updated for each cheetah using Eq. 12. A10. If ending constraint is not met, go to A4. Maximum number of iterations in the proposed method and no development in gbest in several repetitions is given as ending condition.
4.1 Hybrid FCM with FCCO It necessitates fewer function evaluations, the FCM is quicker than the FCCO, although it frequently drops into local optima. FCM method was combined with the FCCO algorithm in this study to generate the FCM-FCCO hybrid clustering algorithm, which retains the benefits of both the FCM and CCO algorithms. Every number of generations, the FCM-FCCO algorithm put on FCM to the cheetah in the cheetah chase hunting process, improving the fitness value of each cheetah. For the fuzzy clustering issue, the FCM-FCCO algorithm steps as follows: Algorithm Step 1: Set FCCO and FCM parameters, including population size N, k 1 , k 2 , i & m. Step 2: With N Cheetahs, create a Cheetah Chase Hunting Process (P, B, gbest, bbest and n X c matrix). Step 3: Set up P, B, and pbest for each cheetah, along with gbest.
436
M. Goudhaman et al.
Step 4: The FCCO algorithm: Step 4.1: Step 4.2: Step 4.3: Step 4.4: Step 4.5: Step 4.6: Step 4.7:
Compute the centroid for each cheetah using Eq. (6). Using Eq. 14, calculate the fitness value of each cheetah. For each cheetah, calculate pbest. Calculate gbest for the entire cheetah pursuit hunt. Compute each cheetah’s velocity matrix using Eq. (11). Using Eq. 12, calculate the position matrix of each cheetah. Go to step 4 if the FCCO ending condition is not arrived.
Step 5: The FCM algorithm Step 5.1: Using Eq. 6, calculate the centroid for each cheetah Step 5.2: Using Eq. 5, calculate the Euclidian distance dij Step 5.3: For each cheetah, use Eq. 7. Find the association function lij, Step 5.4: For each cheetah, pbest value to be calculated. Step 5.5: gbest value to be calculated for the cheetah pursuit hunting process Step 5.6: Return step 5, if the FCM ending condition is not arrived. Step 6: Return step 4 if the FCM–FCCO ending state is not met.
5 Experimental Results 5.1 Setting Parameters The parameters are chosen in such a way that the FCCO and FCM–FCCO performance can be fine-tuned. Based on the experimental results, the following settings are used to run these algorithms: k 1 = k 2 = 2.0, N = 10, I = 0.9–0.1.
5.2 Results of Experiment Five different UCI machine learning data sets were used to evaluate the FCM, FCCO, and FCM-FCCO methods: • 625 cases with four attributes were collected for the balance scale data set, which was created to replicate psychological testing results. • Glass Identification, which comprises 225 objects and six different types of glasses, as well as ten qualities. • With 303 items and 75 types, the heart disease data set comprises 14 properties. • The Zoo data set comprises 17 features and consists of 101 objects of three different categories.
Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm
437
• The Ecoli data set, which has 336 objects divided into three types based on eight characteristics. These five data sets are all classified as low, medium, or high readings. These algorithms are written in the Python programming language. Table 1 summarizes the FCM, FCCO, and FCM-FCCO experimental outcomes. The values of the objective function are represented by the numbers in this table. Table illustrates that as compared to the other two approaches, the hybrid FCM–FCCO obtained greater results and can outflow with local optimum values. Similarly, the test results reveal that when the number of objects is small, FCCO outperforms FCM, but when the data set is larger, FCM outperforms FCCO.
6 Conclusion The FCM is sensitive to initialization and obtains a local optimum quickly. The cheetah chase algorithm, on the other hand, is a stochastic instrument that is used to address a numerous problems in optimization. The inadequacies of FCM are manifested here by combining them with the fuzzy cheetah chase optimization technique. The proposed hybrid FCM-FCCO method was proven to be effective and may disclose inspirational outcomes using five well-known data sets, including balance scale, glass identification, heart diseases, zoo, and Ecoli.
73.78
2236.1
12,193.8
3550.5
Glass Id (225, 6, 10)
Heart Disease (303, 75, 14)
Zoo (101, 3, 17)
Ecoli (336, 3, 8)
3536.8
11,990.5
2214.2
72.89
Bst
3518.8
11,684.5
2196.9
72.36
68.44
4192.2
12,252.2
2751.1
87.35
69.75
Wrst
Avrg 71.25
Wrst
72.45
FCCO
FCM
Balance scale (625, 3, 4)
Properties (n, c, d)
Table 1 FCM, FCCO, and FCM–FCCO experimental outcomes Avrg
4096.8
11,530.8
2724.5
86.95
67.42
Bst
4026.5
11,175.5
2704.8
86.28
66.36
3533.5
11,219.2
2220.2
73.22
63.96
Wrst
FCM-FCCO Avrg
3486.8
10,604.0
2198.8
72.66
63.55
Bst
3416.8
10,412.2
2182.8
72.33
63.19
438 M. Goudhaman et al.
Fuzzy C-Means and Fuzzy Cheetah Chase Optimization Algorithm
439
References 1. Alpaydin E (2004) Introduction to machine learning. MIT Press, Cambridge 2. Bezdek J (1974) Fuzzy mathematics in pattern classification. Ph.D. thesis. Cornell University, Ithaca, NY 3. Goudhaman M (2020) Cheetah chase algorithm (CCA): a nature inspired metaheuristic algorithm. IJET, Int J Eng Technol 7(3):1804–1811. https://doi.org/10.14419/ijet.v7i3.18.14616 4. Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston 5. Webb AR (2002) Statistical pattern recognition. John Wiley & Sons, ISBNs: 0-470-84513-9 (HB); 0-470-84514-7 (PB)
PASPP Medical Transformer for Medical Image Segmentation Hong-Phuc Lai, Thi-Thao Tran, and Van-Truong Pham
Abstract Medical Transformer (MedT) has recently attracted much attention in medical segmentation as it could perform global context of the image and can work well even with small datasets. However, there are some limitations of MedT such as the big disparity between the information of the encoder and the decoder, the low resolution of input images to effectively execute, and the lack of ability to recognize contextual information in multiple scales. To address such issues, in this study, we propose an architecture that employs progressive atrous spatial pyramid pooling (PASPP) to the MedT architecture, and pointwise atrous convolution layers instead of AvgPooling layers in MedT to make robust pooling operations. In addition, we also change the convolution stem of MedT to help the model to accept a higher resolution of input with the same computational complexity. The proposed model is evaluated on two medical image segmentation datasets including the Glas and Data science bowls 2018. Experiment results show that the proposed approach outperforms other state of the arts. Keywords Long-term dependency · Medical Transformer · Atrous spatial pyramid pooling
1 Introduction The main goal of medical image segmentation is to distinguish the valuable object concerned and the background of medical images. It has significance in medical diagnosis and disease prediction. Over the past decade, along with the remarkable evolution of convolutional neural networks and deep learning techniques [1, 2], constructing an automatic, fast, and precise medical image segmentation system [3, 4] is among the most challenging and interesting computer vision tasks. In [5], He et al. introduced H.-P. Lai · T.-T. Tran (B) · V.-T. Pham Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_31
441
442
H.-P. Lai et al.
the skip-connection mechanic [5, 6] which improves the gradient flow for deep neural networks and enables deeper layer models to learn better and converge. ResNet [5] model with skip-connection layers performed an outstanding result in ImageNet challenge. In particular, FCN [7] (fully convolutional network) came up with the idea of using an encoder-decoder architecture network with the encoder as a feature extractor. The encoder produces rich semantic information and spatial information from the input image then the decoder synthesis all information from high-level features to make predictions. Inspired by that, U-Net [8] and its variants are encoder-decoder network architectures [9, 10] specifically designed for medical image segmentation tasks. By reason of skip-connection mechanics, the information from the encoder is transferred to the decoder before executing deconvolution which prevents the U-Net model from lacking low-level information. Convolution is a core building block for these networks and others in computer vision. One of the most important properties of convolution is locality. Locality reduces the number of parameters and total operation for convolution. However, some works have shown that constructing long-term correlations from pixels in features is beyond the capacity of convolution. Convolution only incorporates a small local group of pixels from the input to make outputs so that it often focuses only on the local dependencies of those pixels instead of the non-local context of the entire image. To overcome the lacking global information, commonly used approaches are using atrous convolution [11, 12] to make a larger receptive field or the application of the attention mechanism [13] to force the model to notice the important part of the input. However, these methods have not really generated effective results in medical image segmentation due to the size of medical datasets. Attention [14] is the mechanism first used in Natural Language Processing (NLP) applications, it helps the model to focus on noticeable parts of the sequence input to make predictions. Transformers [15] were also introduced for the NLP task later, it found out that by stacking multiple self-attention layers together, the model has the ability to capture long-term relations between the input. Following the success of using transformers in NLP applications, there are some works [16] in the computer vision field exploiting transformers for image classification, object detection, semantic segmentation,…Applying naive transformers with image input required a huge amount of computational resources so they often were used to operate in layers near the end. There are suggestions for improvements of transformers in computer vision such as using a small local area to produce outputs in one pixel (like convolution) or using the axial attention mechanism [17]. Atrous convolution’s advantage over original convolution is the ability to extract denser features and make a larger receptive field with the same amount of computation. Due to the consequence of capturing multiple scales objects in the image, pyramid architectures [18] were particularly designed for object detection tasks. Inspired by that, there have been several works applying atrous spatial pyramid pooling (ASPP) [19] layer to deep neural networks as an alternative for pyramid architecture to reduce computational resources. ASPP is the merger of some atrous convolutions, each of them has different dilation rates. With different receptive fields, ASPP helps the model to scan almost all the objects in the image, which has value for capturing global contextual information from the image.
PASPP Medical Transformer for Medical Image Segmentation
443
Medical Transformer [20] which was introduced recently is the transformer-based architecture for medical segmentation. The superiority of MedT is the potential to perform global context of the image even with small datasets. However, there are some limitations of MedT such as the big disparity between the information of the encoder and the decoder, the low resolution of input images to effectively execute, and the lack of ability to recognize contextual information in multiple scales. Motivated by the above reasons, we upgrade MedT with several additions and the proposed model achieves dramatic improvements and outperforms the modern methods. We evaluate proposed model on two medical image datasets: (1) Gland segmentation on histology images [21]; (2) Nuclei segmentation of divergent images [22]. To summarize, in this work, out contributions are: • We replace decoder of MedT with stronger decoder to balance contextual information between the encoder and the decoder. We also change the convolution stem of MedT so as to help the model to accept a higher resolution of input with the same computational complexity • We use atrous convolution pyramid pooling to improve the capacity of the model to apprehend the global knowledge of the image • We replace Average Pooling with pointwise atrous convolution layer to perform robust pooling operation
2 Methodology 2.1 Related Work Self-Attention: Self-attention mechanism executes an given input feature map x ∈ RCin ×H ×W where H is the height, W is the width and Cin is the channles and produce output feature y ∈ RCout ×H ×W , yi j ∈ R Cout in the position (i, j) of output is computed by following equation: yi j =
W H
softmax(qiTj khw )vhw ,
(1)
h=1 w=1
where queries q = W Q x, keys k = W K x, and values v = WV x are all linear mappings of input feature map x. Matrices W Q , W K , WV ∈ RCout ×Cin are trainable parameters. According to the Eq. (1), the self-attention for image features has two main disadvantages that are the computational complexity and the position of pixel in image features. The computational complexity for an input feature x ∈ RCin ×H ×W is proportional to H × W and it will become very expensive if the resolution of input feature are quite large and it is infeasible to apply self-attention for vision model architecture. The positional information often has its own value for constructing texture of image
444
H.-P. Lai et al.
Gated Axial Attention: Due to making feasible and efficient self-attention layers for vision model architecture, there have been several improvements for self-attention in embedding positional information and reducing the complexity but still has a large receptive field [17]. Specifically, axial attention introduced in [17] consecutively applies the self-attention mechanism in height-axis and width axis of feature map input to decrease computational complexity and still propagate information from entire feature map inputs. One axial attention layer has a capacity to achieve long-term dependencies because it employs two self-attention layers. The first layer implements self-attention on the height axis and the second one executes on the width axis. Each width axis module and height axis module is similar to a one-dimensional selfattention module and augmented by relative positional information. The relative positional encodes are used for all queries, keys, values and learned through the training time, and they enable modules to capture the texture and spatial information of the image. The output of axial attention module along the width axis is computed by following equation (similar equation for height axis): yi j =
W
T k v softmax(qiTj kiw + qiTj riw + kiw riw )(viw + riw ), q
(2)
w=1 q
where the learnable ri j ∈ RCout is relation positional encodes for queries, the learnable k v riw ∈ RCout is relation positional encodes for keys, the learnable riw ∈ RCout is relation positional encodes for values. However, for medical image segmentation, to control the impact of positional encoders precisely, JMJ Valanarasu et al. [20] proposed Gated Axial attention using learnable parameters for positional bias . Because of the small datasets, it is problematic for the model to learn positional components exactly. On the assumption where learnable positional information is not conducive to the prediction of the model, the model will adjust additional parameters to reduce the impact of positional encodes. The output of axial attention module along the width axis is computed by following equation (similar equation for height axis): yi j =
W
T k v softmax(qiTj kiw + G Q qiTj riw + G K kiw riw )(G V 1 viw + G V 2 riw ), q
(3)
w=1
where G Q , G K , G V 1 , G V 2 are learnable parameters for queries, keys, and values, respectively. Local-Global Training Strategy JMJ Valanarasu et al. [20] proposed a network that uses two branches, i.e., global branch which captures global context, and the local branch which focuses on local information. The local branch network divides the original image with size S × S into 16 small patches of size S/4 × S/4, then those patches are processed by a share weight network. Output feature maps for all patches are concatenated by their original positions to produce the output feature map for the local branch. We use the gated axial attention layer to build the global branch network. We also preserve the original size of the input image and reduce the number
PASPP Medical Transformer for Medical Image Segmentation
445
of transformer layers to prevent expensive computations. Through experimentation, it has been shown that the global branch network only needs a few blocks of gate axial transformer to be able to accomplish long-term connections between pixels. The output feature maps of global branch and local branch are summed up and passed through one adjustment layer to predict the segmentation mask.
2.2 Proposed PASPP Medical Transformer Inspired by the Medical transformer (MedT) architecture [20] and advances of deep learning techniques, we propose an approach for better segmentation performance. In more detail, for increasing the ability to identify objects of different sizes, we propose using additional Progressive Atrous Pyramid Pooling (PASPP) [23] layers to the MedT. In addition, we use the pointwise atrous convolution layers instead of AvgPooling layers in MedT to make robust pooling operations. First, to enhance the ability to identify objects with varying sizes, we use the PASPP layers in our model. This stems from the fact that, in medical image segmentation tasks, the interset objects in the image have various sizes so the model has to have different receptive fields to understand the global context of the image. The structure of Progressive Atrous Pyramid Pooling with 4 atrous convolution layers is
Fig. 1 Progressive atrous pyramid pooling with 4 atrous convolution layers
446
H.-P. Lai et al.
illustrated in Fig. 1. To attain this desire, we use PASPP [23] in our model in the global branch network and before the adjustment layer. We feed input feature Fin of size C × H × W into four convolutional layers with 1 × 1 kernel size in parallel to get four ouput features with size C/4 × H × W . Then each branch passes the feature through four atrous convolutional layers and should notice that the dilation rate of each layer is different and depends on the input feature map size. The current activation map is processed by two convolutional layers and skip-connection paths to produce output feature Fout . Since we use feature maps with a smaller number of channels, the amount of computation is significantly reduced compared to normal ASPP. For the pointwise atrous convolution layer, it is a combination of two consecutive convolutional layers, the atrous convolutional layer and the pointwise convolutional layer. The atrous convolutional layer with dilation rate equals 2 executes spatial convolution for each input, then pointwise convolution is used to integrate the output from the atrous convolutional layer to produce output activation maps. In this work, we proposed using pointwise atrous convolution with dilation rate of 2 and stride of 2 for the network to perform pooling operations. It has the advantage of increasing the receptive field and helping the model achieve better results. In this study, we propose improvements for the MedT architecture utilizing the PASPP layers and the pointwise atrous convolution layer described above. The structure of the proposed PASPP MedT is shown in Fig. 2. In the proposed architecture, the
Fig. 2 Demonstration of the proposed PASPP MedT. a Network architecture which uses LoGo strategy for training overview. b Proposed decoder which contains two spatial convolutional layer and dropout with drop rate of 0.1. c Gated axial multi-head transformer blocks are the basic building block of the encoder
PASPP Medical Transformer for Medical Image Segmentation
447
input of each branch is passed through convolutional stem blocks. These blocks will consist of 3 convolutional layers, after each convolutional layer, we insert batch normalization layer and activation function. In the encoder of the local branch, we will use the transformer layer built from the axial attention layer which does not incorporate positional embeddings, whereas in the global branch, the gated axial transformer is used instead. The decoder of the original MedT uses only one convolution block. To prevent the information imbalance between the encoder and decoder of the original MedT, we strengthen the decoder by using two convolution blocks in the decoder followed by batch normalization and activation. The encoder bottleneck comprises one convolutional layer with 1 × 1 kernel size followed by normalization layer and one gated axial multi-head transformer block which performs on the height axis and operates on the width axis consecutively. We also use a pointwise separable convolutional layer with a dilation rate of 2 and stride of 2 to replace the AvgPooling layer of the original MedT. We use SiLU activation for all activation functions in the model. The global branch of MedT includes 2 blocks of encoder and 2 blocks of decoder, whereas the local branch is built from 5 blocks of encoder and 5 blocks of decoder.
2.3 Loss Function We use Sorensen-Dice (Dice) loss and binary cross-entropy (BCE) loss between the prediction mask and the target mask as objective function, which is expressed as: LBCE ( p, y) =
t 1 i=1
t
(yi log( pi ) + (1 − yi )log(1 − pi )),
t yi pi LDICE ( p, y) = 1 − t i=1 y i=1 i + pi L( p, y) = α1 LBCE ( p, y) + α2 LDICE ( p, y)
(4) (5) (6)
where yi and pi are the value of the pixel in target mask and the confidence score of the pixel in prediction mask at position ith, respectively, t is the total number of pixels in each image. In our training phases, we set α1 = α2 = 0.5
3 Experiments and Results 3.1 Datasets We use Gland segmentation and the Data Science Bowl 2018 dataset to evaluate our method. Gland segmentation dataset contains 165 microscopic images and the corresponding target mask annotations. In this work, we split GlaS dataset into 85
448
H.-P. Lai et al.
training images and 80 testing images. The Bowl dataset consists of a training set of 671 nuclei images along with target masks annotations for each image. Because the annotation masks for the test set are not available, we only assess the performance of the proposed method based on the training set. In this work, the Bowl training set is split into 534 training images and 137 testing images. Because of the different size of the images in these datasets, every image was resized to a resolution of 256 × 256 for all experiments.
3.2 Implementation Details We implemented PASPP MedT with Pytorch on NVIDIA Tesla P100 16GB GPU. We use two version of our proposed model which one called PASPP MedT-O is the model using two PASPP blocks and the other called PASPP MedT-G is the model using only PASPP blocks in global branch. We augment images by applying
Fig. 3 Prediction masks on test images from GlaS. The highlighted red area shows where propose methods perform more accurately than MedT in constructing long range dependencies
PASPP Medical Transformer for Medical Image Segmentation
449
horizontal flipping, vertical flipping, and random rotating methods. We do not use any pre-trained weights to train the proposed PASPP MedT. The proposed model was trained by using Adam [24] optimizer with step size of 0.001 and stochastic weight average method [25] with step size of 0.0001 after epoch 100. The model was trained for 400 epochs. We use the dice coefficient (Dice) and intersection over union (IoU) to evaluate the performance.
3.3 Representative Results We provide predictions from MedT and our proposed methods on the both datasets in Figs. 3 and 4. As shown in the figures, our methods can achieve better performance than MedT, providing more precise boundary maps for the predicted regions.
Fig. 4 Prediction masks on test images from Data Science Bowl 2018. The highlighted red area shows where propose methods perform more accurately than MedT in constructing long range dependencies
450
H.-P. Lai et al.
3.4 The Importance of PASPP We indicate the consequence of using different size receptive fields in the transformer model to improve prediction quality and compare the variants of atrous spatial pyramid pooling. We set the dilation rate for each atrous convolution layer in PASPP based on the input feature map size. In the proposed method, dilation rates of PASPP in the global branch equal (1, 6, 12, 18) and rates for PASPP using at the end of the model equal (1, 16, 32, 48). The results demonstrated in Table 1 have shown that by incorporating PASPP, it is possible to significantly improve the quality without having to use too much additional parameters.
3.5 Evaluation The quantitative outcomes from proposed method are demonstrated in Tables 2 and 3. To assess the efficiency of PASPP MedT, we compare its prediction quality with different models. From the results obtained from Tables 2 and 3, it has been shown that PASPP MedT models give the most worthy results on both datasets, on GlaS with DSC (88.74%), IoU (79.92%) and also on 2018 Data Science Bowl with DSC (92.81%) and IoU (86.62%). The proposed model with PASPP also shows superiority over the original MedT model, which shows that the proposed model has the ability to capture global context well even with small datasets. We indicate other fundamental observations that:
Table 1 Comparisons between the variants of atrous spatial pyramid pooling on Gland segmentation dataset Method Size (M) DSC IoU MedT original [20] Using ASPP [26] Using PASPP (Ours)
2.06 2.2 2.09
82.93 84.98 88.74
70.95 73.98 79.92
Best results are in bold Table 2 Comparisons with various method on Gland segmentation dataset Method Year DSC IoU FCN [7] U-Net [8] Res U-Net [27] Axial attention Unet KiU-Net [28] PASPP MedT-O PASPP MedT-G Best results are in bold
2015 2015 2020 2020 2020 2022 2022
66.61 77.78 78.83 76.30 83.25 88.74 87.31
50.84 65.34 65.95 63.03 72.78 79.92 77.63
PASPP Medical Transformer for Medical Image Segmentation
451
Table 3 Comparisons with various method on Data Science Bowl 2018 Dataset Method Year DSC IoU FCN [7] U-Net [8] Res U-Net [27] PraNet [29] TransAttUnet [30] MedT [20] PASPP MedT-O PASPP MedT-G
2015 2015 2020 2020 2021 2021 2022 2022
89.39 75.73 78.83 83.34 91.62 91.72 92.75 92.81
80.87 91.03 65.95 71.49 84.98 84.75 86.53 86.62
Best results are in bold
• The transformer-based PASPP MedT can perform better results than the SOTA baselines on GlaS and 2018 Data Science Bowl. These improvement can be explained by the use of PASPP for transformer-based models. Using PASPP allows the model to be flexible in using receptive field so it can obtain the information from the entire image and capture global context.
4 Conclusion In this study, we propose to use a combination of PASPP and transformer architecture. Our experimental outcomes on 2 datasets GlaS and Data Science Bowl 2018 exhibit that the proposed PASPP MedT has outstanding performance. The proposed model shows that it has a better ability to accomplish complex long-term relations than other methods. Acknowledgements This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005.
References 1. LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks, vol 3361 2. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 3. Pham V, Tran T, Wang P, Lo M (2021) Tympanic membrane segmentation in otoscopic images based on fully convolutional network with active contour loss. Signal Image Video Process 15:519–527 4. Trinh M, Nguyen N, Tran T, Pham V (2022) A deep learning-based approach with imagedriven active contour loss for medical image segmentation. In: Proceedings of international conference on data science and applications, pp 1–12
452
H.-P. Lai et al.
5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 6. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, pp 630–645 7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431– 3440 8. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241 9. Pham V, Tran T, Wang P, Chen P, Lo M (2021) EAR-UNet: a deep learning-based approach for segmentation of tympanic membranes from otoscopic images. Artif Intelli Med 115:102065 10. Zhou Z, Siddiquee M, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11 11. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. ArXiv:1412.7062 12. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2017) DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Analy Mach Intell 40:834–848 13. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612 14. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057 15. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 16. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16 × 16 words: Transformers for image recognition at scale. ArXiv:2010.11929 17. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L (2020) Axial-DeepLab: stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision. pp 108–126 18. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125 19. Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818 20. Valanarasu J, Oza P, Hacihaliloglu I, Patel V (2021) Medical transformer: gated axial-attention for medical image segmentation. ArXiv:2102.10662 21. Malìk P, Krištofìk Š, Knapová K (2020) Instance segmentation model created from three semantic segmentations of mask, boundary and centroid Pixels verified on GlaS dataset. In: 2020 15th Conference on computer science and information systems (FedCSIS), pp 569–576 22. Rashno A, Koozekanani D, Drayna P, Nazari B, Sadri S, Rabbani H, Parhi K (2017) Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans Biomed Eng 65:989–1001 23. Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z (2020) COVID-19 chest CT image segmentation-a deep convolutional neural network solution. ArXiv:2004.10987 24. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. ArXiv:1412.6980
PASPP Medical Transformer for Medical Image Segmentation
453
25. Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson A (2018) Averaging weights leads to wider optima and better generalization. ArXiv:1803.05407 26. Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. ArXiv:1706.05587 27. Jha D, Smedsrud P, Riegler M, Johansen D, De Lange T, Halvorsen P, Johansen, H (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255 28. Valanarasu J, Sindagi V, Hacihaliloglu I, Patel V (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations. In: International conference on medical image computing and computer-assisted intervention, pp 363–373 29. Tomar N, Jha D, Riegler M, Johansen H, Johansen D, Rittscher J, Halvorsen P, Ali S (2021) FANet: a feedback attention network for improved biomedical image segmentation. ArXiv:2103.17235 30. Chen B, Liu Y, Zhang Z, Lu G, Zhang D (2021) TransAttUnet: multi-level attention-guided U-Net with transformer for medical image segmentation. ArXiv:2107.05274
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things Based Wireless Sensor Networks S. L. Prathapa Reddy , Poli Lokeshwara Reddy , K. Divya Lakshmi, and M. Mani Kumar Reddy
Abstract The Internet of Things (IoT) permits several objects to connect with one another by means of the Internet without human interaction. The Wireless Sensor Network (WSN) denotes a set of small sensor nodes placed densely in those places that cannot contain wired cables and is taken to be an example of the IOT focusing on the manner in which the sensors are able to monitor in various domains. The primary challenge of the WSN was to maintain the lifetime of the network without any need for replacing the network nodes. Every sensor node that contains sensing, communicating components and data processing wherein the node normally shares the medium with all other nodes in its range of communication creates Medium Access Control (MAC). This is a very important issue for confirming whether the network is functional or not. This work presents a fuzzy with Particle Swarm Optimization (PSO) algorithm to improve the performance of MAC in WSN. The results of proposed approach reveal that it provides better efficacy compared to other approaches. Keywords Internet of Things · Wireless sensor network · Medium access control · Fuzzy · Particle swarm optimization algorithm
S. L. Prathapa Reddy (B) · K. Divya Lakshmi Department of Electronics and Communication Engineering, K.S.R.M College of Engineering, Kadapa, Andhra Pradesh, India e-mail: [email protected] M. Mani Kumar Reddy Department of Electronics and Communication Engineering, JNTUA College of Engineering Pulivendula, Pulivendula, Andhra Pradesh, India P. Lokeshwara Reddy Department of Electronics and Communication Engineering, Anurag University, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_32
455
456
S. L. Prathapa Reddy et al.
1 Introduction The Internet with its magical power of connecting and unrivaled popularity has made the entire world as one global village. Recently, there has been a dramatic increase in the prevalence of smart phones that has boosted the wireless network and the Internet accessibility has gone up several notches. Also, this popularity has resulted in advancement among mobile technologies such as the Internet of Things. It includes physical objects that are connected with each other and are capable of collecting data and sharing them with each other. The objects also have several sensors that have been embedded within them thus ensuring the collection of data. At the same time using Radio Frequency Identification (RFID) devices the objects are identified and tracked automatically. Using the existing infrastructure of the Internet, these objects can interoperate and communicate with each other. Smart retails, smart industries, connected cars, smart cities, smart supply chains, smart homes, logistics and transportation are some applications that benefit from this technology. The architecture of the IoT may be considered to be a system that may either be physical or virtual or even a hybrid of both that contains several active physical objects enterprise layers, developers, users, communication layers, IoT protocols, cloud services, actuators and sensors. There are certain architectures that act to be a pivot component of infrastructure that is IoT specific and may facilitate a systematic approach made toward the components that were dissimilar thereby resulting in finding solutions to all related issues. Today, there is a well-defined type of IoT architecture that is available for the purpose of knowledge. They also use interfaces that are intelligent and can be integrated seamlessly within the information network. The IoT components were organized into four different layers. The first one was the sensing layer in which all the sensors, the RFIDs and the WSN exist. The information that was generated by the layer was gathered by means of the aggregation which is Layer 2. There are different kinds of aggregators that exist dependent on sensing devices found in the primary layer. The sensor networks which are wireless consists of one or more such nodes of sink for the purpose of data gathering and uploading the same. The sensors operating independently may broadcast data to the smart phone population or certain specialized gateways existing within their regions. The aggregators will process such data in a direct manner or forward them to the other nodes found in Layer 3. Once the data gets processed, it is uploaded to the cloud which will be used by vast users through the Internet which is Layer 4. Generally, IoT falls under a networking environment that is quite diverse in the case of devices and applications. But one very challenging issue is the heterogeneity of the IoT and this is expected to continue with a scope that is unprecedented. For instance, it has a sensing layer that is supposed to use various technologies like ZigBee and Bluetooth Low Energy (BLE). In layers 2 and 3, there are several technologies of transmission that are used like Wi-Fi or 3G/4G that ensures all persistent connectivity. For supporting such technologies of transmission, the operators may make use of components from different vendors that complicate management thereby reducing their interoperability. Also, the operators and providers have been implementing
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things …
457
networks and solutions to server virtualization for maximizing resource utilization that introduces significant problems of management. Furthermore, the challenges found in Layer 4 also include the manner in which they can deploy the services and components in an efficient and fast manner. WSNs will enable the development of applications along with their monitoring capacities which is a combination of sensor nodes with constrained capacity of storage and energy. The data to other nodes in network will be transmitted by the sensor nodes in wireless mode. Additionally, by executing a new and collaborative function it permits sensor nodes will represent another huge challenge in the WSN since, in most of these cases, the nodes will be deployed in the same location that has hard access which may be infeasible for replacing the batteries feeding them. The IoT and its complexity can be higher than in the case of WSNs, and this will have plenty of objects. Owing to this reason, it will have a larger scale. The concept of dynamic routing may not be suitable for large scale areas. Such sensor transmission can be affected owing to various factors such as interference, temperature and humidity of air. Thus, the WSN architectures which has dynamic routing may not be usable for networking on a bulk scale. The sensor nodes which are tiny have certain limitations like low power, less memory and limited battery energy. The sensor nodes will need information with regard to their location which has to be exchanged using dynamic routing. As a consequence, the nodes will consume additional power owing to the increase in overheads. The existence of network of systems in IoT depends on its energy consumption. Saving of energy can be a crucial aspect to the WSN [1, 2]. The sensors will spend their energy at the time of transmission or receiving of messages. The battery will act in the form of the primary supply of power to the sensor nodes. For the IoT applications, a large number of sensor nodes will be placed within the areas in which the users may not be reached easily. It may not be possible to replace batteries of such nodes that are arranged in the areas that are not within the reach of users. For the purpose of avoidance of battery drain, the sensor nodes that are energy saving will have to be considered. Therefore, an efficient scheme for saving energy is deployed. The energy-efficient IoT-based WSN will be conducted in various other ways like routing protocols, data aggregation-based clustering, device scheduling dissemination and data storage. Scheduling can be a very efficient approach to Smart Devices in making them into Disjoint Subsets (DSs) for which every DS will have to completely cover the objects. Thus, the higher the DS, the more it is able to discover the exploitation of excessing information for an IoT-based WSN, with a long time of operation. Today, fuzzy [3– 5] logic has emerged to be the consequence of a fuzzy set theory proposed by Lofti Zadeh that appeared for the first time in the article of 1965. Fuzzy logic can be called a numerous-valued logic which is extracted from theory of fuzzy set. Fuzzy logic does not require any detailed inputs and is hardy as well. But the complexity of this system will increase in a rapid manner with more outputs and inputs. This type of intelligent optimization algorithm will have optimization performance which is global. It is well-suited for different types of parallel processing and has a strict theoretical base. The optimal solution or its approximate optimal solution may be found at a particular time. This optimization algorithm has several modes of calculation like
458
S. L. Prathapa Reddy et al.
the Firefly Algorithm (FA), Leapfrog Algorithm, Particle Swarm Optimization, Fruit Fly Algorithm and Genetic Algorithm (GA). For solving different complex issues, there was a large variety of modes of computing which was mixed for maximization of power of computing. In this work, the fuzzy optimized PSO algorithm is proposed for IoT-based WSN.
2 Related Work Hasan et al. [6] had made a proposal of a new bio-inspired Particle Multi-Swarm Optimization for construction and recovery of the k-disjoint paths to tolerate failure and satisfies the parameters of Quality of Service. This multi-swarm strategy will enable the determination of all optimal directions in choosing multipath routing simultaneously while exchanging of messages from several network areas. Zhang et al. [7] had proposed another novel and intelligent localization algorithm that is based on computing for the IoT. This will combine the features of both Invasive Weed Optimization (IWO) and the Simplified Quadratic Approximation (SQA). It had very strong levels of robustness for ranging errors of communication and better practicability. Hajjej et al. [8] focused on the problems of placing sensor nodes in the networks. A novel methodology based on the Multi-Objective Flower Pollination Algorithm (MOFPA) had been specified aiming to approximate all optimal trade-offs. These were the enhancement of coverage, reduction of energy dissipation of network, maximization of the lifetime of the network and finally maintenance of connectivity. Pau and Salerno [9] had made an introduction of another method that was fuzzy-based. The work was executed on the off the shelf hardware. The initial task in this was achieving practical results to reach the target and at the same time bypassing computationally expensive solutions. The results that were retrieved proved the proposed method to have outperformed all other solutions to a significant level thereby extending the lifetime of the battery powered devices with a satisfactory level of values of a ratio of throughput to workload. Raguraman and Ramasundaram [10] had made a new proposal of an approach that was fuzzy-driven which was embedded with a Dimensionality-based PSO (DPSO) algorithm. The Adaptive Neuro Fuzzy Inference System (ANFIS) had been developed for investigating such irregularities along with a distinct set of such rules that were framed in its training phase for choosing suitable attenuation exponent values. This proposed algorithm was applied to anisotropic environments as well. This model outperformed in localization instances for three different test cases that contained various trajectories in which the target node’s path was chosen randomly. There was a fuzzy logic engine that was developed for every node for decreasing the message transmissions. The consequence of this was an improvement in battery life.
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things …
459
3 Methodology The fuzzy logic system is applied on nodes to reduce the number of message transmissions which will help increase the battery life. In this section, the MAC protocol and fuzzy with PSO algorithm are discussed.
3.1 Medium Access Control (MAC) Protocol The primary functions of the MAC layer will be delimiting of the frame and its recognition, transferring data, addressing, protection of errors and access arbitration to mono channel that is dispensed by all the nodes. The MAC layer protocols are efficient in terms of energy for maximizing lifetime. In addition to this, the protocols will have to be scalable in accordance with the size of the network and will have to adjust to all modifications in the network. The MAC protocols are divided into two groups in accordance with the method that is used for managing its medium access. They are based on contention and schedule [11]. IEEE defined 802.15.4e MAC protocol for embedded systems used for various applications of IoT overcomes the drawbacks of 802.15.4 MAC protocol, with regard to consumption of energy, reliability and delay. It provides low power consumption, less duty cycle, frequency hopping, communications with multiple channels and dedicated slotted access. Since various IoT applications will have a low frequency of reporting like smart meters wherein 802.15.4e MAC permits a very less duty cycle which is 1% or even lower. It also supports diverse objectives [12].
3.2 Fuzzy Logic System The Fuzzy logic systems can be used in undefined and complex operations managed by operators that do not have any knowledge on the underlying dynamics. The concept of Fuzzy Logic Controllers (FLC) was the incorporation of all types of expert experience of human operators in designing a controller to control the entire process that had relation which is defined by a set of fuzzy control rules [13]. All basic FLC functions consist of a fuzzifier with a fuzzy rule base, inference engine and a defuzzifier. The fuzzifier is responsible for the transformation of crisp and measured data into linguistic values and the fuzzy base rule provides the basic ideology regarding various processes of operations. The inference engine is used for the aggregation of IF–THEN rules that forms output which continues to be a fuzzy membership function at the same stage which is changed into a new crisp value that has to be processed within the plant and this will be the responsibility of the defuzzifier.
460
S. L. Prathapa Reddy et al.
3.3 Particle Swarm Optimization (PSO) Algorithm This algorithm will simulate the groups and their social behavior like flocks of birds for achieving solution for a multiple variable function which may be optimal or near optimal, which is found in the continuous space of exploration. This was an approach that was population-based wherein the swarm of particles which is a collection of individuals that have position denoting solutions. The performance of the particle was based on the position and has been evaluated with a cost function which varies from problem to problem. The initial position of the particle is assigned in a random manner for every iteration and their movement found in space of search was affected by that of its current best position known as the personal best. This is dependent on its current best position for all the particles in the swarm [14], known as the global best. Taking a single swarm with a size K into consideration that is completely connected in N dimensional search space with a velocity and position of particle is specified as follows: vk,n (t + 1) = wvk,n (t) + c1 r1 ( pk,n (t) − xk,n (t)) + c2 r2 (gn (t) − xk,n (t)) xk,n (t + 1) = xk,n (t) + vk,n (t + 1)
(1) (2)
where: 1 ≤ k ≤ K and 1 ≤ j ≤ N . xk (t) and vk (t) denote the vectors for both position and velocity of that of the kth particle at tth time step, pk (t) denote the kth particles personal best position and the tth time step and g(t) denotes the particles and their global best at tth time step, r 1 and r 2 denote random numbers that are distributed uniformly within the range [0, 1] and w specifies the inertia weight and c1 , c2 specifies cognitive coefficients. Identified are several techniques in this setting of parameters even though simplifying the implementation of the algorithm can be fixed to a particular constant value. In accordance with Eq. (1), every particles velocity will be adjusted on the basis of an inertial component that outlines the actual velocity and a cognitive component along with a social component. In the initial stage, both position and velocity for the swarm were randomly initialized. For every such particle, the cost function value was assessed to ensure the globally best position for the swarm is found. At every iteration both parameters were updated in accordance with Eqs. (1) and (2). Iterations are stopped in case the termination criterion was met. The position that is the globally best in the last iteration is the solution.
3.4 Fuzzy Optimized PSO Algorithm In the FLC, the membership functions [15] for both inputs and outputs will be three (i.e., Low, Medium and High). The triangular membership functions, as shown in
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things …
461
Fig. 1 Triangular membership functions
Fig. 1, are taken into consideration and were introduced to this work and its goal was to optimize them using the PSO. Membership functions were expressed as per Fig. 1 and optimization will have to be simple which does not involve large loads of computation. For this, the parameters a L , b M and c H , for that of inputs and outputs were fixed. The consequence of this was that the PSO optimized all eighteen function parameters. Arranging the general particle for inputs and outputs is specified as: |c L b L a M c M a H b H |
(3)
While examining Fig. 1, it becomes important to be able to specify that the introduced PSO for this work had six parameters which are used to optimize both inputs and outputs to satisfy the rules specified below: (1) a L < c L < b M (2) a L < b L < c L (3) a L < a M < c L (4) b M < c M < c H (5) b M < a H < c M (6) a H < b H < c H
(4)
In this PSO algorithm and its operation, the constraints shown in (4) had to be checked for every repetition. At the same time, the PSO was supported using a proportional method that aimed at decreasing the cost of computation in a way that the speed of convergence was enhanced. This was valuable as it examined the nth position of Kth particle at tth iteration for understanding the PSO algorithm proposed. The subsequent iteration which needs to satisfy is specified below: xk,n (t) ∈ [Ak,n (t + 1), Bk,n (t + 1)]
(5)
where Ak,n (t + 1) and Bk,n (t + 1) were refreshed taking into consideration the sequence shown in (4) and this will be useful in remarking that if needed they can be changed. The PSO algorithm had presented the work as below: 1. In case the interval [Ak,n (t + 1), Bk,n (t + 1)] does not have position xk,n (t), then it will be specified as:
462
S. L. Prathapa Reddy et al.
If xk,n (t) < Ak,n (t + 1) then xk,n (t) = Bk,n (t) +
Ak,n (t + 1) − Bk,n (t) (xk,n (t) − Bk,n (t)) Bk,n (t) − Ak,n (t)
(6)
Else if xk,n (t) > Bk,n (t + 1) then xk,n (t) Bk,n (t + 1) − Ak,n (t) (xk,n (t) − Ak,n (t)) = Ak,n (t) + Bk,n (t) − Ak,n (t)
(7)
2. Velocity vk,n (t) will be refreshed as per Eq. (1). In this analyzed context, velocity for the nth of kth particle at (t + 1)th iteration is specified as: (min) (max) (t + 1), vk,n (t + 1)] vk,n (t + 1) ∈ [vk,n
(8)
(min) (max) where vk,n (t + 1) and vk,n (t + 1) are determined by using the following: (min) vk,n (t + 1) = wvk,n (t) + c1r1 ( pk,n (t) − Bk,n (t + 1))
+ c2 r2 (gn (t) − Bk,n (t + 1))
(9)
(max) vk,n (t + 1) = wvk,n (t) + c1r1 ( pk,n (t) − Ak,n (t + 1))
+ c2 r2 (gn (t) − Ak,n (t + 1))
(10)
3. Position xk,n (t) will be refreshed according to Eq. (2). In case the interval [Ak,n (t + 1), Bk,n (t + 1)] does not have position xk,n (t + 1), then it will be specified as: If xk,n (t + 1) < Ak,n (t + 1) then xk,n (t + 1) vk,n (t + 1) = xk,n (t + 1) + (min) (Ak,n (t) − xk,n (t)) vk,n (t + 1)
(11)
Else if xk,n (t) > Bk,n (t + 1) then xk,n (t + 1) vk,n (t + 1) = xk,n (t + 1) + (max) (Bk,n (t) − xk,n (t)) vk,n (t + 1)
(12)
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things …
463
4 Results and Discussion The simulation results of the proposed fuzzy PSO methodology are specified and are compared with MAC which is carried out using 10 to 60 node pause time in second. The average Packet Delivery Ratio (PDR), average end to end delay, normalized energy consumption compared to base line, reduction in congestion compared to base line and number of cached replies as shown in Tables 1 and 2 and Figs. 2, 3, 4, 5 and 6. Table1 Performance analysis of MAC using various metrics Parameter
Node pause time in seconds 10
20
30
40
50
60
Average packet delivery ratio
0.6409
0.7404
0.7913
0.8282
0.8469
0.8954
Average end to end delay
0.073606 0.010642 0.003455 0.002315 0.001066 0.000862
Normalized energy 1 consumption compared to base line
1
1
1
1
1
Reduction in congestion compared to base line
1
1
1
1
1
128
132
135
143
145
1
Number of cached replies 127
Table 2 Performance analysis of FUZZY PSO using various metrics Parameter
Node pause time in seconds 20
30
40
50
60
Average packet delivery ratio
0.7997
0.8391
0.8957
0.9603
0.9717
0.9749
Average end to end delay
0.0103
0.00628
0.00104
0.00106
0.00082
0.0007
Normalized energy consumption compared to base line
0.75
0.71
0.73
0.72
0.72
0.71
Reduction in congestion compared to base line
0.49
0.58
0.54
0.52
0.49
0.4
Number of cached replies
334
373
354
391
348
469
Fig. 2 Plot of average packet delivery ratio for MAC and fuzzy PSO
Average Packet Delivery Ratio
10
1 0.8 0.6 10
20 30 40 50 Node pause time in second MAC
Fuzzy PSO
60
Fig. 4 Plot of normalized energy consumption compared to base line for MAC and fuzzy PSO
0.1 0.05 0 10
20 30 40 50 Node pause time in second MAC
Normalized Energy Consumption Compared to Base Line
Fig. 3 Plot of average end to end delay for MAC and fuzzy PSO
S. L. Prathapa Reddy et al. Average End to End Delay (Second)
464
1 0.8 0.6 10
20 30 40 50 Node pause time in second
Reduction in Congestion Compared to Base Line
Number of Cached Replies
60
Fuzzy PSO
1.3 0.8 0.3 10
20 30 40 50 Node pause time in second MAC
Fig. 6 Plot of number of cached replies for MAC and fuzzy PSO
Fuzzy PSO
1.2
MAC
Fig. 5 Plot of reduction in congestion compared to base line for MAC and fuzzy PSO
60
60
Fuzzy PSO
500 300 100 10
20 30 40 50 Node pause time in second MAC
60
Fuzzy PSO
PDR specifies number of successfully received packet to the total number of packets sent by the sender and End to end delay specifies the average time taken for a data packet to move from source to destination. The proposed methodology provides reduction in congestion, i.e., it reduces the traffic in the network. From the above figure, it can be seen that the fuzzy PSO has higher average PDR by 22.04%, by 12.49%, by 12.37%, by 14.77%, by 13.72% & 8.5% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively.
Fuzzy Optimized Particle Swarm Algorithm for Internet of Things …
465
From Fig. 3, it can be seen that the fuzzy PSO has lower average end to end delay by 150.89%, by 51.55%, by 107.45%, by 74.37%, by 26.08% & 20.74% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively. From Fig. 4, it can be seen that the fuzzy PSO has lower normalized energy consumption compared to base line by 28.57%, by 33.91%, by 31.21%, by 32.55%, by 32.55% & 33.92% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively. From Fig. 5, it can be seen that the fuzzy PSO has lower reduction in congestion compared to base line by 68.45%, by 53.16%, by 59.74%, by 63.15%, by 68.45% & 85.71% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively. From Fig. 6, it can be observed that the fuzzy PSO has higher number of cached replies by 89.8%, by 97.8%, by 91.35%, by 97.33%, by 83.5% & 105.53% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively.
5 Conclusion The advent of the IoT has created an expectation of the WSN growth to increase in an exponential rate providing a new medium for the IoT for sensing all realworld environments. An increase in the WSN nodes has deteriorated the MAC layer level’s performance. Fuzzy logic can now make real-time decisions, using incomplete information and the PSO has been presented here for achieving parameters and optimal values of the FLC. The PSO could achieve parameters and values in the fuzzybased system and was used for optimizing all membership functions to adjust the FLC by means of adjusting their range. The PSO was recognized to be an ideal heuristic technique for the problems of optimization found in continuous and multidimensional research spaces. The PSO technique can also be used to achieve solutions of high quality at the same time bringing download computation unlike the other stochastic methods like the GA. Results show that the fuzzy PSO has higher average PDR by 22.04%, by 12.49%, by 12.37%, by 14.77%, by 13.72% & 8.5% for MAC when compared with 10, 20, 30, 40, 50 & 60 node pause time in second, respectively.
References 1. Lin TH, Lee CC, Chang CH (2018) WSN integrated authentication schemes based on Internet of Things. J Internet Technol 19(4):1043–1053 2. Cacciagrano D, Culmone R, Micheletti M, Mostarda L (2019) Energy-efficient clustering for wireless sensor devices in Internet of Things. In: Al Turjman F (eds) Performability in Internet of Things. EAI/Springer Innovations in Communication and Computing. Springer, Cham, pp 59–80. https://doi.org/10.1007/978-3-319-93557-7_5 3. Sharma K, Chhamunya V, Gupta PC, Sharma H, Bansal JC (2015) Fitness based particle swarm optimization. Int J Syst Assur Eng Manag 6(3):319–329
466
S. L. Prathapa Reddy et al.
4. Jadon SS, Sharma H, Bansal JC (2013) Self adaptive acceleration factor in particle swarm optimization. In: BICTA-2012, advances in intelligent systems and computing, vol 201. Springer, pp 325–340 5. Lalwani S, Sharma H, Satapathy SC, Deep K, Bansal JC (2019) A survey on parallel particle swarm optimization algorithms. Arab J Sci Eng 44(4):2899–2923 6. Hasan MZ, Al TF (2017) Optimizing multipath routing with guaranteed fault tolerance in Internet of Things. IEEE Sens J 17(19):6463–6473 7. Zhang Y, Gan J, Liu Y (2019) A novel intelligent computing based localization algorithm for Internet of Things. In: 2019 IEEE 9th International conference on electronics information and emergency communication (ICEIEC), 2019, pp 1–4. https://doi.org/10.1109/ICEIEC.2019.878 4474 8. Hajjej F, Hamdi M, Ejbali R, Zaied M. (2019) A new optimal deployment model of Internet of Things based on wireless sensor networks. In: 2019 15th International wireless communications & mobile computing conference (IWCMC), 2019, pp 2092–2097. https://doi.org/10. 1109/IWCMC.2019.8766560 9. Pau G, Salerno V (2019) Wireless sensor networks for smart homes: a fuzzy-based solution for an energy-effective duty cycle. Electronics 8(2):131. https://doi.org/10.3390/electronics8 020131 10. Raguraman P, Ramasundaram M (2019) Dimension based localization technique in Internet of Things: a fuzzy driven approach for mobile target. J Inf Sci Eng 35(5):977–996 11. Kabara J, Calle M (2012) MAC protocols used by wireless sensor networks and a general method of performance evaluation. Int J Distrib Sens Netw 8(1):1–11. https://doi.org/10.1155/ 2012/834784 12. Kumar A, Zhao M, Wong K-J, Guan YL, Chong PHJ (2018) A comprehensive study of IoT and WSN MAC protocols: research issues, challenges and opportunities. IEEE Access 6:76228– 76262. https://doi.org/10.1109/ACCESS.2018.2883391 13. AlSbakhi MA, Elaydi HA (2017) Hybrid FLC/BFO controller for output voltage regulation of Zeta converter. J Eng Res Technol 4(2):48–60 14. Pau G, Collotta M, Maniscalco V (2017) Bluetooth 5 energy management through a fuzzy-PSO solution for mobile devices of Internet of Things. Energies 10(7):992. https://doi.org/10.3390/ en10070992 15. Pau G, Collotta M, Maniscalco V, Choo KKR (2019) A fuzzy-PSO system for indoor localization based on visible light communications. Soft Comput 23(14):5547–5557
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making Problems in Triangular Fuzzy Numbers Sandhya Priya Baral , P. K. Parida , and S. K. Sahoo
Abstract In this paper, we study a selection of admirable alternative consequence using specified alternatives, experts and criteria corresponding to the multi-criteria decision-making (MCDM) models. In this model, we select a process to established on Fuzzy Analytical Hierarchy Process (FAHP) method and Fuzzy Technique for Order Performance by Similarity to Ideal Solution (FTOPSIS) method is applied to get the weights through FAHP of each criterion by using pair wise comparison and FTOPSIS is applied for the closeness coefficients for final ranking of the solutions of chosen alternatives. Lastly, we demonstrated the result of the closeness coefficient solution and have defended our model to be structured and vigorous. Keywords Multi-criteria decision-making (MCDM) · Analytical hierarchy process (AHP) · Fuzzy analytical hierarchy process (FAHP) · Fuzzy TOPSIS · Triangular fuzzy numbers (TFN) MSC Classification 90C29 · 90C31 · 91A35 · 91B06
1 Introduction The decision-making (DM) is an experimental process to entails the recognizing goal, acquiring the pertinent and requisite information, weighing the alternatives in order to make a decision. Decision can be made by individuals or groups. A systematic decision-making process proposed Simon [1] involves intelligence, design and choice. Later on, added implementation phase for DM process. Decision-making (DM) problems assuming many criteria are called MCDM with p alternative and q criteria conceivably described in matrix form as: S. P. Baral · P. K. Parida (B) Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India e-mail: [email protected] S. K. Sahoo Institute of Mathematics & Applications, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_33
467
468
S. P. Baral et al. K K ··· K q
⎡ 1 2 L 1 S11 S12 L2 ⎢ ⎢ S21 S22 Md = . ⎢ . . .. ⎣ .. .. Lp
··· ··· .. .
S1q S2q .. .
⎤ ⎥ ⎥ ⎥, T = T1 , T2 , · · · , Tq ⎦
(1)
S p1 S p2 · · · S pq
where L 1 , L 2 , . . . , L p are alternatives, K 1 , K 2 , . . . , K q are criteria, Si j the performance indicators of alternative L i under the criteria K j and T j is the weights of criterion K j . MCDM is a notion which is enabling to choose the most convenient among the pre-established alternatives by evaluating them in terms of criteria. The fuzzy multicriteria decision-making (FMCDM) problems Bellman and Zadeh [2], Chen [3], Chen and Hwang [4], Hsu and Chen [5], Kacprzak et al. [6], Liang [7], Tanino [8], Wang et al. [9] among which the rating and the weights of criteria evaluated on imprecision, subjective and vagueness are usually expresses by linguistic terms and then set into fuzzy numbers Zadeh [10], Zimmermann [11]. The Techniques for Order Performance by Similarity to Ideal Solution (TOPSIS) method is a well-known method for determine the positive ideal solution and anti-positive ideal solution. This method was introduced by Hwang and Lin [12]. Since TOPSIS method is a wellknown method for MCDM, many researchers have applied TOPSIS to solve FMCDM problems in the past. The normalized values for the ideal solution and negative ideal solution on criteria are (1, 1, 1) and (0, 0, 0), respectively. (1, 1, 1) is extreme value which is possibly far from true max value and (0, 0, 0) is extreme value which is possibly far from true min value, so the extreme values could not represent the max and min values of TOPSIS techniques. In this study, we arranged as follows. In Sect. 2, we delineated the basic notion and definitions of crisp set, fuzzy set, normalized fuzzy set, alpha cut, distance of fuzzy set, type-2 fuzzy set, ordered triplet, triangular fuzzy number, and distance of triangular fuzzy number. In Sect. 3, we introduced literature review of AHP, fuzzy AHP and TOPSIS methods. In Sect. 4, includes the proposed methodology and algorithm. Section 5 includes a numerical example involving calculations and sensitivity analysis discussed. Section 6 concludes the paper.
2 Basic Notations and Definitions Definition 1 A fuzzy set U˜ in X is characterized by the membership function μU˜ (u) which is connected with each point of ‘u’ in the interval [0, 1] expressing the grade of membership function of ‘u’ in U˜ . U˜ =
u, μU˜ (u) |u ∈ X , U˜ is fuzzy andμU˜ : X → [0, 1].
(2)
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making …
469
Definition 2 The height of the fuzzy set is 1and then it is called normalized fuzzy set. Definition 3 Let U˜ be a fuzzy set. The α−level set or α−cut is defined by
U˜ α = u|μU˜ (u) ≥ α , α ∈ [0, 1].
(3)
Definition 4 Suppose U˜ and V˜ be two fuzzy sets. Now the distance of fuzzy sets can be defined in three different ways. q (i) Hamming distance is defined by d U˜ , V˜ = i=1 μU˜ (xi ) − μV˜ (xi )
2 q (ii) Euclidean distance is defined by E U˜ , V˜ = i=1 μ A˜ (x i ) − μ B˜ (x i ) w q (iii) Minkowski distance is defined by dw U˜ , V˜ = w i=1 μ A˜ (xi ) − μ B˜ (xi ) , w ∈ [1, ∞], w ≥ 1. For special cases of Minkowski distance, we put w = 1 and w = 2 to get Hamming distance and Euclidean distance, respectively. Definition 5 A type-2 fuzzy set U˜˜ is a fuzzy set which has the membership function as type-1 fuzzy set on the interval [0, 1]. U˜˜ =
(u, v), μU˜˜ (u, v) | ∀u ∈ X, ∀v ∈ J X ⊆ [0, 1] , 0 ≤ μU˜˜ (u, v) ≤ 1
(4)
where J X denotes an interval in [0, 1]. Definition 6 A fuzzy set U˜ defined by the ordered triplet. U˜ =
u, μU˜ (u), νV˜ (u) |u ∈ X
(5)
where μU˜ (u) : X → [0, 1] is a membership of u. νU˜ (u) : X → [0, 1] is a degree of non-membership of u. and 0 ≤ μU˜ (u) + νU˜ (u) ≤ 1. Definition 7 The triangular fuzzy number (TFN) U˜ = [u 1 u2 u 3 ] is ⎧ ⎪ 0 if u < u 1 ⎪ ⎪ ⎨ u−u 1 if u ≤ u ≤ u 1 2 1 μU˜ (u) = uu23−u −u ⎪ if u ≤ u ≤ u 2 3 ⎪ u 3 −u 2 ⎪ ⎩ 0 if u > u 3
(6)
Definition 8 Let u˜ = (u 1 , u 2 , u 3 ) and v˜ = (v1 , v2 , v3 ) be two TFN. A distance measure between two TFN are u˜ and v˜ is given by
470
S. P. Baral et al.
3 1 d(u, ˜ v) ˜ = (u i − vi )2 3 i=1
(7)
3 Literature Review First, we review the literature and theoretical background of AHP method and TOPSIS method. Secondly, we review the literatures and theoretical background from two aspects: the literature for solving on fuzzy AHP method to determine the weights and using fuzzy TOPSIS method to determine the alternatives for fuzzy PIS and fuzzy NIS and commonly used for evolution of ranking order.
3.1 The AHP Method The AHP method is basically to formulizing the intuitive acknowledge to complicated problems using a hierarchical structure. The basic fundamental of the AHP is to empower of decision-makers to construct a MADM problems in the form of an attribute hierarchy. The AHP method has the following general steps by Saaty [13] (a) Design a hierarchy for a MADM problem. (b) Construct the relative-essential between the attributes by pairwise correlations in a matrix. (c) Construct the pairwise correlations of alternatives with respect to attributes in a matrix. (d) Design the relative weights of individual elements in the matrix produced in Step (b) and Step (c). (e) Construct the individual alternatives to all-embracing target by combining the emanating weights vertically. The all-embracing preference for individual alternative is acquired by summating the product of the attributes weight and the contribution of the alternatives with respect to that attribute. The TOPSIS method consists of different procedure and this method elaborated below.
3.2 The TOPSIS Method Consider the problem of selecting one between Kq Step1 Consider the decision matrix Md is expressed as Md = L p Si j , where pq
L i , i = 1, . . . , p are alternatives and K j , j = 1, . . . , q are criteria, Si j are indigenous scores express the appraising of the alternative L i w.r.t. criteria
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making …
Step2
Step3 Step4
Step5
471
K j . The weight vector T = T1 , T2 , . . . , Tq is accumulated the discrete weights T j ( j = 1, 2, . . . , q) for every one criteria K j . Consider the normalized decision matrix (Md ) is Ni j , where Ni j = Si j Si2j for i = 1, . . . , p ; j = 1, . . . , q, where Si j and Ni j are original and normalized matrix, respectively. The weighted normalized decision matrix (Md ) is Vi j = T j Ni j , where T j is T j = 1. the weight for j th criteria and The PIS and NIS are R + = v1+ , v2+ , . . . , vq+ and R − = v1− , v2− , . . . , vq− ,
where v +j = maxi Vi j | j ∈ J1 ; min j Vi j | j ∈ J2 .
and v −j = mini Vi j | j ∈ J1 ; maxi Vi j | j ∈ J2 . For J1 represent as benefit criteria and J2 represent as cost criteria. Calculate the Euclidean distances from FPIS R + and FNIS R − solutions for each alternative Ri δi+
=
2 2 + − i j and δi = i−j .j
.j
where i+j = v +j − Vi j and i−j = v −j − Vi j with i = 1, 2, . . . , p Step6 Calculate the relative closeness i for every alternatives Ri with respect to FPIS R + is given by i = δi− δi− + δi+ , where i = 1, 2, . . . , p.
3.3 The Fuzzy AHP Method There is a different method for obtain the criteria weights Saaty [13] but the AHP established by Zavadskas and Podvezko [14] as most used method. Another paper Buckley [15], Buckley et al. [16] incorporates the fuzzy set into AHP method and acquired fuzzy AHP method. Here introduced state-of-the-art method for determining the fuzzy weights, depends on a direct fuzzification method is introduced Saaty [13]. The general procedure of fuzzy AHP method used by Elomda et al. [17], Wu et al. [18] represented as: Step1 Construct fuzzy pair wise comparison matrices: Each DM assigns linguistic variables expressed by TFN for pair wise comparison between all criteria. Let C˜ = [˜si j ] be a q × q matrix, where s˜i j is the importance of criteria K i with respect to criteria K j and ⎤ s˜12 · · · s˜1q (1, 1, 1) ⎢ s˜21 (1, 1, 1) · · · s˜2q ⎥ ⎥ ⎢ C˜ = ⎢ ⎥ .. .. .. .. ⎦ ⎣ . . . . s˜q2 · · · (1, 1, 1) s˜q1 ⎡
472
S. P. Baral et al.
⎡
s˜12 (1, 1, 1) ⎢ (1, 1, 1) x˜12 (1, 1, 1) ⎢ =⎢ .. .. ⎣ . . (1, 1, 1) x˜1q (1, 1, 1) x˜2q
··· ··· .. .
s˜1q x˜2q .. .
⎤ ⎥ ⎥ ⎥ ⎦
(8)
· · · (1, 1, 1)
Step2 Construct the fuzzy weights by normalization decision matrix. ˜ The fuzzy weight of criteria K i represented by Ti 1/ q R˜ 1 + R˜ 2 + · · · + R˜ q , where R˜ i = s˜i1 × s˜i2 × · · · × s˜iq . R˜ i
=
4 Proposed Methodology Here we introduce the technique for order performance of similarity to ideal solution (TOPSIS) for fuzzy data. Fuzzy TOPSIS method is established upon the notion of alternatives have the shortest path from PIS and longest path from the NIS presented in details Parida and Sahoo [19], Hwang and Lin [12]. The fuzzy TOPSIS method was suggested Simon [1]. This technique is mostly well-known technique for work out MCDM problems. Suppose the decision matrix Kq M˜ d = L p s˜ pq , where L 1 , L 2 , . . . , L p are alternatives and K 1 , K 2 , . . . , K q are with respect to K q criteria. The criteria, s˜ pq are fuzzy numbers with L p alternatives weight vector T = T1 , T2 , . . . , Tq and T = 1.
4.1 Algorithm ˜ Step1 Construct the fuzzy decision matrix matrix of each element s˜ pq is Md . This a triangular fuzzy number s˜i j = si j , αi j , δi j . Step2 Calculate normalized fuzzy decision matrix V˜i j . Step3 For every fuzzy numbers˜i j = si j , αi j , δi j , we set up with the set of α-cut L U as s˜i j = s˜i j α , s˜i j α ,α ∈ [0, 1]. Each fuzzy number s˜i j is changed into an interval. Now this interval is reconstructed into normalized interval L L n˜ i j α = s˜i j α
" p ! L 2 U 2 s˜i j α + s˜i j α , j = 1, 2, . . . , q, i=1
U U and n˜ i j α = s˜i j α
" p ! L 2 U 2 s˜i j α + s˜i j α , j = 1, 2, . . . , q, i=1
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making …
473
L U L U n˜ i j α , n˜ i j α is the normalized interval of s˜i j α , s˜i j α which is trans formed into a fuzzy number V˜i j = n i j , ai j , bi j . According to setting the value of L U α = 1, we get n˜ i j α=1 = n˜ i j α=1 = n i j . Also setting the value α = 0, we U L L get n˜ i j α=1 = n i j − ai j and n˜ i j α=1 = n i j + bi j then ai j = n i j − n˜ i j α=0 and U bi j = n˜ i j α=0 −n i j . Now V˜i j = n i j , ai j , bi j is the fuzzy number of the normalized L U interval n˜ i j α , n˜ i j α . This V˜i j is a normalized positive TFN. Step4 Consider the weighted normalized fuzzy decision matrix M˜ d as v˜i j = V˜i j .T˜ j where T˜ j is the weight of the jth criterion. Step5 Each v˜i j is a normalized fuzzy number and it is belong to [0, 1]. So, we iden tify the PIS A˜ + = v˜1+ , v˜2+ , . . . , v˜q+ and the NIS A˜ − = v˜1− , v˜2− , . . . , v˜q− where v˜ +j = (1, 1, 1) and v˜ −j = (0, 0, 0), j = 1, 2, . . . , q for each criterion. Step6 Also using the distance, we calculate of each alternative from the length q q + + − − ˜ ˜ and δ d v ˜ , v ˜ = d v ˜ , v ˜ the PIS and NIS as δi = i j i j j=1 j=1 j j j , i = 1, 2, . . . , p, respectively. δ˜− Step7 The relative closeness coefficients are C˜ i = δ˜+ +i δ˜− , i = 1, 2, 3, . . . , p. (i i) Step8 The alternative with highest closeness coefficients represents the best alternatives. Now
4.2 Linguistic Variables See Tables 1 and 2.
5 Numerical Example House is a dream of everyone in their life. The state of Odisha Housing Board OSHB comes into existent in the year of 1968 by an act of state government legislature. The main intention is providing reasonable or inexpensive houses in urban and rural areas to serve to the drastic shortage of house in the state of Odisha. The Odisha State Housing Board (OSHB) carries out their desire by delivering house to all customers of the society at reasonable price. The ESW and LIG category homeless persons are supplied a house to enhance the socio-economic status and enrich the living standards of the inhabitants. In such COVID-19 financial situation customers are ready to purchase the flat with affordable prices. Five flats are ready for sale which are evaluated in such way that the best of five that suits the client’s needs on the basis of the different factors like location, transport, cost, expansion possibilities, educational services, and other benefits. Each of one depends on a selected interest rate to adjust cash flows at
474 Table 1 Linguistic variables and TFN
Table 2 Linguistic variables and solution rating of TFN
S. P. Baral et al. Fuzzy numbers
Linguistic variables
TFN
1.1
Very low importance
(01, 01, 01)
1.2
Low importance
(01, 01, 02)
1.3
Low moderate importance
(01, 02, 03)
1.4
Moderate more importance
(02, 03, 04)
1.5
More importance
(03, 04, 05)
1.6
Strong more importance
(04, 05, 06)
1.7
Very strong importance
(05, 06, 07)
1.8
Very strong more importance
(06, 07, 08)
1.9
Extreme strong importance
(07, 08, 09)
Fuzzy numbers
Linguistic variables
TFN
2.1
Very cheap (VC)
(01, 02, 03)
2.2
Cheap (C)
(02, 03, 04)
2.3
Cheap average (CA)
(03, 04, 05)
2.4
Average (A)
(04, 05, 06)
2.5
Very average (VA)
(05, 06, 07)
2.6
Expensive (E)
(06, 07, 08)
2.7
Very expensive (VE)
(07, 08, 09)
2.8
Excellent expensive (EE)
(08, 09, 10)
different points of time Hwang and Lin [12]. Suppose customers want to choose the best flat among all proposed flats. The evaluation of three decision-makers are made cost (Million) of flats in eight point scale as very-cheap (VC), cheap (C), cheapaverage (CA), Average (A), very-average (VA), expensive (E), very-expensive (VE), excellent-expensive (EE) with the fuzzy numbers (FN) described in Tables 1 and 2. The above data and vectors of relative weights of each criteria, the normalized fuzzy decision matrix and weighted normalized fuzzy decision matrix are given in Tables 3, 4, and 5, respectively. The closeness coefficients which are defined to calculate the ranking order of all five alternatives. Also calculating the distance for both the FPIS and FNIS in Tables 6. In 7 represents the distances from FPIS and FNIS. For the closeness coefficients, the order of alternatives is represented in Table 8. The C˜ i values are L 1 > L 2 > L 4 > L 3 > L 2 , respectively, which is given in Table 8. The approach of sensitivity analysis investigates the closeness coefficients with introduced AHP method, TOPSIS method, fuzzy AHP method undergoing Fuzzy TOPSIS conditions and numeric data to argue or authenticate the outcomes. So, we taken the weight based on fuzzy AHP method. It is used to test the criteria weight and five experiments tested by using fuzzy TOPSIS method for Closeness coefficients C˜ i values are given in Table 8. The DFPIS, DFNIS, and closeness coefficients are
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making …
475
Table 3 Decision matrix Al\Cr
K1
K2
K3
K4
K5
L1
(1.0, 2.0, 3.0)
(8.0, 9.0, 10.0)
(6.0, 7.0, 8.0)
(8.0, 9.0, 10.0)
(8.0, 9.0, 10.0)
L2
(2.0, 3.0, 4.0)
(4.0, 5.0, 6.0)
(7.0, 8.0, 9.0)
(5.0, 6.0, 7.0)
(7.0, 8.0, 9.0)
L3
(3.0, 4.0, 5.0)
(8.0, 9.0, 10.0)
(3.0, 4.0, 5.0)
(4.0, 5.0, 6.0)
(8.0, 9.0, 10.0)
L4
(4.0, 5.0, 6.0)
(7.0, 8.0, 9.0)
(4.0, 5.0, 6.0)
(7.0, 8.0, 9.0)
(6.0, 7.0, 8.0)
L5
(5.0, 6.0, 7.0)
(6.0, 7.0, 8.0)
(6.0, 7.0, 8.0)
(3.0, 4.0, 5.0)
(8.0, 9.0, 10.0)
demonstrated through bar graph in Fig. 1, line graph in Fig. 2 and scatter graph in Fig. 3, respectively. In the sensitivity analysis, we have seen that the proposed approach can construct adequate result and furnish actual information to abet implements in the decisionmaking problems.
6 Conclusion and Future Scope This paper has presented details analysis of AHP method, fuzzy AHP method, TOPSIS method, and fuzzy TOPSIS method, respectively, in relation to the closeness coefficients. The fuzzy AHP method and fuzzy TOPSIS method, algorithms were planned and tabulated values are computed using fuzzy TOPSIS software. This paper we introduced the hybrid methods of fuzzy AHP and fuzzy TOPSIS on determined ranking solutions for best flat among the five flats through the decision-makers. So, Fuzzy AHP method was used to determined weights and fuzzy TOPSIS method was used to get ranking of the solutions. For the future research, this study can use for multi-criteria group decision-making problem involving fuzzy TOPSIS method. Further, we combine different methods like AHP method; TOPSIS method; Fuzzy AHP method; Fuzzy TOPSIS methods, etc. form hybrid methods can solves the different multi-criteria decision-making problems involving applications in optimization problems.
K1
(0.143, 0.286, 0.429)
(0.286, 0.429, 0.571)
(0.429, 0.571, 0.714)
(0.571, 0.714, 0.857)
(0.714, 0.857, 1.000)
Al\Cr
L1
L2
L3
L4
L5
Table 4 Normalized decision matrix
K2
(0.500, 0.571, 0.667)
(0.444, 0.500, 0.571)
(0.400, 0.444, 0.500)
(0.667, 0.800, 1.00)
(0.400, 0.444, 0.500)
K3
(0.667, 0.778, 0.889)
(0.444, 0.556, 0.667)
(0.333, 0.444, 0.556)
(0.778, 0.889, 1.000)
(0.667, 0.778, 0.889)
K4
(0.600, 0.750, 1.000)
(0.333, 0.375, 0.429)
(0.500, 0.600, 0.750)
(0.429, 0.500, 0.600)
(0.300, 0.333, 0.375)
K5
(0.800, 0.900, 1.000)
(0.600, 0.700, 0.800)
(0.800, 0.900, 1.000)
(0.700, 0.800, 0.900)
(0.800, 0.900, 1.000)
476 S. P. Baral et al.
K1
(0.05, 0.100, 0.150)
(0.100, 0.150, 0.200)
(0.150, 0.20, 0.25)
(0.200, 0.250, 0.300)
(0.250, 0.300, 0.350)
Al\Cr
L1
L2
L3
L4
L5
(0.175, 0.200, 0.233)
(0.156, 0.175, 0.200)
(0.140, 0.156, 0.175)
(0.233, 0.280, 0.350)
(0.14, 0.156, 0.175)
K2
Table 5 Weighted normalized decision matrix K3
(0.233, 0.272, 0.311)
(0.156, 0.194, 0.233)
(0.117, 0.156, 0.194)
(0.272, 0.311, 0.350)
(0.233, 0.272, 0.311)
K4
(0.210, 0.263, 0.350)
(0.117, 0.131, 0.150)
(0.175, 0.210, 0.263)
(0.150, 0.175, 0.210)
(0.105, 0.117, 0.131)
K5
(0.280, 0.315, 0.350)
(0.210, 0.245, 0.280)
(0.280, 0.315, 0.350)
(0.245, 0.280, 0.315)
(0.280, 0.315, 0.350)
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making … 477
478
S. P. Baral et al.
Table 6 Fuzzy positive and negative ideal solutions
Criteria
FPIS
FNIS
K1
(0.250, 0.300, 0.350)
(0.050, 0.100, 0.150)
K2
(0.140, 0.156, 0.175)
(0.233, 0.280, 0.350)
K3
(0.272, 0.311, 0.350)
(0.117, 0.156, 0.194)
K4
(0.105, 0.117, 0.131)
(0.210, 0.263, 0.350)
K5
(0.280, 0.315, 0.350)
(0.210, 0.245, 0.280)
Table 7 Distance from fuzzy positive and negative ideal solutions Alternatives
L1
L2
L3
L4
L5
DFPIS
0.239
DFNIS
0.485
0.382
0.357
0.272
0.249
0.342
0.367
0.452
0.475
Table 8 Closeness coefficients and ranking order Alternatives
L1
L2
L3
L4
L5
C˜ i
0.670
0.472
0.507
0.624
0.656
Rank
1
5
4
3
2
Fig. 1 Performance changes in FPIS, FNIS, and C˜ i
Fuzzy TOPSIS Approaches for Multi-criteria Decision-Making …
479
Fig. 2 Ranking changes in sensitivity analysis
Fig. 3 Performance of FPIS, PNIS, and C˜ i
References 1. Simon HA (1977) The new science of management decision. Prentice-Hail, Englewood Cliffs, New Jersey, USA 2. Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17:141– 164 3. Chen CT (2000) Extensions to the TOPSIS for group decision-making under fuzzy environment. Fuzzy Sets Syst 114:1–9 4. Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making methods and application. Lecture Notes in Economics and Mathematical Systems, Springer, New York. 5. Hsu HM, Chen CT (1996) Aggregation of fuzzy opinions under group decision making. Fuzzy Sets Syst 79:279–285
480
S. P. Baral et al.
6. Kacprzak J, Fedrizzi M, Nurmi H (1992) Group decision making and consensus under fuzzy preferences and fuzzy majority. Fuzzy Sets Syst 49:21–31 7. Liang GS (1992) Fuzzy MCDM based on ideal and anti-ideal notions. Eur J Oper Res 112:682– 691 8. Tanino T (1984) Fuzzy preference in group decision making. Fuzzy Sets Syst 12:117–131 9. Wang YJ, Lee HS, Lin K (2003) Fuzzy TOPSIS for multi-criteria decision-making. Int Math J 3:367–379 10. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353 11. Zimmermann HJ (1987) Fuzzy set, decision making and expert system. Kluwer, Boston 12. Hwang CL, Lin MJ (1987) Group decision making under multiple criteria: methods and applications. Springer-Verlag, Heidelberg 13. Saaty TL (1980) The analytic hierarchy process. planning, preference setting, resource allocation. McGraw-Hill 14. Zavadskas EK, Podvezko V (2016) Integrated determination of objective criteria weights in MCDM. Int J Inf Technol Decis Mak 15(2):267–283 15. Buckley JJ (1985) Fuzzy hierarchical analysis. Fuzzy Sets Syst 17(3):233–247 16. Buckley JJ, Feuring T, Hayashi Y (2001) Fuzzy hierarchical analysis revisited. Eur J Oper Res 129(1):48–64 17. Elomda BM, Hefny HA, Hassan HA (2013) An extension of fuzzy decision maps for multicriteria decision-making. Egypt Inf J 14:147–155 18. Wu HY, Tzeng GH, Chen YH (2009) A fuzzy MCDM approach for evaluating banking performance based on balanced scorecard. Expert Syst Appl 36:10135–10147 19. Parida PK, Sahoo SK (2015) Fuzzy multiple attributes decision-making models using TOPSIS techniques. Int J Appl Eng Res 10(2):1433–1442
Black–Scholes Option Pricing Using Machine Learning Shreyan Sood , Tanmay Jain , Nishant Batra , and H. C. Taneja
Abstract The main objective of this paper is to explore the effectiveness of machine learning models in predicting stock option prices benchmarked by the Black–Scholes Model. We have employed the following four machine learning models—Support Vector Machine, Extreme Gradient Boosting, Multilayer Perceptron and Long ShortTerm Memory, trained using two different set of input features, to predict option premiums based on the S&P 500 Apple stock option chain historical data from 2018 and 2019. Statistical analysis of the results show that Long Short-Term Memory is the best out of the chosen models for pricing both Call and Put options. Further analysis of the predictions based on moneyness and maturity demonstrated consistency of results with the expected behavior that validated the effectiveness of the prediction model. Keywords Option pricing · Option Greeks · Volatility · Black–Scholes model · Machine learning · LSTM · Moneyness
1 Introduction The Black–Scholes Model is one of the most fundamental and widely used financial models for pricing stock option premiums. However, due to the standard limitations and assumptions of the model, it is considered to be just a useful approximation tool or a robust framework for other models to build upon. Most research studies that attempted to discern the relevance of the Black–Scholes Model in real world scenarios conclude that the assumption of constant underlying volatility over the life S. Sood (B) · T. Jain · N. Batra · H. C. Taneja Department of Applied Mathematics, Delhi Technological University, Delhi, India e-mail: [email protected] T. Jain e-mail: [email protected] H. C. Taneja e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_34
481
482
S. Sood et al.
of the derivative was the biggest contributing factor for the empirical inaccuracy of the model. Modifications based on the concepts of stochastic volatility and jump diffusion are widely implemented in the field of financial mathematics to correct the shortcomings of the Black–Scholes Model. The high dimensionality and flexibility of factors upon which option premiums depend makes the task of accurately predicting them extremely complex. Recently, the concept of machine learning, specifically, time-series forecasting using predictive model is finding much applications in the field of finance. In our approach to provide a solution for predicting option premiums accurately, we have implemented certain machine learning models designed with the intent to effectively build upon and outperform the Black–Scholes Model while using the same set of input parameters and subsequently calculated Option Greeks. This approach of using Option Greeks as training input for option pricing prediction models is relatively unexplored and our research contributes to the scarce amount of existing literature which have applied this approach. We compared and explored the behaviors and performance of different models with the benchmark predictions obtained from the Black–Scholes Model. A comprehensive comparative statistical analysis of the best obtained results has been performed separately for call and put based on the moneyness and maturity values.
2 Literature Review Drucker et al. [1] introduced a new regression technique, SVR, based on Vapnik’s concept of support vectors. Hochreiter and Schmidhuber [2] established a new type 3 gated RNN known as LSTM that was efficient at learning to store information over extended time periods. Chen and Guestrin [3] described an optimized tree boosting system known as XGBoost, a sophisticated machine learning algorithm that is popularly used to achieve state-of-the-art results for different problem statements. Hutchinson et al. [4], utilized a non-parametric approach to option pricing using an ANN for the first time as a more accurate and computationally efficient alternative. Gencay and Salih [5] showed how mispricing in the Black–Scholes option prices was greater for deeper out-of-the-money options compared to near out-of-themoney options when options are grouped by moneyness. Their research indicated that the mispricing worsened with increase in volatility. They concluded that the Black– Scholes Model was not an optimal tool for pricing options with high volatility while feed forward neural networks had a lot more success in such situations. Gençay et al. [6] dove deep into Modular Neural Networks (MNNs) and provided an insight of how they could overcome the shortcomings of Black–Scholes Model, MNN was used to decompose the data into modules as per moneyness and maturity and each module was estimated independently. Palmer [7] implemented a neural network that used a novel hybrid evolutionary algorithm based on particle swarm optimization and differential evolution to solve the problem of derivative pricing. Culkin and Das [8] provided an overview on neural networks, their basic structure and their use in the field of finance, specifically for option pricing. Shuaiqiang et al. [9] utilized an
Black–Scholes Option Pricing Using Machine Learning
483
artificial neural network (ANN) solver for pricing options and computing volatilities with the aim of accelerating corresponding numerical methods. Ruf and Wang [10] deeply looked at the literature on option pricing using neural networks and discussed the appropriate performance measures and input features. One key takeaway from their paper was the reason why chronological partitioning of the option dataset was better than random partitioning which might lead to data leakage. The paper is formulated as follows. Section 2 gives the literature review. In Sect. 3 we describe the dataset employed, Black–Scholes model and the four machine learning models. Sections 4 and 5 give, respectively, the analysis and results obtained. Section 6 gives the conclusion.
3 Methods 3.1 Dataset and Features We have used the S&P 500 Apple (AAPL) stock option chain historical data from 2018 and 2019. After the necessary data cleaning and preprocessing, we compiled 2 separate chronological datasets for call and put options with the call option data containing about 2,77,000 rows and the put option data containing about 2,42,000 rows. Each row of data contained the closing values of the option premium (C for call, P for put), underlying stock price (S), strike price (K), implied volatility (σ I ) and time to expiration in years (t), as well as the values of the following Option Greeks— Delta (δ), Gamma (γ ), Theta (θ ), Vega (ν) and Rho (ρ). The option premiums served as our output ground truth values whereas the rest served as our input features. The four machine learning models were trained using two different sets of input features: 1. Set one which excluded Option Greeks, i.e., it contained only four features—S, K, t and σ I . 2. Set two that included Option Greeks, i.e., it contained all nine features—S, K, t, σ I , δ, γ , θ, ν and ρ. Since the corresponding information regarding the risk-free interest rates ® was unavailable, it was not used as an input feature. For the purpose of calculating the Black–Scholes option premiums, the average value of the U.S. one year treasury rate across 2018 and 2019 was taken as R. Both call and put datasets were split into a 70:30 ratio chronologically for the purpose of generating training and testing datasets.
484
S. Sood et al.
3.2 Black–Scholes Model (BSM) The Black–Scholes Model was introduced in 1973, and provided a straight closedform solution for pricing European Options. However, the model is based on certain assumptions that do not hold water in real market scenarios, for instance the assumption that the underlying asset price follows a Geometric Brownian Motion and that volatility of underlying prices is constant. For our purposes, the premiums calculated using the BSM used implied volatility instead of the annualized volatility and hence formed an ideal benchmark as the only error occurred from the unavailability of the exact risk-free interest rate values for each of the individual options. Using the Black–Scholes equation, the premium for a call option can be calculated as: C = SN(d1) − Ke − rtN (d2)
(1)
Similarly, the premium for a put option can be calculated as: P = K e − r t N (−d2) − S N (−d1)
(2)
where S: underlying stock price t: time until option exercise (in years) K: strike price of the option contract r: risk-free interest rate available in the market N: cumulative probability function for standard normal distribution and d 1 , d 2 are calculated as follows: 2 ln KS + r + σ2 t d1 = (3) √ σ t √ (4) d2 = d1 − σ t; Here σ is the annualized volatility of the stock. Option Greeks are used to label various kinds of risks involved in options. Each “Greek” is the result of a flawed assumption of the option’s relation with a specific underlying variable. They are commonly used by options traders to comprehend how their profit and loss will behave, as prices vary. In this paper, we have used five Option Greeks, namely Delta (δ), Gamma (γ ), Theta (θ ), Vega (ν) and Rho (ρ). Delta (δ) is the price sensitivity of the option, relative to the underlying asset. Theta (θ ) is the time sensitivity or the rate of change in the option price, with time. It is also known as option’s time decay. It demonstrates the amount with which the price of an option would reduce, with diminishing time to expiry. Gamma (γ ), also known as the second derivative price sensitivity, is the rate of change of option’s Delta with respect
Black–Scholes Option Pricing Using Machine Learning
485
to the underlying asset’s price. Vega (ν) gives the option’s sensitivity to volatility. It shows the rate of change of option’s value with that of the underlying asset’s implied volatility. Rho (ρ) represents the sensitivity to the interest rate. It is the rate of change of the option’s value with respect to that of 1% change in interest rate. In essence, the Greeks represent gradient information for the Black–Scholes Model and when fed as additional input parameters, can help the Machine Learning Models better fit the training data and accurately capture the underlying relation.
3.3 Machine Learning Models Support Vector Machine (SVM). These are supervised learning models based on the margin maximization principle and risk minimization by finding the optimal fit hyperplane. They are typically used for non-probabilistic linear classification but can be used for regression as well. Specifically, support vector regression (SVR) is the method based on SVMs that is used for high dimensional nonlinear regression analysis. We used SVR with radial basis function kernel for predictions. SVR was chosen because of its high accuracy of generalization of high dimensional data and high robustness to outliers even though the model underperformed in cases of large datasets. Standard normalization of data was performed prior to training and hyperparameter tuning and regularization was carried out using the Grid Search Method to minimize overfitting. Extreme Gradient Boosting (XGB). XGB is a software library which provides an optimized distributed gradient boosting framework that is designed to be highly efficient. Gradient boosting is a technique that produces a single strong prediction model in the form of an ensemble of weaker iterative models such as decision trees. XGB was chosen because of its high speed and flexibility with which it handled large, complex datasets even though it was sensitive to outliers. Regularization and early stopping techniques were utilized to minimize overfitting and hyperparameter tuning was carried out using the grid search method. Multilayer Perceptron (MLP). MLPs are categorized as feedforward artificial neural networks (ANNs) that utilizes the back-propagation technique for training. MLPs are universal function approximators and hence are ideal for generalizing mathematical models by regression analysis. In our implementation, we used a sequential MLP with five fully connected hidden layers that used a variety of different activation functions and optimized hyperparameters. To minimize overfitting, we utilized dropout and batch normalization layers throughout the dense layers and also monitored test dataset metrics for early stopping. Long Short-Term Memory (LSTM). LSTMs are categorized as recurrent neural network (RNN) architectures which, unlike standard feedforward neural networks, have feedback connections. A common LSTM unit consists of three gates—an input gate, a forget gate, and an output gate, which are used in tandem to regulate the
486
S. Sood et al.
Table 1 Call option test dataset metrics S. No.
Moneyness type
Value of S/K (Call)
Value of S/K (Put)
1
In-the-Money
> 1.05
< 0.95
2
Out-of-the-Money
< 0.95
> 1.05
3
At-the-Money
0.95 ≤ S/K ≤ 1.05
0.95 ≤ S/K ≤ 1.05
flow of information to the LSTM cell which remembers the values over arbitrary time intervals. LSTMs are optimally used to process time-series data, as time-lags of unknown duration can be obtained between important events in a time-series. To minimize overfitting, we implemented dropouts, batch normalization and early stopping that monitored testing dataset loss. Activation functions for every layer and other hyperparameters were efficiently tuned in order to minimize training loss.
4 Analysis We have analyzed the predictions made by the BSM on both call and put option datasets and have compared it with four regression models, each trained on the two sets of input parameters, which gave a total of nine models for comparison. Out of these nine models, we picked the model with the best results and further broke down its performance based on the following two parameters.
4.1 Moneyness Moneyness describes the inherent monetary value of an option’s premium in the market. For our purposes we divided the underlying stock price (S) by the strike price (K) and categorized the options as being in-the-money, out-of-the-money or at-the-money. Table 1 shows how moneyness for different options was defined quantitatively.
4.2 Maturity Maturity represents the time to expiration in years for the option to be exercised (t). We considered options with t < 0.1 to be short-term, 0.1 ≤ t ≤ 0.5 to be medium-term and t > 0.5 to be long-term.
Black–Scholes Option Pricing Using Machine Learning
487
5 Results The following regression metrics were used in the evaluation of models: 1. Mean Absolute Error (MAE): MAE =
N 1 |x − y| N i=1
(5)
2. Root Mean Squared Error (RMSE): N 1 RMSE = (x − y)2 N i=1
(6)
where N: number of observations x: actual value y: predicted value. A combination of the metrics was taken as the main loss function for training and testing purposes. Following two tables (Tables 2 and 3) represent the results for the metrics obtained from the testing dataset predictions of call options and put options, respectively. In call option test dataset predictions, LSTM performed the best and gave the minimum MAE as well as RMSE. Both LSTM and MLP outperformed the benchmark BSM. In put option test dataset predictions, LSTM performed the best and gave the minimum MAE as well as RMSE. The results from Tables 2 and 3 show that LSTM was the best model for option pricing for both call and put options. SVM and XGB performed worse than BSM Table 2 Call option test dataset metrics
S. No.
Model
MAE
RMSE
1
BSM
2.42
5.32
2
SVM
6.25
9.04
3
SVM_Greeks
9.58
11.81
4
XGB
8.49
15.37
5
XGB_Greeks
5.08
11.81
6
MLP
2.74
4.52
7
MLP_Greeks
3.38
5.35
8
LSTM
2.02
3.74
9
LSTM_Greeks
3.31
4.98
488 Table 3 Put option test dataset metrics
S. Sood et al. S. No.
Model
MAE
RMSE
1
BSM
1.96
5.52
2
SVM
3.76
6.30
3
SVM_Greeks
5.21
8.72
4
XGB
4.74
8.50
5
XGB_Greeks
4.54
8.54
6
MLP
1.68
2.95
7
MLP_Greeks
4.59
6.71
8
LSTM
1.40
2.77
9
LSTM_Greeks
1.99
3.65
that is, these models underfit and were unable to sufficiently capture the underlying relation for option premium predictions. Results also show that Option Greeks when included as additional input features actually produced worse final test metrics in every single model and introduced a degree of data leakage in the models. Along with LSTM, MLP also outperformed BSM. The metrics show similar distribution and trends for both types of options which implies that the training of the models was impartial to option type. Figure 1 shows the accuracy of LSTM in predicting the actual Call Option Prices. The plot is highly linear which demonstrates high similarity between the predicted and the actual prices. Figure 2 shows that the majority of predicted Call Option Prices give a lesser magnitude of error in LSTM. This demonstrates the high similarity between the predicted and the actual prices.
Fig. 1 Calculated (predicted) price versus actual price plot for LSTM call option test dataset predictions
Black–Scholes Option Pricing Using Machine Learning
489
Fig. 2 Error density plot for LSTM call option test dataset predictions
Figure 3 shows the accuracy of LSTM in predicting the actual Put Option Prices. The plot is highly linear which demonstrates high similarity between the predicted and the actual prices. Figure 4 shows that the majority of predicted Put Option Prices give a lesser magnitude of error in LSTM. This demonstrates the high similarity between the predicted and the actual prices. Table 4 represents the results of the best model obtained (LSTM for both Call and Put) when the options are grouped by moneyness.
Fig. 3 Calculated (predicted) price versus actual price plot for LSTM put option test dataset predictions
490
S. Sood et al.
Fig. 4 Error density plot for LSTM put option test dataset predictions
Table 4 LSTM test dataset metrics grouped by moneyness S. No.
Option type
Moneyness type
MAE
RMSE
1
Call
In-the-Money
3.59
5.43
2
Call
Out-of-the-Money
0.55
0.87
3
Call
At-the-Money
1.06
1.37
4
Put
In-the-Money
3.93
6.55
5
Put
Out-of-the-Money
0.84
1.02
6
Put
At-the-Money
1.43
2.03
The results from Table 4 show that when grouped by moneyness, Out-of-theMoney options gave significantly better metrics followed by At-the-Money and then In-the-Money samples. This behavior is to be expected as lower moneyness directly lowers the value of the option premium. Hence, since the magnitude of premium value decreases as we move from In-the-Money to Out-of-the-Money options, so does the magnitude of our error metrics. Figure 5 shows that Out-of-the-Money Stock Option Prices illustrates the highest similarity between predicted and actual prices for Call Options in LSTM. Figure 6 shows that Out-of-the-Money Stock Option Prices illustrates the highest similarity between predicted and actual prices for Put Options in LSTM. Table 5 represents the results of the LSTM when the options are grouped by maturity. In both call and put option LSTM test dataset predictions, short-term options gave the minimum MAE as well as the minimum RMSE, followed by medium-term and then long-term options. The results from Table 5 show that when grouped by maturity, short-term options gave better metrics followed by medium-term and then long-term options. This
Black–Scholes Option Pricing Using Machine Learning
a) In-the-Money
491
b) Out-of-the-Money
c) At-the-Money
Fig. 5 Calculated price versus actual price plots for LSTM call option test predictions grouped by moneyness
a) In-the-Money
b) Out-of-the-Money
c) At-the-Money
Fig. 6 Calculated price versus actual price plots for LSTM put option test predictions grouped by moneyness
behavior is also to be expected as a lower maturity directly lowers the option premiums. Hence, the magnitude of our error metrics decreases as we move from long-term to short-term options. Figure 7 shows that Short-Term Stock Option Prices illustrates the highest similarity between predicted and actual prices for Call Options in LSTM. Figure 8 shows that Short-Term Stock Option Prices illustrates the highest similarity between predicted and actual prices for Put Options in LSTM. Table 5 LSTM test dataset metrics grouped by maturity S. No.
Option type
Moneyness type
MAE
RMSE
1
Call
Short-term
1.73
3.57
2
Call
Medium-term
1.87
3.50
3
Call
Long-term
2.39
4.06
4
Put
Short-term
1.16
1.99
5
Put
Medium-term
1.33
2.26
6
Put
Long-term
1.68
3.67
492
S. Sood et al.
a) Short-Term
b) Medium-Term
c) Long-Term
Fig. 7 Calculated price versus actual price plots for LSTM call option test predictions grouped by moneyness
a) Short-Term
b) Medium-Term
c) Long-Term
Fig. 8 Calculated price versus actual price plots for LSTM put option test predictions grouped by moneyness
6 Conclusions We conclude that LSTM was the best performing model when it came to predicting stock option prices. We found that the performance of both LSTM and MLP was superior to the benchmark Black–Scholes Model whereas SVM and XGBoost failed to outperform the benchmark. We also demonstrated the redundancy of utilizing Option Greeks as input features for the prediction models. Option Greeks introduced a certain degree of data leakage on the models which often reduced the overall performance. This was a new finding as most of the literature about using Option Greeks as input features for option pricing mostly reported an improvement in performance. The analysis of the option predictions on the basis of moneyness and maturity validated that the LSTM prediction model was able to efficiently and accurately predict option prices in a real world market scenario.
References 1. Drucker H, Christopher B, Linda K, Alex S, Vladimir NV (1996) Support vector regression machines. NIPS 2. Hochreiter S, Jürgen S (1997) Long short-term memory. Neural Comput 9:1735–1780
Black–Scholes Option Pricing Using Machine Learning
493
3. Chen T, Carlos G (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining association for computing machinery, New York, NY, USA 4. James MH, Andrew WL, Tomaso P (1994) A nonparametric approach to pricing and hedging derivative securities via learning networks. J Financ 49(3):851–889 5. Ramazan G, Aslihan S (2003) Degree of mispricing with the Black-Scholes model and nonparametric cures. Ann Econ Financ 4:73–101 6. Nikola G, Ramazan G, Dragan K (2009) Option pricing with modular neural networks. IEEE Trans Neural Netw 20(4):626–637 7. Palmer S (2019) Evolutionary algorithms and computational methods for derivatives pricing 8. Robert C, Sanjiv D (2017) Machine learning in finance: the case of deep learning for option pricing. J Investment Manage 15(4):92–100 9. Liu S, Oosterlee CW, Bohte SM (2019) Pricing options and computing implied volatilities using neural networks. Risks 7(1):16 10. Ruf J, Weiguan W (2020) Neural networks for option pricing and hedging: a literature review. J Comput Financ
Literature Review on Waste Management Using Blockchain S. S. Sambare, Kalyani Khandait, Kshitij Kolage, Keyur Kolambe, and Tanvi Nimbalkar
Abstract The ever-evolving blockchain like Ethereum has wide applications in various sectors but is comparatively unknown in the sector of managing waste. If assorted steps within the management method are not meted out correctly, waste management ways can even create a health and environmental risk. Smart cities can subdue ecological issues caused by unethical waste handling to overstate mankind, defend marine life, and scale back contamination. The conventional technologies do not offer adequate levels of transparency and coordination among completely different entities. Tamper-proof technology introduced by blockchain; municipalities can escalate the productivity of their waste management efforts. The projected blockchain technology can attach correct stakeholders toward combining and providing information. We review the various opportunities provided by blockchain Technology in proper handling of the various types of wastes like Solid Waste, Electronic Waste, Medical Waste and Industrial Waste. In this review paper we compare various research available in this domain related to different types of waste on the basis of vital specifications. Moreover, we present perceptive analysis of the available case studies to focus on the feasibility of blockchain in proper waste handling. Ultimately, we put forth challenges that can lead to upcoming research. Keywords Blockchain · Ethereum · Smart contracts · Cryptography · Cloud · Hash function · IoT S. S. Sambare · K. Khandait · K. Kolage (B) · K. Kolambe · T. Nimbalkar Pimpri Chinchwad College of Engineering, Pune, India e-mail: [email protected] S. S. Sambare e-mail: [email protected] K. Khandait e-mail: [email protected] K. Kolambe e-mail: [email protected] T. Nimbalkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_35
495
496
S. S. Sambare et al.
1 Introduction Enormous amounts of waste have been continuously generated since last decade creating a problem for several urban local bodies in India for its proper management. The improper management of waste will have detrimental effects on health and environment. According to the estimates, around 1.3 billion tons per year of solid waste considering industrial and E-waste is generated will increase approximately up to 2 billion tons every year [1, 2]. Also, medical waste approximately grows past 16 billion in two years at a CAGR of 3.8% [3]. Proper management of waste requires a close harmonization among the concerned stakeholders like waste treatment facilities, shippers, collectors and waste generators. However existing systems fall short to manage the waste as they lack to share the information of the waste handling state to the responsible entities. The present status of undertakings of garbage removal does not have the discernibility ability to follow the EOL of squanders. Utilizing blockchain a combined and remarkable stage might be utilized to beat the downfalls of the current frameworks for instance utilizing blockchain we can divide the information between the elaborate elements in a straightforward and secure way. As decentralized architecture is followed by blockchain technology it is fault tolerant at high levels and a trusted technology along with robustness [4, 5]. Blockchain follows P2P architecture to store the data in a see-through manner which is secure and is dependable. In blockchain technology miners participate in the consensus process for validating and authenticating the transactions and creation of the new block accordingly. Many consensus algorithms like Proof of work (PoW) [6], Proof of Stake (PoS) [7] and Proof of Authority (PoA) [8] protect the blockchain. PoW considers computation power whereas Proof of Capacity (PoC) or Proof of Space (PoS) considers capacity in terms of Space. PoW has to be transmitted in order to feature the blocks in blockchain. POW is used as a consensus algorithm in blockchain. They are compelled to follow the contract between different nodes present in the blockchain. Blockchain can be classified into permissioned and permissionless. Per mission less anyone can participate in the system but for permissioned blockchain participants need prior approval. Hyperledger fabric is a permissioned platform and Ethereum and Quorum come under the permissionless category. Smart contracts are simple codes reserved on a blockchain with T&C with participants involved in the tasks of waste administration that run when predetermined conditions are met. They enforce automatically and activate the events only after reaching the fixed benchmark with help of an if/then condition system (if/when). Smart Contracts serve stakeholders to perform operations which are more secure, faster and cheaper as compared to the traditional system where third parties are required to carry out the functioning. Smart contracts can be categorized as prolific, dormant, self-destructed and active. The blockchain is still in its early years and could have more influence than cryptography. In order to protect the identity of users in a network, blockchain uses cryptography, thus ensuring secure transactions to protect all sorts of valuable information. We put forth the blockchain-based case studies and the research related to it in which blockchain technology can be used to solve the
Literature Review on Waste Management Using Blockchain
497
waste management problem in the urban areas. The principal contributions are as mentioned: • Practicality and crucial specifications of using blockchain in waste handling in urban areas is demonstrated and summarized. • Various challenges, research and advancements using blockchain in waste administrations are recognized and considered. • Blockchain-based waste management case studies were studied for four different types of waste. • Future research challenges that will help to unlock the full potential of blockchain are enumerated. Further the paper is divided as: Sect. 2 for how blockchain solves existing problems. Section 3 contains blockchain for various types of waste management. Section 4 contains the comparative study of the available research papers; Sect. 5 contains the generalized waste management process and finally we have Conclusion in Sect. 6.
2 How Blockchain Solves Existing Problems 1. Each process included in waste management can be stored in blockchain, right from the manufacturing of a product from raw material to its recycling stage. This leads to the fact that everyone must carry out their work responsibly as not doing so may affect the entire management procedure. 2. There will be no dispute regarding the original owner of the product when it reaches the recycling sector because all the information of each item will be securely stored in blockchain including the transfer ship of owners from manufacturer to retailer then to customer. 3. Important data like the quantity of waste generated, number of products manufactured, stakeholders included, etc. will be safely present in the blockchain increasing the liability. 4. Blockchain will allow every entity to come under one roof. Government organizations/agencies, customers, producers will act together which will make the entire waste management process transparent and unchangeable. 5. The participants that are present in blockchain receive incentives when they channelize their waste.
498
S. S. Sambare et al.
3 How Blockchain in Different Types of Waste Management 3.1 Solid Waste Management Major Cities in the world face a common problem for solid waste disposal. Inefficient collecting schedule, improper disposal, lack of funding and various factors has led to poor waste management in India. About 620 lakhs of tons of waste is generated in India, whereas 70% is collected and only 27.9% is treated properly. On the larger scale approximately 200 crore ton waste gets produced all over the world (33% not controlled) [9, 11]. These problems mentioned before can be solved efficiently using blockchain technology. Various transactions are done in blockchain including digital currency, as discussed before. High level of computation is required for a hash of the new block for adding. The hash value should be between a specific threshold values. The hash value will be the same all the time as the contents in blocks are static, to find the value of hash under target a variable value is introduced called Nonce. To find a proper nonce for the newly created block, miners across the world compete with each other. The newly created bitcoin is given as a reward to the first one who finds the value under a specific target. High speed GPUs and ASICs are used by miners as they are designed to perform these compounded computations. These processes of miners trying to get the specific target value of nonce is called mining [12, 13]. Figure 1 explains the whole architecture of how this mining and blockchain is implemented for waste management. It is divided in two parts: internal and external networks. The internal networks are of miner nodes which are responsible for creation of blocks and proof of work verification. This network has high data calculation and storage resources while the external nodes have finite storage and low computation volume. The external nodes are mostly used for waste tracking. This large amount of data generated by external nodes on real-time monitoring is transferred to the core network. The encrypted data is piled up by these nodes to trace the solid waste and obtain functional figures. PoW and generated blocks are checked and validated by the minor node (in the internal network) Hashes in blockchain are stored and digital signature is used to secure the integrity of the data stored in the network. For all this internal network requires high computation power and storage (refer Figs. 2 and 3). The author describes the PoW Algorithm as follows: The mining process is started when external nodes receive the transaction from the internal node. Mining is performed at the internal node network due to limited resources available to carry out mining on the external node. Mining process consists of 5 steps: Firstly, a transaction appeal is sent to a miner, whenever a new data/transaction is received by the external node. Secondly, when this transaction request is received, a validation process is carried out to check 2 conditions, i.e., is a transaction altered? Is a transaction in the blockchain? If these 2 conditions are satisfied then the miner node proceeds to the next step 2. And if any of these 2 conditions are satisfied then the miner node completely stops the process and transmits it to the internal network. In the third step,
Literature Review on Waste Management Using Blockchain
499
Fig. 1 Architecture of solid waste management [10]
Fig. 2 Block diagram for hardware connections for sensors [13]
the PoW process is set upon as the node gets the ID. The first block in the blockchain is called genesis and always has the preceding ID as ZERO. Iterative hashing of the information is carried out by the miner’s node. In step 4 a verification process is carried out by the miner node to check the righteousness of the data of all blocks, after the creation of the block. In the 5th and final step, an updated blockchain is received by all external nodes sent by miner nodes [13]. According to the method proposed in the [23] IoT is used in parallel with the blockchain technologies. Hardware IoT devices are used for collection of the information and the information is passed through various nodes of the blockchain ensuring security and traceability devices can be synced to the cloud servers via the gateway. The major fact using the IoT devices along with the gateway is that they must consume less power, to avoid this use of the LoRa protocol along with AT Mega is
500
S. S. Sambare et al.
Fig. 3 Block diagram for hardware connections for the gateway [13]
suggested by the author. FTDI port is inserted for transferring the firmware in the sensor hub. ICSP is used to program the regulator’s module is used to transfer data over Wi-Fi using the IP. Load Sensor is used for getting the amount of the waste generated. Level sensor is used for acquiring the level or quantity of waste in the dustbins. Moreover, to start Internet availability, ESP module is incorporated on the equipment. The ESP Module is an underlying TCP/IP convention, which enables the regulator to approach your Wi-Fi organization. As discussed, earlier data related to waste is collected through IoT gadgets. This data is attached to the blockchain via API and then transferred to the cloud through ESP module. In the blockchain, when a new data is received through IoT device it is considered as new data or transaction in the blockchain. This transaction is appended through the PoW processes using mining as explained before as discussed in the survey of [13]. After the successful generation of block through this method data is finally sent to the cloud.
3.2 E-Waste Management In the past 20 years, digital waste i.e., electronic waste is the quickest developing class of unsafe waste produced. Less than 20% of 530 lakhs metric tons of electronic waste produced in 2019 was gathered and then recycled with proper disposing process. We can endorse the evolution of rising blockchain technology structure to conquer structural and monetary boundaries toward distributive techniques for electronic waste control which can limit the toxic exposures and its unfavorable effects toward human body and the surroundings quality [14]. As these digital merchandise tend to stack up, they start constituting a full-size part of the disposal area, which leads to leakage of poisonous chemical compounds or materials that causes
Literature Review on Waste Management Using Blockchain
501
Fig. 4 E-waste management process [16]
various environmental dangers just like polluting water supplies, the air and the floor itself E-waste management includes various processes starting from; Collection of E-waste from consumers, segregating that waste into renewable and non-renewable, the waste that is renewable may be saved for reselling and the non-renewable can be disassembled. These elements then can be disposed of in a secure manner [15]. Blockchain-helped 5G environment is equipped for setting up responsibility, origin of information, and non-renouncement to every user. The foremost block present in blockchain is different from others. It does not store any transactions. This block is called the genesis block. The other blocks are authenticated transactions. Each block stores the hash value of the previous block which is possible using cryptography. All members inside a particular blockchain network have indistinguishable duplicates of the exchange record. Hence, any changes made to the assets in the ledger can be displayed to all the members. The members of the blockchain have to give information such as, the client’s name, password, type of client of each individual from the framework, etc. The Products comprises the data identified with each item which is enhanced entirety by makers filed with selective merchandise code. The track stores the data or record of every transaction carried out in the entire blockchain [16]. The main goal is to authorize effective observing and administrative exercises through the entire life pattern of electronic items (refer Fig. 4). We can consider a conventional model for this process which is separated into two parts, the first part is called the Forward Supply Chain (FSC) and the second one, Reverse Supply Chain (RSC). Forward flow of the network centers around the exchanging of e-items that begins with raw material providers and finishes with purchasers/mass customers as end-clients. The Stakeholders involved here are mainly the producer, wholesaler, retailer, etc. They have essential advances for creating, bringing in/out e- products, assembling, and conveying these products to the users. Whereas second part, so called RCS of the network flow predominantly manages e-waste assortment, isolation, and converting back to unprocessed substances. All of the processes are carried out by various agencies, for example, digital waste collection centers, mending shops and so on. We must consider both the parts of this process to be important to catch every exercise that is happening in the life cycle of electronic items, where we start from their assembling, then it is removal and finally again back to the unprocessed substances [17]. The first undertaking of all incentives is to register themselves. To forestall fake, any administration certified public personality data set could be looked to confirm the
502
S. S. Sambare et al.
Fig. 5 Registration process [17]
accuracy of the submitted subtleties during enrollment process. This can be accomplished through a tree-based verification application. The next step is associated with collection and gathering of the electronic products. This is one of the important phases in which there is a need to control the flow of electronic products in the whole process flow (refer Fig. 5). This step consists of different subphases as follows: 1. Registration of product 2. Transfer of product 3. System associated for file storage The first stage, i.e., registration of products the registration is done only by the manufacturers and for the registration of products specific details of products are required. In addition to that each product will consist of a unique ID. Also recycling process may create crude materials which need to go through this enrollment interaction again to obtain new one of a kind identities for them. After manufacturing the product, the stakeholders are involved for the product transfer. Any item in the framework may have various strong records, for example, buy charge, guarantee card, transfer archives, e-garbage removal reports and so on. To connect these documents with the comparing item during one or the other enlistment process, we utilize the System associated for file storage. In the access control step the entrance control strategy is deployed with the help of smart contracts [17]. Hence, we can ensure that the e-waste management process is energizing for every one of the clients, retailers and producers to finish the cycle right from making the e-item to guaranteeing that it is disposed of appropriately.
Literature Review on Waste Management Using Blockchain
503
3.3 Medical Waste Management According to the literature review conducted, two researches that show a direction on how blockchain can be implemented for proper medical waste management have been presented below. As proposed in [18], all the data associated with the amount of waste and the information that needs to be shared among the respective stakeholders is collected using different available communication technologies and then transferred to the blockchain for processing, decision making, storing the data block temporarily for predefined time. Using consensus algorithms, especially Does the aggregated block are verified. Using DPoS, respective stakeholders can nominate a few delegates who will be responsible for securing the network for them. In DPOS the algorithm used is similar to a voting system. One responsible member or node is elected. If at any instance of time if the recommended member does not function according to the rules set it will be replaced by a new one. DPoS is considered as compared to PoW and PoS because it has more transaction. Ultimately the verified blocks are added to the blockchain and the data transfer is then carried out with the help of smart contracts [18]. There has been rapid growth in medical waste due to COVID-19. So, considering the aspect of COVID-19 the literature review based on [19] is enumerated further. In this research paper the author has proposed a system which basically follows the sequence of activities. Using registration smart contract all the partakers are put on the blockchain. After successful enrollment of the partakers, a function is raised by the wholesaler or the person responsible for the distribution of goods making use of the smart contract. It will assist this person to validate or deny the order received from the hospitals. Following this successful enrollment of the order the wholesaler makes the order request to the producers of the medical instruments. A lot of medical supplies are created by the manufacturer along with its details. Large sized documents are stored on the IPFS system. For proper authenticity hash values of these documents are stored on the blockchain. Further the distributor receives a lot and interaction between the distributor and the hospital is considered. The placement and confirmation of the order happens in the same manner as between the distributor and the management. Manufacturer. Here a smart contract is used to trigger an event to notify the response of the distributor to all the stakeholders. Now considering the hospital scenario, the main goal of waste management is taken into consideration. For this firstly, the COVID-19 testing hospitals or centers deploy the smart contract or function that will be responsible to ask for the shipment process to the various waste treatment centers. These requests are placed to certified shippers. Here also the response of the shipper is notified to all the related entities. As the medical waste considered in this research is COVID-19 waste, utmost care of the waste must be ensured so that the virus does not spread through the waste dispatch. Shipper continuously stores the information regarding the dispatch and the waste cycle on the computer connected network with the help of the Share Sensor Data function. During the entire process all related entities with help of smart contracts
504
S. S. Sambare et al.
Fig. 6 Algorithm 1 Process [19]
are simultaneously notified about the changes data by the shipper. If the shipper does not follow the specified rules with smart contracts itself, the permit of the shipper may get abandoned. The entire application can be developed using the Ethereum platform as suggested by the authors of this paper. Total 5 algorithms are suggested using which the blockchain technology can be implemented in the waste handling process. This process can be understood better in the flowcharts as demonstrated in the figures [19]. Algorithm 1 Method to submit a request demand to the supplier of COVID-19 related clinical supplies and hardware (refer Fig. 6). Algorithm 2 Algorithm 2 features the business stream alongside functionalities acted in offering the clinical supplies to the order. It guarantees that main enrolled wholesalers and makers with a legitimate permit number can sell the clinical supplies to the order for limiting the phony medications provided from the unlicensed maker or wholesaler. Algorithm 3 Algorithm 3 features the methodology alongside framework capacities and logs to put a clinical waste shipment demand. This calculation must be set off by the COVID-19 testing habitats/clinics as displayed in sync 1 of Algorithm 3. It guarantees that the expected transporters of the COVID-19 clinical waste ought to be enlisted, they have a legitimate permit and as of late tried negative for COVID-19 to additionally limit the odds of COVID-19 spreading. Algorithm 4 Algorithm 4 features the activities performed by the transporter during transportation of the waste identified with Coronavirus clinical supplies and hardware (refer Fig. 7). Fig. 7 Algorithm 4 Process [19]
Literature Review on Waste Management Using Blockchain
505
Algorithm 5 Algorithm 5 breaks down the information identified with the condition of the loss during it taking care of and recommends punishments too. It likewise proposes the FDA/Owner to discredit the permit of the significant partner. Likewise, it confirms and issues a punishment to the client on the off chance that the weight contrast between delivered furthermore got squander is unique. After effective execution, it produces an occasion by setting off an exchange to record transporter EA, infringement count, and transporter permit ID.
3.4 Industrial Waste Management The blockchain-based industrial water management design contains four different layers which are described in Fig. 8. In the first layer, i.e., data acquisition layer diverse equipment related to industry are fit out consisting of smart devices and objects connected to the internet, at the same time can communicate with additional devices. The equipment includes tanks for water storage, different waste treatment plants, flowmeters and measurement of level gauge. Also, this equipment has the capability of monitoring, sensing, processing and can communicate data interconnected to the water storage level. The collected data from the first layer are passed on to the second layer, i.e., edge computing systems which include smart gateways, edge nodes etc. This data transfer is done with the help of communication technologies like Wi-Fi and different network coverage systems for real-time data transfer, risk management, decision analyzing, and storing data in the form of data blocks. Then using the other edge nodes, the collected data blocks are verified via consensus and accordingly added to the blockchain. The edge nodes from the Layer-2(Edge computing) forward the confirmed data blocks to the Layer-3 (Cloud computing platform) having cloud-based servers. These nodes are capable of processing, storing, managing and handling blockchain related operations. In the Layer-4, i.e., Application Layer operations like monitoring and managing tasks related to wastewater are carried. Industrial wastewater management includes stakeholders such as smart industries, environmental monitoring agencies, government agencies, on-governmental organizations, water suppliers and distribution companies. These involved stakeholders will have permission to access the data at the same time they can view the data for which they are granted permission [20]. A cloud-based architecture can be taken into consideration for ensuring privacy and security in industries which can help to empower clients to get to the common pool of resource manufacturing anywhere and anyplace on their demand. The proposed architecture (refer Fig. 9) has in all five layers. Starting with the sensing layer, it incorporates different sorts of sensors and at least one microcomputer with a specific registering power, and preprocesses the gathered information [22]. The management hub layer encrypts the information, bundles the information to produce blocks, and stores it in the data set and the storage layer centers the data. The firm layer connects each layer. The application layer gives users different kinds of services like
506
S. S. Sambare et al.
Fig. 8 Flow diagram for industrial wastewater management [20]
monitoring and predicting failure. In the blockchain framework, other than the transaction carrying node, there is an exceptional hub for recording blocks. To guarantee the trust of all management hubs different consensus algorithm are applied like Proof of stake (PoS), Proof of Work (PoW), Practical Byzantine Fault Tolerance (PBFT), etc. In this system architecture more preference is given to resource utilization and interaction of data. Here SPC called as the Statistical Process Control is taken into consideration, in order to complete Proof of work (PoW). Here the eigenvalues such as average values, control limit etc. are set. In addition to that, analysis of the data that is uploaded is accomplished followed by authentication of transactions. The Proof of Work (PoW) needs to be carried out one time, when there is generation of a new block. This type of architecture helps in real-time capability and scalability. Rather than setting up the cloud, we consider numerous management hubs to shape a private cloud, which permits clients to access the information. This type of architecture is a reliable solution for blockchain technology in union with industrial management [22]. Fig. 9 Architecture for smart industry [21]
Literature Review on Waste Management Using Blockchain
507
4 Comparison of Available Research Papers Recent research summarized in Table 1.
5 Generalized Waste Management Process Using Blockchain After studying different types of waste management systems, we have concluded that the basic structure of implementing the blockchain technology is as mentioned in the flow diagram (refer Fig. 10). In the first step waste is collected. Then all the participating collaborators are added on the blockchain. Once this step is done the waste shipment process is triggered. Then all the relevant data related to the waste is gathered. In the next step the conditions for the data are checked using the smart contracts. Then a unique hash key is generated for this individual block of data and finally the data is added to the blockchain. Only the difference in this method for all the different types of waste is that the waste collection methods are different, the data gathering process followed is different.
6 Conclusion In this paper we have discussed the problems, challenges faced by waste management systems. We have also brought forward how blockchain can be used to manage waste in smart cities in a way which is transparent, trackable, secure, trustworthy and immutable. Various opportunities provided by blockchain have been surveyed in this paper in order to manage various types of wastes like Solid waste, electronic waste, Bio-medical waste and Industrial waste. Blockchain-based research papers for waste management were highlighted along with the stakeholders included in the system flow. Insightful discussions were provided on the practicability of use of blockchain in waste management. Future research challenges that will help to unlock full potential of blockchain are: • Automate waste segregation using some existing technologies like OpenCV or IoT based sensors. • The processing costs in waste management are more viable for storing temporary data because of less delay and its rapid refining. Data is added indelibly to the blockchain. Sometimes there is a possibility of the data getting altered before storing it. As a result, the process of securing the edge nodes can be considered for future direction. • To ensure correct data some mechanism must be installed or sufficient infrastructure must be in place for implementation of blockchain solutions between
508
S. S. Sambare et al.
Table 1 Comparison table of available research papers for different types of waste S. No.
Article No.
Type
Outcomes
Merits
Year
1
[11]
Solid waste
Presented Perceptive analysis of available systems to focus on feasibility of blockchain in various types of waste
Traceability system using Security, IoT in Smart Cities
2021
2
[10]
Solid waste
Tracking of waste Enables reliability, from start to end of all transparency of steps through waste activity blockchain technology
2019
3
[15]
E-waste
To keep track of E-waste generated throughout its whole life cycle
An Interactive Desktop App used for carrying out all EWM processes
2020
4
[16]
E-waste
A representative model which suggests the proper disposal strategy of digital waste using blockchain
The consumers are 2020 provided with incitement when they hand over their electronic waste
5
[18]
Medical waste
The data is collected More efficient than from various available PoW and PoS communication technologies and transferred to the blockchain using DPoS Algorithm
2021
6
[19]
Medical waste
Ethereum blockchain Developed four is used along with smart contracts IPFS to fetch, store and share data securely which assists authorities in disposing of the waste properly
2021
7
[20]
Industrial waste
Blocks of data are passed to servers and the stakeholders with permission can access them
2020
Protects the environment from hazardous and harmful effects of industrial waste. Monitor the harmful industrial waste
(continued)
Literature Review on Waste Management Using Blockchain
509
Table 1 (continued) S. No.
Article No.
Type
Outcomes
Merits
Year
8
[21]
Industrial waste
Focuses mainly at the pharma industry. At the same time the proposed solution has the possibility of detecting wrong entries, unauthorized insertion and eliminating those entries from the system
Crypto pharmaceuticals and its disposal tool is proposed
2020
Fig. 10 Use of blockchain in waste management
different parties. Municipality can guide the development, maintenance, execution of blockchain applications and architectures as they impose the use on stakeholders. In the course of time, we look forward to coming up with an approach considering these research challenges.
References 1. Kaza S, Yao L, Bhada-Tata P, Van Woerden F (2018) What a waste 2.0. A global snapshot of solid waste management by 2050. World Bank Publications 2. Bocquier P (2005) World urbanization prospects: an alternative to the model of projection compatible with the mobility transition theory. Demogr Res 12:197–236 3. Wood L (2020) Global medical waste management market (2020 to 2030)—COVID 19 Society, vol 63, p 102364 4. Singh S, Sharma PK, Yoon B, Shojafar M, Cho GH, Ra I-H (2020) Convergence of blockchain and artificial intelligence in the IoT network for the sustainable smart city. Sustain Cities Soc 63 5. Angelis J, da Silva ER (2019) Blockchain adoption: a value driver perspective. Business Horizons
510
S. S. Sambare et al.
6. Ren W, Hu J, Zhu T, Ren Y, Choo KKR (2020) A flexible method to defend against computationally resourceful miners in blockchain proof of work. Inf Sci 507 7. Vashchuk O, Shuwar R (2019) Pros and cons of consensus algorithm proof of stake difference in the network safety in proof of work and proof of stake. Electron Inf Technol 9 8. Singh PK, Singh R, Nandi SK, Nandi S (2019) Managing smart home appliances with proof of authority and blockchain. In: International conference on innovations for community services. Springer, pp 221–232 9. Kaza S, Yao L, Bhada-Tata P, Van Woerden F (2018) What a waste 2.0: a global snapshot of solid waste management to 2050. World Bank Publications 10. Laouar MR, Hamad ZT, Eom S (2019) Towards blockchain based urban planning: application for waste collection management. ICIST 11. Ahmad RW, Salah K, Jayaraman R, Yaqoob I, Omar M (2021) Blockchain for waste management in smart cities: a survey. IEEE 12. Vo HT, Kundu A, Mohania MK (2018) Research directions in blockchain data management and analytics. EDBT 13. Gupta N, Bedi P (2018) E-waste management using blockchain based smart contracts. In: 2018 International conference on advances in computing communications and informatics (ICACCI) 14. Chen M, Oladele A, Ogunseitan (2021) Zero e-waste: regulatory impediments and blockchain imperatives. Front Environ Sci Eng 15. Dua A, Duttay A, Zamanz N, Kumar N (2020) Blockchain-based e-waste management in 5g smart communities. In: IEEE INFOCOM 2020—IEEE conference on computer communications workshops (INFOCOM WORKSHOPS) 16. Poongodi M, Hamdi M, Vijayakumar V (2020) An effective electronic waste management solution based on blockchain smart contract in 5G communities. In: 2020 IEEE 3rd 5G World Forum 17. Sahoo S, Halder R (2020) Blockchain-based forward and reverse supply chains for e-waste management. In: International conference on future data and security engineering 18. Kassou M, Bourekkadi S, Khoulji S, Slimani K, Chikri H, Kerkeb ML (2021) Block chain-based medical and water waste management conception. ResearchGate 19. Ahmad RW, Salah K, Jayaraman R, Yaqoob I, Omar M, Ellahham S (2021) Blockchain-based forward supply chain and waste management for COVID-19 medical equipment and supplies. IEEE 20. Hakak S, Khan WZ, Gilkar GA, Haider N, Imran M (2020) Industrial wastewater management using blockchain technology: architecture, requirements, and future directions. IEEE 21. Pahontu B-I, Arsene D-A, Mocanu M (2020) A survey about industries that blockchain can transform. IEEE 22. Wan J, Li J, Imran M, Li D, Fazal-e-Amin (2018) A blockchain-based solution for enhancing security and privacy in smart factory. IEEE 23. Akram SA, Alshamrani SS, Singh R, Rashid M, Gehlot A, AlGhamdi AS, Prashar D (2021) Blockchain enabled automatic reward system in solid waste management. Hindawi Security and Communication Network
Applicability of Klobuchar Model for STEC Estimation Over Thailand Region Worachai Srisamoodkham, Kutubuddin Ansari , and Punyawi Jamjareegulgarn
Abstract In the current study, we investigated the total electron content (TEC) variations over Thailand region by using five multi-constellation Global Navigation satellite system (GNSS) signals. The STEC variations for all multiple GNSSs were computed with well-known ionospheric Klobuchar model and compared with the STEC values of Global Ionospheric map (GIM) on a storm day (May 12, 2021), before as well as after this storm day. The results showed that most of the GIM STEC values (GSVs) were overestimated with the Klobuchar model STEC values (KMSVs) for all constellations, except QZSS. As for the QZSS, the KMSVs were closest to the GSVs. Likewise, the MS-SHF-based Klobuchar model (MSSHFKM method) was able to compensate the ionospheric delay effectively as compared to the original Klobuchar model up to 77.08% RMSE during the geomagnetic storm on May 12, 2021. Keywords GNSS · Klobuchar model · MS-SHF · Thailand · TEC
1 Problem Preliminary Klobuchar model is an empirical ionospheric model which depends primarily on global positioning system (GPS) observations [1]. It is expected that Klobuchar model can decrease globally around 50% of the range errors [2]. Both half-cosine function and amplitude as well as period variabilities are employed to model the ionospheric W. Srisamoodkham Faculty of Agricultural and Industrial Technology, Phetchabun Rajabhat University, Sadiang, Thailand e-mail: [email protected] K. Ansari Integrated Geoinformation (IntGeo) Solution Private Limited, New Delhi, India P. Jamjareegulgarn (B) King Mongkut’s Institute of Technology Ladkrabang, Prince of Chumphon Campus, Chumphon 86160, Thailand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_36
511
512
W. Srisamoodkham et al.
daily changes of Klobuchar model regarding the geographic location and the local time. For a given local time and geomagnetic location of a crossed ionospheric point (IPP), the vertical TEC (VTEC) is utilized to calculate the slant TEC (STEC) can be calculated from the vertical total electron content (VTEC) [3]. Here, the required input parameters for this conversion consist of two satellite angles, the station site of Global Navigation satellite system (GNSS), and Klobuchar coefficients that are periodically distributed with a navigation information. It has been recognized that Klobuchar model emphasizes largely on the approximations of both intensity and local time, and also assumes the nighttime constant variability [4]. Klobuchar model can be expanded to support lots of GNSS users for ionospheric delay correction in accurate navigation and positioning with the advent of multiconstellation GNSSs [5]. NeQuick model is also an alternative of ionospheric delay correction that has been developed by ICTP organization and the University of Graz and employed by European Galileo system [6]. Moreover, an enhanced Klobuchar model has been developed to compensate the ionospheric delay correction of BeiDou navigation system (BDS), namely, BDS Klobuchar [5]. In the current study, the real STEC values were calculated by the observation data from the GNSS stations located in Thailand region. We used the Klobuchar algorithm for the observation data of five GNSS systems such as Galileo (Europe), Glonass (Russia), GPS (U.S.A.), BeiDou (China), and QZSS (Japan), as well as evaluated the performance of the Klobuchar model STEC values (hereafter, KMSVs) of each GNSS. Recently, there exist numerous literatures to enhance or modify the Klobuchar ionospheric model over low latitude region so that it enables us to compensate the ionospheric delays and improve better positioning accuracy as much as possible. For example, Tongkasem et al. [7] proposed the extended Klobuchar model by computing the new Klobuchar coefficients at KMIT, CPN, NNKI, and CMU stations in Thailand. They found that the root mean square errors (RMSEs) of their modified Klobuchar models can be improved approximately up to 22%. Shivani and Raghunath [8] presented the enhanced Klobuchar model by adjusting the singlelayer height from 350 to 500 km, the Earth angle and the slant factor were thus modified regarding to this height change. They also reported that their modified Klobuchar model improved up to 85.1% RMS range errors over low latitude. Unfortunately, a main obstacle of the proposed single-shell ionospheric model is the gradients of vertical electron densities and their fluctuations, especially over magnetic equator and low latitude. Hence, the multishell (MS) model is able to be a better choice to derive a set of navigation parameters [9], including Thailand sector. Thus, Ratnam et al. [10] proposed another improved Klobuchar approach using multishellspherical harmonics function (MS-SHF). They addressed that the MS-SHF-based Klobuchar model (hereafter, MSSHFKM) provide better positioning improvements with (62.69/77.08%) RMSE during quiet/disturbed days. Therefore, we have applied MS-SHF method for Klobuchar model over Thailand sector during an intense storm (May 12, 2021) and expected to get the positioning accuracy with greater than 50%.
Applicability of Klobuchar Model for STEC Estimation Over Thailand …
513
2 Study Area Thailand is a country in Southeast Asian region, situated in the middle of Indochinese region with the bordered by Laos and Cambodia to the east, by Myanmar and Laos to the north, by Gulf of Thailand and Malaysia to the south and by the Andaman Sea and the southern part of Myanmar to the west (Fig. 1). In this present work, four GNSS receivers in Thailand < e.g., Chiang Mai (namely, CHMA), Nakhonratchasima (namely, NKRM), Bangkok (namely, DPT9), and Suratthani (namely, SRTN) > are utilized to investigate the effect of geomagnetic storm on May 12, 2021 over TEC variation. We selected these stations which situated in four different parts of Thailand, for instance, CHMA (north), DPT-9 (center), NKRM (northeast), and SRTN (south). The observed information in a form of RINEX were retrieved from the web site of DPT organization in Thailand. Afterward, the RINEX datasets were processed daily by MATLAB program. Note that the elevation-based weighting function was applied to get rid of the noisy observations of each GNSS receiver. Fig. 1 Four geographic locations of GNSS receivers in Thailand region such as Chiang Mai (namely, CHMA), Nakhonratchasima (namely, NKRM), Bangkok (namely, DPT9), and Suratthani (namely, SRTN) employed in this present work
514
W. Srisamoodkham et al.
3 Consequences and Argument As illustrated in Fig. 2, we chose an intense storm on May 12, 2021 as well as plotted Klobuchar model STEC value (namely, KMSV) at CHMA site. In Fig. 2, the plots of KMSV for each GNSS were displayed with various colors, for instance, blue lines for GPS, green lines for Galileo, red lines for Glonass, yellow lines for BeiDou, and violet lines for QZSS. Meanwhile, the respective plots of GIM STEC value (namely, GSV) for each GNSS were demonstrated by black lines in Fig. 2. It can be found obviously in these plots that the performance of KMSV for each GNSS corresponding their GIM STEC was different. There were the number of satellites which can be seen prior to the storm period, the KMSVs were less than the GSVs like underestimation. Meanwhile, the visible satellites during the storm period showed the overestimated KMSVs as compared to GSVs. Furthermore, we selected five continuous days (May 10–13, 2021, covering an irregular day on May 12, 2021) with the retrieved dataset from NKRM site as shown in Fig. 3. As we can see from the plots that the GPS KMSVs on quiet days varied between 5 and 75 TECU, whereas they could achieve about 80 TECU on May 12, 2021. Normally, the GSVs were large on each day, as they occasionally became small. The QZSS KMSVs covered between 10 and 62 TECU. Here, the QZSS KMSVs were very close to the QZSS GSVs with the discrepancies between −10 and 20 TECU. The main reasons were described as follows: (a) The QZSS was built as a GPS complementary system so as to enhance both the reliability and the accuracy
Fig. 2 The comparisons between KMSVs and GSVs at CHMA site obtained from five satellites of GPS, five satellites of Glonass, five satellites of Galileo, seven satellites of BeiDou, and three satellites of QZSS on May 12, 2021 (DOY 132)
Applicability of Klobuchar Model for STEC Estimation Over Thailand …
515
of positioning in the Asia-Ocean region and (b) the QZSS orbits were assigned particularly to raise the satellite signals up with high elevation [11–13]. Generally, the GSVs of all GNSSs were mainly higher than the KMSVs, except QZSS constellation. The QZSS KMSVs and the GSVs showed almost equal values. We estimated the correlation coefficient and the root mean square error (RMSE) between the KMSVs and the GSVs in order to understand better about their relationships. We find that the KMSVs for most of the constellations correlated with their respective GSVs with all correlation coefficients of 0.87–0.89. As for Klobuchar model, the computed correlation coefficient levels can be considered as very good
Fig. 3 Five comparative plots of KMSVs and GSVs, as well as their differences during five continuous days (May 10–13, 2021) at NKRM station, Thailand
516
W. Srisamoodkham et al.
Fig. 4 The comparison of ionospheric range delay (Ir) among the MSSHFKM method, the original Klobuchar modeled, and the observed data on May 12, 2021 at CHMA station
correlations. Likewise, we also find that the RMSE values for most of the constellations varied between 10 and 11 TECU, except the RMSEs of QZSS constellation of smaller than 10 TECU. Further, to test the performance of MSSHFKM method, the Klobuchar parameters were enhanced using the minimization of mean square errors for ionospheric delays between Klobuchar model and MS-SHF model over Thailand using the TEC values at CHMA station on May 12, 2021. Note that the equations and the details of the MSSHFKM method can be found and studied further from [10]. After applying MSSHFKM method, the ionospheric range delay (Ir) values can be plotted in Fig. 4 where “Range_new, Range_old, and Range_real” represent the Ir values of the MSSHFKM method, the original Klobuchar model, and the observed data. In Fig. 4, we find that the RMSEs of the observed (monitored) ionospheric range delay (Ir) and the original Klobuchar model is about 2.53 m, whereas the RMSE between the monitored Ir and the modified Klobuchar model is only 0.50 m. Likewise, the average absolute error (AAE) of the monitored Ir and the original Klobuchar model is about 2.08 m whereas the RMSE between the monitored Ir as well as the modified Klobuchar model is only 0.47 m. Furthermore, the modified Klobuchar model (MSSHFKM method) is better than the original one with the improvement of RMSE 79.31% and AAE 76.65%.
4 Conclusion This present work investigated the ionospheric total electron content (TEC) variations over four GNSS receivers situated at different parts of Thailand. The RINEX datasets
Applicability of Klobuchar Model for STEC Estimation Over Thailand …
517
during May 10–13, 2021 were taken to compute the Klobuchar model STEC values (KMSVs) and the GIM STEC values (GSVs) of each GNSS. Likewise, the correlated coefficients between the KMSVs and the GSVs varied between 0.87 and 0.89. We also investigated the RMSE values between the KMSVs and the GSVs, and noticed that they ranged between 10 and 11 TECU, excluding QZSS with the RMSE value smaller than 10 TECU. Furthermore, the ionospheric range delay (Ir) values of the refined Klobuchar model (MSSHFKM method) are better than the original Klobuchar model with the improvement of RMSE 79.31% and AAE 76.65% as expected. This shows that the Klobuchar model based on MS-SHF method can be used to mitigate the ionospheric delays effectively over Thailand region. Acknowledgements This research is funded by Broadcasting and Telecommunications Research and Development Fund for Public Interest (fund number: B2-001/6-2-63). The authors also thank to Department of Public Works and Town & Country Planning, Thailand for the RINEX files of all GNSS sites.
References 1. Klobuchar JA (1987) Ionospheric time-delay algorithm for single-frequency GPS users. IEEE Trans Aerosp Electron Syst 3:325–331. https://doi.org/10.1109/TAES.1987.310829 2. Hobiger T, Jakowski N (2017) Atmospheric signal propagation. In: Teunissen PJ, Montenbruck O (eds) Springer handbook of global navigation satellite systems. Springer, Cham, pp 165–193. https://doi.org/10.1007/978-3-319-42928-1_6 3. Newby SP (1992) Three alternative empirical ionospheric models-are they better than GPS broadcast model? In: Proceedings of the sixth international geodetic symposium on satellite positioning, pp 240–244. https://ci.nii.ac.jp/naid/10006712734/ 4. Bi T, An J, Yang J, Liu S (2017) A modified Klobuchar model for single-frequency GNSS users over the polar region. Adv Space Res 59(3):833–842. https://doi.org/10.1016/j.asr.2016. 10.029 5. Wang N, Yuan Y, Li Z, Huo X (2016) Improvement of Klobuchar model for GNSS singlefrequency ionospheric delay corrections. Adv Space Res 57(7):1555–1569. https://doi.org/10. 1016/j.asr.2016.01.010 6. Nava B, Coisson P, Radicella SM (2008) A new version of the NeQuick ionosphere electron density model. J Atmos Solar Terr Phys 70(15):1856–1862. https://doi.org/10.1016/j.jastp. 2008.01.015 7. Tongkasem N, Supnithi P, Phakphisut W, Hozumi K, Tsugawa T (2019) The comparison of Klobuchar model with GPS TEC model at the low geomagnetic latitude station, Thailand. In: 34th International technical conference on circuits/systems, computers and communications (ITC-CSCC), pp 1–4. https://doi.org/10.1109/ITC-CSCC.2019.8793336 8. Shivani B, Raghunath S (2020) Low latitude ionosphere error correction algorithms for global navigation satellite system. Int J Innov Technol Explor Eng (IJITEE), 9(3). https://doi.org/10. 35940/ijitee.C8870.019320 9. Shukla AK, Das S, Nagori N, Sivaraman MR, Bandyopadhyay K (2009) Two-shell ionospheric model for Indian region: a novel approach. IEEE Trans Geosci Remote Sens 47(8):2407–2412. https://doi.org/10.1109/TGRS.2009.2017520 10. Ratnam DV, Dabbakuti JRKK, Lakshmi NVVNJS (2018) Improvement of Indian-regional Klobuchar ionospheric model parameters for single-frequency GNSS users. IEEE Geosci Remote Sens Lett 15(7):971–975. https://doi.org/10.1109/LGRS.2018.2827081
518
W. Srisamoodkham et al.
11. Ansari K, Bae TS, Seok HW, Kim MS (2021) Multiconstellation global navigation satellite systems signal analysis over the Asia-Pacific region. Int J Satell Commun Network 39(3):280– 293. https://doi.org/10.1002/sat.1389 12. Ansari K, Park KD (2018) Multi constellation GNSS precise point positioning and prediction of propagation errors using singular spectrum analysis. Astrophys Space Sci 363(258). https:// doi.org/10.1007/s10509-018-3479-7 13. Li X, Zhang X, Ren X, Fritsche M, Wickert J, Schuh H (2015) Precise positioning with current multi-constellation global navigation satellite systems: GPS, GLONASS, Galileo and BeiDou. Sci Rep 5(1):1–14. https://doi.org/10.1038/srep08328
A Survey on Neural Networks in Cancer Research Jerin Reji and R. Sunder
Abstract The earlier prediction of cancer will be very useful for the future treatment of cancer. An accurate prediction of tumor cells in human beings will increase the chance of survival from cancer. Nowadays, machine learning employs some statistical and probabilistic methods for learning about various types of tumors. Machine learning, a branch of computer science, is having its detection and prediction capabilities to identify the patterns of cancer. Thus, an accurate and efficient model for cancer prediction can be made of machine learning algorithms. Neural networks are the most commonly used branch of machine learning that is used for cancer prognosis and detection. This study focuses on new artificial neural network techniques that are used for cancer prediction. Here it compares the performance of some new techniques and summarizes which model is the best among these techniques. To achieve the maximum throughput using machine learning models, the training and validation phase of that model must be proper. The dataset and the architecture of these techniques may be different but these differences will determine the performance of a particular model. Every neural network model follows the same routine for the training and validation phase, but the difference in their architecture makes them better among others. Machine learning is a day-by-day developing technology and is contributing its maximum to improve the existing technologies in cancer prediction. Keywords Cancer · Artificial neural networks · Machine learning · Training phase · Validation · Detection · Prediction · Prognosis · Maximum throughput
1 Introduction In cancer research, machine learning approaches are not new. It has been a part of the medical field for a long time. Machine learning, as all of us know, is a vast field. It is evolving day by day. With that, the chances of improvements in cancer prognosis J. Reji (B) · R. Sunder Sahrdaya College of Engineering and Technology, Kodakara, Kerala 680684, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_37
519
520
J. Reji and R. Sunder
and detection have also increased exponentially. So there is a need to make use of those new methods to improve the accuracy and efficiency of an existing model. The artificial neural network (ANN) is a method that has been widely used in the diagnosis and prediction of cancer. Artificial neural networks may employ X-ray pictures, CT scan images, and other forms of medical imaging as input data. Microarray techniques [11, 18] were the most commonly used techniques for cancer prediction and prognosis. In this strategy, several genes have been employed at the same time to evaluate their expression and alignment. The microarray technique’s purpose is to study the gene expression of thousands of genes at a time. A gene chip is a tool for evaluation that will always be in contact with the genes. In this glass chip, the RNA and DNA components have been placed. This chip consists of several tiny spots at specified locations. In these spots, the DNA and RNA components are placed. This technique mainly uses two types of DNA. These are (1) experimental sample and (2) reference sample. By comparing the experimental sample with the reference sample, the difference between them can be identified and can easily identify the cancer nodules. The distinctions between cancer cells and normal cells are investigated quickly using this approach; however, there is a difficulty with accuracy when the input dataset is huge. Now machine learning methods are used along with the traditional methods to improve the result of the traditional methods. Machine learning consists of different methodologies like prediction and classification to use in cancer research. For the classification of tumors of different types, various kinds of classification algorithms are used. These include SVM [17], decision tree [21], K-nearest neighbor [3], and DBscan [19]. These algorithms are for the classification of tumors. There are many types of algorithms for cancer prediction, such as feature selection algorithms. The most powerful machine learning approaches for studying tumors and accurately predicting cancer are neural networks [16]. It contains various information layers for the training of input data. The input data will be fed into the first layer initially. The output from the first layer is then sent into the second layer and so on until the last layer is reached. At last, the output layer produces the required output data. This technique is using to learn the general representation of input data. Nowadays, this idea is widely used in cancer research because of its ability to produce accurate results. Neural networks are categorized into (1) clustering methods, (2) filtering methods, (3) predicting methods. Each of these has its advantages and functionalities. In the case of microarray approaches, clustering and prediction approaches are the most typically employed techniques to categorize the data and extract reliable information. Filtering techniques are mainly used to extract the most wanted features from a set of features to use in the training process. This study is a review of the most recent neural network techniques that can be used for cancer research and studies. All of them have their own best features that improve the accuracy of previous works in the same area. These techniques are not only for cancer prognosis and detection but also for tumor growth prediction. Input data is the first criteria that differentiate one of these techniques from another.
A Survey on Neural Networks in Cancer Research
521
Some of these approaches take images as input, while others take a table of data. Each of these techniques shows its uniqueness in the preprocessing tools and adopted methodologies. As a result, each of them is assured in terms of precision and accuracy.
2 Background 2.1 Artificial Neural Networks (ANNs) Artificial neural networks are machine learning concepts that are generated from the concept of biological neural networks. In computer science, it is using as a tool for helping the training process of a machine. It performs the mapping of input to a particular output. It consists of three types of layers for the development of a machine learning model. The first one is the input layer. This layer is made up of artificial neurons based on the idea of biological neurons. It brings the input data and transmits the data to subsequent layers for further processing of data. The second layer is called the hidden layer. Hidden layers are using for the required transformations of the data based on assigned weights to the inputs. Then it directs the result into the activation function in the output layer, which is the last layer of a neural network. In the output layer, the calculations are performed based on the results that are getting into the output layer from the previous layers. A multilayer neural network consists of L layers numbers from 0 to L-1. It is commonly known as a multilayer perceptron network (MLP). Generally, it is using for the linear transformations of the input data. In MLP, the training process will be completed using two phases. The first phase is the forward phase, and the second phase is the backward phase. In the forward phase, the inputs are propagated using fixed network weights until it reaches the output. Finally, the output layer generates some output values. However, there may be some differences between the original and generated output values. This difference is known as the error of the forward phase. This error is reduced in the backward phase. In the backward phase, error is propagated back through the layer-by-layer network. Based on this error, the weights will be updated. Several ANN-based cancer prediction models are now available worldwide. All of these use the same concept, but their architectures may be different. Every model has training and a backpropagation phase. But the algorithms and dataset of these models may be different from one another. The algorithms are similar in the concept of their working, but some differences will be there in the number of neurons and network architecture. These differences make a system better than another.
522
J. Reji and R. Sunder
Table 1 Table of performance metrics in machine learning Metric name Definition Accuracy Sensitivity, recall Specificity Precision Negative prediction rate F1
(TP + TN)/(TP+FP+TN+FN) TP/(TP+FN) TN/(TN+ FP) TP/(TP+FP) TN/(TN+FN) (precision * recall)/(precision+recall)
2.2 Cancer Prediction Models Cancer is a genetic problem that has a severe effect on the health of human beings. Cancer is because of some genetic abnormalities that happen in a person due to different reasons. Generally, cancer is defined as the mutations that occur in the DNA of a person. DNA is a package of cells that perform functions like growing and dividing into other cells. But when cancer occurs, the normal functions of the cells are changed. In this situation, the rapid growth of cells will occur and the ability to control the growth of cells will lose. Then the cells can be said to be cancerous. Cancer prediction uses more than one method at a time to get high prediction accuracy. The statistical and machine learning techniques are widely used for developing accurate prediction models. The accuracy of a cancer prediction model is critical for its future application. The quality of the model’s input determines how well it performs. Thus, the input data’s dimensionality is an essential feature of a prediction model. The high dimensionality of the input data can give good performance in most cases, but it may reduce the performance in some cases. Machine learning makes use of different scores for measuring performance. These include scores like accuracy, recall, precision, and specificity. It is shown in Table 1.
2.3 Tumor Growth Prediction Models The next most significant use of neural networks in the cancer field is tumor growth prediction. Cancer prediction is the mechanism for ensuring the presence of tumors in human beings. But tumor growth prediction is to predict the speed of the growth of tumor cells. In [23], the prediction of the tumor growth is explained with the help of two processes named invasion and expansion of the tumor cells. The images at time1 and time2 are used as the input to the prediction model to generate images at time3. The tumor images of cancer patients at time1, time2, and time3 are using as input to the training process. This step completes the learning process and generates a machine learning model. In the paper [24] also, images at different time points are used as input to the model. The input images are preprocessed and then used
A Survey on Neural Networks in Cancer Research
523
Fig. 1 Citations of each paper till June 2021
in the training step. Then a spatiotemporal convolutional long short-term memory (ST-ConvLSTM) network is used for the training process. The prediction is done with high accuracy with the help of ST-ConvLSTM. After the prediction phase, the testing phase will be done with the help of input images at time1 and time2. By using these two images, the testing phase will produce some predictions at time3 and will be compared with the test data.
3 Neural Network-Based Cancer Prediction In this study, a survey of neural network-based cancer prediction models was conducted based on five articles published between 2018 and 2021. All of these papers include a new neural network-based model for cancer research. The selected articles covered almost all areas of cancer research like cancer detection, prognosis prediction, and tumor growth prediction. Here Fig. 1 shows a graphical representation of the number of citations for each of the five chosen papers. The citations of these articles range between 12 and 67, and all of them have a reasonable number of citations. No one has less than ten citations.
3.1 Dataset and Preprocessing Images and text data related to cancer research are using as the datasets in these review papers. Three of these chosen articles used a set of images for the study, and the other two used text data or a table of information as the dataset. The input data for
524
J. Reji and R. Sunder
two papers [4] and [20] are collected from publicly available datasets like LIDC-IDRI and METABRIC. In others, the dataset is collected from some medical authorities based on the problem. The article [15] used 513 images as the input dataset from the Pathology Department at Cedars-Sinai Medical Center. The article [23] used the images of ten patients with von Hippel-Lindau (VHL) disease, each with a pancreatic neuroendocrine tumor (PanNET). Article [24] collected the images of 33 patients from the von Hippel-Lindau (VHL) clinical trial at the National Institutes of Health. Preprocessing is an essential step in the generation of a machine learning model. In this step, the input data is preprocessed and made changes to make the input data applicable for the study. All of these chosen articles have a preprocessing stage. In [15], the images are divided into various tiles named tile A and tile B. After this step, the annotated image tiles are cross-validated by some pathologists. Then a normalization process is done in these images. Normalization is a part of preprocessing for reducing dimensionality. The data samples with values greater than a given threshold value will be considered as data points in this stage. Other input samples will not be considered. This paper also includes some data augmentation techniques like image flipping, rotating, and mirroring. Data augmentation techniques are the techniques for adding new data to existing data. In [24], four data augmentation techniques are using for data enriching. These include data reformatting, rotating, translating, and reversing the input image slices at different time points. In [4], CT images of 1186 lung nodules are using as the input data. These images are given for a screening test by experienced radiologists. They classified the images into three categories, non-nodular, nodules of radius greater than 3 mm, and nodules less than 3 mm. Here lung nodules of radius less than 3 mm are removed from the input set. The remaining images are using as the input dataset. Registration and segmentation are other essential preprocessing techniques in some cases. In [23], the input images are passing through these preprocessing stages for the extraction of tumors. In this technique, multidimensional images at three different time points are used as the input data. These images at various time points are registered then the tumors are segmented. These tumor images are the input to the neural network. Feature selection with nRMR algorithm is used in [20] for preprocessing the input data. Feature selection is the process of reducing the number of inputs to a model. The dataset in this article is high-dimensional data. It has three different dimensions like gene expression profile, CNA profile, and clinical information. The feature selection approach is designed to prevent difficulties with the input dataset’s high dimensionality by reducing its dimensions. The gene expression data and CNA profile data are reduced from 24,000 to 400 and 26,000 to 200 using the feature selection algorithm. The reduction of clinical information from 27 to 25 is also done using this feature selection technique.
A Survey on Neural Networks in Cancer Research
525
3.2 Neural Network After the analysis of the five selected papers, a conclusion is reached that the neural networks in these papers can be classified into three based on their functions. These three are (i) cancer diagnosis and prediction (ii) cancer nodule detection (iii) tumor growth prediction. Cancer diagnosis and prediction: Li et al. [15] used the concept of neural networks for the prognosis and prediction of prostate cancer. Prostate cancer is the most common type of cancer seen in men in the USA. Here the prognosis and prediction of this prostate cancer are done with a region-based convolutional neural network(RCNN) [7] and a Gleason grading system[5]. The Gleason grading system is using for predicting the grade of cancer. This grading helps to find the behavior of the cancer present in a person. The R-CNN is used for training and classifying the region of interest (ROI) into object categories, in which the network predicts the output as 1 for the images having ROIs and 0 for images only having stromal components. Sun et al. [20] also proposed a breast cancer prognosis and prediction method using a multimodal deep neural network. Here a novel multimodal deep neural network is used by integrating multidimensional data for prediction. Breast cancer is one of the deadly diseases facing by females all over the world. Inspired by this, a multimodal deep neural network is proposed here for accurately predicting breast cancer. The architecture of the article [15] is in Fig. 2. The first stage consists of an image parser. Residual neural network(ResNet) [10] is the backbone of the image parser. The image parser will generate the feature map from the input images, and then these feature maps are given into two branches. On the left, there is a region proposal network that generates ROIs. Next, the ROIs are inputted to a grading network head (GNH) to get grades for the ROIs. The GNH classifies the images as benign, low grade, and high grade based on the presence of the tumor. Then the result is added to the right branch of the image parser known as the epithelial network head (ENH). The ENH detects the presence of epithelial cells in the image. If any epithelial cells are present in images, then those images are combined to the result of GNH. If there are no epithelial cells then those images are given directly to the next stage. This stage is known as fully connected conditional random field postprocessing (CRF). The images with a resolution of 1200 * 1200 pixels were selected as input to the training step. Because of the GPU’s limited memory, these images are cropping into patches of 512 * 512 pixels. These image patches are given into the training stage for training. In the testing stage also, the images are first cropped and then given as input. After the testing, the images are joined back into the original images. The training is done using a two-stage mechanism. In the first stage, the training phase of GNH will be done with higher layers of ResNet. A backpropagation method is using to optimize the network as in the [8], and the weights are updated. The training of GNH is done 25 times to get good training results. In the second stage, the training of ENH is going on. After the training, the testing stage is completed. If there are any specific ROIs to focus on in the input, then the results will be 1. If the image only contains the stroma cells, then the results will be 0. After the predictions from
526
J. Reji and R. Sunder
Fig. 2 Path R-CNN architecture
each patch, these patches are stitched back together to the original image tiles. This artifact the edges in the images as in [13]. Finally, the output image is generated in the fully connected CRF postprocessing stage, and the resulted image will have the graded tumor cells. In [20], a multimodal deep neural network is proposed for accurate breast cancer prognosis and prediction. The architecture of the multimodal deep neural network is in Fig. 3. Here a multidimensional dataset of breast cancer which includes gene expression profile, CNA profile, and clinical information is using as the input. Here some preprocessing is done to the selected input data and reduced the number of samples of clinical information, gene expression, and CNA profile to improve the accuracy of the results. Generally, in every machine learning model, a deep neural network prediction model will be used for a single dataset. But in this case, the dataset is a multidimensional dataset having three dimensions. It is the main problem faced in this method. So the most common straightforward approach is using for the training of input. It uses one deep neural network (DNN) for all dimensions of data. Each of these three dimensions will have different representations. So, directly combining these into a single dataset will not be that much efficient. A multimodal deep neural network technique is using in this study to overcome this issue.
A Survey on Neural Networks in Cancer Research
527
Fig. 3 Multimodal deep neural network
Feature selection methods are used in the initial phase of the architecture to preprocess multidimensional data. Next, a triple-modal DNN is introducing for the extraction of information from three different dimensions. As illustrated in the picture, each of the three dimensions is trained using exactly one DNN. For the fusion of the outputs from each independent model, a score level fusion [9] is done in the next step. The equation in [20] is using to aggregate the results of the weighted linear aggregation. Cancer nodule detection: Lung cancer is another type of cancer that causes so many deaths per year all over the world. The development of an effective nodule detection model is difficult due to the heterogeneity of lung nodules in CT images of cancer. Cao et al. [4] proposed an efficient nodule detection model named twostage convolutional neural network (TSCNN) for the robust nodule detection of lung cancer. The computed tomography (CT) images of lungs are given as the input to the proposed model for accurately detecting the lung nodules. It uses an improved U-Net segmentation [1] model for nodule detection and a new sampling strategy for segmentation training. The main contribution of the paper [4] is the U-Net segmentation for candidate nodule detection. This technique consists of mainly two phases. The first phase is a detection phase in which candidate nodules are identified based on the U-Net
528
J. Reji and R. Sunder
segmentation. It detects the candidate nodules by segmenting the suspicious nodules from the input images. The second stage is focused on the performance of this method by reducing the false-positive nodules. It implements a 3D-CNN architecture to reduce the false-positive nodules in these segmented nodules. The candidate nodule detection starts with a segmentation network. In this stage, U-Net architecture is using for candidate nodule detection based on the residualdense mechanism. The architecture of U-Net segmentation is explained in Fig. 4. The next step is the sampling strategy used for the segmentation of nodules. For the segmentation of lung nodules, the edge voxels of the nodules must be considered. As a result, while segmenting lung nodules, pays an extra attention to the edge voxels to properly segment the nodules. So an edge voxel-based sampling strategy is proposed here. In the offline hard mining phase, a model M is designed to complete the actual training process. Here the nodule samples cropped from the above two steps are used as the input. Here the training is done with the use of Adam optimizer [12]. The two-phase prediction, which includes rough prediction and second phase prediction, is done in the next step to improve forecast accuracy. A 3D-CNN architecture is used in the second phase. This phase is giving more importance to false-positive reduction. This phase starts with a random mask step which accepts the predictions from the candidate nodule detection stage. It is a data augmentation technique to avoid the imbalance problem raised because of the number of positive and negative samples generated from the segmentation network. Here they expand the positive data samples in the training phase to make the ratio between positive and negative samples equal to one. A similar procedure is done for negative sample expansion also. The last stage is a training model for false-positive reduction which uses the results of random mask and classification network as the inputs. Tumor growth prediction: Cancer originates because of some abnormalities that occurred in the cells. At some point, an undifferentiated and uncontrollable development of undesirable cells will arise from these abnormal cells, and this spread is known as cancer. The growth prediction of tumors is the next category in cancer research, in which the growth patterns of a tumor and the time period for the growth of the tumors are predicted. Understanding the growth rate of tumors will be very helpful to know the chance of survival. In [24] and [23], the tumor growth prediction models are proposed. In [23], the tumor growth is the result of two processes named invasion and mass effect. Cell invasion [6] is the ability of cells to be motile and to spread into the neighboring cells. The mass effect is the outward pushing of tumor cells. Here the images of tumors at time1 and time2 are used as input to predict the images at time 3. The convolutional neural network learns the input data at its current state and makes predictions at the next state. Zhang et al. [24] proposed a new method for the tumor growth prediction using a new technique with the use of four-dimensional (4D) data instead of 2D data because of the deficiencies of using 2D data as input. The 2D data cannot make use of all the spatiotemporal context of imaging. Thus, a new technique with the use of four dimensions is proposed. Here the 4D includes time as the fourth dimension. Here the convolutional long short-term memory (ConvLSTM) [22] is used for extracting the
A Survey on Neural Networks in Cancer Research
Fig. 4 Two-stage CNN for lung nodule detection
Fig. 5 Convolutional invasion and expansion networks architecture [23]
529
530
J. Reji and R. Sunder
static appearances of images and capturing the dynamic changes of the images. Here the image set will be preprocessed to generate the 4D longitudinal tumor dataset. Then only the data will give into the convolutional neural network for tumor growth prediction. Tumor growth prognosis is important for estimating the duration of cancer treatment in a particular patient. Zhang et al. [23] proposed an accurate tumor growth prediction technique based on invasion and mass effect concepts. Here deep convolutional neural network is trained using tumor regions at the current time point for learning about invasion and mass effect of cancer cells. The complete architecture is in Fig. 5. Here population data and personalized data are used as the training data to the network. The population data at different time points are registered in the preprocessing stage to get a spatiotemporal relationship of tumor development. Then the next stage consists of invasion and expansion ConvNets. Three types of information (SUV, ICVF, and tumor mask) are extracted from the input images and used as a three-channel input to the invasion network. It uses a six-layer ConvNet as in [14]. It trains from the three-channel input data. In the expansion network, four-channel input data is using for training. This four-channel input consists of a growth map and an optical flow image [2]. This training of the input is done in all the pairs of time points time1/time2, time2/time3, and (time1->time2)/time3. The next stage is the fusion of the results from the invasion and expansion network. There are different types of fusion techniques that can be applied to fuse the results of the two networks. These include late fusion, early fusion, and two-stream end-to-end fusion. These three fusion techniques can be applied and can validate the performance of the population predictive model. Then the personalized data is comparing with the results of the population model. If the output and the personalized data match, the model is designated as a personalized prediction model. If not then some arrangements are made to the model. Only the images at time1 and time2 are giving as the input in the prediction phase. This testing data is giving to the invasion network and expansion network, and then the prediction is done using the personalized predictive model. These predictions are comparing with the data at time3 corresponding to the data at time1 and time2. The two-dimensional image patch-based training is incapable of predicting all the properties of the tumor. So, a 4D image patch-based model is proposed in [24]. The complete architecture is in Fig. 6. The model comprises two stages. The development of the 4D longitudinal tumor dataset begins with certain preprocessing techniques in the first stage. Four-dimensional CT image data of different patients at three time points are used as the input in the proposed model. To generate 4D data, the precontrast and post-contrast images at the same time points are first cropped. Then they are registered to get registered post-contrast CT and registered pre-contrast CT. The segmentation of post-contrast CT is done to generate the tumor mask. From post-contrast CT and pre-contrast CT, the calculation of ICVF is done. The three channels (mask, ICVF, and CT image itself) are combined, then some cropping is done to this three-channel image and generated an RGB image. Then the image data at different time points are divided into slices of the same time point. Then these image slices are given into the proposed ST-ConvLSTM network at time1 and time2
A Survey on Neural Networks in Cancer Research
531
Fig. 6 Spatiotemporal convolutional LSTM architecture [24]
in the second stage of this proposed technique. From the learned parameters, the prediction of image slices at time3 is done. LSTM is a conventional method that operates in 1D vectors and predicts feature representations. Convolutional LSTM (ConvLSTM) is quite different from LSTM because it is capable to represent the 2D spatiotemporal sequences. Here the 3D volume data and its sequential nature of 2D image slices can convert the ConvLSTM to the 3D spatial domain. In spatiotemporal ConvLSTM, each image slice is given to a different ST-ConvLSTM unit and generates one slice per unit. ST-ConvLSTM unit of time2 will receive the outcome of training the time1 input images. Next, the prediction of slices at time point3 will be completed by the equations used in [24]. After the training stage, the testing is completed using the same procedure. The slices at time1 and time2 are used as the input to the produced model to generate the predictions at time 3.
4 Conclusion This study reviews the most recent neural network techniques used in cancer research. This study considers the papers between 2018 and 2020 for review. So, the most recent techniques are compared here. In this study, the architecture, preprocessing steps, and the performance of neural network techniques in cancer research are compared to know which one is the best. The research can be concluded that cancer prediction, cancer detection, and cancer growth prediction are the main fields of cancer research. Each of these regions is important and significant in its own way. All of them employed the best approach to attain the best results, and as a consequence, they achieved an accuracy rate of over 80%. In this study, it is found that the techniques used in each of these papers contributing to their maximum performance. In the future, each of them can be improved or can be combined to get an all-time best model.
532
J. Reji and R. Sunder
References 1. Alom MZ, Yakopcic C, Hasan M, Taha TM, Asari VK (2019) Recurrent residual u-net for medical image segmentation. J Med Imaging 6(1):014006 2. Baker S, Scharstein D, Lewis JP, Roth S, Black M, Szeliski R (2007) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31, 01 3. Bouazza SH, Hamdi N, Zeroual A, Auhmani K (2015) Gene-expression-based cancer classification through feature selection with KNN and SVM classifiers. In: 2015 Intelligent systems and computer vision (ISCV) 4. Cao H, Liu H, Song E, Ma G, Xu X, Jin R, Liu T, Hung C-C(2020) A two-stage convolutional neural networks for lung nodule detection. IEEE J Biomed Health Inf 24(7):2006–2015 5. Epstein JI, Zelefsky MJ, Sjoberg DD, Nelson JB, Egevad L, Magi-Galluzzi C, Vickers AJ, Parwani AV, Reuter VE, Fine SW et al (2016) A contemporary prostate cancer grading system: a validated alternative to the gleason score. Eur Urol 69(3):428–435 6. Friedl P, Wolf K (2003) Tumour-cell invasion and migration: diversity and escape mechanisms. Nat Rev Cancer 3(5):362–374 7. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142– 158 8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 9. He M, Horng S-J, Fan P, Run R-S, Chen R-J, Lai J-L, Khan MK, Sentosa KO (2010) Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recogn 43(5):1789–1800 10. Ismael SAA, Mohammed A, Hefny H (2020) An enhanced deep learning approach for brain cancer MRI images classification using residual networks. Artif Intell Med 102:101779 11. Istepanian RSH (2003) Microarray processing: current status and future directions. IEEE Trans Nanobiosci 2(4):173–175 12. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization 13. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFS with Gaussian edge potentials. Adv Neural Inf Process Syst 24:109–117 14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 15. Li W, Li J, Sarma KV, Ho KC, Shen S, Knudsen BS, Gertych A, Arnold CW (2019) Path R-CNN for prostate cancer diagnosis and Gleason grading of histological images. IEEE Trans Med Imaging 38(4):945–954 16. Maclin PS, Dempsey J, Brooks J, Rand J (1991) Using neural networks to diagnose cancer. J Med Syst 15(1):11–19 17. Reddy SVG, Reddy KT, Kumari VV, Varma KVSRP (2014) An SVM based approach to breast cancer classification using RBF and polynomial kernel functions with varying arguments. Int J Comput Sci Inf Technol 5(4):5901–5904 18. Sharma A, Paliwal KK (2008) Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl Eng 66(2):338–347 19. Sowjanya R, Rao KN (2018) Lung cancer with prediction using dbscan. Int J Res 5(12):3621– 3627 20. Sun D, Wang M, Li A (2018) A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinf 16(3):841–850 21. Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147 22. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In Adv Neural Inf Processing Syst pp 802–810
A Survey on Neural Networks in Cancer Research
533
23. Zhang L, Lu L, Summers RM, Kebebew E, Yao J (2017) Convolutional invasion and expansion networks for tumor growth prediction. IEEE Trans Med imaging, 37(2):638–648 24. Zhang L, Lu L, Wang X, Zhu RM, Bagheri M, Summers RM, Yao J (2019) Spatio-temporal convolutional LSTMS for tumor growth prediction by learning 4d longitudinal patient data. IEEE Trans med imaging 39(4):1114–1126
To Optimize Google Ad Campaign Using Data Driven Technique K. Valli Priyadharshini and T. Avudaiappan
Abstract Google Ads is a powerful tool that helps marketers to grow online businesses. It is aimed to develop the growth of business campaigns and E-Commerce sales we have analyzed and implemented the Data Driven Digital Marketing strategies to ensure effective positioning of marketing strategies, organization’s websites, and other online resources. The analysis process takes over by the Key performance Indicator (KPI) in Machine Learning and Python. The main purpose of running the Google Ads is to analyze the Return of Investment (ROI) and to implement the Google Ad words Campaign in digital marketing. Taking account into all the terms for that we use data driven attribution model (DAA) to help the marketer to improve their business. Google analytics plays the major role in marketing industry. By combining the data analytics (data science) and digital marketing we can meet out the various problems and business strategies. We can plan and provide a really tailored experience, making your customer satisfied while purchasing your goods or joining your online community. By extracting insights from enormous amounts of generated data, Machine Learning in Google Analytics can predict future trends and improve decision making. The main goal is to measure the performance of the ads in terms of reaches, engagements, metrics, and reactions, as well as to analyze the effectiveness of Google ad Campaign in generating orders, as well as to identify a possible target group for the brand and promote digital branding to achieve a profitable Return on Investment (ROI). Keywords Return of investment · Digital marketing · Google Ads · Data science · Machine learning
K. Valli Priyadharshini (B) · T. Avudaiappan Department of Computer Science and Engineering, K.Ramakrishnan College of Technology, Samayapuram, Trichy, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_38
535
536
K. Valli Priyadharshini and T. Avudaiappan
1 Introduction Internet and online-based digital technologies, such as desktop computers, mobile phones, and other digital media platforms, are utilized in digital marketing to promote items or services [1]. It has become more common to use a combination of search engine optimization (SEO), search engine marketing (SEM), content marketing (including influencer marketing), content automation, campaign marketing, and other techniques to promote a product or service via digital platforms. Mobile phones (SMS and MMS), callback and on-hold mobile ring tones are examples of nonInternet digital marketing channels. It differs from online marketing in that digital marketing extends to channels other than the Internet [2, 3]. You can build online ads to attract people who are interested in your products and services using Google Ads [4]. Paid-per-click (PPC) advertising is used on the Google Ads platform, which means you must pay every time a visitor clicks on your ad. In part, Google Ads relies on cookies, while in part, it relies on terms chosen by advertisers. Ad copy is placed on pages where Google believes it will be relevant using these qualities. As a result, advertisers profit when consumers divert their surfing to click on the advertisement material [5]. Locally, nationally, or internationally, advertisements can be produced and distributed. It is a web analytics tool that watches and reports website traffic [6]. It is currently part of the Google Marketing Platform brand, which is owned by Google. Urchin was acquired by Google in November 2005. According to Google Analytics, the most extensively used web analytics service on the web as of 2019 is Google Analytics. There is a Google Analytics for Mobile Apps SDK that allows you to get usage statistics from iOS and Android apps. Browsers, browser extensions, firewalls, and other tools can restrict Google Analytics. Since its start, Google Analytics has gone through several iterations. GA4 is now on its fourth iteration. Now known as Google Analytics 4, App + Web Property is the rebranded version of Google Analytics that was introduced in 2019 as a beta. UA, Universal Analytics, has been replaced by GA4, now [7]. GA4 has a built-in integration with Google’s Big Query, a functionality previously available only in GA 360 for enterprises. Google appears to be attempting to integrate GA and its free users into their bigger cloud offering with this change. As well as tracking website activity, Google Analytics also provides information on the source of traffic [8, 9]. Users may design and assess online campaigns by analyzing landing page quality and conversions using Google Ads (goals). Sales, lead generation, accessing a given page, or downloading a particular file are all examples of goals that could be set. A high-level dashboard is provided to the casual user, while more detailed data is provided further inside the report set. Through the use of tools such as funnel visualization, referrers, length of stay on site, and location, Google Analytics can identify pages that are underperforming. It also has more complex capabilities, such as visitor segmentation that is tailored to the individual. It is possible to track sales activity and performance with Google Analytics’ e-commerce reporting tool. A site’s transactions, revenue, and many other commerce-related indicators are displayed in the e-commerce report [10].
To Optimize Google Ad Campaign Using Data Driven Technique
537
By identifying areas that require primary research, systemization, followup research, validation or the gathering of empirical data, an investigation of the prospects for digital transformation in marketing can benefit both academic researchers and business practitioners in the fields of information technologies (IT), information systems (IS), business, and marketing.
2 Literature Survey Using keywords, Dumitriu et al. [11] discuss the current status of artificial intelligence in marketing operations and provide a four-step sequential approach. An extensive research study was conducted to identify new trends in digital marketing, notably the utilization of keywords and their value for a long-term business strategy. Their strategy included four processes for determining keywords and including them in SEO tactics for website visibility. Keyword research will help Google better integrate voice search, as long or unclear phrases spoken in a variety of dialects can make it difficult to locate items and services. To achieve intelligent, tailored, and automated marketing, the authors propose a sequential methodology for finding the correct keywords to be employed in SEO. Users of social e-commerce, according to Xia et al. [12], will benefit from a more comprehensive theoretical framework for data mining, as well as a theoretical basis and reference for platform and business management and operation. First, this article creates a model of information flow in social e-commerce based on information ecology and information dissemination perspectives. Social network analysis is used to analyze users’ networks; a conceptual model is constructed using UTAUT, the theory of perceived risk and the theory of trust to determine the factors that influence initial information adoption by social e-commerce users. The key influencing factors for initial information adoption are perceived risk, perceived trust, and perceived risk. Description, content and network analytics were used by Aswani et al. [13] to gain insights from Twitter. It has proven possible to analyze user-generated content using methods like hashtag analysis, polarity and emotion analysis, word analysis or even topic modeling. It is validated by a qualitative case study of an e-market place. Small firms and freelancers’ SEM services are less effective than those supplied by established players. In social media and forum conversations, customers have expressed dissatisfaction with these companies’ services. This study shows that SEM is often not only ineffective, but it can even destroy value if it is not implemented effectively. As well as transaction expenses such as agency difficulties and coordination costs, as well as the loss of non-contractible value and cost of fit, long-term benefits may also be adversely affected by these factors. SEM and outsourcing planning will benefit from the inputs provided. This study examines Twitter’s User Generated Content (UGC) to determine whether the much-hyped SEM domain is indeed gold and worth enterprises’ resources in the struggle for web visibility. Using consumer
538
K. Valli Priyadharshini and T. Avudaiappan
search behavior theory, Scholz et al. [14] describe a novel approach for automatically generating sponsored search phrases. As a result of recent research, we extract keywords from an online store’s internal search log. On the basis of a store’s internal search engine, we empirically test our approach and compare its impacts to a state-ofthe-art strategy. We found that our method raised the number of profitable keywords, increased the store’s conversion rate by almost 41 percent, and cut the average cost per click by more than 70%. There are governance problems associated with the deployment of big data in businesses, according to Trom et al. [15]. When big data technology first appeared, it created a huge buzz and a lot of upheaval in the business sector. The key reason for this disruption is the unique ability to process massive amounts of information, analyze it, and make well-informed data driven judgments. The benefits of big data have been realized by corporations and government organizations around the world. These organizations have begun preparing plans to deploy and extend big data technology. The lack of big data governance continues to bother enterprises, despite the fact that these organizations are constantly experimenting with different ways to exploit big data. An emerging governance dynamic for big data is the topic of this research. Large amounts of data complicate governance procedures and rules as well as roles and responsibilities of individuals. New approaches, methods, controls, and policies are needed to govern large amounts of data. Data quality level is a new criterion in the Big Data Governance Framework. Timely, reliable, useful, and sufficient data services are the emphasis of the Big Data governance framework. A big data analytics service’s goal is to determine what data qualities should be accomplished. Problems can be avoided by implementing a personal information protection policy, as well as a data disclosure and accountability strategy.
3 Proposed Methodology In our project the main motive is to analyze the Return of Investment (ROI). The Return of Investment is the ratio between the net income and investment. The analyzation process is taking over by the Key performance Indicator (KPI) in Machine Learning and Python. To ensure successful deployment of marketing strategies, organization’s websites, and other online resources. Increasing the website traffic on social media through automation tools. Use data driven attribution method to meet out the various problems and business strategy. The whole process of data analysis remains the same. First, Export data from Google ads, Imported into Python, Perform data cleaning operations, and Insights Reporting. In Fig. 2 represent that the flow diagram of proposed model shown in Fig. 1. In Fig. 2, it was represented that the flow diagram of proposed model. Initially, we take the Ad group & Ad copy and day and time slot data. The data is uploaded in the python module. Next the data is clean by removing the empty rows and columns. Then exploration the data and next level to feature engineering the data. Then we can
To Optimize Google Ad Campaign Using Data Driven Technique
Fig. 1 Block diagram of proposed method Fig. 2 Flow diagram of proposed model
539
540
K. Valli Priyadharshini and T. Avudaiappan
compare the visualize the data by using the relationship visualization by Ad group and Headline report and Date & Time slot report by generating the two graph such as Generic view graph and Granular view graph and finally to decision making process. The detailed view of the same is discussed below.
3.1 Dataset In this proposed methodology, the dataset is gathering from the Google PPC data. There are two dataset are used, which named as Ad group & Ad copy and day and time slot. The two dataset consists of different row and columns. In the row, which contains Ad group, Campaign Type, Search keyword match type, CTR, cost, click, hour of the day, conversion, conversion rate, and average CPC. Each column of a table represents a variable, and each establish a sense to a record of a data set in question.
3.2 Importing Dataset and Data Exploration We have to import the data which is downloaded from the google ad words into python interface. The first step in data analysis is data exploration, in which users explore a huge data set in an unstructured manner to identify initial patterns, traits, and areas of interest. To explore data, a combination of manual and automated methods might be used.
3.3 Data Cleaning The process of discovering and repairing (or removing) corrupt or inaccurate records from a record set, table, or database by recognizing incomplete, erroneous, inaccurate, or irrelevant data and then replacing, updating, or deleting the filthy or coarse data.
3.4 Feature Engineering Feature engineering is the process of extracting features from raw data using domain knowledge and data mining techniques. These characteristics can be exploited to boost the performance of machine learning algorithms.
To Optimize Google Ad Campaign Using Data Driven Technique
541
3.5 Relationship Visualization Ad scheduling is one area where the visualization has been improved. Currently, there is a bar chart displaying when ads are running and a separate line graph for viewing performance data on the ad schedule tab of a campaign, but no day parting.
3.6 Testing We carried out many testing schemes in this experiment, including system testing, unit testing, unit testing, functional testing, acceptance testing, and regression testing. Testing fully integrated applications, including external peripherals, to see how components interact to one another and with the overall system. This is also known as an End-to-End testing scenario.
4 Result and Discussion The results are obtained by using python platform as Jupiter notebook, which is open source and is freely available on its official website www.python.org. The hardware requirement for this experimentation with 64 MB RAM, 20 GB hard disk and Intel(R) Pentium(R) CPU G3250 @ 3.20 GHz.
4.1 Performance Evaluation Analysis and insights report (visualize and finally report the insights), we generate two reports namely, Ad group and Headline report and Date & Time slot report. The analyzation is taken for the various dataset with the help of two graphs namely, Generic view graph and Granular view graph. The generic view graphs are used to find which headlines are performing well with the help of various metrics (KPI). The granular view graphs are used to find which Ad groups are performing well with the help of various metrics (KPI).After running various ad group and ad set we have to analysis which Ad set give maximum result.
4.2 Generic View In the Fig. 3, the four graphs are generated from that we have headlines as Xaxis and budgets as Y-axis. The 4 graphs are generated according to the various
542
K. Valli Priyadharshini and T. Avudaiappan
Fig. 3 Generic view graph (Ad group + Ad headline + Budget)
metrics like total amount spent, total clicks, CTR, avg. CPC. The headlines are Min. Investment & Max. Profit, Number 1 Franchise Opportunity, Own Franchise in Maharashtra, Franchise Opportunity @Mumbai, Profitable Franchise in Pune, Franchise Opportunity in Pune, Profitable Franchise in Mumbai. The first graph is about total amount spent. From the first graph we can see that the budget of the headline Franchise Opportunity in Pune and Number 1Franchise Opportunity are high. We spent more amount to these headlines and next Franchise Opportunity @Mumbai has high budget followed by Profitable Franchise in Mumbai. The second graph is about total clicks. From the second graph we can see that the first headlines receive maximum clicks than second. Likewise, according to the amount spent the number of total clicks will be received. Here, we can analyze that the amount spent for the headline Number 1 Franchise Opportunity is high so we can reduce the amount here. The third graph is about CTR. CTR is the important metric for the analyzation. From the graph we can see that CTR is high for the headlines Number 1 Franchise Opportunity and Min. Investment & Max. Profit. So, it is clear that the headline Min. Investment & Max. Profit is the best performing headline because we spent less amount in that headline but we receive maximum CTR for that. Likewise, the avg CPC is also high for the Min. Investment & Max. Profit headline. The fourth graph is about avg CPC. In Fig. 4, we have the 3 Ad groups like Franchise in Pune, Franchise in Mumbai, Chicken Franchise Pune and the same 7 headlines. In the Ad group Franchise in Pune the first two headlines have the maximum clicks and in Ad group Franchise in Mumbai the last two headlines have the maximum clicks but in the Ad group Chicken Franchise Pune there is no clicks is found for that so we can eliminate that Ad group from the analysis. From the Fig. 5, we can see that in the Ad group Franchise in Pune the first two headline have the maximum amount spent and in Ad group Franchise in Mumbai the last two headlines have the maximum amount spent and the Chicken Franchise Pune is eliminated.
To Optimize Google Ad Campaign Using Data Driven Technique
543
Fig. 4 Granular view graph, (Ad group + Ad headline + Budget) total clicks
Fig. 5 Granular view graph (Ad group + Ad headline + Budget) total amount
From the Fig. 6, we can see that the headline Min. Investment & Max. Profit in the Ad group Franchise in Mumbai have the high CTR than others. Next the Profitable Franchise in Mumbai has the high CTR in both Ad groups followed by Number 1 Franchise Opportunity. From the Fig. 7, Profitable Franchise in Pune has less CPC than other. According to the combination of Ad group, Ad headline and budget the Min. Investment & Max. Profit is the best headline and Franchise in Mumbai is the best Ad group to get the profitable performance. Now we combine the ad groups, ad headlines and keyword match type. From the Fig. 8, the 6 graphs are generated from that we have phases as X-axis and metrics as Y-axis. The 6 graphs are generated according to the various metrics like total amount spent, total clicks, CTR, avg. CPC. The phases are Exact match, Phrase match, Broad match. From the above 6 graphs it is clear that the phrase match type gives more result than the other match types.
Fig. 6 Granular view graph (Ad group + Ad headline + Budget) Average CTR
544
K. Valli Priyadharshini and T. Avudaiappan
Fig. 7 Granular view graph (Ad group + Ad headline + Budget) Avg CPC
Fig. 8 Generic view graph (Ad group + Ad headline + Keyword match type)
4.3 Granular View In the Fig. 9 we have 3 ad groups. In the first ad group Franchise in Pune we spent maximum amount for exact and phrase match type and less amount is spent for the broad match type. In the second ad group Franchise in Mumbai we spent maximum amount only in phrase match type. In the third ad group Chicken Franchise Pune we does not spent any amount so we can eliminate that ad group from the analysis process. From the Fig. 10, we can see that the clicks received for the exact match is high for the ad group Franchise in Pune, moderate for phrase match and less for broad. Likewise in ad group Franchise in Mumbai the clicks are only received for phrase match type (Refer Fig. 11).
Fig. 9 Granular view graph (Ad group + Ad headline + Keyword match type) amount spent
To Optimize Google Ad Campaign Using Data Driven Technique
545
Fig. 10 Granular view graph, (Ad group + Ad headline + Keyword match type) total clicks
Fig. 11 Granular viewgraph, (Ad group + Ad headline + Keyword match type) CTR
From the Fig. 12, the CPC is maximum for the ad group Franchise in Pune and in the ad group Franchise in Mumbai the phrase match type is high. Conversion is the important metric for the analyzation process. We already done a feature engineering process to get the conversion metric in our project. From the Fig. 13, the conversions are higher for the ad group Franchise in Mumbai than other. In the ad group Franchise in Pune both exact and phrase type have higher conversion rate.
Fig. 12 Granular viewgraph, (Ad group + Ad headline + Keyword match type) average CPC
Fig. 13 Granular view graph, (Ad group + Ad headline + Keyword match type) total conversions
546
K. Valli Priyadharshini and T. Avudaiappan
Fig. 14 Granular viewgraph, (Ad group + Ad headline + Keyword match type) average conversion rate
From the Fig. 14, the avg conversion rate is low for all the ad groups. But for the ad group Chicken Franchise Pune we receive high avg. conversion rate for phrase match but we do not consider that ad group because we do not spent any amount for that ad group.
4.4 Day and Time Slot Report On this data set we will see the combine effects of Day and Time slot. It will answer questions like what time of the day is best for conversions, so we can increase our Ad that time & which day of the week is the best performing one so that we can accordingly modify the campaign. Now combine the various Days with Time slots to find which gives higher results and which day and time is performing better also worst. In the Fig. 15, the six graphs are generated from that we have time as X-axis and metrics as Y-axis. The 6 graphs are generated according to the various metrics like total amount spent, total clicks, CTR, avg. CPC, total conversions, CVR. The timeslots are 4–7 AM, 7–10 AM, 10 AM–1 PM, 1–5 PM, 5–8 PM, 8 PM–12 AM, 12–4 AM.53. The first graph is about total amount spent. From the first graph we can see that the budget spent for 1–5PM is high followed by 10 AM–1 PM, 5–8 PM. For other time slots the amounts are spent moderately. According to the amount spent for the time slots the other metrics are calculated and analyze.
Fig. 15 Generic view graph (Time slot + Ad group + Budget)
To Optimize Google Ad Campaign Using Data Driven Technique
547
The second graph is about total clicks. From the second graph we can see that the clicks are received according to the amount spent. 1–5 PM slot receives maximum clicks than other. Likewise the other time slots also receive clicks according to the amount spent. The third graph is about CTR. CTR is the important metric for the analysis process. From the third graph we can see that the CTR is higher for the time slot 4–7 AM slot, then next higher for 7–19 AM slot, then next higher for 12–4 AM slot. The CTR is minimum for 1–5 PM slot and 10 AM–1 PM slot but we spent more amount in it. This leads to loss for the business. So, we can decrease the budget spent in the worst performing time slots. The fourth graph is about CPC. If we get less CPC for the high CTR slot that slot will be the best performing time slot in a day. From the fourth graph we can see that CPC is less for the time slots 8 PM–12 AM, 12–4 AM. CPC is high for the time slots 7–10 AM, 10 AM–1 PM. The fifth graph is about total conversions. The conversion rate is also an important metric for the analyzation process. The conversion rate is high for the timeslots 7–10 AM, 10 AM–1 PM, 1–5 PM, 5–8 PM, 12–4 AM less for the time slot 8 PM–12 AM and moderate for 4–7 AM.54. The sixth graph is about CVR. CVR is higher for the time slots 12–4AM and 8 PM–12 AM. 4–7 AM doesn’t have any CVR. Other time slots are having moderate CVR.
4.4.1
Granular View
In the Fig. 16, we can see there are three ad groups namely, Franchise in Mumbai, Chicken Franchise Pune, Franchise in Pune and same 7 time slots. From the graph we can see in the first ad group Franchise in Mumbai the maximum amount is spent on 10–1 PM time slot and on 12–4 AM time slot. Likewise other timeslots are have moderate and less amount.
Fig. 16 Granular view graph (Effect of time slot + Ad group + Budget on KPIs) total amount spent
548
K. Valli Priyadharshini and T. Avudaiappan
Fig. 17 Granular view graph of total clicks
In the second ad group Chicken Franchise Pune, there is no amount spent on any of the time slot so we can eliminate this ad group from the analysis process. In the third ad group Franchise in Pune, the maximum amount is spent on1PM-5PM time slot and in 10 AM–1 PM time slot. Moderate amount is spent on 5–8 PM time slot. Other time slots have less amount spent on that. From the Fig. 17, the maximum clicks are received by the ad group Franchise in Pune because we spent more amount on that. In the ad group Franchise in Pune the time slot 1–5 PM receives more clicks followed that 10 AM–1 PM and 5–8 PM time slots receives more clicks. In the ad group Franchise in Mumbai 10 AM–1 PM time slot receives more clicks followed that 1–5 PM and 12–4 AM time slot receives more clicks. We do not consider the Chicken Franchise Pune because of the cost. In the Fig. 18, the six graphs are generated from that we have day of week as X-axis and metrics as Y-axis. The 6 graphs are generated according to the various metrics like total amount spent, total clicks, CTR, avg. CPC, total conversions, CVR. The day of weeks are Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. The first graph is about total amount spent. From the first graph we can see that the budget spent for Tuesday and Wednesday is high followed by Monday and Sunday. For other days the amounts are spent moderately. According to the amount spent for the days the other metrics are calculated and analyze. The second graph is about total clicks. From the second graph we can see that the clicks are received according to the amount spent. Sunday and Monday receive maximum clicks than other. Likewise the other days also receive clicks according to the amount spent.
Fig. 18 Generic view graph of (Day of week + Ad group + Ad headline + Budget)
To Optimize Google Ad Campaign Using Data Driven Technique
549
The third graph is about CTR. CTR is the important metric for the analyzation process. From the third graph we can see that the CTR is higher for the time slot Saturday, Tuesday, and Monday, then next higher is Friday and Sunday. The CTR is minimum for Thursday and Wednesday but we spent more amount in it. This leads to loss for the business. So, we can decrease the budget spent in the worst performing days. The fourth graph is about CPC. If we get less CPC for the high CTR slot that slot will be the best performing day in a week. From the fourth graph we can see that CPC is less for Monday, Saturday, and Sunday. CPC is high for Tuesday and Wednesday. The fifth graph is about total conversions. The conversion rate is also an important metric for the analyzation process. The conversion rate is high for Monday, Wednesday, and Friday and moderate for Wednesday and Saturday and less for Tuesday and Thursday. The sixth graph is about CVR. CVR is higher for Friday. Other time slots are having moderate CVR. In the Fig. 19 we have 3 ad groups. In the first ad group Franchise in Pune we spent maximum amount for Sunday and Wednesday and less amount is spent for Saturday. In the second ad group Franchise in Mumbai we spent maximum amount for Tuesday and Wednesday. From the Fig. 20, we can see that in the ad group Franchise in Pune Wednesday and Friday has maximum conversions and in the ad group Franchise in Mumbai Monday has maximum conversions likewise the other days are also have the moderate to less conversions rate.
Fig. 19 Granular view graph of day of week wise total amount spent
Fig. 20 Granular viewgraph of day of week wise conversions
550
K. Valli Priyadharshini and T. Avudaiappan
Fig. 21 Granular view graph of day of week wise CVR
From the Fig. 21, we can see that in the ad group Franchise in Pune Wednesday and Friday has maximum CVR and in the ad group Franchise in Mumbai Monday Saturday has the maximum CVR.
4.5 Final Estimation Result from Ad Group and Ad Headline Report 4.5.1
Generic View
According to this the best performing headline is Min. Investment & Max. Profit. So, we can increase our budget for this headline and reduce the budget for worst performing headlines like Franchise Opportunity in Pune.
4.5.2
Granular View
According to this the best performing headline is Min. Investment & Max. Profit. So, we can increase our budget for this headline and reduce the budget for worst performing headlines like Franchise Opportunity in Pune. According to the combination of Ad group, Ad headline and Keyword Match type the phrase match is the best keyword match type for the ad group Franchise in Mumbai and exact match is the best keyword match type for the ad group Franchise in Pune.
4.6 Result from Day and Time Slot Report 4.6.1
Generic View
According to this, the time slot 8PM–12AM has best CTR, CPC & CVR. So, we can increase our budget for this time slot and reduce the budget for worst performing headlines like 1–5 PM. Effect of Ad group + Day of week + Budget on KPIs results as Monday & Friday have the best metrics. Increase bid.
To Optimize Google Ad Campaign Using Data Driven Technique
4.6.2
551
Granular View
In the Ad Group Franchise in Pune 4–7 AM Reduce Bid significantly—20%, 1–5 PM Reduce Bid significantly—20%, 8 PM–12 AM increase Bid slightly—10% and for the Ad Group Franchise in Mumbai 4–10 AM Reduce Bid significantly—20%, 10 AM–1 PM Reduce Bid slightly—10% 8 PM–12 AM Increase Bid slightly—10%. Effect of Ad group + Day of week + Budget on KPIs on the Ad Group Franchise in Pune Tuesday: Reduce Bid—20%, Monday & Thursday: Reduce Bid—10% and for the Ad Group Franchise in Mumbai Wednesday & Thursday: Reduce Bid—20%, Tuesday: Reduce Bid—10%.
5 Conclusion In this modern world the marketing strategy is very crucial for the development of business. Taking account into all the terms for that we use data driven attribution model (DAA) to help the marketer to improve their business. Google analytics plays the major role in marketing industry. By combining the data analytics (data science) and digital marketing we can meet out the various problems and business strategies. The analysis process takes over by the Key performance Indicator (KPI) in Machine Learning and Python. The main purpose of running the Google Ads is to analyze the Return of Investment (ROI) and to implement the Google Ad words Campaign in digital marketing. The current system can perform the data analytics in Google ad campaign to optimize the growth of the business ads. It can analyze the insights reports from the consumer PPC to remarketing their campaign to meet out the return of investment (ROI). It uses the Data Driven Attribution operations to optimize the analytical data from the insight reports and compare the competitors of particular ad campaign to make maximum website traffic for the users. Further, present system will improved by some advanced techniques in Data Driven Attribution in python to make it more optimize the campaigns and can be more automated for support decision making process.
References 1. Yogesh S, Sharaha N, Roopan S (2019) Digital marketing and its analysis. Int J Innov Res Comput Commun Eng 5(7):201957007 2. Saura JR, Palos-Sanchez PR, Correia MB (2019) Digital marketing strategies based on the e-business model: literature review and future directions. In: Organizational transformation and managing innovation in the fourth industrial revolution, pp 86–103 3. Desai V (2019) Digital marketing: a review. Int J Trend Sci Res Dev 196–200 4. Saxenaa K, Mittalb S (2019) An analytical study of digital advertising strategies and measuring their effectiveness. Int J Trade Commerce 5. Chaffey D, Edmundson-Bird D, Hemphill T (2019) Digital business and e-commerce management. Pearson UK
552
K. Valli Priyadharshini and T. Avudaiappan
6. Pohjanen R (2019) The benefits of search engine optimization in google for businesses. Unpublished Master’s Thesis, University of Oulu. http://jultika.oulu.fi/files/nbnfioulu-201910112963. pdf 7. Thushara Y, Ramesh V (2016) A study of web mining application on e-commerce using Google Analytics tool. Int J Comput Appl 149(11):975–8887 8. Ahmed H, Jilani TA, Haider W, Abbasi MA, Nand S, Kamran S (2017) Establishing standard rules for choosing best KPIs for an e-commerce business based on google analytics and machine learning technique. Int J Adv Comput Sci Appl 8(5):12–24 9. Anderson C (2015) Creating a data-driven organization: practical advice from the trenches. O’Reilly Media, Inc. Booth D (2019) Marketing analytics in the age of machine learning. Appl Market Anal 4(3):214–221 10. Micheaux A, Bosio B (2019) Customer journey mapping as a new way to teach data-driven marketing as a service. J Mark Educ 41(2):127–140 11. Dumitriu D, Ana-Maria Popescu M (2019) Artificial intelligence solution for digital marketing. In: International conference interdisciplinarity in engineering 12. Xia C, Guha S, Muthukrishnan S (2016) Targeting algorithms for online social advertising markets. In: IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM) 13. Aswani R, Kar AK, Ilavarasan PV, Dwivedi YK (2018) Search engine marketing is not all gold: insights from Twitter and SEOClerks. Int J Inf Manage 38(1):107–116 14. Scholz M, Brenner C, Hinz O (2019) AKEGIS: automatic keyword generation for sponsored search advertising in online retailing. Decis Support Syst 119:96–106 15. Trom L, Cronje J (2019) Analysis of data governance implications on big data. In: Future of information and communication conference. Springer, Cham, pp 645–654
Identification and Detecting COVID-19 from X-Ray Images Using Convolutional Neural Networks Rahman Farhat Lamisa and Md. Rownak Islam
Abstract The sudden increase of COVID-19 patients is alarming, and it requires quick diagnosis in a quick time. PCR testing is one of the most used methods to test and diagnose COVID, which is time-consuming. In this paper, we present an end-toend technique that can detect COVID-19 using chest X-ray scans. We have trained and optimized a convolutional neural network (ConvNet), which was trained on a large COVID-19 dataset. We have performed a series of experiments on a number of different architectures. We have chosen the best performing network architecture and then carried on a series of additional experiments to find the optimal set of hyperparameters and show and justify a number of data augmentation strategies that have allowed us to enhance our performance on the test set greatly. Our final trained ConvNet has managed to obtain a test accuracy of 97.89%. This high accuracy and very fast test speed can be beneficial to get quick COVID test results for further treatment. Keywords COVID · X-ray · ConvNet · Deep learning
1 Introduction COVID-19 is a respiratory disease caused by severe acute respiratory syndrome, and it was declared a global pandemic on March 11, 2020, by the World Health Organization (WHO) [1]. COVID-19 is rapidly spreading, and the number of COVID-19 cases is increasing continuously. Although RT-PCR is the widely used detection technology with COVID-19 still has some limitations, like it is costly and time-consuming [2, 3]. Also, there might be limited time for diagnosis for more vulnerable people, i.e., elderly people. These people might need an earlier indication of their health risks R. F. Lamisa (B) · Md. R. Islam Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] Md. R. Islam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_39
553
554
R. F. Lamisa and Md. R. Islam
before they get back their test results. On top of that, human judgment is not errorfree, and an automated system can assist human doctors and can be performed by anyone who has access to a computer. For all these reasons, it has become a top priority to generate a time-efficient system and equally if not more accurate in terms of COVID diagnosis [4, 5]. In this paper, we have proposed a method by which we can diagnose COVID reliably in a short time. To justify the reliability, we have compared some deep learning architectures to find out the best-fitted model. Moreover, we have also experimented by tuning the parameters and hyper-parameters to make the model perform better. The main objective of this research is to make use of lungs X-ray scans to detect whether a person is affected by COVID-19. More specifically, we have used a convolutional neural network (ConvNet), trained, optimized, and tested thoroughly to distinguish lung X-ray scans of patients between COVID positive and negative. Our proposed solution can be divided into two main parts: • In the first part, the first stage of experiments was performed to find out the best performing network architecture on a fixed set of settings. • In the second part, after selecting the best performing network architecture, we analyze to find the best selection of hyper-parameters and other settings. This paper is arranged in a few sections. To begin with, some of the recent works conducted on this topic are discussed and compared. Then we go over the dataset and how the pre-processing, splits, etc., were done. In the methodology section, the various experiments and their setups are explained. The results acquired are discussed in the results and the analysis section. Then the best performing model and the justification behind the selection would be discussed in the decision on the network and justification section. After selecting the best model, different hyperparameters and settings were set up to get the best outcome, and it is discussed in result section. Finally, we conclude with some ideas for future improvements.
1.1 Related Work Relatively few researches have been conducted to identify the COVID infected cases using X-ray applying deep learning methods. Transfer learning framework for deep learning can be used to detect COVID-19 using image classification and deep learning model [6]. The mode of an image can be X-ray, ultrasound, and CT scan. The accuracy level can be changed by selecting different types of models and changing their parameters. Deep learning methods to automatically analyze chest X-ray has been used in researches [7] with the hope to bring precision tools to health professionals towards screening the COVID-19 and diagnosing confirmed patients. The authors experimented with numerous chest X-ray images collected from various sources [8]. Then, a transfer learning paradigm was used to develop COVIDCXNet, the well-known CheXNet model. According to the authors, based on relevant and meaningful features and precise localization, this powerful model is capable of
Identification and Detecting COVID-19 from X-Ray Images Using …
555
detecting the novel coronavirus pneumonia. COVID-CXNet is an important step toward developing a fully automated and robust COVID-19 detection system [8]. A new concept was proposed by Sanhita et al. [9] in their research named domain extension transfer learning (DETL). They deployed DETL with a pre-trained deep convolutional neural network on a chest X-ray dataset tuned for classifying between four classes: normal, pneumonia, other disease, and COVID. They performed a fivefold cross-validation to estimate the feasibility of using chest X-rays to diagnose COVID. The overall accuracy was around 90.13%. They also provided another concept to understand the COVID detection transparency, named gradient class activation map (Grad-CAM). Grad-CAM helps to detect the regions where the model paid more attention during classification, and experts found that this had a substantial correlation with clinical findings. Another research report [11] used deep feature extraction, fine-tuning pre-trained convolutional neural networks (CNN), and end-to-end training of a developed CNN model to classify COVID-19 and normal chest X-ray images. They used ResNet18 [11], ResNet50 [11], ResNet101 [11], VGG16 [12], and VGG19 [13] for feature extraction and fine-tuning procedure. The support vector machines (SVM) classifier with various kernel functions, namely, linear, quadratic, cubic, and Gaussian, were used for classification. They proposed a new CNN model in their study with end-toend training. The ResNet50 model and SVM classifier with the linear kernel function gave 94.7% accuracy, the highest obtained results. ResNet50 model after fine-tuning showed 92.6% accuracy when end-to-end training of the developed CNN model gave 91.6% accuracy. They also used various local texture descriptors and SVM classifications to compare the performances with alternative deep approaches. Their result showed that the deep approaches are more efficient than the local texture descriptors in detecting COVID-19 based on chest X-ray images. A similar research report [14] tried to find a deep neural network-based model to sort patients for appropriate testing. Their model gave 90.5% accuracy with 100% sensitivity (recall) for the COVID-19 infections [15]. In [16], the authors analyzed healthy and COVID infected people in parallel with the help of computerized tomography images and chest X-ray images. The authors aimed to compare earlier works in the field and explore prospective task models that might be evaluated further to demonstrate their utility in real-world circumstances. They applied augmentation in their collected data since they did not have enough to prevent overfitting. Three models were implemented, and their performance was compared to calculate the accuracy for better analysis. They concluded that the XCeption net had performed the best even though the high accuracy can be occurred due to the overfitting. In research [17], the authors proposed a method which was consisted of four phases, viz, data augmentation, stage-I and stage-II deep network model designing. Using chest X-ray images, they designed deep network implementation in two stages to differentiate COVID induce pneumonia from healthy cases, bacterial and other virus-induced pneumonia. They performed a comprehensive evaluation to demonstrate the effectiveness of the proposed method.
556
R. F. Lamisa and Md. R. Islam
2 Dataset One of the most important prerequisites for any project that relies on deep learning is data availability. If it is a supervised learning task, the associate labels are required too. Since COVID-19 is a new disease, the availability of a large dataset was not as easy. We leveraged a dataset that had X-ray scan images for COVID positive and negative patients [18]. Figure 1a represents an X-ray of a lung affected by COVID and Fig. 1b image is a normal lung X-ray. These two images are from our collected dataset. One problem with this dataset was class imbalance. Ideally, a dataset should have an equal distribution of samples for all its classes. However, the dataset used in this project was not balanced. To resolve this, a random sampling method was used. This would not have any negative impact on the later stages because the negative samples do not pose a large amount of variation from one image to the other. Rather the positive class images vary from image to image. On the contrary, this would help the ConvNet to generalize better and to not favor one class over the other. After resizing the dataset, there were a total of 3616 images of the positive class and a total of 4445 images of the negative classes. The resolution of the images in the dataset was 299 × 299 pixels. We split the dataset into three partitions for training, validation and testing, which had 70, 20 and 10% of the total data. The validation set was used for evaluation and for the experimentations. The test set was held out and kept separate until the very end to avoid making any changes to the system based on the feedback from the test images.
(a) Lung affected by covid Fig. 1 a Lung affected by COVID; b Normal lung
(b) Normal Lung.
Identification and Detecting COVID-19 from X-Ray Images Using …
557
3 Methodology Since the objective of our task is to identify COVID positive and negative, therefore it becomes a problem commonly known as a binary classification problem. In this case, our two classes would be COVID and Non-COVID. At first, the task was to decide on particular network architecture and use it as the baseline. In current times, there is continuous research being done regularly for inventing novel architectures specialized to tackle various deep learning tasks. Therefore, to decide on which network we should be choosing, we had to do a number of experiments. Based on the results of these experiments, we chose the model architecture. This is extensively explained in the next few sections. Once a decision has been made on the network model, then additional experiments were performed on this chosen network to find out the optimal choices of hyperparameters and a few other factors. This also has been extensively covered with a number of plots and figures in the latter part of this paper. In our work, AlexNet, ResNet18, ResNet34, EfficientNet, and DenseNet121 were used. The choices of these models were made so they would cover different types of networks, starting from the simpler ones (AlexNet) to more complex and deeper ones (ResNet) and toward the more recent efficient but not so deeper ones (EfficientNet).
3.1 Experimental Setup for Comparison of the Different Networks For our first series of experiments, a few parameters and hyper-parameters were fixed and constant across the different models. These settings involved several factors like the learning rate, the choice of optimizer, the batch size, the train-validation-test ratio, the choice and settings of various augmentations, etc. Keeping these settings standard and consistent across the different architecture would allow to perform proper evaluation on finding which model outperforms the rest on our test set. At first, for the dataset distribution we kept the 70–20–10 split for train, validation and test respectively. All the data were consistently shuffled during every epoch. The number of epochs was set to 100. However, it was made sure there is no overfitting by only writing out the checkpoint that performed best on the validation set. A consistent learning rate of 0.001 and that was not changed throughout the iterations. As the loss function, since it is a classification problem, we used a binary cross entropy loss. Several transformations/data augmentations were applied such as horizontal/vertical flipping, random rotation, random sharpness, contrast adjustment, etc. These allowed us to correctly utilize our data and also to make the models be robust against insignificant changes in details. Also, these would help us to counter any probable overfitting or sensitivity to noise. Finally, as the optimizer we used ADAM with a weight decay of 0.01.
558
R. F. Lamisa and Md. R. Islam
4 Results and Discussions 4.1 Decision on the Network and Justification At the first stage of our experiment our aim was to find out the best performing model among the other models we have experimented with. There is a large number of network architectures that can be used to solve our binary classification problem. To reduce the architecture search space, a few models were chosen that span across different factors. Firstly, the AlexNet [19] model was pick to see if a simpler model with fewer number of learnable parameters is sufficient enough to fit the data. Beyond that we picked two variants of the ResNet architecture to see if we can perform better using deeper models that also leverage the skip connections. ResNet [11] models use residual blocks that contain skip connections which ensures better gradient flow between the shallow and deeper layers. The DenseNet [20] models also serve a similar purpose, however the DenseNet architecture is also deeper and use the concept of dense blocks. Finally, the other model that we used is a fairly recent one, the EfficientNet [21]. This model is one of the architectures of the EfficientNet family, where the authors showed state of the art results using relatively simpler model which they have designed using neural architectures search methods. Afterwards, the different augmentation methods were chosen. Basic horizontal flipping was used to make the model invariant to horizontal changes. Random rotations within a fixed angle were also added to make it rotation invariant. On top of that random contrast, sharpness and brightness augmentation ensured the models are robust against visual changes and artifacts. The images were normalized using the ImageNet mean and standard deviation. This would allow a smoother loss function and faster training process. Finally, all the models were trained for 100 epochs. None of the parameters, settings or hyper-parameters were different or changed for any model throughout this process. In each epoch, along with the training loss (binary cross entropy loss) and training accuracy, the validation loss (binary cross entropy loss) and validation accuracy were calculated. Once the training was completed, the performance of each of the trained models was evaluated on our test set. The test set consisted of 712 images where 362 images are positive and 350 images are from the negative class. As the evaluation metrics we have used accuracy, precision, recall and F1 score. While accuracy can show us how many predictions were correct, precision and recall also take into account the number of false positives and false negatives estimated by the models. The F1 score takes all of these into account since it uses both precision and recall in its calculation. The validation accuracy has been given in Table 1 to show if the obtained results in the validation phase (the best one through all epochs) was close to the testing result. A closer match would indicate there was no overfitting or underfitting. It would also indicate that the data distribution was uniform and there was no bias in either subset of the images. Table 1 below is given to show the validation accuracy and test accuracy that we got for each model through experiment.
Identification and Detecting COVID-19 from X-Ray Images Using …
559
Table 1 Validation accuracy and test accuracy as obtained for each model from the experiments Model
Validation accuracy
Test accuracy
Precision
Recall
F1 Score
AlexNet
0.5051
0.4915
0.49157
1.0000
0.6591
ResNet18
0.9815
0.9733
0.9688
0.9771
0.9729
ResNet34
0.9794
0.9831
0.9884
0.9771
0.9827
EfficientNet
0.9616
0.9606
0.9653
0.9542
0.9597
DenseNet121
0.9733
0.9775
0.9826
0.9714
0.9770
From the table, it can be seen that ResNet34 has the best test accuracy rate, closely followed by the DensNet121. Testing accuracy rate for ResNet34 is 98.31% and for DenseNet121 is 97.75. AlexNet has the least test accuracy, which is 49.15% which agrees with the graphs that we have presented earlier. The accuracy rate using ResNet18 and EfficientNet is also fairly decent, scoring 97.33% and 96.06%, respectively. Figure 2 shows the accuracy of training and validation through a number of epochs across the different networks to further validate our point. Additionally, Fig. 3 illustrates a comparison of performance using different optimizers. Figure 2a, b, c and d represent the Training Accuracy; Validation Accuracy; Training Loss and Validation Loss respectively for AlexNet, ResNet18, ResNet34, EfficientNet, DenseNet121.From these results and discussions and also from the
Fig. 2 a Training accuracy; b validation accuracy; c training loss d validation loss
560
R. F. Lamisa and Md. R. Islam
Fig. 3 a Training accuracy; b validation accuracy; c training loss; d validation loss
graphs and numbers, we have come to a decision to select the ResNet34 model for our project. Even though DenseNet121 has almost the test accuracy (still a little behind the ResNet34) but it took a lot of time to complete the task. Therefore, we chose ResNet34 for our second stage of experiments in the project which will be discussed in the next few sections.
4.2 Second Stage Experiments on ResNet34 and Comparisons Based on our previous results, this section goes over some more thorough experiments performed on our chosen architecture, the ResNet34. Essentially, the results will be used to justify the design choices. The comparisons will also demonstrate how the results would have been if different options were chosen.
4.2.1
Choosing the Right Optimizer
To select the most optimal optimizer and its associated parameters, we performed a series of experiments. Firstly, Adam and SGD optimizers both were used while
Identification and Detecting COVID-19 from X-Ray Images Using …
561
keeping every other hyper-parameters fixed to identify each of their impacts on the loss value. Figure 3a and b represent the Training Accuracy; Validation Accuracy; whereas Fig. 3c and d represent the Training Loss and Validation Loss respectively for ResNet34 model using Adam and SGD optimizers. Here, the blue line corresponds to the ADAM optimizer, and the orange line indicates the SGD optimizer. From the graph in Fig. 3c, d it is apparent that the SGD optimizer produces a slightly higher loss value compared to the ADAM optimizer in both training and validation. Interestingly, we managed to achieve a test accuracy of 98.75% using Adam whereas SGD managed to give us 97.75%. Since the validation accuracy rate for both optimizers are almost the same and the test accuracy rate is higher for Adam, we can say that Adam optimizer is performing better for our task and the network architecture setup that we would be using.
4.2.2
ResNet34 Using Different Learning Rates
Changing the learning rate impacts the way weights are updated in a network. A higher learning rate can theoretically increase the speed of reaching the global minimum and make model train faster. However, it might bring additional problems like the model never converging, or even the model might oscillate back and forth inside a local optimum. On the other hand, a small learning rate can often times take us to a global minimum but that can take a very long time if it is very small. Also, it might get stuck in a local optimal point if it does not make a big enough jump to climb out of it. So, we needed to pick a good value so that the gradients are update properly and the losses get lower as well. Figure 4 illustrates the impact of different learning rates on ResNet34 model. Here, learning rate is 0.01 in the red line that is named ResNet and a value of 0.001 was used in the gray curved named ResNet_Extra. Figure 4a and b represent the training accuracy and validation accuracy; whereas Fig. 4c and d show the training loss and validation loss, respectively, for the ResNet34 model using different learning rates. In these figures, the orange line indicates a learning rate of 0.001 and the gray line indicates a learning rate of 0.01. Since training the entire system end to end is computationally expensive, we proceeded with using just two different values as our learning rate to make our observation. Setting the learning rate as 0.01, the test accuracy becomes 97.75% and minimizing the learning rate, we get 97.89% test accuracy, and the difference between the accuracy is not very significant. But we preferred to choose the lower learning rate because the difference of accuracy is slightly larger, and taking the lower learning rate will be the safer choice since our number of epochs is high. So, it will also not fall out of the global minimum.
562
R. F. Lamisa and Md. R. Islam
Fig. 4 a Training accuracy; b validation accuracy; c training loss; d validation loss
4.2.3
Resnet34, and Its Augmentations
Any deep learning technique requires a lot of data for training and we can increase the performance of the model by using more data. But, this is a difficult task to manage a large dataset which also comes with labels. Lack of data can make the model’s generalization capacity limited which will result in poor performance on test sets. It can also lead to overfitting. To counter that, we have leveraged image augmentation techniques instead of manually collecting data. The compose class allows us to create a composition of transformation. Image augmentation is the process of generating new images by adding artificial modification or noises but keeping the original image mostly unchanged in its content. Figure 5a and b illustrate the training accuracy and validation accuracy while Fig. 5c and d show the training loss and validation Loss respectively for the ResNet34 architecture. The red line indicates the training and validation of a network that has augmentation, whereas the orange line shows the same process without any augmentations being applied. For the purpose of data augmentation, several domain related modification techniques were applied to the training set. The augmentation strategies were pick so that they would introduce random modifications to the data without ruining their inherent characteristics. To achieve that, all the images were randomly chosen to apply horizontal and vertical flipping. A probability of 50% was chosen so that each image has 50% chance of being horizontal and/or vertically
Identification and Detecting COVID-19 from X-Ray Images Using …
563
Fig. 5 a Training accuracy; b validation accuracy; c training loss; d validation loss
flipped. In addition to that, a random rotation was also used which would apply arbitrary rotation to some images up to an angle of 45°. Also, the sharpness of the images was randomly enhanced and reduced. The ImageNet mean and standard deviation were used to normalize the images from the training, validation and test set. During the validation and test passes, no augmentation was applied. Also, we would always want to know how the model has learnt to understand real COVID images. Hence, we opted out of using augmentation techniques during the validation and test phases. Figure 5 demonstrates the training and validation accuracy to compare the model with data augmentation and the model without augmentation. We preferred choosing the augmented model even though the validation accuracy is slightly higher in the non-augmented model for our project. It is because we transformed the images in many ways, but the augmented model was still able to identify the images correctly, and so it is expected that it will later be able to give a good result with datasets containing noises. Moreover, the difference in accuracy is so low and can be ignored. However, as mentioned, it is supposed to be more robust to different variations and can perform better if images later on during testing have more distortions. We can observe from the graph that there is no significant impact on training and validation loss rate if we do not use augmentation. Therefore, the augmented model is better because it is performing almost the same as the non-augmented data.
564
R. F. Lamisa and Md. R. Islam
5 Conclusion In this project, we tried to reformulate the challenge of detecting Covid-19 cases from chest X-ray images as a binary classification problem and attempted to solve it using state of the art deep learning methods. To achieve this, we leveraged a Convolutional Neural Network that acts as our classifier. We used a dataset of annotated COVID positive and negative images. First, to choose the network, we performed a series of fixed experiments over a number of different ConvNet architectures. From there, based on the performance on a number of different metrics we chose a ResNet34 architecture. This architecture was then used to do a number of extended experiments. Using these experiments, we identified the optimal set of parameters and hyperparameters. We also analyzed and used the best set of data augmentation techniques that would help us to fully utilize the dataset, improve our generalization capacity and prevent the model from overfitting. As future work, this study can be further extended by localizing the areas in the lungs where the COVID as impacted the most. We can also extend this by further classifying the COVID positive cases into more specific categories based on risk, seriousness etc.
References 1. WHO Director-General’s opening remarks at the media briefing on COVID-19—11 March 2020 (2020) World Health Organization. https://www.who.int/director-general/speeches/det ail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march2020 2. Episode #14—COVID-19—Tests (2020) World Health Organization. https://www.who.int/ emergencies/diseases/novel-coronavirus-2019/media-resources/science-in-5/episode-14--covid-19---tests?gclid=Cj0KCQjwtrSLBhCLARIsACh6RmhtM0kgy0I6B0l7nldIKITjvlN cDrjCoJfpi5WrRFEzBvgXBdzQIJoaAvOLEALw_wcB 3. MacMillan C (2021) Which COVID-19 test should you use? Yale Medicine. https://www.yal emedicine.org/news/which-covid-test-is-accurate#:%7E:text=%E2%80%9CPCR%20tests% 20are%20considered%20the,risk%20for%20a%20false%20positive.%E2%80%9D 4. Chowdhury MEH, Rahman T, Khandakar A, Mazhar R, Kadir MA, Mahbub ZB, Islam KR, Khan MS, Iqbal A, Al-Emadi N, Reaz MBI, Islam MT (2020) Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8:132665–132676 5. Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Kashem SBA, Islam MT, Maadeed SA, Zughaier SM, Khan MS, Chowdhury ME (2020) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. arXiv preprint arXiv:2012. 02238 6. Horry MJ, Chakraborty S, Paul P, Ulhaq A, Pradhan B, Saha M, Shukla N (2020) COVID-19 detection through transfer learning using multimodal imaging data. IEEE. https://ieeexplore. ieee.org/stamp/stamp.jsp?tp=&arnumber=9167243 7. Hammoudi K (2020) Deep learning on chest X-ray images to detect and evaluate. https://arxiv. org/abs/2004.03399 8. Haghanifar A (2020) COVID-CXNet: detecting COVID-19 in frontal chest X-ray images. https://arxiv.org/abs/2006.13807
Identification and Detecting COVID-19 from X-Ray Images Using …
565
9. Basu S, Mitra S, Saha N (2020) Deep learning for screening COVID-19 using chest X-ray images. In: 2020 IEEE symposium series on computational intelligence (SSCI), pp 2521–2527. https://doi.org/10.1109/SSCI47803.2020.9308571 10. ScienceDirect (2021) Deep learning approaches for COVID-19 detection based on chest X-ray images. https://www.sciencedirect.com/science/article/pii/S0957417420308198 12. Qassim H, Verma A, Feinzimer D (2018) Compressed residual-VGG16 CNN model for big data places image recognition. In: 2018 IEEE 8th Annual computing and communication workshop and conference (CCWC), pp 169–175. https://doi.org/10.1109/CCWC.2018.8301729 13. Wen L, Li X, Li X, Gao L (2019) A new transfer learning based on VGG-19 network for fault diagnosis. In: 2019 IEEE 23rd International conference on computer supported cooperative work in design (CSCWD), pp 205–209. https://doi.org/10.1109/CSCWD.2019.8791884 11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi. org/10.1109/CVPR.2016.90 14. Mangal A (2020). CovidAID: COVID-19 detection using chest X-ray. https://arxiv.org/abs/ 2004.09803 15. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. In: 2019 Scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT), pp 1–5. https://doi.org/10.1109/EBBT.2019.8741582 16. Jain R, Gupta M, Taneja S et al (2021) Deep learning based detection and analysis of COVID19 on chest X-ray images. Appl Intell 51:1690–1700. https://doi.org/10.1007/s10489-020-019 02-1 17. Jain G, Mittal D, Thakur D, Mittal MK (2020) A deep learning approach to detect Covid-19 coronavirus with X-Ray images. Biocybern Biomed Eng 1391–1405. https://doi.org/10.1016/ j.bbe.2020.08.008 18. COVID-19 Radiography Database (2021) Kaggle. https://www.kaggle.com/tawsifurrahman/ covid19-radiography-database 19. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 84–90. https://doi.org/10.1145/3065386 20. Huang G, Liu Z, Pleiss G, van der Maaten L, Weinberger K (2019) Convolutional networks with dense connectivity. IEEE Trans Pattern Anal Mach Intell 1. https://doi.org/10.1109/tpami. 2019.2918284 21. https://arxiv.org (2020) EfficientNet: rethinking model scaling for convolutional neural networks (arXiv:1905.11946v5). Mingxing Tan 1 Quoc V. Le 1. https://arxiv.org/pdf/1905. 11946.pdf
A Novel Metaheuristic with Optimal Deep Learning-Based Network Slicing in IoT-Enabled Clustered Wireless Sensor Networks in 5G Systems B. Gracelin Sheena and N. Snehalatha
Abstract In recent times, high rate of 5G networks enables the Internet of Things (IoT) and wireless sensor networks (WSN) to effectively gather the data from the deployed environment. Due to the limitations of energy and network slicing process, the efficiency of the IoT-enabled WSN is considerably affected. Therefore, this study introduces a novel fruit fly optimization-based clustering (FFOC) with optimal gated recurrent unit (OGRU)-based network slicing for IoT-enabled WSN in 5G networks. The proposed FFOC–OGRU technique initially constructs clusters and selects cluster heads using FFOC technique. Besides, the OGRU technique is employed for network slicing process, and the hyperparameters involved in the GRU model are optimally adjusted by the use of Bayesian optimization technique, which results in enhanced performance. For inspecting the improved performance of the FFOC–OGRU technique, a comprehensive experimental analysis is carried out and the outcomes are examined in several aspects. The experimental outcome showcased the betterment of the FFOC–OGRU technique in terms of several measures. Keywords Gated recurrent unit · Bayesian optimization · Clustering · 5G networks · Metaheuristic algorithms · Network slicing
1 Introduction Communication becomes a most important part of their everyday life. 5G becomes one of the major technologies in assisting a broad spectrum of sustainable developments, from access to a sustainable environment, and good health to energy efficiency. Nowadays, several kinds of communication are accessible involving Peer to Peer (P2P), Machine to Machine (M2M), and Human to Machine (H2M) computing B. Gracelin Sheena (B) · N. Snehalatha Department of Computational Intelligence, SRM Institute of Science and Technology Kattankulathur, Chennai, Tamil Nadu, India e-mail: [email protected] N. Snehalatha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_40
567
568
B. Gracelin Sheena and N. Snehalatha
[1]. The effective integration of 5G, smart platform, IoT, and artificial intelligence, would be changing the world, providing sustainable, smart connectivity to different service-oriented applications such as infrastructures, platform, and applications. The primary goal of the 5G network framework is to offer an opportunity for generating novel business strategies and services for entering communication network [2]. Wireless sensor networks (WSN) consists of distributed micro-gadgets embedded with many sensing capacities (named sensor) that are utilized for monitoring the environments and communicate information to a central Base Station (BS) [3]. Generally, WSN includes a huge amount of sensor nodes which is armed with constrained energy resource; however, it is needed to function without replacing/recharging battery for longer period. Energy effective transmission solution for WSN can be emphasized by several authors [4]. To extend a network lifetime, clustering technique has been presented for achieving energy effective communications among sensors. A clustering method may divide sensor nodes into distinct groups or clusters. In all the clusters, a Cluster Header (CH) is selected to be responsible for making a communication schedule, collecting information from each sensor in the clusters, and sending the collected information back to the BS [5]. The CH might interconnect with the BS straightaway or through another CHs. The routing among the sensor nodes in a similar cluster is known as intracluster routing [6]. The routing system could either be single hop or multi-hop and is based on various aspects, like the transmission capacity of the sensor/the objective of a clustering method. Network slicing is a vital technology for using network service from the sustainable 5G environments [7, 8]. It has the benefit of virtualizing infrastructure which offers flexibility, effective consumption of constraint resource and offers maximal service, basically mobile and IoT, among others. Still, it has some constraints associated with high latency, centralization, load balancing, as well as processing of huge amount of information [9]. This study introduces a novel fruit fly optimization-based clustering (FFOC) with optimal gated recurrent unit (OGRU)-based network slicing for IoT-enabled WSN in 5G networks. The proposed FFOC–OGRU technique originally constructs clusters and selects cluster heads using FFOC technique. In addition, the OGRU technique is employed for network slicing process and the hyperparameters involved in the GRU model are optimally adjusted by the use of Bayesian optimization technique, which results in enhanced performance. For inspecting the improved performance of the FFOC–OGRU manner, a comprehensive experimental analysis is carried out and the outcomes are examined in numerous aspects.
2 Literature Review Le et al. [10] incorporated several ML methods, NFV, big data, and SDN for building an experimental framework and complete structure for the network slicing and upcoming SONs. At last, depending on this architecture, they effectively executed an earlier state traffic classification and network slicing for mobile broadband traffic
A Novel Metaheuristic with Optimal Deep Learning-Based Network …
569
application performed at BML, National Chiao Tung University (NCTU). CasadoVara et al. [11] developed a new method to process heterogeneous temperature data gathered through an IoT network in smart buildings and transform to homogeneous data which is utilized as input to monitored and control model in smart building, improving the performances. The presented method, named IoT slicing, combines complex clusters and networks for reducing the input error and enhancing the monitoring and control of a smart building. Casado-Vara et al. [12] proposed a novel method which permits transforming heterogeneous to homogeneous data; this method is known as IoT slicing. This method comprises making graphs with the measurement of the IoT networks, and virtualizing layer depends on the clustering of graph. In order to authenticate, the efficacy of these novel methods present the result of an analysis with a smart building temperature control method. Vinodha and Mary Anita [13] proposed a secure data aggregation system via slicing the data produced by all sensors placed in layered topology and allowing en route aggregation. Zhang et al. [14] proposed a TDSM method for improving the performances of WSN. By introducing a trust evaluation method, TDSM calculates the trust value of all the nodes in the respective time interval, and based on this trust value, they perform CH selection and avoid the untrusted node from being chosen as a CH. Ghosal et al. [15] resolve the problems of life span maximization in WSN by developing a new clustering method, wherein the cluster is dynamically formed. Particularly, examine the network life span maximization problem by balancing the energy utilization among CHs. According to the analyses, they offer an optimum clustering algorithm, wherein the cluster radius is evaluated by alternating direction technique of multipliers. Later, presented a new OPTIC model for WSN models.
3 The Proposed Model In this study, a new FFOC–OGRU technique is developed to accomplish energy efficiency and network slicing in IoT-enabled WSN. The FFOC–OGRU technique operates on major 2 levels such as FFOC-based clustering and OGRU-based network slicing. The detailed working of these processes is offered in the succeeding sections.
3.1 Process Involved in FFOC Technique FFO algorithm is a novel swarm intelligent algorithm that depends on fruit flies’ foraging behavior and belongs to a type of interactive evolutionary computations. The fruit fly is smaller than rotting plants and eats fruit that broadly exists in tropical and temperate climate regions worldwide. The fruit fly has olfactory and visual senses when compared to another species. The food searching method generated by the fruit flies could be summarized in the subsequent steps: (1) first, smell the food source
570
B. Gracelin Sheena and N. Snehalatha
with the help of olfactory organ and fly toward that position; (2) next, get closer to the food position through their sensitive vision; and (3) at last, another fruit fly flocking position and fly toward that direction. Based on the food searching characteristic of fruit fly swarm, the FOA is separated into seven steps in the following: Step 1. Parameter initiation: the major parameter of the FOA are the population size pop, overall evolution numbers, and the early fruit fly swarm position (X 0 , Y0 ). Step 2. Population initiation: X i = X 0 + rand, yi = y0 + rand.
(1)
Step 3. Calculation of distance (Di ) and smell (Si ): Di = Si =
X i2 + Yi2 ,
1 . Di
(2)
Step 4. Calculation of the fitness function ( f i ): f i = f (Si )
(3)
Step 5. Discover the minimal individual fruit flies with the optimal fitness function ( f b ) among the fruit fly swarms [16]: [bestXbestindex] = min( f (Si )).
(4)
Step 6. Selection process: possess the optimal fitness function value and coordinate (X b , Yb ). Next, the fruit fly swarm fly toward that position with an optimal fitness function value through vision: f b = bestX, X b = X (bestindex), Yb = Y (bestindex).
(5)
Step 7. Determine when the ending criteria are fulfilled. Otherwise, proceed to Step 2; or else, end the process. Let WSN of n sensor has been established arbitrarily. To CH selection, the FFOC technique executes the fruit fly populations that have been utilized with making suitable clusters and maintains the decreased power consumption of the model.
A Novel Metaheuristic with Optimal Deep Learning-Based Network …
571
Consider that X = (X 1 , X 2 , . . . , X n ) represents the population vector of WSN with n sensors, where X i ( j) ∈ {0, 1}. The CH and normal nodes have been demonstrated as one and zero. The fundamental population of NP solutions is inspired arbitrarily with means of 0s and 1s and showcased as: X i ( j) =
1, if rand ≤ popt 0, otherwise
(6)
where popt signifies the recommended percentage of CHs, and rand refers the uniform arbitrary value in 0 and 1. The arbitrarily located sensor nodes have been arranged as K clusters: C1 , C2 , . . . , C K . The CH election has been responsible to decrease the cost of FF. Therefore, FF to CH election has been demonstrated as: f obj_ CH =
2
wi × f i
(7)
i=1
2 with i=1 wi = 1. Maximal stability period has been accomplished with decreasing the SD of RE of a node as the main concern. So, the SD (σRE ) has been appropriate to measure the supremacy of uniformly distributed load in sensor node and exhibited as: f 1 = σRE
1 n 2 μRE − E node j =
n j=1
(8)
n E(nodei ), E(nodei ) signifies the RE of ith node, and n showwhere μRE = n1 i=1 cases the node count. The latter objective has been dependent upon clustering quality where the function of cluster isolation and cohesion has been implemented. If the proportion of cohesion for separating has minimal, afterward optimum clustering was executed. It is accomplished by utilizing FF ratio of entire Euclidean distance of CH to CM and restricted Euclidean distance of two different CHs. K ∀node j ∈Ck d node j , CHk k=1 (9) f2 = Q c = min {d(CHc , CHk )} ∀Cc ,Ck ,Cc =Ck
3.2 Process Involved in OGRU Technique In the network slicing procedure, the feature extraction technique has been implemented and the contained features are user device variety, duration, packet loss rate, packet delay, bandwidth, delay rate, speed, jitter, and modulation type. Assume
572
B. Gracelin Sheena and N. Snehalatha
FFr = F1 , F2 , . . . , Fn as the normalization features in which n refers the feature length. Also, the weight features have been calculated utilizing in Eq. (17): NFFr = FFr × WFr
(10)
where NFFr refers the novel features, and WFr implies the weight function to feature scaling procedure. Next, these features are fed as to OGRU technique to network slicing classifier. GRU is determined as a variant of LSTM and is utilized for solving the gradient vanishing problems in standard RNN, thus enhancing the learning of long term dependency from the network. Also, The GRU block uses the tanh and sigmoid functions for calculating the essential value. There is no single forget gate to these types of blocks, and the update or input gate is accountable to control the data flow. Because of the variation in these two frameworks, it has simpler design and fewer parameters that eventually make it more computational easier and efficient for training. Every gate and candidate activation have its own bias and weight, and the previous activation value and current block input are utilized as input to calculate this value [17]. Initially, the sigmoid function is utilized for calculating the values of the gate. Equations (11–14) are an arithmetical model of GRU. u t = σ (Wxu xt + Whu h t−1 + bu )
(11)
rt = σ (Wxr xt + Whr h t−1 + br )
(12)
h˜ t = tanh W
(r t h t−1 ) + bh˜ xh x t + Whh
(13)
h t = u t h˜ t + (1 − ut ) h˜ t
(14)
In this work, GRU, Vanilla LSTM, BiLSTM, and Stacked LSTM are executed by the Keras package (A publicly available software library which provides a Python interface for ANN), and the loss function is integrated by MSE. Typically, there are two types of hyperparameter optimization algorithms, i.e., automatic and manual search techniques. Manual hyperparameter optimization is a complex process for reproducing as it depends on several efforts of error and trial. Grid search isn’t scalable for high dimensions. Random search acts as a greedy algorithm, settle for local optimal, and hence, it isn’t attaining to global optimal. Another evolutionary optimization algorithms need a great number of training cycle and it could be noisy. Bayesian optimization is an approach to solve function that is computationally costly for finding the extremal. The essential components in the optimization algorithm are. • Gaussian method of f (x). • Bayesian upgrade process for changing the Gaussian method at every assessment of f (x).
A Novel Metaheuristic with Optimal Deep Learning-Based Network …
573
• An acquisition function a(x) depends on the Gaussian method of f that are maximized for determining the upcoming points x for estimation. With this method, it is determined where the function obtains an optimum value, therefore, maximizing the model accuracy and decreasing the loss [18]. As previously mentioned, the optimization aim is to detect the minimal values of the loss at the sample point for an unknown function f : xopt = arg min f (x), x∈D
(15)
whereas D represents the search space of x. Gaussian method work in a manner that expects output to be equivalent to the input, and therefore assumed that statistical method of the function. P(M|E)∞P(E|M)P(M).
(16)
Equation (15) reproduces the concept of Bayesian optimization. Analyzing sample information E, prior probability P(ME) of a M models are proportionate to the P(EM) probability of observing E provided M models multiplied with the posterior probability of P(M). Then, the prior data is utilized for finding wherein the function f (x) is minimalized by the condition value. The condition is denoted as an acquisition/utility function a. When xbest the position of the minimum posterior is mean and μ Q (xbest ) is the minimum values of the prior mean than the expected improvements are E I (x, Q) = E Q max 0, μ Q (xbest ) − f (x)
(17)
In another word, when the enhancement of function values is lesser than the expected values afterward the process is implemented, the present optimum value point might be the local optimum solution, and the process would detect the optimal value point in another position. Seeking the sample areas includes exploitation (sample from the highest values) and exploration (sample from the area of higher uncertainty) that assist in decreasing the amount of samplings. At last, the efficiency would be enhanced once the function has many local maximal. Besides the sample data, Bayesian optimization is based upon the posterior distribution of the function f , i.e., essential part from the statistical inference of the prior distribution of functions f. The fundamental step in the optimization is given below: 1. For present iteration t. 2. Calculate yi = f (xi ) for determining quantity of points xi considered at arbitrary within the parameter bound. 3. Upgrade the Gaussian method of f (x) to attain a prior distribution over function Q( f |xi , yi fori = 1, . . . , t). 4. Detect the novel point x which maximizes the acquisition function a(x).
574
B. Gracelin Sheena and N. Snehalatha
The procedure ends afterward the amount of iterations/reaching time limits.
4 Performance Validation This section investigates the performance of the FFOC–OGRU method in terms of different aspects. Figure 1 demonstrates the ARE analysis of the FFOC–OGRU with existing manners under various rounds. The figure reported that the MO-PSO algorithm has obtained poor outcomes with the minimal ARE, whereas the FUCHAR and F5NUCP techniques have attained moderate ARE. At the same time, the HMBCR technique has resulted in a near optimal ARE. However, the presented FFOC–OGRU technique has accomplished maximum performance with the minimum ARE. Figure 2 depicts the PLR analysis of the FFOC–OGRU with recent approaches in different rounds. The figure stated that the MO-PSO technique has attained worse results with the maximal PLR, whereas the FUCHAR and F5NUCP methods have gained moderate PLR. Simultaneously, the HMBCR system has resulted in a near optimum PLR. Also, the presented FFOC–OGRU methodology has accomplished effectual performance with the superior PLR. For instance, with 1000 rounds, the FFOC–OGRU manner has reached a minimal PLR of 21.73%, whereas the HMBCR, FUCHAR, F5NUCP, and MO-PSO algorithms have resulted in a superior PLR of 26.03%, 35.89%, 51.78%, and 58.35% correspondingly. Table 1 demonstrates the network slicing performance of the FFOC–OGRU with existing techniques [19–22]. The experimental values stated that the SVM approach has resulted in ineffective performance with the acc y , sens y , spec y , prec S , NPV, F1Score , and MCC of 68.711%, 14.533%, 95.800%, 63.372%, 95.800%, 23.644%, and 18.333%, respectively. Followed by, the DBN model has gained Fig. 1 ARE analysis of FFOC–OGRU model with existing approaches
A Novel Metaheuristic with Optimal Deep Learning-Based Network …
575
Fig. 2 PLR analysis of FFOC–OGRU model with existing approaches
Table 1 Comparative analysis of FFOC–OGRU model with different measures Methods
Accuracy
Sensitivity
Specificity
Precision
NPV
F1-Score
MCC
SVM
68.711
14.533
95.800
63.372
95.800
23.644
18.333
DBN
87.822
81.733
90.867
81.733
90.867
81.733
72.600
PSONN-DBN
93.867
85.200
98.200
95.946
98.200
90.254
86.125
FFOC–OGRU
95.102
87.393
99.25
96.547
99.059
91.774
88.549
slightly enhanced outcome with the acc y , sens y , spec y , prec S , NPV, F1Score , and MCC of 87.822%, 81.733%, 90.867%, 81.733%, 90.867%, 81.733%, and 72.600%, respectively. Though the PSONN-DBN model has tried to accomplish reasonable performance, the proposed model has outperformed the other ones with the acc y , sens y , spec y , prec S , NPV, F1Score , and MCC of 95.102%, 87.393%, 99.25%, 96.547%, 99.059%, 91.774%, and 88.549%, respectively.
5 Conclusion This study has introduced a novel FFOC with OGRU-based network slicing for IoTenabled WSN in 5G networks. The proposed FFOC–OGRU technique originally constructs clusters and selects cluster heads using FFOC technique. In addition, the OGRU technique is employed for network slicing process and the hyperparameters involved in the GRU model are optimally adjusted by the use of Bayesian optimization technique, which results in enhanced performance. For inspecting the improved performance of the FFOC–OGRU manner, a comprehensive experimental analysis is carried out and the outcomes are examined in numerous features.
576
B. Gracelin Sheena and N. Snehalatha
References 1. Singh SK, Salim MM, Cha J, Pan Y, Park JH (2020) Machine learning-based network subslicing framework in a sustainable 5G environment. Sustainability 12(15):6250 2. Arfaoui G, Bisson P, Blom R, Borgaonkar R, Englund H, Félix E, Zahariev A (2018) A security architecture for 5G networks. IEEE Access 6:22466–22479 3. Xu L, O’Hare GM, Collier R (2017) A smart and balanced energy-efficient multihop clustering algorithm (smart-beem) for mimo IoT systems in future networks. Sensors 17(7):1574 4. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38(4):393–422 5. Mekikis PV, Antonopoulos A, Kartsakli E, Alonso L, Verikoukis C (2016) Connectivity analysis in wireless-powered sensor networks with battery-less devices. In: 2016 IEEE Global communications conference (GLOBECOM). IEEE, pp 1–6 6. Mekikis PV, Lalos AS, Antonopoulos A, Alonso L, Verikoukis C (2014) Wireless energy harvesting in two-way network coded cooperative communications: a stochastic approach for large scale networks. IEEE Commun Lett 18(6):1011–1014 7. Danial SN, Smith J, Veitch B, Khan F (2019) On the realization of the recognition-primed decision model for artificial agents. Human-Centric Comput Inf Sci 9(1):1–38 8. Kotulski Z, Nowak TW, Sepczuk M, Tunia M, Artych R, Bocianiak K, Wary JP (2018) Towards constructive approach to end-to-end slice isolation in 5G networks. EURASIP J Inf Secur 2018(1):1-23 9. Mamolar AS, Pervez Z, Calero JMA, Khattak AM (2018) Towards the transversal detection of DDoS network attacks in 5G multi-tenant overlay networks. Comput Secur 79:132–147 10. Le LV, Lin BSP, Tung LP, Sinh D (2018) SDN/NFV, machine learning, and big data driven network slicing for 5G. In: 2018 IEEE 5G World Forum (5GWF). IEEE, pp 20–25 11. Casado-Vara R, Martin-del Rey A, Affes S, Prieto J, Corchado JM (2020) IoT network slicing on virtual layers of homogeneous data for improved algorithm operation in smart buildings. Future Gener Comput Syst 102:965–977 12. Casado-Vara R, De la Prieta F, Prieto J, Corchado JM (2019) Improving temperature control in smart buildings based in IoT network slicing technique. In: 2019 IEEE Global communications conference (GLOBECOM) (pp 1–6). IEEE 13. Vinodha D, Mary Anita EA (2021) Discrete integrity assuring slice-based secured data aggregation scheme for wireless sensor network (DIA-SSDAS). Wirel Commun Mobile Comput 2021 14. Zhang Q, Liu X, Yu J, Qi X (2020) A trust-based dynamic slicing mechanism for wireless sensor networks. Procedia Comput Sci 174:572–577 15. Ghosal A, Halder S, Das SK (2020) Distributed on-demand clustering algorithm for lifetime optimization in wireless sensor networks. J Parallel Distrib Comput 141:129–142 16. Xiao C, Hao K, Ding Y (2015) An improved fruit fly optimization algorithm inspired from cell communication mechanism. Math Probl Eng 2015 17. Liao Z, Lan P, Fan X, Kelly B, Innes A, Liao Z (2021) SIRVD-DL: a COVID-19 deep learning prediction model based on time-dependent SIRVD. Comput Biol Med 138:104868 18. Kolar D, Lisjak D, Paj˛ak M, Gudlin M (2021) Intelligent fault diagnosis of rotary machinery by convolutional neural network with automatic hyper-parameters tuning using Bayesian optimization. Sensors 21(7):2411 19. Al-Otaibi ST, Al-Rasheed A, Mansour RF, Yang E, Joshi GP, Cho W (2021) Hybridization of metaheuristic algorithm for dynamic cluster-based routing protocol in wireless sensor networks. IEEE Access 20. Arjunan S, Pothula S, Ponnurangam D (2018) F5N-based unequal clustering protocol (F5NUCP) for wireless sensor networks. Int J Commun Syst 31(17):e3811
A Novel Metaheuristic with Optimal Deep Learning-Based Network …
577
21. Arjunan S, Sujatha P (2018) Lifetime maximization of wireless sensor network using fuzzy based unequal clustering and ACO based routing hybrid protocol. Appl Intell 48(8):2229–2246 22. Aouedi O, Piamrat K, Hamma S, Perera JK (2021) Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network. Ann Telecommun 1–13
Student Attention Base Facial Emotion State Recognition Using Convolutional Neural Network Md. Mahmodul Hassan , Khandaker Tabin Hasan , Idi Amin Tonoy , Md. Mahmudur Rahman , and Md. Asif Bin Abedin
Abstract E-learning system advancements give students new opportunities to better their academic performance and access e-learning education. Because it provides benefits over traditional learning, e-learning is becoming more popular. The coronavirus disease pandemic situation has caused educational institution cancelations all across the world. Around all over the world, more than a billion students are not attending educational institutions. As a result, learning criteria have taken on significant growth in e-learning, such as online and digital platform-based instruction. This study focuses on this issue and provides learners with a facial emotion recognition model. The CNN model is trained to assess images and detect facial expressions. This research is working on an approach that can see real-time facial emotions by demonstrating students’ expressions. The phases of our technique are face detection using Haar cascades and emotion identification using CNN with classification on the FER 2013 datasets with seven different emotions. This research is showing real-time facial expression recognition and help teachers adapt their presentations to their student’s emotional state. As a result, this research detects that emotions’ mood achieves 62% accuracy, higher than the state-of-the-art accuracy while requiring less processing. Keywords E-learning · Facial expression · Convolutional neural networks (CNN) · Intelligent education management system
1 Introduction E-learning system is giving more popular opportunities in the advancement of technology, and many students are opting to learn via the e-learning system. Students can learn with an e-learning system from any location as long as there is a connected netMd. M. Hassan (B) · K. T. Hasan · I. A. Tonoy · Md. M. Rahman · Md. A. B. Abedin Department of Computer Science and Engineering, American International University-Bangladesh, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_41
579
580
M. M. Hassan et al.
work. As a result, using an e-learning system to learn is becoming increasingly popular. The absence of an emotional state is one of the distinctions between e-learning and face-to-face education. Facial emotion has generally been established to the role a significant part in the learning process. If the student is happy, learning efficiency will be better; learning efficiency will be lower if the learner is unhappy. The education institution allows teachers can monitor student facial expression movement and adapt teaching techniques as needed. Teachers are often real time identified to watch students’ emotions in an e-learning environment. In this scenario, it is easy for teachers to sense emotions and recognize issues. This study focused on automatically real-time detecting students’ learners’ learning state based on their facial expressions. Emotion recognition in education platform by developing an automated system that analyzes students’ facial expressions using convolutional neural networks (CNNs), a deep learning technique model extensively used in picture classification. It is several stages of image processing that extract feature representations. Face detection and emotion recognition are the train and test phases of our system. The system can detect seven emotions that should be recognized: neutral, anger, fear, sad, happiness, surprise, and disgust. This research model students’ facial emotions captured during online learning and detect real-time facial feature recognition to generate the overall emotional state of the students.
2 Working-Related Paper The central aspect of emotional analytics is facial expression recognition. These feelings are divided into several categories and subcategories. [1]Zhu Aiqin et al. [2] proposed a comparable facial expression detection system that employs a simple set theory-based algorithm and is classified based on template-based classification. Rule-based, ANN-based, HMM-based, Bayesian, and SVM-based classifiers are also utilized. Then, when the student’s emotional state shifts, teaching techniques are adjusted correspondingly. The classification tree’s primary flaw is that it fails to recognize all needed emotions. Jacob Whitehill et al. [3] developed a using binary classification and machine learning approaches, automated identification of student involvement from facial expression has been developed. The author their proposed technique outperforms the state-of-the-art ECG-based emotion recognition with two openly available datasets, SWELL [7] and AMIGOS [8]. Al-Alwani et al. [4] performed to increase students’ learning inclinations in e-learning systems and accomplished mood extraction using a neural network that identifies face expressions according to facial characteristics. Magdin et al. [5] have created software that uses a camera to assess the user’s emotional state. The technology employs neural networks to operate in real time. This experiment study motives us to build a facial emotion recognition system that can recognize students’ emotional states and motivation levels in real-time e-learning systems. This convolutional neural network communicates the students’ emotional states to the instructor in real time to create a more engaging educational environment. Our research backs earlier studies that
Student Attention Base Facial Emotion State Recognition …
581
show that in e-learning systems, it is feasible to detect the motivation level of both the individual and the virtual classroom. The weights of the convolutional layers of this network are transferred to the emotion recognition network.
3 Proposed Model Figure 1 shows the model of our proposed method that is shown in this model and explains that we used a convolutional neural network (CNN) architecture classifier to analyze students’ facial expressions. First, the system recognizes faces in the input image, which are then cropped and normalized to a size of 48 * 48 pixels. These facial emotions demonstrate in a classifier using a CNN model. Face location determination, feature extraction, and emotion categorization are the three steps for emotion identification. After using a face detection method to locate the face, image processing techniques and knowledge of the symmetry and development of the face are utilized to analyze the face region to identify the feature positions. Finally, the findings of facial expression recognition are the output (anger, happiness, sadness, disgust, surprise, or neutral).
4 CNN Working Method Procedure Figure 2 shows the first layer to demonstrate features from an input picture is the convolution layer. In the case of a convolutional neural network, the primary goal of convolution is to extract features from the input picture. Convolution retains the spatial connection between pixels by learning facial emotion characteristics with squares of input data. It evaluates and computes the dot product of two matrices: the picture and the other a kernel. The feature represents extraction with the CNN model and is classified with emotions. It has three convolutional layers, two pooling layers, each fully connected layer, and a softmax layer with emotion classes. The input picture is a grayscale facial image with a 48 * 48-pixel size. We utilized the max-pooling layer and kernels with stride 2 for the pooling layers. As a result, we employed the rectified linear unit (ReLU) to incorporate nonlinearity into our model, which is the most often used activation function. In this model, the output size N of each convolutional layer can be formulated as where I, F, P, and S indicate the corresponding input dimensions, kernel sizes, padding size, and step size. N=
I − F + 2P +1 S+1
(1)
where I, F, P, and S indicate the corresponding input dimensions, kernel sizes, padding size, and step size. I represents the input volume in our case 128, F repre-
582
Fig. 1 Emotion recognize workflow diagram
Fig. 2 CNN working diagram
M. M. Hassan et al.
Student Attention Base Facial Emotion State Recognition …
583
sents the Kernel size—in our case 5 and P represents the padding—in our case 0. S stands for stride, which haven’t specified. As a result, we enter the following into the formula: Output = (128−5+0)/1+1 Output = (124, 124, 40)
4.1 Feature Preprocessing There are several phases: Because color information does not assist us in detecting key edges or other characteristics, the pictures are first converted to grayscale. Face recognition in OpenCV using Haar cascades is a valuable object detection approach proposed by the CNN model. This method is based on a machine learning methodology. A cascade function is trained on many positive (facial pictures) and negative (non-face images) images before detecting faces. This figure depicts three Haar features, edge features, linear features, and four-rectangle features. Each segment comprises a black rectangle and a white rectangle; we get a single value for the part by subtracting the sum of all pixels in the white rectangle from the sum of all pixels in the black rectangle. The goal of these features is to indicate the presence of specific characters in the image and then distinguish the non-face and face portions. Cropping the faces that have been identified and resizing the pictures to a set size (48, 48). The reprocessing training and validation procedure is shown in Fig. 3.
Fig. 3 Training and validation reprocessing histogram
584
M. M. Hassan et al.
4.2 Convolution Layers Local patterns in small two-dimensional windows are the focus of these hidden layers. They are layers that identify visual characteristics like lines, edges, and color dips in pictures. These layers allow one to learn an image attribute at a particular location and remember it anyplace in one’s vision. Convolution layers are capable of learning more complicated components based on what was known in the preceding layer. For example, a layer may learn different sorts of lines that exist in the image, and the next layer would be able to understand visual elements made up of the pipes known by the preceding layer. These networks’ ability to do so helps them acquire visual ideas that are becoming increasingly complex in a more efficient manner. Convolutional neural networks work on three dimensions: width, height, and channel. The channel in the input might be 1 for a grayscale image or 3 for an RGB color image, one for each color (red, green, and blue). The depth of the network is generally more than 3. Not all input neurons will be linked to all convolution layer neurons. It is done by storing the picture pixels in small, localized regions of the input neurons’ space. The kernel filter was applied to the original picture to produce the output. The outcome is the product and sum of the kernel matrix used for each slide’s original image. Depending on the kernel size, the output size changes. The figure, for example, depicts a filter for identifying horizontal edges. In the first slide, each output value is the result of the product, and some of the original pictures with the filter: 5 + 1 + 1 − 1 − 5 − 3 = − 2 (1) Because the 3 × 3 filter may be placed in 4 × 4 locations in the 6 × 6 input picture, the output is 4 × 4. The matrix starts in the upper left and proceeds to the right, passing through all neurons. Continue with the row below once you’ve completed a row. It is possible to identify various characteristics by using different filters (Fig. 4).
Fig. 4 Convolution layer filter diagram
Student Attention Base Facial Emotion State Recognition …
585
Fig. 5 Pooling layer block diagram
4.3 Pooling Layer The pooling layer can demonstrate a lower dimensional of each feature map while retaining the most relevant data. There are primary forms of pooling: maximum pooling, average pooling, and some pooling. Pooling’s goal is to gradually shrink the input representation’s spatial dimension and make the convolutional neural network insensitive to minor transformations, distortions, and translations in the input picture. The fully connected layer used the block’s maximum as the only output to the pooling layer in those research. In each max-pooling layer, the padding size is 0, and the output size can also be expressed as I−F (2) N= S+1 In the convolutional layers and max-pooling layers, rectified linear units (ReLUs) are used as the activation function to prevent gradient explosion and provide quicker convergence speed during the back-propagation operation which may be defined (Fig. 5).
4.4 Fully Connected Layer A conventional multilayer perception, which employs the activation function of the output layer, is a fully connected layer. The phrase “Fully Connected Layer” each neuron in the preceding convolution layer to be connected to every neuron in the next convolution layer. The objective of this fully connected layer is to categorize the picture in multiple classes based on the training dataset using the output of the convolution and the pooling layers. Therefore, the pooling layers function as feature
586
M. M. Hassan et al.
Fig. 6 Real-time emotion detection
extractors from the input seven types of facial expression, while the completely connected layers function as a classification. Z= f
N
X i ∗ Wi + b
(3)
i=1
4.5 Facial Emotion Analysis The general CNN model boosts the training map to 62.4%, with a real-time average recognition time of 0.78 s. The modified CNN model is then implemented, which combines the elimination of the whole connection layer with the deep convolution layer and residual modules. The model can detect 0.22s real-time facial expression after 50 step training, with a map of 62.1%, as demonstrated by the above test set. The categorization results of the facial expressions randomly picked from the test set are shown in figure. In this research, the findings of the various experiments line graph and real-time emotion will be demonstrated in this chapter. However, it is essential to highlight that almost all pictures are of individuals performing, except for the FER database, which uses spontaneous emotions. It is more challenging to recognize random emotions. Consequently, the accuracy results are almost 62% for a real-world implementation of these learning architectures accuracy for training and validation would almost certainly be best (Figs. 6 and 7).
5 Experiment and Results This research gathers the picture that comprises datasets of FER 2013 from an Internet web Kaggle datasets and inserts it into the CNN model classification to assess the
Student Attention Base Facial Emotion State Recognition …
587
Fig. 7 Training and testing accuracy
performance of the proposed efficiency in actual applications. In a demonstration, the satisfaction of real-time video and camera pupils is increased considerably. There are seven types of tagged on a total of 28,709 faces. The distribution histogram of the likelihood of emotion from which one may intuitively detect the general feelings and assess the class’s emotional condition. This face expression will be marked following the most suspected expression of the following emotional characteristics. However, the full facial expression of an image with multiple faces will be chosen by the sum of the several expression characteristics included on each face. As a result, this research detects emotions; mood achieves 62% accuracy higher than the traditional classifier random forest accuracy while requiring less processing. In total efficiency in the experiment result can give favorable to the deep model performance when applied to natural environments. The below matrix shows how many forecasts have been made, which have been correct and incorrect. The matrices’ values will be shown as a percentage. Losses this parameter enables us to see which samples have been lost due to inaccurate predictions. At the end of the 50 epochs, we had a 62% accuracy rate. We computed a confusion matrix and a line graph to assess the efficiency and quality of our suggested technique, as shown in Fig. 8. Our algorithm excels at predicting happy, surprised, and neutral expressions. However, since it mixes scared faces with disgusted faces, it predicts them poorly. Our system’s confusion matrices and classification accuracy, for demonstration, are being used. These techniques were chosen since it is the most popular among other traditional classifier.
6 Conclusion and Future Work We proposed in this study a model of the convolutional neural network to recognize the facial expression of students. The proposed model consists of max-pooling and one fully connected layer 3 convolution layers. In the Haar cascade front detector,
588
M. M. Hassan et al.
Fig. 8 Confusion metrics for all facial expression
the system identifies the sides of the pictures in the students. It classifies them into seven face expressions: surprise, fear, disgust, sad, happy, furious, and neutral. With the FER 2013 data source, the suggested model achieves a precision of 62%. Our technique for identifying facial expressions can assist the instructor in understanding the student’s activities. So we will concentrate on utilizing convolution in our future work. Our next objective is to develop an API to integrate our system into various software like Microsoft teams and Zoom. By doing so, any video conferencing software can use our system to analyze real-time emotional states and get dynamic feedback without accessing webcam images.
References 1. Daradoumis T, Arguedas M, Xhafa F (2013) Current trends in emotional e-learning: new perspectives for enhancing emotional intelligence. In: 2013 7th International conference on complex, intelligent, and software intensive systems (CISIS). IEEE 2. Aiqin Z, Luo Q (2006) Study on E-learning system model based on affective computing. In: International conference on information and automation, 2006, ICIA. IEEE 3. Whitehill J et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(1):86–98 4. Al-Awni A (2016) Mood extraction using facial features to improveLearning curves of students in E-learning systems. Int J Adv ComputSci Appl 7(11):444–453 5. Magdin M, Turcani M, Hudec L (2016) Evaluating the emotional state of a user Using a Webcam. Special Issue on Artificial Intelligence Underpinning 4(1):61–68 6. El Hammoumi O, Benmarrakchi F, Ouherrou N, El Kafi J, El Hore A (2018) Emotion recognition in E-learning systems. In: 2018 6th International conference on multimedia computing
Student Attention Base Facial Emotion State Recognition …
589
and systems (ICMCS), 2018, pp 1–6. https://doi.org/10.1109/ICMCS.2018.8525872. 7. Koldijk S, Sappelli M, Verberne S, Neerincx MA, Kraaij W (2014) The swell knowledge work dataset for stress and user modeling research. In: Proceedings of the 16th international conference on multimodal interaction, ACM, 2014, pp 291–298 8. Miranda Correa JA, Abadi MK, Sebe N, Patras I (2018) Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans Affect Comput pp 1–1
Detection, Depth Estimation and Locating Stem Coordinates of Agricultural Produce Using Neural Networks and Stereo Vision R. Nimalan Karthik, S. Manishankar, Srikar Tondapu, and A. A. Nippun Kumaar Abstract In recent years, with the rise in deep learning frameworks, the number of fruit and vegetable detection systems also increased. These frameworks support the automation of the harvesting process. With the harvesting process automated, a significant amount of harvest and post-harvest waste can be prevented compared to when done manually. In this paper, we propose an economical and efficient algorithm to detect the produce and identify the stem coordinates of the appropriate produce to harvest by distinguishing between foreground and background produce using stereo vision and finding the distance from the harvesting system to the produce. The motivation behind using stereo images is to facilitate the process mentioned above in real-time economically. The input will be a live video input from which a pair of stereo images are extracted at a particular sampling rate. Produce is detected using a modified version of the You Only Look Once algorithm, and a depth map is generated using triangulation to find the distance and coordinates. The model’s output will show bounding boxes around detected produce with 91.5% accuracy and finds the distance of stem coordinates with a high accuracy and a low RMS error of 0.13. Keywords Deep learning · Produce harvesting · YOLO · Stereovision · Depth map · Machine learning · Neural network · Stem detection · Object detection
1 Introduction India has a rich association with agriculture, with almost 60.4% of its land used for agriculture. Moreover, about 50% of the Indians work in the agriculture sector, contributing to a significant portion of the country’s GDP. According to data published in 2016 by the Ministry of Food Processing Industries, the harvest and post-harvest loss in primary agricultural produce in India are estimated to be Rs. 92,651 crore ($13 billion). Produce needs to be harvested at a particular period based R. Nimalan Karthik (B) · S. Manishankar · S. Tondapu · A. A. Nippun Kumaar Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_42
591
592
R. Nimalan Karthik et al.
on various factors like length, colour, hardness, size, and other features. As farmers cannot cover such vast acres of land in a short time, harvest-ready produce needs to be identified at the right time. For this reason, produce expertise is required, which is scarce nowadays. With the help of artificial intelligence-based harvesting systems, efficient harvesting can be achieved as the system can identify the harvest readiness and harvest it faster than a human. The proposed system uses cucumber to be the produce of interest. It aims to efficiently identify the stem coordinates of the appropriate produce using YOLO for stem identification and stereovision for stem depth identification. Several algorithms are used for object detection, based on architectures like convolutional neural networks (CNN’s), R-CNN’s, etc. YOLO is a (convolutional neural network (CNN) for real-time object detection. It partitions the picture into different regions, predicts the probability of objects in each region, and draws bounding boxes around them, weighted by the probabilities. It produces results in real time with high accuracy and is also extremely fast, hence its popularity. The proposed system is implemented using the YOLOv3, which is incrementally better than the previous versions of YOLO. For implementing depth detection, we have used stereo vision to calculate the depth of the detected cucumber as it is highly economical compared to a 3D camera or any other alternative method. It also provides reliable results for our use case. The balance between cost and accuracy is ideal for our objective. The extraction of depth information from digital images can be done with the help of computer stereo vision. Depth information can be retrieved by comparing data around a scene from two viewpoints by analysing the spatial correlation of objects in the two panels. Epipolar geometry is used to generate the disparity map of the view of the cameras. Then finally, triangulation is used to determine the distance of each object in the camera’s view. Section 2 of the paper gives a brief overview of the motivation and related works. Section 3 of the paper explains the architecture of the You Only Look Once (YOLO) architecture and depth estimation using stereo vision, followed by results and analysis is briefed in Section 4. Conclusion and future scope are discussed in Sect. 5.
2 Related Works The authors proposed a detection algorithm for cylindrical and spherical fruits that uses depth, shape, and colour information to guide harvesting robots in [1]. A (global point cloud descriptor (GPCD) based on form, colour, and angle is constructed to generate a feature vector for a complete point cloud. A support vector machine SVM classifier is used to train on the retrieved features. The datasets for pepper, eggplant, and guava are utilized in the procedure. The detection accuracy was 0.864, 0.886, and 0.888 for the pepper, eggplant, and guava datasets, respectively, while the recall was 0.889, 0.762, and 0.812.
Detection, Depth Estimation and Locating Stem Coordinates …
593
In [2], the authors proposed Faster-Yolo for accurate and faster object detection. A majority of the object detection algorithms are based on deep convolutional neural networks (DCNNS). These convolutional neural networks can achieve good performance and accurately detect objects. However, during the training phase of these neural networks, due to some parameters in the hidden layers, it took much more time due to human intervention and repeated iterations. These issues are taken into consideration in the proposed approach. Faster-Yolo is faster than YOLO v2 and is approximately two times faster than YOLO v3. In [3], the authors discussed the You Only Look Once (YOLO) algorithm implemented for use cases where faster object Detection is crucial. The algorithm splits the entire image into S × S grids, each with B boundary boxes to predict. If the centre of an object falls within a grid cell, that grid cell is in charge of detecting that object. It processes all the cells simultaneously, resulting in a quicker processing speed. In [4], the authors discuss and compare the most popular deep learning object detection algorithms for object detection in intelligent spaces. Their advantages and limitations are discussed and are evaluated according to a given evaluation metric and time constraints. After comparison and evaluation with the available deep learning algorithms, which are efficient to an acceptable degree in intelligent spaces, it is concluded that The You Only Look Once (YOLO) algorithm and its variants perform the best. The use of convolutional neural networks (CNN) is explained in [5] to recognize one of three fruits (apples, bananas, and cherry fruit) used in the evaluation and to detect the infection rate in that particular fruit. The proposed method emphasizes early disease Detection to prevent later crop losses in which a user-friendly tool is introduced to detect any early-stage infection in the recognized fruit. It is accomplished using the Inception model V3. In [6], the authors demonstrate the use of YOLO, a deep-CNN architecture with a design dependent on single-stage locators. Utilizing this method takes out the requirement for explicitly hard-coded features. This design resulted in an accuracy of over 90%. 10 FPS processing was good enough to get the results quickly and accurately. A drawback of this approach is the high computing power, which is difficult to find in most mobile harvesting robots. The You Only Look Once (YOLO) algorithm in [7] is the classification of the produce of varying sizes. Different types of vegetable images are captured, and OpenCv is used to pre-process the images by placing the produce inside bounding boxes for each image. These processed images are then used to train the network. It is observed that the results for test images achieved prediction of the product class with the correctness of approximately 62%. An advanced method of fruit classification is discussed in [8]. Sample images of apples and bananas are used for testing and training. After converting the RGB image to the HSI image, the resulting image undergoes segmentation by utilizing Otsu’s thresholding method. Feature extraction of the interested area by applying Haar’s filter is passed as an input to the SVM classifier for classification. The method of fruit classification discussed in this paper achieved 100% accuracy.
594
R. Nimalan Karthik et al.
The You Only Look Once (YOLO) algorithm [9] introduces convolutional neural networks for object identification. With appropriate parameter changes to the network framework for the desired output, a far better background processing can be achieved, resulting in better performance. By improvising the neural network layers, the loss of data for the focussed region can also be minimized to a great extent. In the field of real-time object detection, there are a few popular neural networks like region-based convolutional neural networks (R-CNN), You Only Look Once (YOLO), and single-shot detectors (SSD). However, there is a trade-off between speed and accuracy. Most of the neural networks that give better performance are slower, and most fast processing neural networks give comparatively poor performance. In [10], an algorithm for real-time object detection is proposed, which detects the objects in less time and gives an acceptable performance based on the single-shot detector algorithm. The You Only Look Once (YOLO) algorithm for the detection of vacant spots in a parking lot is proposed in [11]. The problem with traditional image processing techniques for vacant lot detection is that the images are not intense enough, or the clarity of the captured images is not up to the mark. Implementing a neural network takes care of these issues and provides a robust approach to finding vacant spots. A regional convolutional neural network is proposed in [12] that is computationally lighter and fast detecting objects. In the proposed regional convolutional network, different improvement frameworks are implemented to modify the existing framework of the base VGG16-Net to achieve better performance in detecting the small and dense objects captured in the images taken by the satellite. The time taken in detecting the objects in the image is much lesser than other existing algorithms, and it also takes lesser storage space than other algorithms for object detection. There are different object detection algorithms available that are either computationally fast or have higher accuracy rates. Article [13] introduces a general approach that depends on the desired output. The user can choose whether the focus is more on precision or speed. The image input can contain objects of different sizes and shapes. In [14], the authors proposed a multi-scale convolutional neural network (MSCNN) to effectively detect objects of varying scales. A particular reason for implementing the detection in multiple layers is to detect objects of different scales in all these layers, which is later combined to give a high-performance neural network capable of detecting objects with a range of scales. The combined network is trained to minimize the combined loss of all the layers. The performance of the proposed multi-scale convolutional neural network is observed up to 15 FPS. A hybrid segmentation algorithm is described in [15], which works on the combined principles of the Belief Propagation and K-Means algorithm using a pair of stereo images as input to obtain a refined resultant disparity map. The underlying principle is of 3D reconstruction with the use of stereo images. The proposed approach works with the images captured from standard cameras. A wide range of utilization of this approach is possible in the medical field, for example, to detect and identify tumours in organs and has lots of application in other fields.
Detection, Depth Estimation and Locating Stem Coordinates …
595
A robust real-time object detection algorithm is discussed in [16], which implements a concept of integral image which drives the fast detection of objects. The algorithm used in this robust approach is based on AdaBoost, which is responsible for the high accuracy. A process known as cascade is utilized to support the detection algorithm based on AdaBoost for detecting objects in less time. The algorithm is tested with a wide variety of datasets with different image properties like scale, intensity, etc., and achieved up to a detection rate of 15 FPS. In [17], the authors introduced an algorithm based on a pixel for estimating depth and its associated uncertainty for each pixel, which subsequently gets refined. The depth estimation is done basically on the principles of Kalman filtering. A smoothing technique based on regularization is used to trim the noise. Image warping extrapolates the current map to the next frame. The results indicate that the proposed algorithm has a comparatively gradual convergence rate. The YOLO algorithm is one of the most efficient algorithms for real-time object Detection as it is computationally fast and accurate. In most of the implementations of YOLO, it is used for only object detection. Also, in existing papers, depth detection using a disparity map is used for already captured images and not for objects detected in real-time. In this paper, we are trying to combine YOLO with the approach of finding depth using stereo vision to achieve real-time depth detection with acceptable accuracy.
3 Proposed Approach The project is implemented in three phases: 1. YOLO-based custom object detection of cucumber 2. Real-time disparity MAP generation using stereo cameras 3. Interpolating the stem of the cucumber using geometry and disparity values. The basic theory, along with implementation details for each phase, has been explained in the following subsections.
3.1 You Only Look Once Algorithm (YOLO) As stated earlier, YOLO (You only look once) is a convolutional neural network (CNN) for detecting objects in real-time. The 3rd version, or YOLOv3, was used for its improved performance [18]. It uses a variation of Darknet, which by default has 53 layered networks trained on ImageNet. An additional 53 layers are combined to it for object detection. It gives a total of 106 fully connected convolutional architecture. Because YOLOv3 is a one-stage detector, a single neural network is used for all object detection aspects. YOLO splits the entire image into S × S grids, with each grid estimating B bounding boxes. If the centre of an item falls within a grid
596
R. Nimalan Karthik et al.
cell, then detecting that object is the responsibility of that cell. It processes all the cells simultaneously, giving it a quicker processing speed. The size of each of these detection kernels are given by 1 × 1 × (B × (5 + C)). The number of classes, C, is 1 in our use case. YOLOv3 is used to detect Cucumbers based on a given input image/real-time camera feed in the proposed system. For this purpose, YOLOv3 was chosen for its relatively light and fast predictions and simplicity in implementation. We trained an implementation of YOLOv3 based on darknet [19] to specifically detect only cucumbers by training it with cucumber data, as detecting cucumbers is our primary objective. This would reduce the network’s computation power and time since classes are reduced to 1 from 80. This phase involves the following sub-phases. Dataset To train our custom YOLO object detector to detect cucumbers, we needed a cucumber dataset. The data used in this work is from two different combined sources to have sufficient training data. The first is from the Open Images Dataset V6, an open-source image dataset library from google. The images were handpicked from here and downloaded as per requirement. Due to insufficient training data, another dataset was used along with these images as training data from Kaggle and is called the image localization dataset (for cucumbers). The dataset was split in the ratio of 80:20 for the train: test data. Training A total of 500+ images were used as test data to the custom YOLOv3 model to detect cucumbers. A total of 1700 epochs were completed, and it took around 4 h to train in Google Colab on an NVIDIA Tesla K80 graphics card. Figure 1 represents the loss function versus the number of iterations. The model was trained until the average loss function value was 0.408, calculated by using Eq. 1 below. In YOLOv3, the loss function was updated to use binary cross-entropy (BCE loss). The class predictions and object confidence are calculated through logistic regression [20]. Lossyolo
B S∗S 2 2 ˆ ˆ = αcoord 1 1obji j xi − X j + yi − Y j i=0
+ αcoord
S∗S i=0
j=0
1
B
1obji j
√
wi −
2 wˆ j
+
2 h i − hˆ j
j=0
+ BCEloss
(1)
Further training was not done to avoid overfitting and due to limited training data.
3.2 Stereo Vision In stereo imaging, the following steps are carried out:
Detection, Depth Estimation and Locating Stem Coordinates …
597
Fig. 1 Yolov3 Cucumber detector model loss function
Camera calibration and stereo rectification: This gives us an undistorted and accurate measure of disparity. Find the same feature in the right and left images: This gives us a disparity map that shows the differences between the images on the X-axis. Triangulation: The disparity map is transformed into distances by triangulation. Camera calibration and stereo rectification: To determine the coordinate of a pixel in an image, we need to know two sets of parameters. The first set of parameters is the inner parameters of the camera (intrinsic parameters), which are the camera’s optical centre and focal length of the camera, which is indicated in Fig. 2. Intrinsic parameters are fixed for the camera. The second set of parameters is the position and rotation vectors describing the camera in the 3D world (extrinsic parameters). Extrinsic parameters change for every example image we have. Camera calibration is an optimization process that tries to find the best parameters, i.e., give minimum reprojection error. This information is later used to calibrate the two cameras, i.e., stereo and distortion calibration. The calibration of the cameras represents the arrangement of the stereo setup in geometric terms. The purpose of performing stereo rectification is to project the images captured by the two cameras to be precisely in the same plane and to precisely
598
R. Nimalan Karthik et al.
Fig. 2 Stereo camera calibration
align with the rows of pixels so that the epipolar lines (the lines of intersection between the image and the epipolar plane) become horizontal which enables us to be able to find a point match in two pictures more randomly. The result of this process of alignment in both the images produced eight different terms, which implies four for each camera: Firstly, we have, the rectified camera matrix represented as M-rect. The nonrectified camera matrix is represented as M-ing. The rotation matrix is represented as R-rect, and finally, the Distortion vector. We use the Bouguet algorithm to compute these terms, which can be implemented using OpenCV. Find the same feature in the right and left images: The three primary processes necessary to identify identical features in both images are described in the following section. Epipolar Geometry: Figure 3 represents a typical stereo camera consisting of two identical pinhole cameras. The intersection between the projected plane’s centres [Ol, Or] with the projecting planes themselves creates the epipolar points El and Er. The line segment joining the points Pl with El and Pr with Er is known as epipolar lines. The image of all possible points on the planning plane is an epipolar line lying on another image plane and passing through the epipolar point and the desired point; this enables the ability to limit the search to a point in a single dimension rather than the entire plane. The entire group of 3D points in the periphery of a camera is within the epipolar plane. A characteristic feature in one given plane must also be on the corresponding epipolar line of the other plane. This rule needs to be satisfied according to the epipolar condition. With epipolar geometric knowledge, the search task for the corresponding feature becomes much more straightforward by reducing the search space from two dimensions to a single dimension. Then, the sequence of the points is preserved, i.e., the two points A and B lie on the epipolar line of one level in the same order as on the second level. [21] Essential and Fundamental Matrices: The information regarding how the two cameras, which are a part of the stereo set up, are physically arranged with respect
Detection, Depth Estimation and Locating Stem Coordinates …
599
Fig. 3 Epipolar geometry
to each other, is maintained in the essential matrix ‘E.’ With the help of the rotational and translatory parameters, the essential matrix describes the localization of one camera with respect to the other. It is impossible to directly use these parameters in the matrix since they are used in the configuration process. Matrix ‘F’, defined by Eq. 2, holds the information regarding the intrinsic parameters of the cameras which matrix ‘M represents,’ the essential matrix E, and the details about the position and arrangement of the stereo camera’s setup. T
F = Mi−1 · E M Tj
(2)
Equation 2 defines the relationship between the projected point on the image on the left ‘pl’ and the one on the right ‘pr.‘ prT E pl = 0
(3)
The 3 × 3 matrix E in Eq. 3 is of rank two and represents the essential matrix. As a result, this expression is a straight-line equation. The intrinsic characteristics must be considered to characterize the relationship between the two points completely. q = Mp ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ X x f x 0 cx q = ⎣ y ⎦, M = ⎣ 0 f y c y ⎦, Q = ⎣ Y ⎦ 0 0 1 Z z and,
(4)
600
R. Nimalan Karthik et al.
xscreen = f x
X Z
− cx ,
yscreen = f y
Y − cy Z
with the intrinsic matrix M. Substituting p-value from Eq. 4 in Eq. 3 gives Eq. 5.
T qrT Ml−1 E Ml−1 ql = 0
(5)
Substituting Eq. 2 in Eq. 5, we obtain Eq. 6: qrT Fql = 0
(6)
The stereo vision system is calibrated using Eq. 2 and Eq. 6. Rotation matrix and Translation vector: ‘Rl ’ and ‘Tl ’ (for the left camera) and ‘Rr ’ and ‘Tr ’ (for the right camera) describe the translation and rotation from the camera to the point in the coordinate space of the environment for the left (or right) camera, respectively. Pl and Pr denote the position of the point in the coordinate space of the two cameras, the one on the left and one on the right, respectively. R and T are the rotation and translation that the coordinate system of the right camera brings to the coordinate system of the left camera. The two equations in Eq. 7 yield the exact coordinates of the point P in relation to the left and right cameras. Pl = Rl P + Tl Pr = Rr P + Tr
(7)
The equation for P in the three-dimensional coordinates system is given by: Pl = RT (Pr − T )
(8)
Combining Eqs. 7 and 8, we finally get the stereo rotation matrix and the stereo translation matrix, which are given by Eq. 9 and Eq. 10, respectively, R = Rr Rl T
(9)
T = Tr − RTl
(10)
The two equations Eqs. 9 and 10 are used to calibrate the camera. Triangulation: In the last step, the triangulation, it is presumed that the horizontal row of pixels on the right image is aligned with the corresponding horizontal row of pixels in the left image and that both projection images are coplanar (refer Fig. 4). The point P is in the environment and represented by ‘pl ’ and ‘pr ’ on the left and right pictures, respectively, with the coordinates ‘x l ’ and ‘x r .’ As a result, the disparity can be introduced: d = x l − x r . It can be observed that the size d decreases
Detection, Depth Estimation and Locating Stem Coordinates …
601
Fig. 4 Triangulation
as one moves away from point P. Therefore, the distance is inversely proportional to disparity (d). The formula in Eq. 11 can be used to determine the distance. Z = f ∗ T /(x l − x r )
(11)
In Eq. 11, the nonlinear relation between distance and disparity can be seen. When the disparity is close to zero, slight disparities result in significant distance variations. If the discrepancy is great, the opposite is true. Slight variations in disparity do not result in excessive distance variations. It concludes that a high depth resolution is possible with the help of that stereo vision, but only for objects closer to the camera. This approach is only practical if the stereo camera’s setup is perfect. However, this may not be possible. As a result, the pictures on the left and right are parallel, mathematically. Therefore, the cameras must be physically in an approximately parallel position. Experimental Setup: Two similar cameras, the Logitech C270 webcams, were used in this project for their low price and HD image/video capture capabilities. Both the cameras were chosen as the same model to avoid any difference in optical metrics and minimize manual calibration. The camera configuration is shown in Fig. 5. The two cameras were taped together with the help of a measuring scale to avoid any miscalculation or error resulting from a change in position of the cameras. The setup is depicted in Fig. 5. The distance between the two camera lenses is 10 cm. Implementation: A program [22] was used to achieve this, which was coded in Python and the OpenCV library. The code was split into two modules. The first module takes images of black and white boxes for camera calibration, as shown in the image in Fig. 2. The second module and thus the main module is used to calibrate the cameras with the recorded images, generating a disparity map from which we can measure
602
R. Nimalan Karthik et al.
Fig. 5 Camera used, the setup and its configuration
Fig. 6 Final detection along with the real-time disparity map generated
the distance for each pixel; this is depicted in Fig. 6. A WLS filter is used at the end to better recognize the edges of objects.
3.3 Estimation of the Stem To find the approximate location of the stem of the cucumber, we tried two approaches to automate the process entirely without human intervention. The first approach was to create a semantic segmentation mask of the cucumber detected and utilize the topmost coordinate as the stem, valid for most cucumbers. The GrabCut algorithm was the ideal choice for this approach, but the results produced were not practically viable. Achieving segmentation without manual intervention is challenging since the colour of the plant, stem, leaves, and the cucumber itself is almost identical. The second approach was to calculate the stem using the Midpoint disparity matrix method. This method was implemented as follows. First, the midpoint of the YOLO object detection box is obtained by using Eq. 12, and then we construct two 5X5 matrices containing 25 values. The first matrix, matrix A has the coordinates of the midpoint of the image as the 13th element as in Fig. 7. The remaining points are calculated at 5 pixels distance from each other. The second matrix, matrix B, has coordinates corresponding to the disparity value in matrix A. The matrix B is divided into five major regions as shown in Fig. 7, each region given by:
Detection, Depth Estimation and Locating Stem Coordinates …
603
Fig. 7 Stem detection using Midpoint disparity matrix method, the numbers are placeholders for the disparity value
Region 1 (xR1i ): [x1 ,x2 ,x6 ,x7 ], Region 2 (xR2i ): [x4 ,x5 ,x9 ,x10 ], Region 3 (xR3i ): [x16 ,x17 ,x21 ,x22 ], Region 4 (xR4i ): [x19 ,x20 ,x24 ,x25 ] and Region 5 (xR5i ): [x8 ,x12 ,x13 ,x14 ,x18 ]. AMidpoint = (X min + (|X max − X min |)/2, Ymin + (|Ymax − Ymin |)/2)
(12)
The disparity value of each pixel coordinate in region 1 is compared to its corresponding value in region 2, by using the function in Eq. 13. Sn =
1i f, |x R1i−x R2i| ≤ θ 0i f, |x R1i−x R2i|θ Sfinal =
4
Sn
(13)
i=1
Suppose the values are within the predefined threshold value Theta (θ). In that case, the final score variable (S final ) is incremented by this variable can have a maximum value of 4 and a minimum of 0. At every iteration, S final is reset and re-calculated. Based on the S final value, the following cases may arise, Case 1. If Sfinal is equal to 0, and if the absolute difference between Region 1 and Region 5 is more significant than the absolute difference between the average disparity value of Region2 and Region 5, we shift the matrix right following Eq. 15. Else if the value is less than, then the matrix A is shifted left, ie., Each element of matrix A is updated using Eq. 16nd matrix B accordingly. Similarly, the next iteration by performing the above comparison is by Eq. 13. Case 2. If Sfinal is equal to 4, we check if the absolute difference between Region 1 and Region 5 is less than θ and if the absolute difference between the average
604
R. Nimalan Karthik et al.
disparity value of Region 2 and Region 5 is less than θ. If both the conditions satisfy, we can conclude that the stem coordinate is given by the value present in the 13th element of Matrix A, and its depth can be obtained by the value present in the 13th element of matrix B. If the condition mentioned above is not satisfied, we continue and shift the matrix up, i.e. Update each element of matrix A by following Eq. 14, then by matrix B accordingly. Then we enter the next iteration by performing the comparison above given by Eq. 13. Case 3. If Sfinal is equal to either 1, 2, or 3, we shift the matrix up, i.e. Update each element of matrix A by following Eq. 14. After which, we calculate the corresponding values for matrix B. We enter the next iteration by performing the comparison above given by Eq. 13. Up → (xi , y j − 5)
(14)
Right → (xi + 5, y j )
(15)
Left → (xi − 5, y j )
(16)
The geometric estimate worked for most of the detections instead of the use of the GrabCut algorithm.
4 Result and Analysis The proposed system’s three main phases are tested and analysed using a real-time video feed. Due to various limitations, a flower pot with a green plant and cucumber is used to emulate the actual agricultural field. The analysed data has been split into three types: Type 1: A single target cucumber plant placed at varying distances from the camera setup. Type 2: Two target cucumber plants placed at equal distance from the camera setup. Type 3: Two target cucumber plants placed at varying distances from the camera setup.
4.1 Cucumber Detection Through YOLO A Bounding box is generated for the target object (cucumber in this case) by YOLOv3 while giving a live input through the camera setup, explained in the previous section. After the object is detected, the disparity map is generated with the help of stereo images captured, and accordingly, the depth is calculated for that object.
Detection, Depth Estimation and Locating Stem Coordinates … Table 1 Measures table
605
Measure
Value
Derivatives
True Positive (TP)
662
TP
False Positive (FP)
170
FP
False Negative (FN)
63
FN
Sensitivity/Recall
0.9131
TPR = TP/(TP + FN)
Precision
0.7957
PPV = TP/(TP + FP)
F1 Score
0.8504
F1 = 2TP / (2TP + FP + FN)
[email protected]
0.9153
Fig. 8 a Type-1: YOLO detection of cucumber at a distance of 150 cm, b Type-1: YOLO detection of cucumber at a distance of 120 cm, c Type-1: YOLO detection of cucumber at a distance of 90 cm, d Type-1: YOLO detection of cucumber at a distance of 60 cm
The detector successfully detected cucumbers with a 0.9153 probability as per the mean average precision (mAP) metric, which is inbuilt in the Darknet-based YOLOv3 object detector, as shown in Table.1. The model was able to detect most of the cucumbers presented as input data from the test data. The test Detection’s mAP value, along with the Recall, Precision, and F1 score, can be inferred from Table.1 The detection output of the YOLO cucumber detector for the same object placed at varying distances (Type 1) is shown in Fig. 8; the detection for different objects placed at the same distance (Type 2) and varying distances (Type 3) are shown in Fig. 9a and Fig. 10a, respectively.
4.2 Disparity Map The disparity map generated gives the difference between stereo image pixels. The disparity map is transformed into a distance map for each pixel through triangulation. The final depth of the target object is evaluated using Eq. 11. The output disparity map for Type 1 is depicted in Fig. 11. The corresponding disparity maps of Type 2 and Type 3 are shown in figures Fig. 9b and Fig. 10b, respectively. The distance calculated using the disparity map and the actual distance between the stereo camera setup and the cucumber plant shows a proportional relationship.
606
R. Nimalan Karthik et al.
Fig. 9 a Type-2: YOLO Detection and b disparity map of two cucumbers both at a distance of 70 cm from the camera
Fig. 10 Type-3: a YOLO Detection and b disparity map of two cucumbers at a distance of 75 cm and 125 cm from the camera
Fig. 11 a Disparity map of the object at a distance of 150 cm, b Disparity map of the object at a distance of 120 cm, c Disparity map of the object at a distance of 90 cm, d Disparity map of the object at a distance of 60 cm
Detection, Depth Estimation and Locating Stem Coordinates …
607
2 RMSE = n X obs,i − X model,i /n
(17)
Root mean square error is used to measure the disparity map model’s accuracy and deviation. The root mean square error (RMSE) is widely used to estimate the deviation between the observed values from the environment and predicted values by the model. It is calculated by using Eq. 17. Using this method, the estimated RMSE for the proposed model is 0.13. A plot is shown below that depicts the actual distance vs. the estimated distance from the cameras in Fig. 12, the result obtained is practically viable due to low error in estimation of distance as shown in Table 2.
Fig. 12 Graph showing the variance in actual versus predicted distance using disparity map
Table 2 Object distance measurement for 12 points with varying distance
S. No.
Actual distance (in cm)
Predicted distance (in cm)
Deviation/% error
1
50
52
3.8
2
60
58
3.4
3
70
75
6.6
4
80
82
2.4
5
90
91
1.0
6
100
105
4.7
7
110
112
1.7
8
120
123
2.4
9
130
138
5.7
10
140
150
6.6
11
150
160
6.2
12
160
172
4.6
608
R. Nimalan Karthik et al.
Fig. 13 a Stem detection of cucumber at a distance of 150 cm, b Stem detection of cucumber at a distance of 120 cm, c Stem detection of cucumber at a distance of 90, d Stem detection of cucumber at a distance of 60 cm
4.3 Stem Detection Once the depth is detected, both the output of Yolo (the bounding box) and the disparity map are used to find the stem coordinates, as explained in Sect. 3.3. The following images in Fig. 13 show the stem’s coordinates detected using the proposed approach.
5 Conclusion and Future Scope This paper proposes an approach to efficiently identify the stem coordinates of plant produce using object detection and stereo vision concepts. An approach for calculating the distance of a target object through real-time input is presented. YOLO efficiently detects the target object for the camera input, and then images are captured by the stereo camera setup, which is used to generate the disparity map. The desired distance is calculated using the principles of stereo vision. Finally, the midpoint disparity matrix method is used to find the coordinates of the stem. The proposed system can be implemented in the automated harvesting of produce to manage unnecessary produce wastage caused due to various manual labour limitations. Efficient harvesting can be achieved through this approach since identification of foreground/background produce is done in real-time, and simultaneous calculation of the distance of the produce from the camera can be achieved swiftly and with 91.5% accuracy. Thus, this model can be suggested to use for real time object detection and distance estimation by harvesting robots which are moving within 3 m distance from the produce and expect reliable results. This system can be further enhanced by incorporating real-time depth estimation of the crops taken from the field directly as input data for a harvesting robot to harvest without human intervention efficiently. The accuracy, along with the precision of the Cucumber Detection model, can be improved to detect the exact coordinates of the Cucumber better. This would help in better estimation of the stem coordinates.
Detection, Depth Estimation and Locating Stem Coordinates …
609
References 1. Lin G, Tang Y, Zou X, Xiong J, Fang Y (2020) Color-, depth-, and shape-based 3D fruit detection. Precision Agric 21(1):1–17 2. Yin Y, Li H, Fu W (2020) Faster-YOLO: An accurate and faster object detection method. Digital Signal Process 102:102756 3. Lohit GV, Sampath N (2020) Multiple object detection mechanism using YOLO. In: Data engineering and communication technology. Springer, Singapore, pp 577–587 4. Subbiah U, Kumar DK, Thangavel SK, Parameswaran L (2020) An extensive study and comparison of the various approaches to object detection using deep learning. In: 2020 International conference on smart electronics and communication (ICOSEC). IEEE, pp 183–194 5. Nikhitha M, Sri SR, Maheswari BU (2019) Fruit recognition and grade of disease detection using inception v3 model. In: 2019 3rd International conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 1040–1043 6. Bresilla K, Perulli GD, Boini A, Morandi B, Corelli Grappadelli L, Manfrini L (2019) Singleshot convolution neural networks for real-time fruit detection within the tree. Front Plant Sci 10:611 7. Sachin C, Manasa N, Sharma V, AA NK (2019) Vegetable classification using you only look once algorithm. In: 2019 International conference on cutting-edge technologies in engineering (ICon-CuTE). IEEE, pp 101–107 8. Chithra PL, Henila M (2019) Fruits classification using image processing techniques. Int J Comput Sci Eng 7(5):131–135 9. Mittal N, Vaidya A, Kapoor S (2019) Object detection and classification using Yolo. Int J Sci Res Eng Trends 5:562–565 10. Chandan G, Jain A, Jain H (2018) Real time object detection and tracking using deep learning and OpenCV. In: 2018 international conference on inventive research in computing applications (ICIRCA). IEEE, pp 1305–1308 11. Jose EK, Veni S (2018) YOLO classification with multiple object tracking for vacant parking lot detection. J Adv Res Dyn Control Syst 10(3):683–689 12. Ding P, Zhang Y, Deng WJ, Jia P, Kuijper A (2018) A light and faster regional convolutional neural network for object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 141:208–218 13. Alexander A, Dharmana MM (2017) Object detection algorithm for segregating similar coloured objects and database formation. In: 2017 International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–5 14. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370 15. Kamencay P, Breznan M, Jarina R, Lukac P, Zachariasova M (2012) Improved depth map estimation from stereo images based on hybrid method. Radioengineering 21(1) 16. Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vision 4(34–47):4 17. Matthies LH, Szeliski R, Kanade T (1988) Incremental estimation of dense depth maps from image sequences. In: CVPR, pp 366–374 18. Farhadi A, Redmon J (2018) Yolov3: an incremental improvement. In: Computer vision and pattern recognition. Springer, Berlin/Heidelberg, Germany, 1804-2767 19. YOLOv3 implementation. https://github.com/AlexeyAB/darknet. Accessed on June 2021 20. YOLOv3 improvements. https://towardsdatascience.com/yolo-v3-object-detection-53fb7d 3bfe6b. Accessed on March 2021 21. Epipolar Geometry. https://learnopencv.com/introduction-to-epipolar-geometry-and-stereovision/. Accessed on March 2021 22. Depth Estimation. https://github.com/LearnTechWithUs/Stereo-Vision. Accessed on June 2021
Psychosomatic Study of Criminal Inclinations with Profanity on Social Media: Twitter Angelo Baby , Jinsi Jose , and Akshay Raj
Abstract The World Wide Web (WWW) is a plethora of data related to individual opinion pertaining to many topics that involves personal and social issues. Using social media and various microblogging sites, users share their views and feelings, resonate their observations and decisions in their daily interaction, which can be their sentiments or opinions regarding an event or a thing. These data are on a tremendous hike and a rich source of information for any decision-making process. A mind mapping data with emotional quotient are voiced unknowingly by the users. To automate the analysis of such data, sentiment analysis has emerged. ‘Twitter’ being a microblogging site, limiting the number of tweets and characters in each tweet, makes the process of analysis easier. With the upsurge of useful information, there is also an increase in offensive content and obnoxious words that are sent across as messages and comments. This drives to analyze the sentiment of each user or a page. It also becomes judgmental on the abusive or criminal activity or thought behind the comment or tweet. The ‘tweets’ are analyzed on user sentiment and the intensity of hate content and beleaguered speech in the tweets. This helps in unceasingly monitoring a person or a page for criminal activities. Keywords Crime detection · Multilayer perceptron · Neural network · Sentiment analysis · Social media · Twitter
1 Introduction Sentiment analysis is also termed as ‘Opinion Mining’. This is the procedure of automated discovery of opinions incarnated in the text. There is a drastic increase in the users of social media and time spent by a user each day to update their views A. Baby (B) · J. Jose Rajagiri College of Social Sciences, Kalamassery, Kerala 683104, India e-mail: [email protected] A. Raj Smater Codes, Bengaluru, Karnataka 560064, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_43
611
612
A. Baby et al.
about a variety of topics that include politics, product reviews, and social issues. This statistic is used in various sectors for scrutiny of different parameters. Companies use it for customer satisfaction and study of market trend. Several online applications or portals display the analysis to its stake holders. Public emotions on issues, reviews, concerns, etc., are summarized. These indications signal to an overall view individually as well as collectively. ‘Social media’ as a platform for millions of users to express their views and opinions is an opera show for exchange of useful information, cyberbullying, and hate content to spread radically. There are incidents where a community or a person is continually harassed by a group of people with abusive or targeted speech. In such a situation, the sentiment analysis along with foul detection comes in handy, where the monitoring of such activity can be automated and reported for effective usage of the crime analysis feature. Sentiment analysis is conceivable by machine learning methods. In machine learning, there are supervised, semi-supervised, and unsupervised approaches. Supervised approach deals with certain classifier algorithms such as decision tree, linear, probabilistic, rule based, and Unsupervised deals with clustering techniques of partition, hierarchical, density, and hybrid. Semi-supervised is the blend of both supervised and unsupervised clustering. Twitter, a microblogging platform for users to express their views regarding various topics, is a rich source of sentiment-oriented data. Tweets have a feature of limited word length and are a good source for sentimental analysis. Studies ascertain that more than 60,000 tweets are tweeted in Twitter per second on varied topics and emotions [1]. This can be a boon as large data in consideration of crime-related words. This drudgery is trying to predict and monitor the cybercrimes by analyzing abusive words, hate content, etc., which are beyond a threshold from tweeted data. Through the continuous monitoring of tweets, the proposed system can be used to track a person or a specific page to find out any criminal activities beyond an edge. An improvised support vector classification (SVC) is used for foul language as well as targeted speech detection and deep learning algorithm (Multilayer perceptron) for sentiment analysis.
2 Related Work In recent years, researchers have used different approaches to identify crime on social media. An improved analysis on sentiments played in microblogs was by mapping the relationship between emotion words and modifiers, in contrast to the breaking of words and super sounding an emotion to each word and interpreting it with larger individual bias [2]. Another study [3] explains a new method to identify the abusive language in Arabic social media based on text. The researchers used Arabic tweets and classified them into three categories: obscene, offensive, and clean. The proposed
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
613
method is an automated method to create and expand vulgar words. The main limitation is the testing and setting of limited words with linguistic coherence as pressstud confined to geographical boundaries. Other work [4] proposed a framework for automated cyberbullying detection. The proposed model is motivated by sociological and psychological findings with emotions. To evaluate the model, researchers used Twitter and MySpace datasets. In another study [5], Yoo et al. worked on a framework for analyzing and predicting users’ sentimental orientation for events based on real-time data of massive social media contents. The proposed sentimental analysis and prediction were made with deep learning techniques. The authors used convolutional neural network for sentimental analysis and long short-term memory used for prediction. In [6], researchers proposed a new algorithm to analyze the personality changes of social media users. This new algorithm used unsupervised learning to categorize a person’s regular behavior change and prevent unlucky incidents. Prakruthi et al. [7] proposed a framework for sentiment analysis of real-time tweets, to identify whether people’s opinions are negative, positive, or neutral. To visualize the result, the researchers used a histogram and pie chart. The existing work introduced clubbing possibilistic fuzzy c-means with SVM that improved the results on the movie review dataset. A new set of feature extraction methods with machine learning techniques and hybrid dictionaries and lexicons improved the accuracy [8]. Analysis of multiple individual users’ sentiment for a particular crime improved the combined public perspective on different types of crimes [9]. The study regarding the automated search for hate speech in social media proved that Naïve Bayes model with TF–IDF is the best classifier with a large dataset [10]. In [11], the authors proposed an ontology-based framework for criminal intension classification (OFCIC) and the execution of the framework carried out by ontology of criminal expressions (OntoCexp). The tweet classification is performed by the artificial neural network, support vector classifier, and random forest classifier. Another study was carried out in [12] regarding crime in social media and fear of crime. Researchers used 70 days of data from 18 countries to analyze crime and fear using the Spanish language in Latin America. The result shows that social media is not highly correlated to crime, but the fear of crime is high. To identify social media crime in India, a study was carried out by Vo et al., which is explained in [13]. In this study, authors used sentimental analysis to analyze users’ behavior and psychology to track the possibility of criminal action by using the Markov model and Brown clustering. Twitter data were collected from seven different locations and identified as areas that had high crime influences over the last seven days of the data collection. Jacob and Vijayakumar [14] analyzed tweets using a clustering-based machine learning technique. The authors identified the positive or negative tweets and model evaluated based on precision, recall, and f-score. In [15], authors surveyed sentimental analysis of the combination of terrorist detection on Twitter. Based on the result, the authors suggested that AdaBoost, support vector machine, maximum entropy, Naive Bayes, decision tree algorithms are reasonable, based on data. In another study [16] proposed a model to predict crime patterns from social media. The
614
A. Baby et al.
research was carried out by bi-directional long short-term memory and feed-forward neural network.
3 Proposed Model The process of trailing the system is divided into two procedures, ascertaining the degree of sentimental polarity and the amount of foul content along with hate speech as well as targeted speech calculation. Training and testing of neural network for sentiment analysis was done with the combination of movie review dataset, Amazon review dataset, and airline feedback dataset. Foul language detection and hate speech analysis used the “hate speech and offensive language” dataset by Tom Davidson and Kaggle Bad Word dataset, Hate and Abusive Speech on Twitter, Large-Scale Crowdsourcing and Characterization of Twitter Abusive Behavior [17, 18].
3.1 Tweets Retrieval In this, work experiments are conducted on the real-time data that are obtained using the Twitter APIs. Stream API was used to extract real-time tweets, and search API was used in extraction of tweets regarding keywords and usernames. Tweets concerning trending topics such as newly launched gadget, vehicle, and social disputes were extracted for experimentation procedure. The following diagram Fig. 1 shows the process of tweet retrieval using Twitter API and data preprocessing.
Fig. 1 Tweet retrieval
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
615
3.2 Preprocessing The whole procedure of mining Twitter data (Tweets) is challenging; the raw data collected can contain notations such as hashtags, emoticons, URLs, mentions, acronyms, retweet (RT), and several stop-words that can hinder the performance and accuracy of the classifier [19]. The task of preprocessing involves the removal of stopwords and notations. Python NLTK packages ‘punkt’ and ‘stopwords’ were used for stop-word removal. Punkt tokenizer uses an unsupervised algorithm to divide raw text into meaningful sentences, and the package ‘stopwords’ removes the unsolicited text from the sentences. The special symbols are analyzed to check whether they represent any pictorial emoticon; if so, the emoticons or symbols are replaced by words that match the degree of emotions. The symbols that do not represent any emoticons are eliminated. The extracted words are matched with the dictionary; the words not present in the dictionary are replaced with their synonyms which helps the improvement of the precision of the classifier. Four-gram method is used to identify the meanings of the phrase precisely. N-gram is the continuous sequence of n items from the phrase or sentence. A combination of unigram, bigram, trigram, and four-gram is implemented for the phrase meaning identification. The accuracy in classification and prediction is upgraded by obtaining the phrase meaning rather than word meaning, for, e.g., “Phones aren’t that bad” infers a neutral review rather than negative. Application of NLTK packages Lancaster Stemmer and Word Net Lemmatizer is used for stemming and lemmatization of the text. Stemming condenses the words to their base form (word stems); it uses the Porter algorithm implemented in NLTK. It reduces the text size up to 30 percent by eliminating repetitive features. Lemmatization performs context-based conversion of the words to its significant base form; Lemmatizer is used here with part-of-speech (POS) tag which improves the accuracy in conversion of words.
3.3 Classification Six different classification models were tested for the foul language, hate speech and targeted speech analysis, and the models multinomial Naïve Bayes classifier, Knearest neighbors classifier, support vector machine (SVM), decision tree classifier, random forest classifier, and logistic regression classifier were trained and tested with the testing and training datasets, respectively. Considering the training accuracy, test accuracy and the confusion matrix obtained through SVM classifier yielded the highest accuracy and was used for implementation. The diagram Fig. 2 depicts the workflow of a classifier. SVM employs a supervised learning method that requires a pre-classified dataset for training the algorithm. The trained system is used to classify the data that are input to the system.
616
A. Baby et al.
Fig. 2 Overview of classifier
3.4 Clustering Clustering is an unsupervised learning method that helps in the grouping of similar data, and in this work, possibilistic fuzzy c-mean clustering (PFCM) has been used. The text data are converted to data points, and the model is applied to the data points to produce the outputs as cluster centers, typicality, and membership matrices. Objective function of PFCM is as follows: min[(Z ; U, T, V ) = ci = 1(auikm + btikm) × ||zk − vi||2 A + ci = 1 yi . . . U, T, Y ci = 1(1 − tk)η] (1) where: 0 ≤ , tik ≤ 1, m, η > 1, a, b > 0 and γi > 0 The primary goal is to reduce the value of J. The typicality matrix can be calculated by the following functions: N k=1 (u m +t n )zk νi = N ik ik , 1 ≤ i ≤ C k=1 u m +t n zk ( ik ik ) 2 −1 m−1 D ik A , 1 ≤ i ≤ c, 1 ≤ k ≤ N μtk = cj=1 D jk A
DikA denotes the distance between the ith and kth cluster. The typicality value is t ik which is represented as
(2)
(3)
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
tik = 1 +
b 2 D γ i ika
1 −1 n−1
, 1 ≤ i ≤ c, 1 ≤ k ≤ N
617
(4)
where N k=1 (au m +bt n )m zk νi = N mik ik n m , 1 ≤ i ≤ C k=1 au ik + btik
(5)
3.5 Neural Network The artificial neural network multilayer perceptron (MLP) was used for sentiment analysis which employed multi-class classification for positive, negative, and neutral sentiments. With the usage of Adam optimizer, an improvement in the learning rate is achieved in its accuracy and speed. Adaptive moment estimation (ADAM) is a stochastic optimization which would require first-order gradients with diminutive memory. It combines the heuristics of AdaGrad [20] that performs moral with sparse gradients and RMSProp which outdoes in online and non-stationary settings [21]. The deep learning model used 3 layers, where the first layer being the input layer consists of 500 neurons to accommodate the whole sentence and the features; it used the ReLU activation function. The second layer also used the ReLU activation function; the size of the layer was the same as the first layer. The third layer is the output layer with 3 neurons with output classes: positive, neutral, and negative; it uses Softmax activation function. ADAM stores exponential decay average of past squared and past gradients. They are represented as: m t = β1 m t − 1 + (1 − β1 )gt
(6)
vt = β2 vt − 1 + (1 − β2 )gt2
(7)
where vt and mt are decay averages of past squared and past gradients The mt and vt are the estimates of mean and uncentered variance. During the initial time step and smaller decay rates, the values were inclined to zero. To eliminate this, bias correction was employed. After the bias correction scheme, vt can be written as: vt = (1 − β2 )
t i=1
β2t−i .gi2
(8)
618
A. Baby et al.
By using these rules to update the parameters, which produces the ADAM update rule as: n θt+1 = θt − mt vˆt + ∈
(9)
The default value of β1 is 0.9 and β2 is 0.999 and 10–8 for . This empirically proves that ADAM comparatively excels most other optimizers.
3.6 Extraction of Features and Vector Construction In this work, the term frequency–inverse document frequency (TF–IDF) vectorization method has been used, which makes the classification simpler and efficient [10]. The vectorization is applied to the cleaned data that eliminate the error of calculating the frequencies of stop-words that are irrelevant for classification. For the offensive language detection and targeted speech analysis, the cleaned data are vectorized and input to the classifier. The TF–IDF performs the calculation using the following parameters: Term Frequency: Measures the frequency of occurrence of a word in a document. The TF calculation is done for all the terms in the input using the following formula: ti, j t f i, j = k ti, j
(10)
where ti,j is the number of times the word appears. K is the number of words. Inverse Data Frequency: It measures the importance of a word. It is used to scaleup and weigh down rare and frequent words, respectively. The IDF value is calculated as log of the number of documents divided by the number of documents that contain the word. T (11) idf(w) = log dft where w is the word whose idf value is calculated. The TF–IDF is simply the TF multiplied by idf .
wi, j = t f i, j
T ∗ log d fi
(12)
where w is the word, tf is the term frequency. For the MLP used for sentiment analysis, one-hot encoding is performed on the text using Sklearn’s Multi Label Binarizer package. The categorical array of data is converted into its binary format for easier processing. The binary array format of
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
619
data makes the test and train of neural network easier; it also improves the efficiency of the network.
3.7 Implementation of the Proposed Model In this paper, we propose a model for crime detection in Twitter using sentiment analysis along with foul language detection, hate speech, and targeted speech examination. Figure 3 shows framework of proposed model. Tweets on various events that involved public interest, latest products, and movie reviews are obtained using the Twitter API. The obtained tweets are preprocessed and saved to CSV. The cleaned tweets are extracted from the CSV file and vectorized using TF–IDF vectorizer and input to the SVM classification algorithm to calculate the measure of abusive content, hate speech, and targeted speech. The cleaned tweets from the CSV file are then one-hot encoded and input to the trained MLP algorithm for sentiment calculation. The results obtained from both the algorithms can be used in the detection of crime and analysis of the nature of crime based on the measures of abusive, hate speech, and targeted speech contents. The algorithm elucidates the functioning of the proposed model: Step 1: Start. Step 2: Tweet retrieval using Twitter API. Step 3: Perform text preprocessing. Step 4: Save cleaned text to CSV file. Step 5: Create copy of tweets from CSV file and perform one-hot encoding.
Fig. 3 Proposed framework
620
A. Baby et al.
Step 6: Feed the one-hot encoded data as input to the trained network. Step 7: Store the polarity of sentiment analyzed. Step 8: Create copy of tweets from CSV file and perform TFIDF vectorization. Step 9: Perform clustering on the vectorized data. Step 10: Apply SVM classification algorithm. Step 11: Combine and analyze both the measures to produce the result. Step 12: End.
4 Implementation Results The objective of this experiment is to detect the law-breaking activities on Twitter based on the sentiment analysis that uses MLP, along with hate speech, targeted speech, and foul language detection using classification algorithms. Different classifiers were scrutinized for their accuracy, recall, precision, and F-score. For the examination of the classifier, a combined dataset of 55,638 tweets were used. Table 1 shows the partition of the dataset into test and train datasets; Table 2 shows the classes in the dataset. Table 3 contains the results obtained from evaluating all the algorithms as mentioned in Sect. 3.3. The results obtained from Table 3 are the average value of parameters when the classifier was tested with the individual datasets. The accuracy of the system is the (TP+TN) measure of classes that are properly recognized. It can be calculated as (TP+FP+TN+FN) where TP is true positive TN is true negative. Precision is used in the identification of correct positive classes. Out of all the classes predicted as positive, it is calculated as TP where FP is false positive. Recall is the ratio of correctly classified positive (TP+FP) TP classes against actual positive classes; it is calculated as (TP+FN) where FN is false negative. F-score is the harmonic mean of precision, and recall is calculated as (Precision * Recall) . 2 ∗ (Precision + Recall) Considering the accuracy and F-score, SVM was chosen for classification algorithm to build the system. Table 1 Partitioning of the combined dataset
Table 2 Classes of the dataset
Dataset type
Number of sentences
Training dataset
44,510
Testing dataset
11,128
Class Hate speech Offensive
Support 5691 23,667
Non-offensive
9082
Neutral
4405
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
621
Table 3 Evaluation metrics of different classifiers Classifier
Accuracy
Precision
Recall
F-Score
Naïve Bayes
0.880
0.990
0.884
0.934
KNN
0.876
0.990
0.880
0.932
SVM
0.884
1
0.884
0.938
Decision Tree
0.828
0.918
0.890
0.904
Random Forest
0.876
0.995
0.879
0.933
Logistic Regression
0.876
0.990
0.882
0.933
The MLP algorithm is trained and tested using 10,000 sentences. Table 4 shows the dataset into test and train samples. Table 5 represents the different classes available in the dataset; the sentiment dataset has 3 categories positive, negative, and neutral. The MLP yielded an accuracy of 93.7%. Figure 4 represents the validation accuracy against the Ephoc. The model was tested and trained by splitting the dataset as 80% for training and 20% for testing. The accuracy of the algorithm could be further improved by increasing the size of the training dataset. Since this work concentrates on both the sentiment analysis and many other parameters, the enhancement of the dataset to achieve more accuracy was clogged after it had achieved the threshold. The proposed model was tested with live tweets for finding the accuracy of system. Using the tweet retrieval API mentioned in Sect. 3.1, 700 tweets in English were collected from Twitter between the time periods October 4, 2019, and October 13, 2019. These tweets were manually classified for their sentiment and other parameters like hate speech, targeted speech, and foul language. The manually classified tweets were combined with the sentences from the test dataset and input to the proposed model for testing the accuracy. The description of the contents in each test sample is portrayed in Table 6. The results generated by the model for each test sample are given in Table 7. The accuracy of the proposed model is analyzed based on the comparison of the manually classified values with the values generated by the proposed model. The Table 4 Partitioning of the dataset
Table 5 Sentiment dataset
Dataset type
Number of comments
Training dataset
49,072
Testing dataset
12,268
Class
Support
Negative
21,911
Positive
19,893
Neutral
19,536
A. Baby et al.
Validation Accuracy Vs Ephoc
Ephoc 20
Ephoc 18
Ephoc 19
Ephoc 17
Ephoc 15
Ephoc 16
Ephoc 13
Ephoc 14
Ephoc 12
Ephoc 11
Ephoc 9
Ephoc 10
Ephoc 8
Ephoc 6
Ephoc 7
Ephoc 4
Ephoc 5
Ephoc 3
Ephoc 1
1 0.8 0.6 0.4 0.2 0 Ephoc 2
Validation Accuracy
622
Ephocs Validation Accuracy
Fig. 4 Validation accuracy of MLP training
comparison between the percentages of text classified manually and results generated by the proposed model are displayed in following figures (Figs. 5, 6 and 7). Figure 5 shows the percentage of positive tweets analyzed manually and predictions made by the proposed model in the test samples. In each of the test samples, the manual calculation is the expected result percentage, and the proposed system is the percentages obtained as result from the system. Figure 6 shows the percentage of tweets analyzed for neutral sentiment by manual calculation and proposed system. The expected percentage of neutral tweets is mapped against the results. Figure 7 maps the percentage of expected negative tweets against the calculation made by the model. Figure 8 displays the average values of crime parameters used in our work. The manual classification results are mapped against the results obtained from the proposed system. The average results of the four test samples are displayed in the graph. The classifier results are the polarity values obtained by the classification algorithm. The figures (Figs. 5, 6 and 7) show the accuracy of the proposed model in sentiment analysis and crime parameter detection. The results obtained interpret that the system has attained 91.05% accuracy with the data samples mentioned in Table 6 and the train and test results. The model proposed in this work is to combine the sentiment polarity with foul language, hate speech, and target speech detection that makes the process of cybercrime identification easier. Considering the scenario of test set 4 from Table 7 where the sentiment polarity of negativity is 0.27 which is the highest of all the sets in the table, hate speech is 0.223, and the targeted speech is 0.19 and abusive content polarity 0.21. It infers that out of all the test sets, set 4 is more targeted toward a person or community in a negative manner, which brings a conclusion that set 4 has to be marked as having cybercrime content. Table 8 shows the comparison of proposed model with other traditional models based on accuracy of sentimental analysis.
Classification scheme
Manual
Manual
Manual
Manual
Test set No.
1
2
3
4
Table 6 Description of test data
100
250
150 100
150
150
100
48
146
93
103
Positive
200
Sentiment
Live tweet
Dataset
No. of sentences
54
92
61
42
Negative
98
162
146
155
Neutral
42
63
31
26
Foul language
49
51
53
47
Hate speech
38
37
72
76
Targeted speech
71
249
144
151
Plain text
Psychosomatic Study of Criminal Inclinations with Profanity on Social … 623
624
A. Baby et al.
Table 7 Results from proposed model Positive
Negative
Neutral
Foul language detection
Hate speech detection
Targeted speech detection
Neutral text detection
1
0.336
0.141
0.523
0.083
0.15
0.026
0.503
2
0.312
0.214
0.474
0.103
0.173
0.22
0.46
3
0.354
0.252
0.394
0.153
0.127
0.091
0.599
4
0.253
0.272
0.485
0.21
0.223
0.19
0.315
Percentage of Positive Tweets
Test set No.
Sentiment polarity
Positive Sentiment 40 30
33.6 34.33
3…36.5
31.2 31
25.3 24
20 10 0
Test Sample 1
Test Sample 2
Proposed System
Test Sample 3
Test Sample 4
Manual Calculation
Percentage of Neutral Tweets
Fig. 5 Positive sentiment result
Neutral Sentiment 30 20
25.2 23
21.4 20.33
14.1 14
27.2 27
10 0 Test Sample 1
Test Sample 2
Proposed System
Test Sample 3
Test Sample 4
Manual Calculation
Fig. 6 Neutral sentiment result
5 Conclusion and Future Enhancement Our proposed model is a tactic inferred to improve the measure of cybercrime detection based on the results obtained by combining the sentiment polarity and the polarity values of foul language, hate speech, and target speech. The tenacity of the model is
Percentage of Negative Tweets
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
625
Negative Sentiment 60
52.3 51.66
47.4 48.66
Test Sample 1
Test Sample 2
39.4 40.5
40
48.5 49
20 0 Proposed System
Test Sample 3
Test Sample 4
Manual Calculation
Classifier Result Average
Fig. 7 Negative sentiment result
1.2 1 0.8 0.6 0.4 0.2 0
0.4902 0.13935 0.13725
0.176425 0.16825
0.19395 0.13175
0.46925
Foul Language Detection
Hate Speech Detection
Targeted Speech Detection
Neutral Text
Result Obtained
Expected Result
Fig. 8 Polarity average of crime parameters Table 8 Comparison of proposed model with other models
Model
Accuracy (%)
Random Forest [11]
52.0
ANN [11]
54.0
Support Vector Machine [11]
56.0
Naïve Bayes [8]
66.6
Logistic Reg.BoL1 [10]
69.0
Naïve Bayes. BoW [10]
70.9
Naïve Bayes.TF–IDF [10]
73.9
Support Vector Machine [8]
83.3
BiLSTM [16]
84.7
Proposed model
91.05
626
A. Baby et al.
to analyze a sentence or reflect on cybercrime, based on the crime parameters and the sentiment associated with the issue. This study broadsheet presents an extension to the current prima facie approach of detecting cybercrime on social media. With the clubbing of the sentiment analysis and 3 various parameters of crime detection, we can quantify and qualify the measure of targeted abuse detection and many more avenues incorporated with it. Unison of four parameters for detection has opened an aisle for an improvement over the existing system, which performed the analysis based on either of the parameters. Future work will be centered on the improvement of the systems accuracy and execution time. An in-depth study on the integration of the parameters would be initiated to overhaul the detection process in use now. Developing evaluation matrices for different polarity ranges of the parameters would be fundamental for further improvements and evolutions as the study progresses.
References 1. Gupta P, Goel A, Lin J, Sharma A, Wang D, Zadeh R (2013) WTF: the who to follow service at Twitter. In: International world wide web conference committee (IW3C2). Rio de Janeiro, Brazil 2. Li J, Qiu L (2017) A sentiment analysis method of short texts in microblog. In: 2017 IEEE International conference on computational science and engineering (CSE) and IEEE International conference on embedded and ubiquitous computing (EUC). IEEE, Guangzhou, China 3. Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. In: Proceedings of the first workshop on abusive language online, pp 52–56 4. Dani H, Liu H, Tong H (2017) Sentiment informed cyberbullying detection in social media. Lecture Notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) 10534 LNAI, pp 52–67 5. Yoo S, Song JI, Jeong OR (2018) Social media contents based sentiment analysis and prediction system. Expert Syst Appl 105:102–111 6. Jindala S, Sharmab K (2018) Intend to analyze social media feeds to detect behavioral trends of individuals to proactively act against social threats. In: International conference on computational intelligence and data science (ICCIDS 2018) Proceedings on Procedia Computer Science, vol 132, pp 218–225. Elsevier 7. Prakruthi V, Sindhu D, Kumar A (2018) Real time sentiment analysis of Twitter posts. In: Proceedings 2018 3rd International conference on computational systems and information technology for sustainable solutions, CSITSS 2018. IEEE, Bengaluru, India, pp 29–34 8. Desai RD (2019) Sentiment analysis of Twitter data. In: 2nd International conference on intelligent computing and control systems (ICICCS 2018), Part Number: CFP18K74-ART 9. Prathap B, Ramesh K (2019) Twitter sentiment for analysing different types of crimes. In: Proceedings on 2018 International conference on communication, computing and internet of things (IC3IoT). IEEE, Chennai, India, pp 438–488 10. Ruwandika NDT, Weerasinghe AR (2019) Identification of hate speech in social media. In: 2018 18th International conference on advances in ICT for emerging regions (ICTer). IEEE, Colombo, Sri Lanka, pp 273–278 11. Mendonça R, Britto D, Rosa F, Reis J, Bonacin R (2020) A framework for detecting intentions of criminal acts in social media: a case study on Twitter. Information 11(3):154–194 12. Curiel R, Cresci S, Muntean C, Bishop SR (2020) Crime and its fear in social media Rafael. Palgrave Commun 6(1):1–12
Psychosomatic Study of Criminal Inclinations with Profanity on Social …
627
13. Vo T, Sharma R, Kumar R, Son LH (2020) Crime rate detection using social media of different crime locations and Twitter part-of-speech tagger with Brown clustering. J Intell Fuzzy Syst 38(4):4287–4299 14. Jacob S, Vijayakumar R (2021) Sentimental analysis over twitter data using clustering based machine learning algorithm. J Ambient Intell Hum Comput 1(1):1–12 15. Najjar E, Al-augby S (2021) Sentiment analysis combination in terrorist detection on Twitter: a brief survey of approaches and techniques. In: Kumar R, Quang NH, Kumar Solanki V, Cardona M, Pattnaik PK (eds) Research in intelligent and computing in engineering. Advances in intelligent systems and computing. Springer, Singapore. https://doi.org/10.1007/978-98115-7527-3_23 16. Mahajan R, Mansotra V (2021) Correlating crime and social media: using semantic sentiment analysis. (IJACSA) Int J Adv Comput Sci Appl 12(3):309–316 17. Antigoni-Maria F, Constantinos D, Despoina C, Ilias L, Jeremy B, Gianluca S, Athena V, Sirivianos M, Kourtellis N (2018) Large scale crowdsourcing and characterization of Twitter abusive behavior. In: International AAAI conference on web and social media (ICWSM) 18. DataCite Homepage. https://doi.org/10.5072/FK2/ZDTEMN. Last accessed 19 Sept 2021 19. Roshan F, D’Souza R (2016) Analysis of product Twitter data though opinion mining. In: 2016 IEEE Annual India Conference (INDICON). IEEE, Bangalore, pp 1–5 20. Duchi J, Elad H, Yoram S (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159 21. Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning
Grading of Diabetic Retinopathy Using Machine Learning Techniques H. Asha Gnana Priya and J. Anitha
Abstract Diabetic retinopathy (DR) is a vision-threatening eye disease caused by blood vessel damage. Diabetes patients are commonly affected by DR, and early detection is essential to avoid vision loss. The proposed system uses Indian diabetic retinopathy image dataset (IDRiD) and enhances it using Partial Differential Equation (PDE). Morphological operations are used to detect lesions like microaneurysms, exudates, and haemorrhages, and the clinical features include the area of blood vessels, area and count of the lesions are extracted. The Grey Level Co-occurrence Matrix is used to extract statistical features (GLCM). The extracted 7 clinical and 11 statistical features are fed into machine learning classifiers such as feed forward neural network (FFNN), support vector machine (SVM), and knearest neural network (KNN) for DR classification with two and five classes. The performance metrics sensitivity, specificity, accuracy, negative predictive value, and positive predictive value are calculated, and the accuracy obtained for two class classification for FFNN, SVM, and KNN are 95, 95, and 90%. Keywords Diabetic retinopathy · Partial differential equation · Exudates · Microaneurysms · Haemorrhages · Feed forward neural network
1 Introduction Human eye is the most beautiful creation of God, affected by diabetic retinopathy (DR), Glaucoma, Central Retinal Vein Occlusion (CRVO), Macular Edema (ME), Central Serous Retinopathy (CSR), Choroidal Neo-Vascularization Membrane (CNVM), etc. DR is affected by the people having diabetes mostly the working age population. This is caused by an increase in blood glucose levels, which can lead H. Asha Gnana Priya (B) · J. Anitha Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] J. Anitha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_44
629
630
H. Asha Gnana Priya and J. Anitha
to blood vessel damage. The vision once lost cannot be retained but vision loss can be controlled [1]. Proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR) are the two different types of DR. NPDR includes mild, moderate, and severe stages [2]. The clinical features for detecting DR are microaneurysms, exudates, haemorrhages, cotton wool spots, and neovascularization. Based on the features present, the images are classified into different stages [3]. At the age of 50, one in 200 people and at the age of 80, one in 10 people are affected by Glaucoma [4]. Neuro-retinal nerve from the brain to the eye is damaged, and it cannot be detected till the final stage resulting total blindness so, early diagnosis of Glaucoma is needed to prevent the vision loss [5]. Optic disc (OD) region is enhanced and the blood vessels are separated using low pass unit impulse response filter. Dilation and median filtering operations are used to segment the OD region, and the computing complexity is minimal. The accuracies obtained for these methods are 100, 96.92, 98.99, and 100% using the images from the databases DRIVE, DIRATEDB0, DIRATEB1, and DRIONS [6]. Without using Background mark and blood vessels, Optic disc is automatically detected using colour retinal fundus image which is applied on six public database and is tested using circle operator method to obtain good accuracy [7]. Diagnosis in modern ophthalmology is supported by retinal image segmentation and the morphology of blood vessels, and optic disc is used for diagnosis of various eye diseases. Graph cut method is used for extraction of retina vascular tree and the blood vessel segmentation is used to detect optic disc location [8]. During the pre-processing stage, different colour channels are combined for segmenting optic disc from the retinal fundus image. The Canny edge detection method and morphological operations are used to segment the optic disc. This method is used to test the RIM-ONE database, and the accuracy obtained is 94.90% [9]. The segmentation of retinal veins and arteries can be done by mathematical morphological operation and wavelet transform. The tubular characteristics of blood vessels are explored and the landmark for image registration is optic disc and vascular structure. Without using a manual try-and-error method to select the best parameter, a genetic algorithm is used to sequence the generation cross-over. To overcome the limitations, DIJKSTRAS shortest path algorithm is used by graph-based approach [10]. Glaucoma can be diagnosed using measures such as the Cup to Disc Ratio (CDR), Disc Damage Likelihood Scale (DDLS), Inferior Superior Nasal Temporal (ISNT) rule, and Glaucoma Risk Index (GRI). Colour variation from orange to pink is the indication for the disease indication [11]. The optic disc and cup boundaries are detected using a novel adaptive region-based edge smooth method. Pixellevel multi-dimension feature space-based detection by region classification model of initial optimum object boundary and by minimizing energy function is obtained to iterative force field calculation with contours. Added value detection and boundaries of other objects are provided in medical image processing field and analysing [12]. HEURISTIC algorithm-based hybrid approach is used by RIM one bench mark retinal database. Pre-processing is done by Jaya algorithm, distance regularized level post processing, and segmentation is done by watershed and Chan-Vese procedures [13]. The cup to disc ratio is calculated as part of the structural diagnosis of the
Grading of Diabetic Retinopathy Using Machine Learning Techniques
631
disease. The optic disc and cup are segmented using the U-Net convolutional neural network. DRIONS-DB, RIM-ONE v.3, and DRISHTI-GS are some of the databases that the methods are compared [14]. Patients with glaucoma have a high cup to disc ratio. The optic disc is segmented using a fuzzy approach, and the texture is evaluated. The cluster method employs optic disc segmentation, which is used to review edge noise and vessels. The pre-post method’s performance was compared and 94% is obtained [15]. A multimodal technique based on spectral domain coherence tomography (SD-OCT) and segmentation of the cup borders, and optic disc is proposed to exploit the complimentary information. The three in region cost functions are designed by three regions corresponding to the cup, rim, and backdrop using a machine learning theoretical graph-based method and a random forest classifier. The cost functions are compared using the cost of a single model region, the cost of a multimodal region, and the cost of a disc boundary. As a result, when it comes to segmenting the optic disc and cup, the multimodel technique outperforms the uni-model approach [16]. High-resolution radial spectral domain optical coherence tomography (SDOCTB) scans centred on the Optic Nerve Head (ONH) are analysed at each clock hour, and the elschnig border tissue is classified for obliqueness and Bruch’s membrane for each scan with customized software optic disc stereo photographs [17]. Joint segmentation of optic disc and cup can be done by fully convolutional networks (FCNs) and feature enhancement does not require complicated pre-processing techniques. Fully convolutional and adversarial networks use the retinal images and the segmentation mapping between them, and 159 images from RIM database is used for the comparison [18]. The features extracted for detecting different stages of DR are given to support vector machine (SVM) with the dataset of 400 retinal images gives sensitivity of 95% and predictive capacity of 94% [19]. Automatic diagnosis method reduces the time and the cost for different stages classification. The localization of the affected lesions on retinal surface is given to the convolutional neural network to classify five stages and the achieved accuracy is 89% [20]. In this paper, the image is enhanced using a Partial Differential Equation (PDE), and the features extracted from the enhanced image are given to the feed forward neural network, which is then compared to a support vector machine and a k-nearest neural network. The performance metrics sensitivity, specificity, accuracy, positive predictive value, and negative predictive value are compared for three classifiers based on different cases and classes.
2 Proposed System The block diagram for grading of DR using computer aided diagnosis (CAD) system is shown in (Fig. 1).
632
H. Asha Gnana Priya and J. Anitha
Fig. 1 Block diagram for detecting diabetic retinopathy
Fig. 2 Input retinal fundus image
2.1 Input Image The input retinal fundus image is taken from the Indian diabetic retinopathy image dataset (IDRiD) given by ISBI challenge 2018. It is the first retinal fundus image from India, collected at an eye clinic in Nanded, Maharashtra. This dataset contains 516 images of training and testing sets with 168 normal images and 348 DR images including Mild NPDR, Moderate NPDR, Severe NPDR, and PDR. The DR input retinal image used in the proposed system is shown in (Fig. 2).
2.2 Image Enhancement During the image acquisition process, factors such as media opacity, camera nonalignment, small pupils, camera focusing problems, and noise diminish the quality of the retinal fundus image. Image enhancement is essential in computer-aided diagnosis (CAD) approaches to increase image quality due to the above-mentioned factors affecting illumination, colour, and contrast. Image enhancement reduces the workload of the process and aids in quality improvement. Because the classifier can be slowed by different image sizes, the images are resized to the same size to speed up the process. Feature localization is possible by splitting the RGB image into red, green, and blue channels. The red channel, which has the lowest contrast and is the
Grading of Diabetic Retinopathy Using Machine Learning Techniques
633
Fig. 3 Image enhancement: a resized image, b green channel, c greyscaled image, and d PDE image
brightest colour channel, is used to collect information about the optic disc and blood vessels. This channel is used to detect diseases like glaucoma and is not suitable for detecting DR due to the high level of noise. The green channel contains information of the retinal blood vessels and clinical factors that are used in DR grading. Because of the low contrast level in the blue channel, it cannot be used to diagnose any disease. To identify the edges or other features and to reduce memory size, the colour image is converted to greyscale. Morphological operations and segmentation problems are simply solved because of this single-layered image. PDE is applied in the greyscaled image for enhancing the image. Partial Differential Equation of smoothing function is given by ∂p = −∇ 2 [c ∇ 2 p ]∇ 2 p ∂t
(1)
where image gradient, c is the diffusion co-efficient and is given by c ∇ p = ∇ p is the ∇ exp − k 2p ∧ 2 , and k is the gradient threshold (Fig. 3).
2.3 Feature Extraction Clinical and statistical features are the extracted features in the proposed system. Clinical features include blood vessel area, count and area of exudates, microaneurysms,
634
H. Asha Gnana Priya and J. Anitha
and haemorrhages. For extracting exudates, first optic disc should be identified and removed. The PDE filtered image is inverted and is subtracted from the greyscaled image. The resulting image is binarized and complemented, then subtracted from the PDE image to remove the OD. In the OD removed image, an opening operation with a disc-shaped structuring element is applied, followed by adaptive threshold-based binarization. The binarized image is subjected to Canny edge detection, and the holes are filled. The image with exudates is obtained by subtracting the holes filled image from the Canny image. Microaneurysms is detected from the optic disc removed image. Canny edge detection is applied in the optic disc removed image and holes are filled. The holes filled image is subtracted from the Canny edge detected image for detecting microaneurysms. Blood vessel should be detected and removed for extracting haemorrhages. The steps for haemorrhages detection are given as follows, Adaptive histogram equalization is applied in the PDE image for enhancing the blood vessels. With the equalized image, the average special filter is applied and subtracted. In both the subtracted and PDE images, a grey level threshold is applied. The connected components are removed by area opening in the threshold image from the subtracted image. The threshold image from the PDE image is inverted and subtracted with the area open image to obtain the blood vessel. The haemorrhages are obtained by subtracting the resultant image from the threshold image and filling the holes. The output images of exudates, microaneurysms, and haemorrhages are shown in (Fig. 4). Statistical features are extracted from the enhanced image. Grey Level Co-occurrence Matrix (GLCM) is formulated to obtain statistical features. Number of texture features can be extracted from GLCM. The GLCM features extracted from the filtered image is mean, variance, standard deviation, skewness, kurtosis, energy, entropy, contrast, inverse difference moment, correlation, and homogeneity.
2.4 Classifiers Grading of DR in the proposed system uses the machine learning techniques like feed forward neural network (FFNN), support vector machine (SVM) and k-nearest neural network (KNN). The classifier input includes clinical features (7) and statistical features (11). Back propagation neural network is another name for the feed forward neural network. The network’s initial weights are set, and the number of features obtained (18) are assigned as input neurons. The proposed system uses 10 hidden neurons, but this number can vary depending on the process. Target (2) is set in the output layer for bilevel classification. To improve the network until it is trained to perform the task, iterative, recursive, and efficient methods for updating the weights are used. The feed forward neural network is represented in (Fig. 5). The same network with target (5) is used for multi-level classification.
Grading of Diabetic Retinopathy Using Machine Learning Techniques
635
Fig. 4 Clinical Feature Detection: a Inverted image subtracted from greyscale image, b binarized image c optic disc removed image, d Canny edge detection with holes filled in binarized image, e exudates detected image, f Canny edge detection with holes filled in optic disc removed image, g microaneurysms detected image, h blood vessels detected image, i background removal from the blood vessel image, and j haemorrhages detected image
Fig. 5 Feed forward neural network diagram
636
H. Asha Gnana Priya and J. Anitha
2.5 Performance Metrics Specificity (SP), sensitivity (SE), accuracy (AC), negative predictive value (NPV), and positive predictive value (PPV) are the performance metrics calculated in the proposed system (PPV). All the performance metrics are calculated by true positive (TP), true negative (TN), false positive (FP), and false negative(FN) values. The mathematical equations for all the performance are given below. TP + TN TP + TN + FP + FN
(2)
SE =
TP TP + FN
(3)
SP =
TN TN + FP
(4)
PPV =
TP TP + FP
(5)
NPV =
TN TN + FN
(6)
AC =
The performance of three different cases with bilevel and multilevel class classification is analysed. Clinical features alone to the classifier are considered the first case, statistical features alone to the classifier are considered the second case, and clinical features combined with statistical features are considered the third case. The tabulation for the performance metrics of bilevel and multilevel classes of all the 3 cases are tabulated in Tables 1 and 2. When compared to multilevel classification (normal, mild NPDR, moderate NPDR, severe NPDR, and PDR) and bilevel (Normal & DR) classification with all three cases of features as input provides the best performance level. Multilevel DR grading is more complicated than bilevel grading. Improved multilevel grading requires fine-tuning of machine learning techniques as well as an increased number of features.
3 Conclusion In this paper, the original resized image is enhanced by a Partial Differential Equation to highlight the blood vessels and optic disc. Exudates, microaneurysms, and haemorrhages are the lesions detected. The blood vessel and optic disc must be removed in order to detect the clinical features like area of blood vessels and area and count of the detected lesion. Clinical and statistical features are fed into classifiers with 2 and 5 class classification. The proposed system calculates sensitivity, specificity, accuracy,
Grading of Diabetic Retinopathy Using Machine Learning Techniques
637
Table 1 Bilevel classification (Normal and DR) Images FFNN
SVM
KNN
Feature extraction Statistical and clinical
Statistical
Clinical
AC
95
95
90
SE
80
80
100
SP
100
100
86.7
PPV
100
100
71.4
NPV
93.8
93.8
100
AC
90
90
85
SE
100
80
62.5
SP
88.2
93.3
100
PPV
60
80
100
NPV
100
93.3
80
AC
90
95
85
SE
60
83.3
66.6
SP
100
100
92.8
PPV
100
100
80
NPV
88.2
93.3
86.6
Table 2 Multilevel classification (normal, mild NPDR, moderate NPDR, severe NPDR, and PDR) Images FFNN
SVM
KNN
Feature extraction Statistical and clinical
Statistical
Clinical
AC
90
90
85
SE
85.7
100
60
SP
98.7
86.7
93.3
PPV
97.7
71.4
75
NPV
96.2
100
87.5
AC
85
80
85
SE
80
40
100
SP
86.7
93.3
80
PPV
66.7
66.7
62.5
NPV
92.2
82.4
100
AC
85
85
80
SE
80
66.6
57.1
SP
86.7
82.8
92.8
PPV
66.7
80
80
NPV
92.9
86.6
86.6
638
H. Asha Gnana Priya and J. Anitha
positive predictive rate, and negative predictive rate for two different classes with different feature numbers, and the accuracy for feed forward neural network is 95% when compared to support vector machine and k-nearest neural network. Increasing the number of images in the dataset, fine-tuning the extraction of clinical features and implementing deep neural networks can help in improving the performance of the network.
References 1. Acharya U, Kim C, Ng E, Tamura T (2009) Computer-based detection of diabetes retinopathy stages using digital fundus images. J Eng Med 223(5):545–553 2. Saranya P, Umamaheswari, Sivaram, Jain, Bagchi (2021) Classification of different stages of diabetic retinopathy using convolutional neural networks. In: 2nd International proceedings on computation, automation and knowledge management (ICCAKM). IEEE, Dubai, pp 59–64 3. Wang, Lu, Wang, Chen (2018) Diabetic retinopathy stage classification using convolutional neural networks. In: International proceedings on information reuse and integration (IRI). IEEE, USA, pp. 465–471 4. Anitha J, Priyadharshini M (2014) A region growing method of optic disc segmentation in retinal images. In: International proceedings on electronics and communication systems. IEEE, Coimbatore, pp 1–5 5. Dey, Roy, Das, Chaudhuri (2012) Optic cup to disc ratio measurement for glaucoma diagnosis using harris corner. In: 3rd International proceedings on computing, communication, and networking technologies. IEEE, Coimbatore, pp 1–5 6. Bharkad S (2017) Automatic segmentation of optic disc in color fundus retinal images. Biomed Signal Process Control 31:483–498 7. Rezia, Nahid M (2018) Automatic detection of optic disc in color fundus retinal images using circle operator. Biomed Signal Process Control 45:274–283 8. Kulkarni, Annadate (2017) Optic disc segmentation using graph cut technique. In: 3rd international proceedings on sensing, signal processing and security. IEEE, Chennai, pp 124–127 9. Pal, Chatterjee (2017) Mathematical morphology aided optic disk segmentation from retinal images. In: 3rd international proceedings on condition assessment techniques in electrical systems. IEEE, Rupnagar, pp 380–385 10. Rodrigues, Marangoni (2017) Segmentation of optic disc and blood vessels in retinal images using wavelets, mathematical morphology and Hessian-based multi-scale filtering. Biomed Signal Process Control 36:39–49 11. Thakur, Juneja (2018) Survey on segmentation and classification approaches of optic cup and optic disc for diagnosis of glaucoma. Biomed Signal Process Control 42:162–189 12. Haleem, Han, Hemert, Li, Fleming, Pasquale, Song (2018) A novel adaptive deformable model for automated optic disc and cup segmentation to aid glaucoma diagnosis. J Med Syst 42(1):20 13. Shree, Revanth K, Raja N, Rajinikanth V (2018) A hybrid image processing approach to examine abnormality in retinal optic disc. Procedia Comput Sci 125:157–164 14. Sevastopolsky, Artem (2017) Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network. Pattern Recogn Image Anal 27(3):618–624 15. Sun, Kuan, Hanhui (2015) Optic disc segmentation by ballon snake with texture from color fundus images. J Biomed Imaging 4:1–14 16. Miri, Abramoff, Lee, Niemeijer, Wang, Kwon, Garvin (2015) Multimodal segmentation of optic disc and cup from SD-OCT and color fundus photographs using a machine-learning graph-based approach. IEEE Trans Med Imaging 34(9):1854–1866
Grading of Diabetic Retinopathy Using Machine Learning Techniques
639
17. Reis, Sharpe, Yang, Nicolela, Burgoyne, Chauhan (2012) Optic disc margin anatomy in patients with glaucoma and normal controls with spectral domain optical coherence tomography. Ophthalmology 119(4):738–747 18. Shankaranarayana, Ram, Mitra, Sivaprakasam (2017) Joint optic disc and cup segmentation using fully convolutional and adversarial networks. In: International proceedings on fetal, infant and ophthalmic medical image analysis. Springer, Cham, pp 168–176 19. Enrique, Gonzalez, Carrera (2017) Automated detection of diabetic retinopathy using SVM. In: 14th international proceedings on electronics, electrical engineering and computing. IEEE, Peru, pp 1–4 20. Alyoubi, Abulkhair, Shalash (2021) Diabetic retinopathy fundus image classification and lesions localization system using deep learning. Sensors 21(11):1–22
Cognitive Radio Networks Implementation for Optimum Spectrum Utilization Through Cascade Forward and Elman Backpropagation Neural Networks Rahul Gupta and P. C. Gupta Abstract The wireless network users were not able to utilize the radio frequency spectrum efficiently, which wasted a lot of bandwidth of the spectrum and resulted in the underutilization of the spectrum. The cognitive radio network (CRN) emerged as a potential solution for efficient utilization of the spectrum. CRN is an intelligent wireless radio frequency network, which consists of two kinds of users: primary users, which uses the license band of the spectrum allocated to them and secondary users, which uses the license band of the primary users when they are not using it. The primary users when want to use their license frequency band preempt the secondary users. Secondary users detect the presence of primary users with the help of various spectrum sensing techniques. When the CRN is implemented with artificial neural network (ANN), it enhances the capability of the secondary users to efficiently sense the spectrum. It learns the parameters of spectrum environment and take decision to detect the presence of primary users. This article discusses various aspects of the cognitive radio networks and its implementation by using artificial neural networks. In this article, the CRN simulation model is developed, which implements the partial cooperative spectrum sensing technique. The CRN simulation model uses the cascade forward backpropagation ANN and Elman backpropagation ANN along with various training functions to enhance the sensing, learning, and decision-making capabilities of the secondary users. Keywords ANN (Artificial Neural Networks) · CRN (Cognitive Radio Networks) · CFBP (Cascade Forward Backpropagation) · RF (Radio Frequency) · RP (Resilient Backpropagation) · PU (Primary Users) · SU (Secondary Users)
R. Gupta (B) · P. C. Gupta Department of Computer Science and Informatics, University of Kota, Kota, India e-mail: [email protected] P. C. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_45
641
642
R. Gupta and P. C. Gupta
1 Introduction Wireless radio networks suffer from the issue of spectrum underutilization. A lot of bandwidth remains unutilized in wireless networks due to slow speed of its devices, protocols, and architecture. Nadine et al. [1] suggested that cognitive radio networks [2] are intelligent wireless radio networks, which utilize the spectrum efficiently due to the learning capabilities of its users. CRN has two kinds of users: primary users, which uses the license band of the spectrum allocated to them and secondary users, which uses the license band of PU when they are not using it. The SU of CRN has sensing and learning capabilities to detect the presence of PU in a particular channel of the spectrum. Figure 1 shows that the primary users are connected with other users via primary base station, and secondary users are connected with other users via cognitive radio (CR) base station as discussed by Shaat and Badar [3]. The IEEE gave 802.22 standard to CRN [4] The major advantage of CRN is that they could coexist with the current RF spectrum like 3G, 4G, and with the upcoming spectrums like 5G, 6G, and 7G. These capabilities motivated the researchers like Beibei and Liu [5] to understand the network protocols, architecture, devices, and communication mechanism of CRN. Enhancement of sensing and learning capabilities of secondary users became a challenge
Fig. 1 Cognitive radio networks [3]
Cognitive Radio Networks Implementation for Optimum Spectrum …
643
in front of researchers. To detect the presence of PU in a particular channel of the spectrum, the researchers Shewangi and Garg [6], Zeng et al. [7] analyzed various spectrum sensing techniques categorized into non-cooperative and cooperative spectrum sensing. In non-cooperative spectrum sensing techniques like energy detection, spectrum hole prediction, match filter detection presented by kockaya and Develi [8], cyclo-stationary feature detection by Aparna and Jayasheela [9] etc., the SU does not require to share their sensing information with each other. Lu et al. [10] and Ian et al. [11] presented that in cooperative spectrum sensing technique the SU share their sensing information with each other to detect the presence of PU in the frequency band of the spectrum. The researchers Wang et al. [12] categorized the cooperative spectrum sensing as centralized, partial, and distributive cooperative spectrum sensing. In the centralized cooperative spectrum sensing, there is a controlling device called fusion center, which collects the information from local secondary users and detects the presence and absence of PU. In the partial cooperative spectrum sensing, the local SU detects the presence or absence of PU and sends the decision to FC, which informs the other SU. The fusion center is not required in distributive CSS. The researchers Mohammadi et al. [13] worked on and proposed the implementation of artificial neural networks in CRN to enhance the sensing, learning, and decisionmaking capabilities of the SU. In this article, the authors created a CRN simulation model, which implements the partial cooperative spectrum sensing. CRN simulation uses cascade forward backpropagation and Elman backpropagation artificial neural networks along with various training functions like Bayesian regularization (BR) and resilient backpropagation (RP). SU of this CRN simulation model could efficiently detect the presence of PU in a particular channel of the spectrum. This article is divided into various sections. Section 2 discusses cooperative spectrum sensing and its types. Section 3 discusses cascade forward backpropagation and Elman backpropagation networks. Section 4 discusses CRN simulation model designed and developed which uses the cascade forward backpropagation and Elman backpropagation ANN for the optimum performance. Section 5 discusses the data collection, working, and feature extraction. Section 6 discusses the experiment performed, results and graphs obtained after successful execution of the CRN simulation model using cascade forward backpropagation and Elman backpropagation ANN with various training functions. Section 7 discusses the limitations of the proposed work. Section 8 concludes the article.
2 Cooperative Spectrum Sensing In the cooperative spectrum sensing, the secondary users share the local spectrum sensing information with each other for better detection of primary user in the spectrum. There is a central station called fusion center to control the process of CSS, which is executed in three steps: local sensing, reporting, and data fusion. In the first step, the FC selects particular channels in the spectrum and asks all secondary users to perform local sensing. In the second step, the local SU performs sensing
644
R. Gupta and P. C. Gupta
by receiving the information of energy received from the signal of the primary user. They also obtain the information about the energy of signals of PU from the other secondary users. They combine the information and forward the sensing result to fusion center through the reporting channel. In the third step, the fusion center collects the information from all the local secondary users, combines them and detects whether the primary user is present or not in the particular channel of the spectrum. It sends the result back to all the secondary users. The CSS is classified as centralized cooperative spectrum sensing, partial cooperative spectrum sensing, and distributive cooperative spectrum sensing.
2.1 Centralized Cooperative Spectrum Sensing In this technique, the secondary users sense the spectrum and obtain the information of energy from the signal of primary user and combines the information with the information received from other local secondary users and sends the combined information to the fusion center. The fusion center collects the information from all the SU, combines the information, and detects the presence or absence of the PU in a particular channel.
2.2 Partial Cooperative Spectrum Sensing In this technique, the secondary user independently senses the channel, detects the presence of PU, and sends the information to the fusion center. This reduces load on the fusion center. The fusion center informs all the other local SU.
2.3 Distributive Cooperative Spectrum Sensing In this spectrum sensing, there is no need of the fusion center as the secondary users share their information with each other and could select which part of the spectrum they want to use. This sensing does not require any backbone infrastructure.
3 Cascade Forward Backpropagation and Elman Backpropagation ANN In this article, the cascade forward and Elman backpropagation neural networks are used along with various learning functions. Johnson [14, 15] and Al- Masr [16]
Cognitive Radio Networks Implementation for Optimum Spectrum …
645
discussed that backpropagation is a method to fine-tune the weights of the neutral network based on the error rate obtained in the previous iteration (epoch). The error is obtained by calculating the difference of desired output and actual output obtained. Proper tuning of the weights reduces the error rate, thus makes the network more reliable. Backpropagation calculates the gradient (derivative) of the error with respect to the weights in the neural networks. The feed forward backpropagation has two passes, forward pass and backward pass. (a) Forward Pass: In forward pass, the input arrives at the input layer. The input layer is connected to the hidden layer, and every connection link has the associated weights, which are randomly selected. The activation function calculates the output of input layer nodes, which acts as an input to the hidden layer nodes. The activation function F calculates the output on every node by using the weighted sum of inputs and bias. F =b+
Wi ∗ X i
(1)
When the output of every input layer is calculated, the output of the nodes of the hidden layer is calculated, which becomes the input for the output layer to get the actual output using the activation function. (b) Backward Pass: When the output is obtained, the error is calculated by matching the actual output obtained with the desired output. E=
1/2(target output−actual output)2
(2)
The gradient descent (partial derivative) of error is calculated with respect to every weight. ∂ E/∂ W = ∂ E/∂ O ∗ ∂ O/∂ F ∗ ∂ F/∂ Wi
(3)
The weights are updated using the equation, Wi(new) = Wi(old) − α(∂ E/∂ W )
(4)
where α is the learning rate The weights of every node of the hidden layer are updated, and then, the information is passed backward to the input layer, and weights of every node of the input layers are updated. The process continues till we get minimum gradient ∂E/∂W so that the error is minimized, and the actual output is equal to desired or theoretical output.
646
R. Gupta and P. C. Gupta
Fig. 2 Cascade forward backpropagation ANN
3.1 Cascade Forward Backpropagation ANN The researchers Narada and Chavnav [17] presented that cascade forward neural network [18] is ANN which is similar to feed forward ANN, but it includes the connection from input layer and every previous layer with the following hidden and output layers. The main advantage is that each layer is connected with every previous layer thus making the network more efficient in learning complicated relationships. It uses the backpropagation algorithm for updating the weights. Figure 2 shows a cascade forward backpropagation neural network containing one input layer, two hidden layers, and one output layer. The input layer and every previous layer is connected with the following hidden and output layers. Both the hidden layers have 5 neurons each.
3.2 Elman Backpropagation ANN Elman [19] proposed Elman neural network which in addition to the input layer, one or more hidden layers and output layer, consists of an additional layer called context or undertake layer as presented by Mohamed et al. [20]. The context layer input comes from the output of hidden layers. The context layer stores the output of hidden layer which could be used as input to the hidden layer next time. This ANN takes the feedback from output of every hidden layer as input to every corresponding layer which helps in detecting, creating, and modifying patterns every time. The advantage of such ANN is that it is capable to understand the complex relation between future values and past values even if it has to learn them and generalize. Figure 3 shows an Elman backpropagation neural network containing one input layer, two context layers, and one output layer. Both the context layers have 5 neurons each.
Cognitive Radio Networks Implementation for Optimum Spectrum …
647
Fig. 3 Elman backpropagation ANN
4 Simulation Model This section discusses CRN simulation model, which is trained with the help of cascade forward backpropagation and Elman backpropagation neural networks using various training functions. The different training functions for cascade forward backpropagation ANN and Elman backpropagation ANN are shown in Tables 1 and 2, respectively.
4.1 CRN Simulation Model Gupta and Gupta [21] and Dongy et al. [22] suggested that MATLAB is best suited to code cognitive radio network; therefore, the CRN simulation model has been designed and coded in MATLAB [23–25] for performing experiments such that the SU could detect the presence of PU using the partial cooperative spectrum sensing. In the CRN simulation model, the spectrum has the frequency range of 54–698 MHz which is divided into 43 channels. The PU has the license to use any of these channels as per requirement. The PU is connected to the frequency channel via Augmented White Gaussian Noise (AWGN) channel. The SU is connected with the fusion center and with the frequency channel via AWGN channel. The SU detects Table 1 Training functions for cascade forward backpropagation ANN Training BFGS Bayesian Levenberg–Marquardt Resilient One function Quasi-Newton Regularization (LM) Backpropagation Step name (BFG) (BR) (RP) Secant (OSS) Training Trainbfg function
Trainbr
Trainlm
Trainrp
Trainoss
Table 2 Training functions for Elman backpropagation ANN Training function name
Bayesian regularization (BR)
Levenberg–Marquardt (LM)
Training function
Trainbr
TRAINLM
648
R. Gupta and P. C. Gupta
Fig. 4 CRN simulation model
which channels are used by the PU and which channels are available. As per requirement, up to 8 SU could be connected in the same frequency band. The SU is also connected with the fusion center as shown in Fig. 4.
4.2 Cascade Forward Backpropagation and Elman Backpropagation ANN For the training of the CRN simulation model, cascade forward ANN and Elman ANN has been designed. Both the ANN uses backpropagation technique along with various training functions. Considering the complexity of the relation of the input and output data, the cascade forward backpropagation ANN has one input layer, 3 hidden layers, and one output layer. The number of neurons in first, second, and third hidden layers is 10, 10, and 5, respectively, as shown in Fig. 5. The Elman backpropagation ANN, being feedback ANN, required one input layer, one hidden layer, one context layer, and one output layer. The number of neurons in the hidden layer is 10. The context layer has 5 neurons as shown in Fig. 6.
Cognitive Radio Networks Implementation for Optimum Spectrum …
649
Fig. 5 Cascade forward backpropagation ANN in CRN Simulation Model
Fig. 6 Elman backpropagation ANN in CRN Simulation Model
5 Data Collection and Feature Extraction In a particular trial run, 8 SU are connected with the fusion center and also with the frequency band of the spectrum. The PU of the CRN is also connected with the frequency band of the spectrum. The sample data collected for a particular trial run of the CRN simulation model are shown in various tables. In the sample run, the data of the energy received by the SU at 8 different channels are taken. Table 3 shows the values of energy of the signals of the PU received by the SU along with the decision parameters and the bias. The SU sends the energy, decision parameter, and bias as the input to the cascade forward backpropagation ANN and the Elman backpropagation ANN. The CRN simulation model uses both the ANN separately to generate different outputs. Table 3 shows that when the energy received by the SU is greater than a particular threshold, value of decision parameter = 1, else its value = −1. Table 3 Energy received, decision parameter, and the bias taken as input Channel
1
2
3
4
5
6
7
8
Energy
1.254706 1.605816 6.749677 1.110689 7.560217 0.648645 0.931396 0.511277
Decision −1
−1
1
−1
1
−1
−1
−1
Bias
2
2
2
2
2
2
2
2
650
R. Gupta and P. C. Gupta
The cascade forward backpropagation ANN was trained with five different training functions. Table 4 shows the final decision parameter, which is the output generated by the CRN simulation model using cascade forward backpropagation ANN with five different training functions. The SU detects the presence of PU when the value of the final decision parameter is greater than zero in a particular channel. For example, in Table 4, channel 3 and channel 5 have values greater than zero, which clearly indicates that the SU detected the presence of PU in these channels with the help of cascade forward backpropagation neural network trained with five different training functions. In remaining channels, the output shows negative values which suggest that SU did not detect the presence of PU, and these channels are available for data communication of the SU devices. The Elman backpropagation ANN was trained with two different training functions. Table 5 shows the final decision parameter, which is the output generated by the CRN simulation model using Elman backpropagation ANN with two different training functions. The SU detects the presence of the PU when the value of the final decision parameter is greater than zero in a particular channel. For example, Table 5 shows that the channel 3 and channel 5 have values greater than zero, which clearly indicates that the SU detected the presence of PU in these channels with the help of Elman backpropagation neural network trained with two different training functions. In remaining channels, the output shows negative values which suggest that SU did not detect the presence of PU, and these channels are available for data communication of the SU devices. During a particular sample run, one primary user occupied some channels of the spectrum. The number of secondary users was 8, which detects the presence or absence of PU in a particular channel. The energy received by the SU from the signal of primary user in eight different channels is shown in Table 3. When the energy received is greater than a particular threshold, the value of decision parameter = 1, else the value = −1. The bias = 2 for the activation functions of both the ANN. Both the ANN takes energy, decision, and the bias as input and generates the final decision parameter as the output using various training functions as shown in Tables 4 and 5. The CRN simulation model calculates the final decision parameter (fd) by using the formula: fd = sim ANN_Name, [er( j, k)dc( j, k)b( j, k)] + N − 1.15
(5)
Here, er (j, k) is the energy received by jth user in kth channel; DC is the decision value, and b is the bias. The ANN_Name is replaced by the name of ANN trained along with training function. For example, if cascade forward backpropagation ANN is trained with Bayesian regularization training function, the Eq. 5 will become: fd = sim cascade forward bpbr1, [er( j, k)dc( j, k)b( j, k)] + N − 1.15
(6)
−6.45582
−6.4563
−6.45232
−6.10471
−6.10367
Trainoss
−6.45582
−6.10471
Trainbr
−6.10567
−6.45637
−6.10355
Trainrp
Trainbfg
−6.45582
−6.10471
Theoretical
Trainlm
2
1
Channel
1.921805
1.901554
1.899679
1.899681
1.902563
1.899677
3
−5.96087
−5.96147
−5.96069
−5.96069
−5.95897
−5.96069
4
Table 4 Output of cascade forward backpropagation ANN with 5 training functions
2.716229
2.710628
2.710219
2.710215
2.707672
2.710217
5
−5.50092
−5.49812
−5.49865
−5.49865
−5.4967
−5.49864
6
−5.78286
−5.78172
−5.7814
−5.7814
−5.77924
−5.7814
7
−5.36317
−5.36054
−5.36128
−5.36128
−5.35998
−5.36128
8
Cognitive Radio Networks Implementation for Optimum Spectrum … 651
2
−6.45582
−6.45582
−6.45581
1
−6.10471
−6.10472
−6.10471
Channel
Theoretical
Trainlm
Trainbr
1.899671
1.899664
1.899677
3
−5.96069
−5.9607
−5.96069
4
Table 5 Output for Elman backpropagation ANN with 2 training functions
2.710223
2.710235
2.710217
5
−5.49865
−5.49864
−5.49864
6
−5.7814
−5.78141
−5.7814
7
−5.36128
−5.36127
−5.36128
8
652 R. Gupta and P. C. Gupta
Cognitive Radio Networks Implementation for Optimum Spectrum …
653
Similarly, if Elman backpropagation ANN is trained with Levenberg–Marquardt training function, then Eq. 5 will become f d = sim elmanbplm, [er( j, k)dc( j, k)b( j, k)] + N − 1.15
(7)
6 Experiments and Results In the particular sample run of the CRN simulation model, the 8 secondary users are connected with the fusion center and the frequency band. The primary user, which has the allocated license, is connected with the frequency band. The PU signals are sent through the additive white Gaussian noise (AWGN) channel [26] so that the signal of primary user could combine with the noise. The sampling frequency is 1400 MHz for the primary user signal. The cosine function discussed by Gupta and Gupta [27] was used to generate the signals of secondary users: x = cos(2 ∗ pi ∗ 1000 ∗ t)
(8)
The following amplitude modulation function [28, 29] was used to plot the power spectral density of primary user and secondary users: PUS = ammod(x, PUF, Fs)
(9)
As shown in Fig. 7a, the PU is active and has occupied a slot in the frequency range around 450–500 MHz. The power spectral density graph shows that the power of the PU signal peaks in the specified frequency range. The energy received from the PU signals is detected by the SU of the CRN simulation model by using cascade forward backpropagation ANN and Elman ANN. The energy detection graph of Fig. 7b shows that there were two instances when the energy of PU signal received by local SU was greater than the particular threshold, which raise the decision parameter to 1, and therefore, the SU was able to detect the presence of PU. Table 3 of Sect. 5 shows that in channel 3 and channel 5 the SU was able to detect the presence of PU which raise the decision parameter to 1. In remaining channels, the SU detected the absence of PU, and hence, the remaining channels were available for communication by the SU. Figure 7c shows the presence of the primary user with red X. This graph shows the position of PU and SU when the SU detects the presence of the PU. To detect presence or absence of PU by SU, the CRN simulation model used final decision parameter (fd) which was calculated using cascade forward backpropagation ANN with BFGS Quasi-Newton (BFG) training function on the basis of Eq. 10. f d = sim cascade forward bpbfg1, [er( j, k)dc( j, k)b( j, k)] + N − 1.15 (10)
654
R. Gupta and P. C. Gupta
Fig. 7 a Slot occupied by PU in license band, b energy decision graph when SU detected presence of PU. c Secondary users when detected presence of PU
Similarly, for remaining four training functions shown in Table 6, the ANN_NAME casacdeforwardbpbfg1 could be replaced with the other ANN_NAME of Table 6. The SU of CRN simulation model receives the energy from the signals of PU. The SU sends the energy, decision parameter, and bias as the input to the CRN simulation model using the cascade forward backpropagation ANN that is trained with various training functions as shown in Table 6. The R value shown in Table 6 clearly shows that the best value R = 1 for cascade forward backpropagation ANN is obtained from BFG and BR training functions, which indicates that the ANN is perfectly trained. The ANN trained with LM, RP training functions has value R = 0.99999 which is an excellent value and shows that the ANN has been very well trained. The OSS training function has value R = 0.99998, which is very good value and shows that the ANN had been very well trained. In general, the results obtained by any trained ANN with value R = 0.9 or greater are acceptable, and the ANN could be applied to the complex problems. The cascade forward ANN has the lowest value R = 0.99998, which is sufficient to say that the ANN has been very well trained, and the output generated
Cognitive Radio Networks Implementation for Optimum Spectrum …
655
Table 6 Coefficient of Correlation (R) value of cascade forward backpropagation ANN trained with different training functions S. No.
ANN_NAME
Training function
Training function name
(R) value
No. of epochs (iterations) for minimum Gradient
1
Cascadeforwardbpbfg1
Trainbfg
BFGS Quasi-Newton (BFG)
1.0
5668
2
Cascadeforwardbpbr2
Trainbr
Bayesian Regularization (BR)
1.0
8332
3
Cascadeforwardbplm1
Trainlm
Levenberg–Marquardt (LM)
0.99999
2999
4
Cascadeforwardbprp1
Trainrp
Resilient 0.99999 Backpropagation (RP)
4553
5
Cascadeforwardbposs1
Trainoss
One Step Secant (OSS)
3699
0.99998
by the CRN simulation model using cascade forward backpropagation ANN with any of the training functions used will match the theoretical output as desired. In the CRN simulation model, the SU also detects the presence of PU by calculating the value of final decision parameter fd using Elman backpropagation model with training function LM on the basis of Eq. 11: + N − 1.15 fd = sim elmnbplm, er(j, k)dc(j, k)b(j, k)
(11)
Similarly, for other training function in Table 7, the ANN_NAME elmanbplm is replaced with other ANN_NAME. The SU also sends the energy, decision parameter, bias as input to the CRN simulation model using Elman backpropagation ANN that is trained with two different training functions as shown in Table 7. The R values in Table 7 clearly shows that the best value R = 1 for Elman backpropagation ANN is obtained from the LM and BR training functions, which indicates Table 7 Coefficient of Correlation (R) value of Elman backpropagation ANN trained with different training functions S. NO
ANN_ NAME
Training function
Training function name (R) value
No. of epochs (iterations) for minimum Gradient
1
Elmanbplm
Trainlm
Levenberg–Marquardt (LM)
1.0
7942
2
Elmanbpbr
Trainbr
Bayesian regularization 1.0 (BR)
5175
656
R. Gupta and P. C. Gupta
that the ANN is perfectly trained. The output generated by the CRN simulation model using Elman backpropagation ANN with any of the two training functions used will match the theoretical output as desired. The comparison of the actual output of final decision parameter (fd) with the theoretical output of the CRN simulation model after applying the cascade forward backpropagation ANN with five different training functions, respectively, is shown in Fig. 8a–c. Figure 8a shows the output generated by the CRN simulation model by using cascade forward backpropagation ANN with training functions BFG and BR. The straight line at an angle of 45 degrees, which suggest that the theoretical output and the actual output are exactly the same as desired. Figure 8b shows the output generated by the CRN simulation model by using cascade forward backpropagation ANN with training functions LM and RP. The straight line is a bit thick at an angle of 45 degrees, which suggest that the theoretical output and the actual output are nearly the same as desired. Similarly, Fig. 8c shows the output generated by the CRN simulation model using cascade forward backpropagation ANN with training function OSS. The graph suggests that the theoretical output is nearly same as the actual output as desired. Similarly, the comparison of the actual output of final decision parameter (fd) with the theoretical output of the CRN simulation model after applying Elman backpropagation ANN with two different training functions is shown in Fig. 8d. Figure 8d shows the output generated by the CRN simulation model using Elman backpropagation ANN with training functions LM and BR. The straight line at an angle of 45 degrees, which suggest that the theoretical output and the actual output are exactly the same as desired.
7 Limitations of the Proposed CRN Simulation Model The CRN simulation model was implemented using cascade forward backpropagation ANN and Elman backpropagation ANN which showed very effective results as the SU was easy able to detect the presence of PU in a particular RF channel of the spectrum. This section discusses the limitations of the proposed CRN simulation model and the scope of improvement for which the authors are working in their further research work. (1) The article focuses on cascade forward backpropagation and Elman backpropagation only. There are other types of artificial neural networks like feed forward backpropagation, layer recurrent ANN, and NARX ANN with which CRN simulation model could be implemented. The authors are working on implementation of CRN using above mentioned ANN. (2) The CRN simulation model was implemented using Elman backpropagation neural network with two training functions LM and BR because the MATLAB supports these two training functions for Elman backpropagation neural network.
Cognitive Radio Networks Implementation for Optimum Spectrum …
657
Fig. 8 a Actual output versus theoretical output for the cascade forward backpropagation ANN with the training functions BFG, BR. b Actual output versus theoretical output for the cascade forward backpropagation ANN with the training functions LM, RP. c Actual output versus theoretical output for cascade forward backpropagation ANN with the training function OSS. d Actual output versus theoretical output for Elman backpropagation ANN with the training functions LM, BR
(3) To obtain the R value of 1.0 was difficult to obtain when cascade forward backpropagation was trained with LM, RP, and OSS training functions. The best possible R value obtained for LM and RP training functions was 0.9999 and for OSS training function was 0.9998 which are excellent R values and highly acceptable. (4) The CRN simulation model was coded in MATLAB in which the maximum number of secondary users is 8.
8 Conclusion This article discusses about the cognitive radio networks, its various aspects, its importance in efficient utilization of the spectrum, and the various types of cooperative spectrum sensing used by secondary users of the CRN. The secondary users of the
658
R. Gupta and P. C. Gupta
CRN could enhance their sensing, learning, and decision-making skills by using the artificial neural networks. In this article, the CRN simulation model was designed and developed to implement partial cooperative spectrum sensing. The secondary users of CRN simulation model detected the presence of primary user with the help of two specifically designed cascade forward backpropagation and Elman backpropagation artificial neural networks. The results and output graphs proved that the CRN simulation model generated the actual output exactly same as the desired output by using the cascade forward backpropagation ANN with BFG, BR training functions, and the Elman backpropagation ANN with BR, LM training functions as their Coefficient of correlation value (R = 1) was best. The actual output generated by the CRN simulation model using the cascade forward backpropagation ANN with RP, LM training functions (R = 0.99999), and OSS training function (R = 0.99998) was also nearly same as the desired theoretical output. This article concludes that the ability of the SU of the CRN simulation model to detect the presence of PU was enhanced by using the cascade forward backpropagation ANN and Elman backpropagation ANN with various training functions.
References 1. Nadine A, Youseef N., Karim A (2015). Recent advances on artificial intelligence and learning techniques in cognitive radio networks Springer 2. Cognitive Radio Networks Wikipedia. https://en.wikipedia.org/wiki/Cognitive_radio 3. Shaat M, Bader F (2010) Computationally efficient power allocation algorithm in multicarrierbased cognitive radio networks: OFDM and FBMC systems, EURASIP J Adv Signal Process 4. IEEE 802.22 Wikipedia. https://en.wikipedia.org/wiki/IEEE_802.22 5. Beibei W, Liu KJ (2011) Advances in cognitive radio networks: a survey. IEEE J Select Topics Signal Process 5 6. Shewangi, Garg R (2017) Review of cooperative sensing and non-cooperative sensing in cognitive. Int J Eng Technol Sci Res 4(5)229–234 7. Zeng Y, Liang YC, Hoang AT, Zang R (2010) A review on spectrum sensing for cognitiveradio: challenges and solutions. Eurasip J Adv Signal Process 8. kockaya K, Develi I (2020) Spectrum sensing in cognitive radio networks: threshold optimization and analysis, Springer 9. Aparna PS, Jayasheela M (2012) Cyclostationary feature detection in cognitive radio using different modulation schemes. Int J Comput Appl 47(21) 10. Lu Y, Wang D, Fattouche M (2016) Cooperative spectrum- sensing algorithm in cognitive radio by simultaneous sensing and ber measurements. Eurasip J Wireless Commun Network 11. Akyildiz IF, Brandon F, Balakrishnan R (2010) Cooperative spectrum sensing in cognitive radio networks: a survey, Elsevier 12. Wang J, Wu Q, Zheng X, Chen J (2009) Cooperative spectrum sensing, https://www.intech open.com/chapters/8831 13. Mohammadi FS, Enaami HH, Kwasinski A (2021) Neural network cognitive engine for autonomous and distributed underlay dynamic spectrum access, IEEE Open J Commun Soc 2 14. Johnson D Back propagation neural network: what is backpropagation algorithm in machine learning? https://www.guru99.com/backpropogation-neural-network.html 15. Back propagation Wikipedia. https://en.wikipedia.org/wiki/Backpropagation. 16. Al-Masr A How does back propogation in Artificial Neural Network work? https://towardsda tascience.com/how-does-back-propagation-in-artificial-neural-networks-work-c7cad873ea7
Cognitive Radio Networks Implementation for Optimum Spectrum …
659
17. Narada S, Chavanb P (2016). Cascade forward back-propagation neural network based group authentication using (n, n) secret sharing scheme, Elsevier 18. https://in.mathworks.com/help/deeplearning/ref/cascadeforwardnet.html 19. Elam J Wikipedia, https://en.wikipedia.org/wiki/Jeffrey_Elman 20. Elshamy M, Tiraturyan AN, Uglova EV, Elgengy MZ (2021) Comparison of feed- forward, cascade-forward, and Elman algorithms models for determination of the elastic modulus of pavement layers, ACM 21. Gupta R, Gupta PC (2019) A comparative study of various network simulators available for cognitive radio networks. Vindhya Bharti Res J I(17) 22. Dongy Q, Cheny Y, Liy X, Zengz K (2018) A survey on simulation tools and test beds for cognitive radio networks study, arXiv: 1808.09858v1 [cs.NI] 29 23. MATLAB TUTORIAL, https://in.mathworks.com/support/learn-with-matlab-tutorials.html 24. MATLAB TUTORIAL, https://www.tutorialspoint.com/matlab/index.htm 25. MATRIX AND ARRAY IN MATLAB. https://www.javatpoint.com/matrices-and-arrays-inmatlab 26. AWGN Channel Wikipedia. https://en.wikipedia.org/wiki/Additive_white_Gaussian_noise 27. Gupta R, Gupta PC (2019) Cognitive radio network implementation for spectrum utilization in Hadoti (Rajasthan) Region. Vindhya Bharti Res J II(18) 28. Amplitude Modulation. https://www.mathworks.com/help/ comm/ref/ammod.html 29. Amplitude Modulation Wikipedia, https://en.wikipedia.org/wiki/Amplitude_modulation
Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning Bilkis Jamal Ferdosi, Mohaomd Sadi, Niamul Hasan, and Md Abdur Rahman
Abstract Due to recent circumstances, almost everyone is forced to stay at home, and all the educational and professional activities are done via digital devices. Students are attending online classes from their homes. So, more often than not students are using their digital devices unsupervised. It is not ideal for under-aged students as they are at risk of gravitating toward various unwanted content on the Internet. It is important to figure out how the device is being used to prevent misuse of time and resources. To solve this problem, we attempt to track the user’s activity in two steps: first, by collecting information of the used applications from the operating system, including the duration of the usage, and then by analyzing the screenshots of the user’s activities in the device. In this work, we experimented with a few popularly used web applications categorized into five classes: Entertainment YouTube, Educational YouTube, Educational Classroom, Educational Coursera, and Educational Programming. We utilized three deep learning models: pre-trained VGG-16, ResNet-50, and Inception V3 to compare the performance. Accuracy of all three models ranges from 82 to 95% on randomly collected test data and the VGG-16 achieves the highest F1-score of 98%. Keywords Human–computer interaction · Human activity detection · Deep learning · Convolutional neural network
1 Introduction Nowadays everyone uses some form of a digital device for work. Due to COVID-19 pandemic, almost everyone is compelled to stay at home and all educational, as well as professional activities, are conducted via digital devices. Especially, students are attending online classes from their homes. As a result, more often than not students are using their digital devices unsupervised. It is not ideal for the under-aged students B. J. Ferdosi (B) · M. Sadi · N. Hasan · M. A. Rahman University of Asia Pacific, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_46
661
662
B. J. Ferdosi et al.
as they are at risk of gravitating toward various unwanted content of the Internet. According to the latest Kids Online Safety Statistics, 70% of kids encounter sexual or violent content online while doing homework research and 17% of teenagers received an online message with photos that made them feel uncomfortable, only 7% of parents were aware of this [1]. It is important for the parents to figure out how the device is being used to prevent misuse of time and resources. In order to help parents supervise their kids, they need some form of ‘Content Analysis and Detection’ system, which will allow parents to observe the online activities of their children. The paper by Krieter and Breiter proposed an approach that utilizes computer vision and machine learning methods to create event-based log files by automatically analyzing screen recordings [2]. Their detection method operates frame by frame of recorded video files in a certain order. Every frame is checked against the list of events. However, analyzing video files to detect user activity can hamper the device’s performance substantially. There are many third-party applications and software’s like InterGuard1 , ActivTrak2 , etc., use some combination of screenshot capture, email, website, chat monitoring, and file tracking system to determine active and idle time. These systems work on the basis of the input of a person who will monitor the user. One of the many uses of such employee monitoring software is how easily it allows businesses to monitor their employee’s use of social media during work hours. However, this type of program does not determine whether the user is using social media for educational or entertainment purposes. Furthermore, these software are not open source. People have to pay a subscription fee to use the benefits of these kinds of software. In this paper, we proposed a system that can identify the user activities in two ways: at first, it will collect information of the used applications from the operating system. Then it will analyze the screenshot of the whole screen to identify what type of activities the user is doing. For example, if the device user is watching YouTube, it can contain educational or entertainment content. Thus, our system will identify if it is educational or entertainment content. In this work, we consider a few popularly used web applications categorized into five classes, that can be easily be scaled up in the future. We experimented with three deep learning models: pre-trained VGG-16, ResNet-50, and Inception V3 to compare the performance. The performance of the proposed system using a pre-trained VGG-16 model excels compared to the other two models. The rest of the paper is organized as follows: in Sect. 2, we have discussed the related works. In Sect. 3, we have described the datasets used. In Sect. 4, we have explained the proposed system. Section 5 discusses the analysis of the results. In Sect. 6, we have drawn a conclusion and the future direction of this paper.
1 2
https://www.interguardsoftware.com/. https://www.activtrak.com/.
Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning
663
2 Related Work There are several approaches that can be found in the literature for analyzing user activity on computers. There are several professional software exists to monitor the users. The primary objective of these software is to track productivity and time spent on the computer. Imler and Eichelberger proposed a novel method for capturing and analyzing human–computer interaction [3]. They recommended using eight forms of screen recording setups to record user screens to track user activity and behavior. Snapz Pro screen capture technology is used to capture the screen. Here, the screen recorder software can be initiated manually by the user or by someone who wants to monitor the user’s work. Once the screen recording is complete, the person who wants to monitor the activity has to analyze the whole video manually. However, in our work, we have used a deep learning model to automate the analysis part. Krieter and Breiter proposed an approach that utilizes computer vision and machine learning methods to create event-based log files by automatically analyzing screen recordings [4]. Their detection method operates in a certain order frame by frame reducing computing effort. Every frame is checked against the list of events. Here, a frame can contain multiple events or no events. The algorithm finds text using OCR, images using Template Matching, and fixed areas using Perceptual hash function. Then the algorithm generates log files for every interaction. On the other hand, we employed screenshots instead of video files in our research, and we detected user activity by analyzing the screenshot using a deep learning model. Malisa et al. used a small obfuscation tool to extract features for identifying frames that have no visible effect on the user experience [5]. Also they used a perceptual hashing algorithm to provide a compact representation of an image that maintains its main visual characteristics. Dynamic code used analysis to extract user interfaces via screenshots from mobile apps to detect impersonation. The detection is based on the visual appearance of the application as seen by the user. They used hash values to analyze the images and to enable an analysis of large numbers of screenshots leverage locality-sensitive hashing (LSH) is used. On the contrary, we utilized a deep learning model to examine the screenshot in our research. There are several professional software exists in the market such as InterGuard, ActivTrak, etc., that may be used to track user activity. One of the numerous benefits of such employee monitoring software is that it makes it simple for businesses to monitor their employee’s use of social media during work hours. However, this program makes no distinction between whether a student is using social media for educational or entertainment purposes. Other than that, these programs don’t detect any adult content on the screen, which is dangerous for a student under the age of 18. Furthermore, these software are not free of cost. To use the benefits of this type of software, people must pay a fee. And, in most cases, these fees are not justified for all parents who just want to keep track of their children’s Internet or computer screen activity.
664
B. J. Ferdosi et al.
3 Datasets In our work, the main intention is to help parents monitor their under-aged children. As a result, we must determine what type of application or software the user is running at any given time on their screen. We created a dataset by using a screenshot-taking library function in Python named ’pyscreenshot’ to grab screenshots of the full screen while we were using our computer as we usually do. We collected screenshots of popularly used web applications categorized in five classes: • • • • •
Educational Coursera Educational Classroom Educational Programming Educational YouTube Entertainment YouTube
In this dataset, each class contains 800 screenshots. Figure 1 shows a few sample screenshots of the dataset. The screenshot in Fig. 1a is an example screenshot of Educational Coursera class. Coursera is a website that provides online skill development courses. This screenshot was taken while a tutorial video is playing on the Coursera website. we can see an example screenshot of Educational Classroom class in Fig. 1b. This screenshot was taken while using Google Classroom. Figure 1d is an example screenshot of Educational YouTube class. YouTube is a website that can be used for educational purposes. The Educational YouTube class contains screenshots of YouTube, while educational content was playing on that website. On the other hand, Fig. 1e is a screenshot that represents the Entertainment YouTube class. This screenshot was taken while using YouTube for entertainment (watching news, cartoons, music video, etc.) purposes. Finally, Fig. 1c is a sample screenshot of Educational Programming class. This screenshot was taken while an Integrated development environment (IDE) was open on the computer screen to do computer programming. All the images were taken in jpg format with pixel density of 1080 × 1920. We also applied data augmentation using rescale, shear, zoom, and horizontal flip. Therefore, after data augmentation, the size of the dataset is around 16000 screenshots.
4 Proposed System In this paper, we proposed a system that identifies user activity in two steps. In Fig. 2, we illustrated the basic workflow of our system. We collect information of the used applications from the operating system. We record the time duration of each application used by the user to identify the applications usage. Our system also captures screenshots of the whole screen and analyzes the screenshots with the help of deep learning models to track the type of activity.
Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning
(a) Tutorial video, on Coursera
665
(b) Google Classroom
(c) Pycharm IDE
(d) Tutorial video, on YouTube
(e) Entertainment video, on YouTube
Fig. 1 Few screenshots of the dataset
In our system, we experimented with three convolutional neural network (CNN) models to assess the performance of the system: pre-trained VGG-16 [6], Residual Network (ResNet-50) [7], and Inception V3 [8]. VGG-16 is one of the best performing architectures that achieves 92.7% test accuracy on ImageNet dataset [9] which is one of the largest image dataset that contains over 14 million images belonging to 1000 classes. However, VGG-16 is a very computationally rich transfer learning model [10]. We used the knowledge of an already trained VGG-16 model on ImageNet dataset. The ResNet-50 can work with a deeper network by leveraging the advantage of the skip connection. By utilizing the transfer learning of the ImageNet dataset, the pre-trained ResNet-50 reduced the computation time on training. Its overall training, testing, and prediction time is relatively lower than the VGG-16 model. Another popular image classification model is Inception V3 for its smaller weights and lower error rate compared to VGG-16 and ResNet-50.
666
B. J. Ferdosi et al.
Fig. 2 Proposed system
We trained the final layer of the models to get an optimized result for our work. In the final layer, we used one flatten layer and three dense layers with ‘Adam’ optimizer. In our system, we collect information of the applications from the operating system to have the name of applications used and duration of usage, using a Python module. Then, we trained the models using the dataset to categorize the screenshots into five categories. In the second step, we obtained categorization depending on the type of content. For example, if a user is using YouTube, the model will predict if the user is using YouTube for educational purposes or for entertainment purposes. The trained model is then validated. Finally, we saved the trained model. We used the saved model to test random inputs. We implemented our model using Keras, a neural network library built on Python [11].
5 Result and Discussion To assess the performance of the models in classifying screenshots in five categories, we used 90% data for training and 10% data for validation. We also collected 100 random screenshots of each class, in total 500 screenshots for test purposes. VGG-16 obtained 100% training accuracy, 93.8% validation accuracy, and 94% test accuracy (see Fig. 3a and Table 1). ResNet-50 obtained training accuracy of 100%, validation accuracy of 95.4%, and test accuracy of 95%. Inception V3 obtained training accuracy of 100%, validation accuracy of 95.4%, and test accuracy of 82%. In terms of performance on randomly collected test data, VGG-16 and ResNet-50 provide competitive results. Classification performances of the models are summarized in the confusion matrices depicted in the Fig. 4. Inception V3 seems to struggle to distinguish between YouTube Entertainment and YouTube Educational and Educational Google Class-
Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning
667
Fig. 3 Performance graph of (a) VGG-16 (b) ResNet-50 (c) Inception V3
room and Educational Programming. On the other hand, ResNet-50 and VGG-16 seem to perform better in distinguishing classes. In Table 1, we summarized the precision, recall, and f1-score of the models. VGG-16 achieved the F1-score of 98%, ResNet-50 obtained F1-score of 96%, and Inception V3 obtained F1-score of 83%. In terms of F1-score, VGG-16 proved itself as a better classifier compared to the other two models. The results obtained are quite encouraging but there are several challenges that need to be addressed as well. For example, distinguishing activity while using YouTube is very tricky. This platform can be used for educational and also for entertainment purposes. In our experiment, we have worked with two classes of YouTube. One is for educational and another is for entertainment. In the training sample of educational YouTube, most of the samples have text-based content and in the training sample for entertainment YouTube, most of the samples have image-based content. Thus, during testing, we found that if the educational YouTube contains image-based content, it is classified as YouTube entertainment (see Fig. 5b).
0.87
0.98
0.95
0.98
0.95
Educational coursera
Educational classroom
Educational programming
Educational YouTube
Entertainment YouTube
0.94
0.94
Weighted avg
0.94
0.94 0.94
0.94
0.94
Macro avg
0.98
0.96
0.92
0.93
0.93
500
500
500
100
100
100
100
100
F1-score Support
VGG-16
Accuracy
1.00
0.95
0.89
0.88
0.99
Precision Recall
Class
0.95
0.95
0.93
1.00
0.95
1.00
0.88
0.95
0.95
1.00
0.92
0.92
0.90
1.00
0.95
0.95
0.95
0.96
0.96
0.93
0.95
0.94
500
500
500
100
100
100
100
100
F1-score Support
ResNet-50 Precision Recall
Table 1 Comparing precision, recall, and F1-score of the models for the test dataset
0.85
0.85
0.96
0.73
0.71
0.99
0.85
0.82
0.82
0.73
0.96
0.92
0.67
0.80
0.82
0.82
0.82
0.83
0.83
0.80
0.80
0.82
500
500
500
100
100
100
100
100
F1-score Support
Inception V3 Precision Recall
668 B. J. Ferdosi et al.
Tracking Digital Device Utilization from Screenshot Analysis Using Deep Learning
669
Fig. 4 Confusion matrix of (a) VGG-16 (b) ResNet-50 and (c) Inception V3
6 Conclusions and Future Work In this paper, we proposed a method using deep learning models along with the system information to detect digital device activity of the user. Nowadays almost every underage student frequently uses their digital gadgets without any supervision. They are at risk of gravitating toward numerous types of undesirable stuff on the Internet. So, it is important to figure out how the device is being used to prevent misuse. We experimented with VGG-16, ResNet-50, and Inception V3 for our system. Among the three, the VGG-16 model seems the best performing architecture. In the future, we will focus on collecting more diverse data from different users to make the model ready to perform better in the real world. We are also working on developing an application for the enduser.
670
B. J. Ferdosi et al.
Fig. 5 a YouTube Educational is classified as YouTube Educational, b YouTube Educational is classified as YouTube Entertainment, and c YouTube Entertainment is classified as YouTube Entertainment
References 1. Safeatlast.co: Kids online safety-internet safety for kids (2021), https://safeatlast.co/blog/kidsonline-safety 2. Krieter P, Breiter A (2018) Track every move of your students: log files for learning analytics from mobile screen recordings, pp 231–242 3. Imler B, Eichelberger M (2011) Using screen capture to study user research behavior. Libr Hi Tech 29:446–454 4. Krieter P, Breiter A (2018) Track every move of your students: log files for learning analytics from mobile screen recordings, pp 231–242 5. Malisa L, Kostiainen K, Och M, Capkun S (2016) Mobile application impersonation detection using dynamic user interface extraction, pp 217–237 6. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition 7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) 8. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015), http://arxiv.org/abs/1512.00567 9. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255 10. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press 11. Keras CF et al. (2015), https://github.com/fchollet/keras
A Simulative Analysis of Prominent Unicast Routing Protocols over Multi-hop Wireless Mobile Ad-hoc Networks Lavanya Poluboyina, Ch. S. V. Maruthi Rao, S. P. V. Subba Rao, and G. Prasad Acharya Abstract Mobile ad-hoc networks (MANETs) have potential for becoming one of the most widely deployed wireless networks due to the wide spread use of wireless hand-held devices in both personal and professional lives. These networks have various application areas covering from a conference hall to a calamitous area. However, due to mobile nature of the nodes that results in timely varying network topologies, routing the data packets in these networks is a challenging task. In spite of the difficulties, a wide set of unicast routing protocols for MANETs is available in the literature. In this paper, four basic unicast routing protocols such as DSDV, OLSR, DSR, and AODV have been opted because of their popularity. The performance of the chosen routing protocols is evaluated in terms of network performance measuring metrics such as packet delivery ratio, average end-to-end delay, average jitter, throughput, and normalized routing load. Essentially, the routing protocol’s suitability to mobile ad-hoc environments is analyzed for diverse node mobilities under different CBR traffic loads. Keywords AODV · DSDV · DSR · MANET · Mobile Ad-hoc network · Network simulator · OLSR
L. Poluboyina (B) · S. P. V. Subba Rao · G. Prasad Acharya Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India e-mail: [email protected] S. P. V. Subba Rao e-mail: [email protected] G. Prasad Acharya e-mail: [email protected] Ch. S. V. M. Rao Department of Electronics and Communication Engineering, Sreyas Institute of Engineering and Technology, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_47
671
672
L. Poluboyina et al.
Abbreviations AEED AJ AODV BPS CBR CBRP DSDV DSR HM HT IEEE LT M MAC MANETs MPRs NP NRL OLSR PAODV PDR QoS RS RSHM RSM RSNP RSS RS5M S TC TORA TP TS TSHM TSM TSNP TSS TS5M TTL WS WSHM WSNP
Average End-to-End Delay Average Jitter Ad-hoc On-demand Distance Vector Bits Per Second Constant Bit Rate Cluster Based Routing Protocol Destination Sequenced Distance Vector Dynamic Source Routing Half-Minute High-Traffic Institute of Electrical and Electronics Engineers Low-Traffic Minute Media Access Control Mobile Ad-hoc Networks Multi-Point Relays No-Pause Normalized Routing Load Optimized Link State Routing Pre-emptive AODV Packet Delivery Ratio Quality of Service Running-Speeds Running-Speeds Half-Minute Running-Speeds Minute Running-Speeds No-Pause Running-Speeds Stationary Running-Speeds 5-min Stationary Topology Control Temporally Ordered Routing Algorithm Throughput Tank-Speeds Tank-Speeds Half-Minute Tank-Speeds Minute Tank-Speeds No-Pause Tank-Speeds Stationary Tank-Speeds 5-min Time-To-Live Walking-Speeds Walking-Speeds Half-Minute Walking-Speeds No-Pause
A Simulative Analysis of Prominent Unicast Routing Protocols …
WSM WSS WS5M 5M
673
Walking-Speeds Minute Walking-Speeds Stationary Walking-Speeds 5-min 5-Minutes
1 Introduction For the past two decades, the wireless ad-hoc networks are drawing great attention from the scientists as well as the researchers because of their vast and promising application areas. However, due to the wide availability as well as the increased dependency on wireless mobile devices in the daily life, the mobile ad-hoc networks (MANETs)—one class of wireless ad-hoc networks, have also become popular [1]. The main applications of these networks include battle fields, conferencing, disaster recovery operations and emergency handling situations. Moreover, certain MANET applications like conferencing, emergency search-and-rescue operations and military communication require sharing of information in a group of nodes that calls for multicast routing in these networks. Further, in addition to multicast routing, such applications require a multimedia traffic support for effective communication. Specifically, some attracting applications of the MANETs in near future are disaster area wireless network (DAWN) for handling natural or man-made disastrous events, heterogeneous ad-hoc networks (HANETs)-a network formed by connecting devices with different capabilities, and I-MANET s where an Internet access to the mobile nodes is provided by connecting them to the Internet gateways [2]. However, the successful deployment and exploitation of these variants of MANETs in practice are becoming quite a formidable task since the required knowledge encompasses a whole range of topics, viz., network complexity, routing optimization issues, QoS, scalability, heterogeneity, clustering, security, reliability, bandwidth management, mobility management, etc. A MANET [3] is basically a wireless multi-hop network comprising of a set of mobile ad-hoc nodes that are having the abilities different from a regular wireless mobile node. The node in a MANET is capable of creating and maintaining the network on its own. Hence, to achieve this, it depends neither on any centralized infrastructure nor on any routing hardware. The communication between any two distant nodes is takes place with the support from intermediate nodes. Any node can initiate a data transmission if it has a valid route to the intended destination. However, the discovered route is to be maintained till the end of data transmission for which a lot of routing overhead is required. This in turn depends on the employed routing protocol’s working principle. The peculiar characteristics of MANETs such as mobile ad-hoc nodes with constrained resources, infrastructure-less and decentralized operation, network resource constraints and inherent problems of wireless channels pose multiple problems to the routing of the data packets. Clearly, a link or a node failure is quite common
674
L. Poluboyina et al.
in these environments that results in dynamic network topologies which makes the routing a challenging task. Though a variety of routing protocols are proposed and discussed in the literature, in this paper four well known unicast routing protocols are taken and examined for their suitability to dynamic environments presented by MANETs for constant bit rate (CBR) traffic.
2 Multi-hop Wireless Mobile Ad-Hoc Routing Protocols Based on the route discovery and maintenance strategies, the routing protocols for MANETs are divided into proactive and reactive type [4–6]. In proactive category, at each node, the routes to all other nodes of the network are maintained in routing table in advance. In contrast, in reactive-type, the route to any intended destination is discovered (by source node) only when it is required. The benefit of proactive routing is no delay in initiating a data transfer, however maintaining fresh enough possible routes consumes the network resources greatly. On the other hand, the on demand nature of reactive routing utilizes the network resources efficiently while introducing latency in the data transfer.
2.1 Prominent Unicast Routing Protocols Opted for Simulation The four protocols considered for the simulation work are: destination sequenced distance vector (DSDV) and optimized link state routing (OLSR) from proactive category while dynamic source routing (DSR) and ad-hoc on-demand distance vector (AODV) from reactive category [7]. The basic and main features of the protocols are presented in brief hereunder: DSDV. It [8] is a distributed routing protocol based on the distance vector routing protocol. With the insertion of destination sequence numbers, the limitations of distance vector routing are overcome in DSDV. It is a simple proactive routing protocol in which the routes are maintained in routing table form at each node. The neighboring nodes exchange the routing information periodically in order to have an up-to-date view of the network. Failing the reception of periodic update indicates a change in the network topology and the necessary steps are initiated by the upstream node to repair the route breakage. OLSR. It [9] is based on link state routing with multiple optimizations in order to support mobile ad-hoc environments. It is a relatively complex proactive hop-byhop routing protocol that provides routes in advance. Each node of the network is aware of the complete topology of the network. All the nodes are not uniform; in the sense some of them act as multi-point relays (MPRs). MPRs are the only nodes that relay the packets further to distant nodes in the network. This helps in reducing the
A Simulative Analysis of Prominent Unicast Routing Protocols …
675
routing overhead generated by link state routing. Local hello messages are used to maintain information about the neighboring nodes. Each node broadcasts (global) a topology control (TC) message periodically and holds three routing tables for getting up-to-date view of the network topology. DSR. It [10] is the first source routing reactive protocol and is a modified version of distance vector routing protocol. If any node wants to initiate a data transfer, it first checks its route cache for a valid route. It initiates the route discovery process only if it does not have a one. During route discovery process, the nodes that receive the related route request message are not required to maintain the reverse route to the source. They are just required to update the received message by adding their own address and update their routing cache. To each data packet, the source node itself appends the complete route from it to destination. Hence, the data packet contains the more routing overhead unlike hop-by-hop routing technique. A source node can maintain multiple routes to the same destination which allows it to respond immediately to the node or link failures. AODV. It [11] is an improved distance vector based reactive routing protocol with hop-by-hop routing technique. Like DSR, the node which wishes to transfer the data only has to take care of the route finding. However, in route finding process the neighboring nodes are required to maintain reverse route to the source so that the reply to the source’s route request is directed to the source. The address field of data packet contains the destination node and the next-hop addresses only, not the complete route address like DSR. Small hello packets are used for route maintenance. A missed receiving of hello packet represents the connection failure. Then the upstream node goes for either local route repair or provides the same information to the source node.
3 Related Works In the literature, a variety of routing protocols targeted at mobile ad-hoc environments have been made available. In this section, the significant works done on MANETs using the routing protocols DSDV, DSR, AODV, and OLSR are reviewed. RFCs are available for the routing protocols DSR, AODV, and OLSR. AODV is a standardized reactive routing protocol in the IETF with experimental RFC 3561. However, while considering the works only those works in which the routing protocol’s evaluation was done for atleast one of the quality of service (QoS) evaluation parameters such as packet delivery ratio (PDR), average end-to-end delay (AEED), average jitter (AJ), throughput (TP), and normalized routing load (NRL). Moreover, the research works that had furnished the simulation details either fully or atleast the details such as node speed, number of connections, packet rate, and packet size in addition to the minimum required parameters, have been taken for consideration. This is required to make sure that the network considered for protocol’s evaluation was absolutely a MANET. Further, the contributions considered are the ones that had employed atleast any two of the undertaken unicast routing protocols, however considering CBR data traffic. Finally, only those works are accounted where
676
L. Poluboyina et al.
each sample point of graphs was the average of the simulation results taken atleast for three network topology-scenarios. In Table 1, the research works meeting the above specified criteria and in which the experimentation was conducted using the network simulation tool, NS-2 and its versions are tabulated. The authors in [12, 16, 18] had concluded that the AODV has performed well compared to the other protocols under consideration for highly dynamic network scenarios in terms of performance evaluation metrics: PDR, AEED, TP, jitter, and routing overhead. In [13–15], it was discussed that no undertaken protocol fits for all the network scenarios, and based on the application, the relevant protocol is to be opted. The authors in [17] presented that the proactive routing protocol OLSR well suits for the military applications than AODV. Finally, the authors in [18] only had chosen all the four protocols: DSDV, OLSR, DSR and AODV; and from the conclusions, it is observed that the chosen protocols had not been tested under varying node speeds which is required to check the suitability of the protocol for dynamic environments like MANETs. Hence, in this work, the behavior of protocols is evaluated and analyzed for varying node speeds.
4 Methodology The tool utilized for these simulation works is NS-2.34 [19, 20], and the network parameters used for the simulation are tabulated in Table 2. While generating node movement patterns using setdest (basically which uses random waypoint mobility model), the speed- and pause- types selected are ‘1’ indicating uniform speed and constant pause times respectively. From the table, No-Pause (NP), Half-Minute (HM), Minute (M), 5 min (5 M) and Stationary (S) represent various pause times considered for the simulation. A pause time of 0 s constitutes continuous motion of the nodes whereas a pause time of 900 s (simulation time) constitutes a stationary case. The speeds of the nodes have been classified into three types: Walking-Speeds (WS) ranging from 1.2 to 2.5 m/s, Running-Speeds (RS) from 5 to 10 m/s and TankSpeeds (TS) from 15 to 20 m/s. With varying pause times together with the speeds of the nodes, a total of 15 mobility patterns are generated and listed in the Table 3. Two different CBR traffic loads are identified for analysis of the routing protocols: low-traffic (LT) and high-traffic (HT), tabulated in Table 4. In the generated trafficscenarios, 17 nodes acted as sources. All the communications start randomly though all stop 1 min before the simulation time to avoid unnecessary packet drops due to end of the simulation. To ensure direct and fair comparisons among the routing protocols, the same network parameter values are maintained throughout the simulation. Moreover, throughout the simulation, only one parameter is varied at a time while other parameters kept constant. Moreover, to get consistent results in environments like MANETs, each sample point in the graphs shown in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 and 30 is the mean taken over 10 different topology-scenarios. Clearly, 50 topology-scenarios are generated
1. Pause time (0, 100, 200, 300, 400, PDF, AEED of data packets, 500, 600, 700, 800, 900 s) in 1500m × NRL, TP 300m and Pause time (0, 100, 200, 300, 400, 500 s) in 2200m × 600m 2. Network load (10,20,30 traffic sources—4packets/s; 40 traffic sources—3packets/s) 1. Node mobility 1-20 m/s with a step of 5 s with 0-5 s rest time 2. Number of connections 0–100 (with a step of 20)
1. Load (10,20,30 sources—4packets/s, TP, AEED, NRO 40 sources—3packets/s for 50 nodes, 40 sources—2packets/s for 100 nodes) 2. Pause time (0, 100, 200, 300, 400, 500, 600, 700, 800, 900 s) for 50 nodes and (0 to 500 with a step of 50) for 100 nodes
Similar to [11] except: 1500m × 300m with 50 nodes (900 s); 2200m × 600m with 100 nodes (500 s); random waypoint model with uniformly distributed speed between 0–20 m/s; 512bytes
1000m × 1000m; 250 s; 50; IEEE 802.11 MAC; physical layer of IEEE 802.11; 2 Mbps; 250 m; random waypoint model; 25 streams; 0.1 s packet interval; 64 bytes/packet; 10 s stream duration
1500m × 300m with 50 nodes – 900 s; 2200m × 600m with 100 nodes – 500 s; IEEE 802.11 MAC; 2Mbps; 250 m, random waypoint model with speed uniformly distributed between 0-20 m/s; 512bytes; source–destination pairs are distributed randomly
DSR, AODV [13]
DSR, AODV, OLSR [14]
DSDV, DSR, AODV, CBRP, PAODV [15]
(continued)
PDRa, packet delay, CTO, route length
1. Pause time (0, 30, 60, 120, 300, 600, PDR, RO, path optimality 900 s) 2. Number of connections (10, 20, 30) 3. Mobility (1 and 20 m/s)
1500m × 300m; 900 s; 50; 802.11 MAC; a combination of free space and two-ray ground reflection model; 250 m; random waypoint model; 4 packets/s; 64bytes
DSDV, DSR, AODV, TORA [12]
Performance metrics**
Varying parameters
Network parameters*
Protocols used
Table 1 Literature survey table of unicast routing protocols
A Simulative Analysis of Prominent Unicast Routing Protocols … 677
1. Varying transmission range (90, 120, PDR, AEED, Average routing 150, 180 m) overhead 2.Number of connections: 15, 25, 35
1. Number of nodes (20,40,60,80,100-5packets/s, 5connections, 250 m) 2. Transmission rate (10,20,30,40,50packets/s-100nodes, 15connections, 250 m) 3. Transmission range (100, 200, 300, 400, 500, 600-5packets/s, 10connections, 50nodes)
500m × 500m; 500 s; 50&75 nodes; Velocity: human(0.01-2 m/s), vehicles(5-12 m/s), helicopter(22-30 m/s); 4 pkts/s; 512bytes
1000m × 1000m; 300 s; random way point; 10 m/s constant speed; pause time: 10 s; 1024bytes
DSDV, DSR, AODV, OLSR [18]
* Simulation area; simulation time; number of nodes; transport layer protocol; MAC layer; physical layer; propagation model; channel bandwidth; node transmission range; mobility model; node speed; pause time; number of connections; packet rate; packet size ** AD-Average Delay; CTO-Control Traffic Overhead; EED-End-to-End Delay; PDF-Packet Delivery Fraction; PDRa-Packet Delivery Rate; PLR-Packet Loss Ratio; NRO-Normalized Routing Overhead; RO-Routing Overhead; RL-Routing Load
PDR, NRO, EED, TP and Jitter
AD, AJ, PLR, RL, TP, connectivity
AODV, OLSR [17]
Performance metrics**
1000m × 1000m; 300 s; IEEE 802.11 Number of nodes (10, 20, 40, 50, 100) MAC; two-ray ground reflection model; with network connectivity is 60% (i.e., 2 Mbps; 250 m; 0 and 40 m/s using 6, 12, 24, 30, 60) uniform distribution; pause time: 0 s; 1 packet in 1 s; 512bytes; nodes placed randomly
DSDV, DSR, AODV, TORA [16]
Varying parameters
Network parameters*
Protocols used
Table 1 (continued)
678 L. Poluboyina et al.
A Simulative Analysis of Prominent Unicast Routing Protocols …
679
Table 2 Network parameters for unicast routing protocols Network parameter
Value
Simulation area
1000m × 1000m
Number of nodes
50 (randomly placed)
Simulation duration
15 min
Transport layer protocol
UDP
Network layer protocols
DSDV, DSR, AODV, OLSR
MAC protocol
IEEE 802.11
Radio propagation model
Two-ray ground
Network channel interface queue type and length
Drop Tail/PriQueue & 50
Wireless channel bandwidth
2MBPS
Node transmission range
250 m
Pause time in seconds
0(NP), 30(HM), 60(M), 300(M), 900(S)
Table 3 Movement model Mobility pattern
Description
WSNP (Walking-Speeds No-Pause)
Node moves in: 1.2–2.5 m/s with pause time 0 s
WSHM (Walking-Speeds Half-Minute)
Node moves in: 1.2–2.5 m/s with pause time 30 s
WSM (Walking-Speeds Minute)
Node moves in: 1.2–2.5 m/s with pause time 60 s
WS5M (Walking-Speeds 5-min)
Node moves in: 1.2–2.5 m/s with pause time 300 s
WSS (Walking-Speeds Stationary)
Node moves in: 1.2–2.5 m/s with pause time 900 s
RSNP (Running-Speeds No-Pause)
Node moves in: 5–10 m/s with pause time 0 s
RSHM (Running-Speeds Half-Minute)
Node moves in: 5–10 m/s with pause time 30 s
RSM (Running-Speeds Minute)
Node moves in: 5–10 m/s with pause time 60 s
RS5M (Running-Speeds 5-min)
Node moves in: 5–10 m/s with pause time 300 s
RSS (Running-Speeds Stationary)
Node moves in: 5–10 m/s with pause time 900 s
TSNP (Tank-Speeds No-Pause)
Node moves in: 15–20 m/s with pause time 0 s
TSHM (Tank-Speeds Half-Minute)
Node moves in: 15–20 m/s with pause time 30 s
TSM (Tank-Speeds Minute)
Node moves in: 15–20 m/s with pause time 60 s
TS5M (Tank-Speeds 5-min)
Node moves in: 15–20 m/s with pause time 300 s
TSS (Tank-Speeds Stationary)
Node moves in: 15–20 m/s with pause time 900 s
Table 4 Traffic model Traffic pattern
Description
LT
25 random connections started at random times, 4 packets/s, 64 byte packet
HT
25 random connections started at random times, 12 packets/s, 512 byte packet
680
L. Poluboyina et al.
for five values of pause times at WS, 10 for each pause time. Similarly 100 topologyscenarios are created for RS and TS. Hence, a total of 150 topology-scenarios thereby 1200 simulation runs on a whole means 300 runs for each protocol, i.e., 150 runs for each traffic type.
5 Simulation Results and Discussion The outcomes of the simulation works are presented in the form of graphs shown in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 and 30. The QoS metrics considered for evaluating the performance of protocols are PDR in %, AEED in ms, AJ in ms, TP in bits per second (BPS) and NRL in %. To provide a good QoS in MANETs: high values of PDR, low values of AEED, low values of AJ and low values of NRL are recommended. In this work, • PDR is defined as the ratio of the number of data packets received at the transport layer of the destination nodes to the number of data packets delivered from the transport layer of the source nodes. • AEED is the average of the difference between the data packet delivered and arrival times of successfully received data packets. • AJ is expressed as the variations in data packet delay times. • TP represents how efficiently the communication channel is utilized in transporting the data packets from source to destination. • NRL is the ratio of the total number of routing packets generated to the total number of data packets received successfully during the simulation.
5.1 PDR Analysis Figures 1, 2 and 3 represent a comparison of PDR on the basis of mobility patterns for LT pattern while in Figs. 4, 5 and 6 for HT pattern. From the plots, it is observed that the DSDV performance is superior to OLSR at all mobility patterns and quite better than DSR at high mobility patterns of TS. However, its performance is always inferior to the reactive routing protocols under both LT and HT patterns. Except at zero pause times, irrespective of node speeds, the produced PDR values are always less than 92%. The performance of OLSR is poor and it has come completely out from the race. However, it has really worked well producing PDRs of above 97% with zero pause times irrespective of node speeds. OLSR’s performance is degrading with increasing node speeds and decreasing pause time. It has performed almost in the same way irrespective of LT and HT patterns. DSR has performed well (generating PDRs above 90%) for all mobility patterns except TSNP, TSHM and TSM under both the loads. Moreover, it has worked well for RS at HT loads (PDRs of 97–98%) compared to LT loads. AODV has shown consistent behavior at all node speeds producing PDR values of above 90% at LT
A Simulative Analysis of Prominent Unicast Routing Protocols …
681
Fig. 1 PDR in % versus pause time in sec for WS at LT
load. In fact, it has achieved PDR values of above 98% at WS (Fig. 1) and above 95% at RS. The compatible behavior of AODV with the dynamic environments has continued even for HT load producing the PDRs above 90% except at mobility patterns of TSNP and TSHM where the PDR values have dropped just below it. It is working greatly for WS irrespective of node pause times and network traffic loads. In DSDV, a node computes routes in prior and updates its routing table periodically which involves some processing delay. When network topology changes are more, large numbers of triggered routing updates are generated and the delay further increases. During this time, the data packets placed at the node interface queue may be dropped; in fact, this is also a reason for reduced PDRs. Moreover, for frequent network topology changes, there is a possibility of selecting not fresh enough routes that may result in packet drops. In DSDV, the wireless channel is being shared by periodic routing updates and data packets in addition to the triggered routing updates due to increased node mobility which results in network congestion.
Fig. 2 PDR in % versus pause time in sec for RS at LT
682
Fig. 3 PDR in % versus pause time in sec for TS at LT
Fig. 4 PDR in % versus pause time in sec for WS at HT
Fig. 5 PDR in % versus pause time in sec for RS at HT
L. Poluboyina et al.
A Simulative Analysis of Prominent Unicast Routing Protocols …
683
Fig. 6 PDR in % versus pause time in sec for TS at HT
In OLSR, the route from a source to a destination completely depends upon the MPR sets. At the nodes, the MPR sets are recalculated whenever a change in the neighborhood or bi-directional links is detected. This is the reason for OLSR’s degraded performance for increasing node speeds (at WS and RS) and decreasing pause times (from 5 M to NP). Moreover, the two types of periodic messages (local hello and global TC) have made the network fully congested that has led to packet drops at buffers. Further, the constrained TC message broadcasting after a route break is detected also has played its role for OLSR’s poor performance. Since the DSR utilizes the multiple paths that exist in its route cache, it is able to produce good values of PDR at WS and RS. However, at TS, the faster and frequent network topology changes resulted in broken links that have made the nodes to use the stale routes in their caches that have produced low PDRs. Moreover, the data packet processing time is more in DSR as the packet carries the complete route information that makes the node to keep the received packets in queue. This may result in buffer full condition and thereby packets loss. In AODV, the increasing node mobility has caused frequent topology changes in the network that have resulted in more route breaks making the packets to travel long paths which has led to packet drops due to expiration of TTL values of the data packets. Moreover, the broken links call for additional route recovery or discovery processes that have led to a decrease in PDR values. At HT loads, the network congestion due to increased number of data packets into the network has decreased the PDR values. Usually, a node in reactive routing protocols buffers the data packets received from upper layer till it gets the valid route. If no route received in prescribed time limit, the node simply drops the buffered packets that have increased the packet loss for DSR and AODV. Moreover, the local repair mechanism of AODV makes the packets get buffered at the node that may result in packet drops if the attempt is not successful.
684
L. Poluboyina et al.
5.2 AEED Analysis AEED plotted as a function of mobility and traffic patterns is depicted in Figs. 7, 8, 9, 10, 11 and 12. It is obvious that the increase in node mobilities results in low values of delay. From the graphs, it is noted that for all protocols low values of AEED have been recorded at zero pause times irrespective of node speeds and traffic loads. For all routing protocols, the values of AEED have increased with increasing node speeds and from LT to HT loads. At HT loads, the network congestion due to increased number of data packets into the network, has increased the delays of all routing protocols further. Being proactive in nature and having routes in advance, DSDV and OLSR have kept their mark on AEED producing low values. However, the increasing delay times are noticed with increasing node speeds and decreasing pause times irrespective of traffic load. When only proactive routing protocols are compared, the OLSR has produced high values of delay.
Fig. 7 AEED in ms versus pause time in sec for WS at LT
Fig. 8 AEED in ms versus pause time in sec for RS at LT
A Simulative Analysis of Prominent Unicast Routing Protocols …
685
Fig. 9 AEED in ms versus pause time in sec for TS at LT
Irrespective of mobility or traffic patterns, the DSR has shown poor performance out of the four protocols except at zero pause times. It is realized that the DSR has produced relatively high values at TS for HT compared to LT. Out of all, AODV has performed consistently well and produced AEED values of less than 200 ms for all mobility patterns under LT and HT loads. Moreover, the performance of AODV is relatively good at zero pause times compared to HM and M pause times, showing its suitability to dynamic environments. At HT loads, compared to LT loads, the decreased values of delays are due to small PDRs at those traffic patterns. In proactive routing protocols, when a route break happens in the network, the triggered routing updates (for DSDV) and the TC messages (for OLSR) are generated. And, it takes some time for these messages to travel (propagation delay) and for the nodes to update their routing tables after receiving (processing delay) them. Both these delays are responsible for the increased AEEDs in DSDV and OLSR. In addition, a proactive routing protocol may keep the routes with high hop counts in its routing table that could result in more packet delays. Moreover, in OLSR, the
Fig. 10 AEED in ms versus pause time in sec for WS at HT
686
L. Poluboyina et al.
Fig. 11 AEED in ms versus pause time in sec for RS at HT
two periodic routing updates (hello and TC) and dependency of network functioning on MPR sets (which change with network topology) have resulted in high values of AEED compared to DSDV. However, with their proactive nature, having all routes in prior, both DSDV and OLSR have resulted in low AEED values under low mobility environments which can be clearly seen at the zero pause times regardless of the node speeds and network traffic. The increasing node mobilities may change the topology of the network which increases the probability of link failures. In reactive routing protocols, these link breakages trigger the route recovery (in AODV only) or route discovery methods that have led to an increase in AEED of packets for AODV and DSR. In DSR, since each data packet carries complete route information from source to destination, it takes time for the nodes to process the packets. This increases the delay and is the reason for high values of AEED in DSR. Moreover, at increased dynamicity of the network, the multi path nature of DSR may use its stale routes for rebuilding the
Fig. 12 AEED in ms versus pause time in sec for TS at HT
A Simulative Analysis of Prominent Unicast Routing Protocols …
687
broken links which may lead to packet retransmissions and thereby further increase in delays. It is obvious that the reactive routing protocols exhibit more AEED values since the routes are determined only when the node receives a data transfer request from the upper layer. Till the route is discovered, the data packets have to wait in the queue which results in increased delay values. However, due to its source-initiated route discovery process and having a table-driven route maintenance mechanism, the AODV has exhibited better delay characteristics in between proactive (DSDV, OLSR) and reactive (DSR) protocols.
5.3 AJ Analysis The graphs in Figs. 13, 14, 15, 16, 17 and 18, showing AJ as a function of mobility and traffic patterns. From the plots, it is observed that the values of AJ have raised with growing node speeds under LT and HT loads, for all routing protocols. All the protocols have produced low values of AJ at zero pause times irrespective of node speeds and network data traffic. Compared to delay performance, the jitter performance of proactive routing protocols is not good. Similar to AEED, regardless of mobility and traffic patterns, the DSR has exhibited poor jitter performance out of the four except at zero pause times. AODV has outperformed and produced AJ values of less than 169 ms for all mobility patterns and under both traffic loads. Similarly, its performance is relatively better at zero pause times when compared with HM and M pause times. In AODV, low jitter values are noted at HT loads compared to low traffic and it is due to small PDRs at those traffic patterns. In a MANET, the link failures due to increased node mobilities disturb the normal operation of the routing protocols that results in generated route correcting/updating
Fig. 13 AJ in ms versus pause time in sec for WS at LT
688
L. Poluboyina et al.
Fig. 14 AJ in ms versus pause time in sec for RS at LT
processes in the network which cause the packets to wait in queues. However, the handling of route breakages brings the variations in packet waiting periods at queues and thereby making the packets to reach destination nodes in a wide span of time which results in jitter. Moreover, a data packet’s processing time and waiting periods change based on the type of protocol. In DSR, since the source node is required to keep the complete route details in each data packet before placing it on the link apart from performing its other assigned duties like forwarding the packets on behalf of others, the time it attends to data packets is not uniform that have increased the jitter values. All protocols under LT have low jitter values, however, at HT loads, due to increased number of data packets into the network which have made the network congested and reason for increased jitter values further. In fact, in four routing protocols, the reasons that cause the increased delays of data packets are also form the basis for increasing jitter values.
Fig. 15 AJ in ms versus pause time in sec for TS at LT
A Simulative Analysis of Prominent Unicast Routing Protocols …
Fig. 16 AJ in ms versus pause time in sec for WS at HT
Fig. 17 AJ in ms versus pause time in sec for RS at HT
Fig. 18 AJ in ms versus pause time in sec for TS at HT
689
690
L. Poluboyina et al.
5.4 TP Analysis The graphs in Figs. 19, 20 and 21 showing TP on the basis of node mobility for LT while in Figs. 22, 23 and 24, the plots for HT load are depicted. It is concluded from the graphs that TP performance of the protocols is directly proportional to the achieved PDR values. Lower values of TP for DSDV and DSR are cause of their low PDRs. At zero pause times, the protocols DSDV and OLSR have worked equally well with the AODV. Though the PDR values of DSR are comparable to AODV’s values at WS and RS, its throughput performance is inferior to AODV protocol. AODV has displayed the most superior performance, out of all. From the figures, it is clear that the AODV has outperformed all other protocols by utilizing the channel efficiently under LT (1.63%) and HT (3.13%) conditions. The reduced TP values of DSR at WS and RS are due to its source routing nature where the data packets carry the entire route from source to destination, thereby each data packet has considerable routing overhead.
Fig. 19 TP in KBPS versus pause time in sec for WS at LT
Fig. 20 TP in KBPS versus pause time in sec for RS at LT
A Simulative Analysis of Prominent Unicast Routing Protocols …
Fig. 21 TP in KBPS versus pause time in sec for TS at LT
Fig. 22 TP in KBPS versus pause time in sec for WS at HT
Fig. 23 TP in KBPS versus pause time in sec for RS at HT
691
692
L. Poluboyina et al.
Fig. 24 TP in KBPS versus pause time in sec for TS at HT
5.5 NRL Analysis The NRL plotted as a function of mobility patterns under LT is depicted in Figs. 25, 26 and 27 and for HT pattern, the plots are presented in Figs. 28, 29 and 30. From the graphs, it is observed that with increased data traffic loads, the values of NRL have raised regardless of the routing protocol. The routing overhead increases with decreasing pause times and increasing node speeds. However, the small decreased gaps from WSHM to WSNP and WSM to WSHM are the evidence for the uncertain behavior of the MANETs. Out of all, the OLSR has performed very poor by generating huge values of NRL under both LT and HT patterns. DSDV has shown the better performance than AODV at RS and TS for both LT and HT patterns. DSR is really worked well with the zero pause times under both LT and HT patterns. Moreover, it has outperformed all at WS and RS except for TSNP and TSHM mobility patterns under both LT and HT loads. AODV is exhibiting high values of NRL since it generates more number of
Fig. 25 NRL in % versus pause time in sec for WS at LT
A Simulative Analysis of Prominent Unicast Routing Protocols …
693
routing packets to determine a fresh enough route to a possible destination whenever required. It is obvious that routing overhead generated by proactive routing protocols is usually greater than the reactive protocols since they rely on periodic broadcasts for maintaining up-to-date routes in their routing tables. Moreover, the increased node mobilities will also trigger route updating packets in proactive routing protocols; this is the reason for increased NRL values of DSDV and OLSR at RS and TS. Further, the two periodic broadcast messages (hello and TC) of OLSR have increased its NRL values. However, in reactive routing protocols, with increasing dynamicity of the network, it is required to initiate the route discovery frequently that results in increasing routing overhead in the network. DSR has the lowest NRL values compared to other routing protocols since it does not depend on any periodic broadcast messages. It works on source routing where the source node only takes care of the route from source
Fig. 26 NRL in % versus pause time in sec for RS at LT
Fig. 27 NRL in % versus pause time in sec for TS at LT
694
L. Poluboyina et al.
Fig. 28 NRL in % versus pause time in sec for WS at HT
to destination and the intermediate nodes has nothing to do with the routing of the packets except a passive relaying to the next hop. Like AODV, though it is a reactive routing protocol and depends on route request broadcasting, however due to its aggressive caching, the alternate routes are readily available and hence it seldom requires route discovery process. The features of AODV such as route request-reply mechanism, local route repair and periodic broadcasting of small hello messages for maintaining local connectivity with the neighbors have led to the higher routing overhead than DSDV. Here in this work, the number of routing packets only has been taken for calculating NRL. However, if the routing overhead is considered in terms of bytes rather than packets, then the generated routing overhead by AODV is smaller than DSDV and DSR.
Fig. 29 NRL in % versus pause time in sec for RS at HT
A Simulative Analysis of Prominent Unicast Routing Protocols …
695
Fig. 30 NRL in % versus pause time in sec for TS at HT
6 Conclusions The work has presented the performance comparison of the chosen unicast routing protocols DSDV, OLSR, AODV, and DSR under mobile ad-hoc environments for various mobility and two traffic patterns in terms of QoS metrics such as PDR, AEED, AJ, TP, and NRL. It is concluded that though the performance of DSR is good at walking (WS) and running speeds (RS) in terms of PDR and NRL values, its high delay and jitter characteristics has made it not suitable for multimedia applications. Compared to OLSR, with respect to all metrics, the performance of DSDV is better. With respect to PDR and NRL values, OLSR has come completely out of the race. From the simulation results, it is concluded that the reactive routing protocols have really performed well and shown their suitability to the mobile ad-hoc environments for both LT and HT loads. Finally, it is concluded that since the overall performance of the AODV is really good and it is working well at high mobile conditions which are very common environments of MANETs, AODV is the best protocol for supporting multimedia traffic in MANETs.
References 1. Conti M, Giordano S (2014) Mobile ad hoc networking: milestones, challenges, and new research directions. IEEE Commun Mag 52(1):85–96 2. Gunaratna G, Jayarathna P, Sandamini S, De Silva D (2015) Implementing wireless ad hoc networks for disaster relief communication. In 8th international conference on Ubi-media computing (UMEDIA). Colombo, Sri Lanka, pp 66–71 3. Hoebeke J, Moerman I, Dhoedt B, Demeester P (2004) An overview of mobile ad hoc networks: applications and challenges. J Commun Netw 3(3):60–66 4. Murthy CSR, Manoj BS (2004) Ad hoc wireless networks: architectures and protocols. Portable Doc Pearson Educ
696
L. Poluboyina et al.
5. Abolhasan M, Wysocki T, Dutkiewicz E (2004) A review of routing protocols for mobile ad hoc networks. Ad Hoc Netw 2(1):1–22 6. Boukerche A, Turgut B, Aydin N, Ahmad MZ, Boloni L, Turgut D (2011) Routing protocols in ad hoc networks: a survey. Comput Netw 55(13):3032–3080 7. Saini TK, Sharma SC (2019) Prominent unicast routing protocols for mobile ad hoc networks: criterion, classification, and key attributes. Ad Hoc Netw 89:58–77 8. He G (2002) Destination-sequenced distance vector (DSDV) protocol. Networking Laboratory, Helsinki University of Technology, 1–9 9. Jacquet P, Muhlethaler P, Clausen T, Laouiti A, Qayyum A, Viennot L (2001) Optimized link state routing protocol for ad hoc networks. In: IEEE international multi topic conference. IEEE, pp 62–68 10. Johnson DB, Maltz DA, Broch J (2001) DSR: the dynamic source routing protocol for multi-hop wireless ad hoc networks. Ad hoc Network 5(1):139–172 11. Perkins CE, Royer EM (1999) Ad-hoc on-demand distance vector routing. In: Second IEEE workshop on mobile computing systems and applications. IEEE, pp 90–100 12. Broch J, Maltz DA, Johnson DB, Hu YC, Jetcheva J (1998) A performance comparison of multihop wireless ad hoc network routing protocols. In: 4th ACM/IEEE international conference on Mobile computing and networking. Dallas, Texas, USA, pp 85–97 13. Perkins CE, Royer EM, Das SR, Marina MK (2001) Performance comparison of two on-demand routing protocols for ad hoc networks. IEEE Pers Commun 8(1):16–28 14. Clausen T, Jacquet P, Viennot L (2002) Comparative study of routing protocols for mobile ad hoc networks. In: Med-hoc-Net. Sardegna, Italy 15. Boukerche A (2004) Performance evaluation of routing protocols for ad hoc wireless networks. Mobile Networks Appl 9(4):333–342 16. Layuan L, Chunlin L, Peiyan Y (2006) Performance evaluation and simulations of routing protocols in ad hoc networks. Comput Commun 30(8):1890–1898 17. Katiyar S, Gujral R, Mallick B (2015) Comparative performance analysis of MANET routing protocols in military operation using NS2. In: 2015 international conference on green computing and internet of things (ICGCIoT), pp 603–609 18. Rao KG, Babu CS, Rao BB, Venkatesulu D (2016) Simulation based performance evaluation of various routing protocols in MANETs. IOSR J Mobile Comput Appl 3(4):23–39 19. Network Simulator 2, http://www.isi.edu/nsnam/ns/ 20. Johnson DB, Broch J, Hu YC, Jetcheva J, Maltz DA (1998) The CMU Monarch project’s wireless and mobility extensions to ns. 42nd Internet Engineering Task Force
A Secured Novel Classifier for Effective Classification Using Gradient Boosting Algorithms E. Baburaj, R. Barona, and G. Arulkumaran
Abstract This research provides an anomaly detection model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Decision Tree (DT) methodology of classification. The RSA algorithm is used to improve the privacy and security of data in a cloud setting. To ensure security, the RSA algorithm is employed for the key generation process. Feature selection is critical for effective classification. To choose the best features, the Recursive Feature Elimination (RFE) technique is used. To improve the speed and precision of the arrangement and relapse issues, the angle boosting calculations have been utilized. These calculations incredibly improve the exhibition of the interaction in light of their attributes like fast preparation and decrease overfitting. It has been demonstrated in numerous applications particularly in Antimoney laundering in cryptographic forms of money by applying in their framework as well as in their record and exchange levels. The exhibition of angle boosting calculations has been tried with the standard existing datasets. To assess the proposed strategy, we led probes of the NSL-KDD dataset. The outcomes show that the LightGBM algorithm has a delightful presentation over the other two methods as far as the parameters accuracy, precision, and recall followed by XGBoost with the histogram strategy. Keywords Light gradient boosting machine · eXtreme gradient boosting · Anomaly · Classification · Recursive feature elimination · Decision tree
1 Introduction Cloud computing technology [1] is one of the quickest, most adaptive, and widely used innovations in the Information Technology (IT) field in today’s electronic E. Baburaj · G. Arulkumaran Bule Hora University, Bule Hora, Ethiopia R. Barona (B) St.Xavier’s Catholic College of Engineering, Chunkankadai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_48
697
698
E. Baburaj et al.
processing world. Clients can use cloud computing for a variety of on-demand administrations. The pay-as-you-go model [2] determines the cost of usage based on the services obtained by the client. The cost of gaining access to these administrations through the cloud is lower. Cloud computing entails organizing, storing, and other calculating assets that are pooled together as a cloud resource pool. It allows several clients to share resources, allowing for more efficient use of various resources [3]. Cloud administrations include stage administrations for fostering various programming, infrastructure administrations for gaining access to equipment assets, and programming administrations for fostering various client applications. According to a new Gartner report, the use of cloud computing in businesses is growing at a rate of 40% each year. The rate is expected to increase by more than 25% per year, resulting in annual compensation of $55 billion [4]. Storage administrations are viewed as significant administrations since the vast majority of the clients move their gigantic volume of information from their neighborhood framework to the cloud frameworks. Additionally, when performing calculations at the neighborhood framework, it’s anything but fit for dealing with a particularly gigantic measure of data and the calculation time required is more. Along these lines, to make the calculation quickly and smoothly, clients are begun utilizing cloud administrations. Even though the cloud offers a wide range of important administrations, cloud security remains a challenge in accessing cloud administrations. The two most important security concerns in the cloud are initial, one spotlight on the specialist co-op ought to guarantee the cloud administrations are conveyed, and the framework is secure and protected, and the following one guarantees the clients that their information and data are secure [5, 6]. The client’s willingness to store sensitive information in the cloud is harmed by these security concerns. The most significant barriers to cloud storage adoption are data confidentiality, data integrity, and privacy preservation [7]. As a result of data privacy and security concerns, analysts are promoting several encryption methods. By converting raw or unique data into an incomprehensible configuration known as ciphertext, cryptographic methods improve data security and protection. Cryptographic algorithms are divided into two categories: symmetric key cryptography and asymmetric key cryptography. A single key used in symmetric key cryptography is divided between every one of the approved clients. This single key is known as the mysterious key, and it is divided among the clients. Both encryption and decryption use the same common single key. The encryption cycle is used to convert the data into a different type of text organizations referred to as the ciphertext that will be stored in the cloud. The decoding cycle converts figure text to plain text while recovering data from the cloud. Since a solitary mystery key is divided between a few clients, security is again a significant issue, since an unapproved client can hack the key effectively to get to the first information. Henceforth, asymmetric key cryptography or public-key cryptography is appeared. There are two keys in this scheme: public and private keys. The public key is used in the encryption cycle, while the private key is used in the decoding interaction. The secret key is the private key in this case. The initial information can be decoded and viewed by the client who only has the private key. With this technique, the information’s security is greatly enhanced. The asymmetric
A Secured Novel Classifier for Effective Classification Using Gradient …
699
key cryptography method is along these lines more impressive than symmetric key cryptography. Among the public-key cryptographic calculations, the most notable is the RSA calculation because the key size generated is larger, making it more difficult to break the key. The RSA calculation is now used in the proposed approach for data protection, safeguarding, and security. The term ‘data anomaly’ refers to the unusual behavior of information. There are several approaches to dealing with identifying anomalies in data. AI calculations are now used in irregularity detection frameworks [8]. In the anomaly detection measure, both types of AI algorithms specifically supervised and unsupervised techniques are used. Both of these strategies employ classification and clustering approaches separately. Semi-supervised algorithms are also used in some applications The LightGBM algorithm is used in our work to classify anomalies. While preparing enormous datasets to distinguish irregularities, highlight determination is a significant assignment to choose the necessary highlights among countless accessible highlights. Huge element choice assumes a fundamental part in improving the exactness of the anomaly detection model. To select the significant and required features, we used the RFE algorithm in conjunction with strategic relapse. The amount of time required for preparation varies greatly depending on the number of features chosen. Differing the features may change the classifying precision too. The standard dataset is constantly utilized for the experimentation of the proposed model. To acquire exact approval results, the dataset assumes a significant part. As a result, we used the NSL-KDD dataset in our research, which is a typical dataset for intrusion detection. The remaining portions of the paper are structured as follows: the related work on cloud-based anomaly detection using machine learning is discussed in Sect. 2. The system model of the proposed work is presented in Sect. 3. Section 4 presented the privacy preservation and security of the data using the RSA algorithm. Feature selection using the RFE algorithm is discussed in Sect. 5. Anomaly detection using the LightGBM algorithm is discussed in Sect. 6. Anomaly detection using the XGBoost algorithm is presented in Sect. 7. Section 8 presented the results and discussions. Finally, Sect. 9 presented the conclusion and the future work.
2 Related Work In [9], the researchers consolidated the two supervised AI algorithms to be specific Naive Bayesian classifier and Iterative Dichotomiser 3 (ID3) and proposed a hybrid model to break down the organization information and the intricate properties of organization assaults. The identification speed and precision are improved. Huang et al. [10] expressed that the quantity of noticed features is decreased into a more modest set, which is adequate to get the data important to the objectives of the anomaly detection strategy. Highlight determination strategies are utilized to decrease the dimensionality of the datasets. Thottan et al. [11] audited the abnormality discovery techniques which utilized computerized reasoning, AI, and state machine learning.
700
E. Baburaj et al.
Garg et al. [12] proposed an ensemble-based grouping model for network inconsistency discovery in huge datasets and expressed that information with countless highlights cause overfitting issues and the general presentation of the model get decreased. Thus, optimal feature selection is required to reduce the computational overhead and overfitting problems. Xu et al. [13] proposed a dynamic extreme learning machine for the classification of patterns present in the continuous data stream. Their proposed technique was faster, but the accuracy rate was less. Moshtaghi et al. [14] proposed a model for anomaly detection in data streams, and fuzzy rule-based methods were used to learn the incoming samples. Columbus [15] stated that 49% of businesses are deferring cloud utilization due to cybersecurity problems. Hence, privacy and security are important factors to be considered in cloud technology. Recently, anomaly detection emerged as the main research area among researchers [16–19]. The eXtreme Gradient Boosting (XGBoost) yields very competitive results in terms of generalization capability and training for the applications such as credits scoring, bioactive molecule prediction, and analysis on sentiments [20]. It also confers promising improvements in different applications like the prediction of electronic health record data and is important to develop models that can be positively dealt with a huge volume of complex data [21]. Many diverse and adaptive solutions to the problem, which mainly rely on randomization techniques have been proposed by applying the gradient boosting algorithms to improve the performance of the multiple ongoing models [22–24]. After evaluating the two gradient boosting variants (LightGBM, GBoost) with the NSL-KDD dataset, the study is extended to Decision Tree which can be considered as the basis of all these algorithms; concerning the accuracy, precision, and recall, the algorithms found to be suitable for practitioners and researchers of different fields of Machine Learning and Gradient Boost Decision Tree algorithms are the best performers among the number of Machine Learning algorithms in use. When these algorithms have experimented with different classification and regression problems, LightGBM and XGBoost have shown stellar performance over other algorithms in terms of accuracy, speed, and reliability [25]. A deep semi-supervised anomaly detection scheme [26] is a deep end-to-end methodology for general semi-supervised anomaly detection. An information-theoretic framework is further introduced for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for the method. In extensive experiments on Modified National Institute of Standards and Technology (MNIST) Fashion-MNIST and CIFAR-10 (Canadian Institute For Advanced Research) along with other anomaly detection benchmark datasets, the method demonstrated is on par or outperforms the shallow, hybrid, and deep competitors, yielding palpable performance improvements even when provided with only little labeled data. In privacy-preserving real-time anomaly detection [27] model, the problem of a privacy-preserving real-time anomaly detection service on sensitive, time series, streaming data is investigated. The developed privacy-preserving framework enables efficient anomaly detection on encrypted data by leveraging a lightweight and aggregate optimized encryption scheme to encrypt the data before off-loading the data to the edge. The anomaly detection is performed by a windowed
A Secured Novel Classifier for Effective Classification Using Gradient …
701
Gaussian anomaly detector while the Edge Service Provider (ESP) only sees the encrypted data, and the performance of the solution is evaluated in terms of the obtained model privacy, accuracy, latency, and communication cost. This model results in a latency of 145.52 s and a communication cost of 30.9 MB while preserving privacy and satisfying both the storage and accuracy requirements simultaneously. However, the model is not capable of handling malicious ESPs. To sum up, the progressions in the cloud and machine learning algorithms give an approach to fostering a robust model for inconsistency detection strategies. We have followed these mechanical progressions to foster a robust, reliable, and flexible anomaly detection model.
3 System Model Client, Trusted Authority, and Computation Engine are all components of the proposed framework concept. The Trusted Authority and Computation Engine components, for example, are hosted in the cloud. (1) Trusted Authority (TA) and (2) Computation Engine (CE) are the two main components of the framework model in Fig. 1. The TA has three main components: (i) Validating the client, (ii) Issuing a public key for encryption, (iii) Decoding; and (iv) Returning the characterization result to the client. The CE is in charge of (i) Feature Selection, (ii) Classification (Anomaly Recognition), and (iii) Return Analysis Report. The cloud environment is divided into two components: the TA and the CE. For the work, the client requests the public key. The public key is used to scramble the data before sending it to the cloud. For each task, the TA generates a key pair (Pp, Pr), where Pp and Pr are public and private keys, respectively. These keys are used for encryption as well as decryption. The client is approved by TA based on the qualifications provided by the client, and the public key is issued to the client (Pp). The username and password are the credentials. The client scrambles the information and sends it to the TA after encoding it with a Pp key. The client issues a ticket to the TA for anomaly detection. The CE is initiated by TA in response to client demand. The scrambled information from the client is received by TA and decoded using the Pr key. After the information has been unscrambled, it is sent to the CE, where the RFE calculation determines the optimal number of features to improve classification accuracy. As a result, the framework’s algorithm overhead is significantly reduced. Upon determination, the inconsistency location is performed by utilizing the LightGBM, XGBoost, and Decision Tree Classification Algorithms. After finishing the arrangement job, the end product is shipped off the genuine client from the calculation motor through the Trusted Authority.
702
E. Baburaj et al.
Fig. 1 System model
4 Privacy Preservation and Security Using the RSA Algorithm Our proposed cloud-based security preserving model receiving the RSA algorithm can give start to finish protection and privacy. In RSA key generation, a public key is a whole number worth generated at random by combining two prime numbers (p and q). The first information is encrypted all around with the help of the public key. Calculation 1 depicts the generation of an RSA key. Decryption is accomplished through the use of the private key. The TA was responsible for both keys. Algorithm 1: Key Generation Using RSA Algorithm Start Pick two large prime numbers, p and q. n = p×q φ(n) = ( p − 1) · (q − 1) e = random value, such that 1 < e < φ(n) Determine d = e−1 mod φ(n). Public key = (e, n), i.e., e is a public key; Private key = (d); Stop
5 Feature Selection Feature selection [28, 29] has become critical to select the best and ideal features and apply them in a few fields like the clinical field, science, financial aspects, fabricating,
A Secured Novel Classifier for Effective Classification Using Gradient …
703
picture handling, and creation. The selection of features is an important part of the classification task. The Recursive Feature Elimination (RFE) algorithm chooses the best features and wipes out the most noticeably terrible features dependent on the positioning of features. The logistic regression strategy is used for ranking. RFE with logistic regression evaluates the features and selects the best ones based on the ranking score. Logistic regression is used to determine the ranking of features by utilizing Eq. 1: (P(Y ))/(1 + P(Y )) = β0 + β1 x1 + β2 x2 + . . . . . . . + βn xn
(1)
where Y = Target (dependent variable) x 1 , x 2 ,….. x n = Features (independent factors) β 0 = Constant value β 1 , … β n = Coefficient values. The likelihood rank value or the position for each component is determined by utilizing Y and the highlights x 1 , x 2 , …, x n as information sources. It is possible to calculate it using Eq. 2 P = 1/(1 + e∧ (−β0 + β1 x1 + β2 x2 + . . . + βn xn ))
(2)
where P = Probability worth or rank value Hence, the rank for each element is allotted. The value ‘N’ represents the total number of features in the dataset in Fig. 2. Every feature is listed in the Feature Set (FS), and the features having rank attributes are listed in the Feature Rank (FR) set. The element with the most noteworthy position is chosen as the best element and the least positioning features were wiped out. The chosen features gave to LightGBM, XGBoost, and DT algorithms for the anomaly detection measure.
6 Detection Using LIGHTGBM Algorithm LightGBM [30] is a gradient boosting structure, and it’s anything but an amazing calculation. It’s anything but a tree-based learning strategy. The calculation speed is quick for this algorithm so it has the name light. It can be ready to handle an enormous volume of information, and it requires less measure of memory. The LightGBM performs arrangement by utilizing the chosen features. Algorithm 2 shows the LightGBM calculation. The LightGBM’s main parameters [31–33] for implementing the proposed model are as follows: number of leaves—the total number of leaves on each tree. learn rate—the algorithm’s learning rate.
704
E. Baburaj et al.
Fig. 2 Feature selection using RFE algorithm
maximum_depth—the maximum depth of the algorithm. boosting type—it specifies the algorithm’s type. minimum data—the leaf node’s smallest amount of data. feature_fraction—a value ranging from 0 to 1. It refers to the proportion of a specific feature to the total number of features. Our implementation’s parameter values are as follows: learning_rate = 0.003 boosting_types = ‘gbdt’ objective_function = ‘binary’ metric = ‘binary_logloss’ sub_features = 0.5 no.of_leaves = 70
A Secured Novel Classifier for Effective Classification Using Gradient …
705
min_data = 50 max_depth = 70. Algorithm 2: LightGBM Input: T = training data Input: F = no.of iterations Input: u = big gradient data sampling ratio Input: v = small gradient data sampling ratio Input: loss_fn = loss function, L_weak = weak learner Model ← {}, factr ← (1-u)/v topn ← u × len(T), randn ← v × len(T) for i = 1 to F do pred ← Model.predict(T) k ← loss_fn(T,pred) w ← {1,1,…} sort ← GetSortedIndices(abs(k)) topset ← sorted[1:topn] randset ← RandomPick(sorted[topn:len(T)],randn) usedset ← topset + randset z[randset] = factr newml ← L_weak(T[usedset],-k[usedset],z[usedset]) Model.append(newml)
7 Anomaly Detection Using XGBoost Algorithm XGBoost is a kind of machine learning algorithm, which falls on the gradient boosting algorithm, which in turn, split the dataset to do either classification or prediction based on the selected features. The major idea behind such an algorithm is simple. Multiple decision trees have been trained first, then add their predictions to find the final prediction. Based on this, the dataset has been split into multiples to building a tree. If the number of splits is less, then the algorithm executes with faster speed. The algorithm follows the given steps in Fig. 3. 1. Tree Growth XGBoost algorithm uses a leaf wise growth principle. It deploys different approaches at different level. It helps the trees to maintain balance among them and regularizing the training methods. 2. Selection of the best split Best split selection is a challenging task in the algorithmic process. The algorithm has to check every feature of every data point. XGBoost has methods to approximate the best split. It uses the Greedy algorithm for finding the best split, and it is applied
706
E. Baburaj et al.
Fig. 3 Steps in XGBoost algorithm
over the sorted data to improve the efficiency. The histogram method combines the features into different sets. The algorithm then uses the sets, rather than using the features to split. It reduces the complexity of the algorithm and increases the training speed. The sets generated at the beginning are used for the entire training process. The best split may be determined based on the tree structure. The following steps are used to find the above: 1. The initial value may be estimated from the previous data observed and the same is a probability value. 2. Calculate the residual value. Residual value is the difference between the observed value and the estimated value. 3. From this, the algorithm determines the value of Mean Square Error (MSE) as MSE =
(Observed value − Estimated value)2 n
A Secured Novel Classifier for Effective Classification Using Gradient …
707
4. Calculate the odds ratio. The odds ratio is the ratio between the events count and the actual estimation. 5. The probability is calculated from the odds ratio. 6. Finally, the best split is determined using the following relation: n i=1
Residuali Probability xi
where ‘x’ is the iteration number in which the tree is built. From these best split values, first, the tree may be built. 7. Calculate the new residual to build the second tree. 8. Repeat steps 1–7 for all the values in the set. 3. Sparse input processing Most of the data used are in the textual form and in sparse by nature. The values of the sparse will be treated as 0. XGBoost ignores all such spares values during the split operation and considers the data values only. This process will reduce the loss enormously and also improve the training process. The algorithm considers all the missing values as 0. Finally, it processes the numerical features using the ‘max’ and ‘min’ modes. 4. Dealing with categorical values XGBoost has no special operation to process the categorical data values. It processes only the numerical values as input features. Thus, the methods like label encoding are used to encode the categorical values before they have to feed into the algorithmic input.
8 Discussions and Findings The standard NSL-KDD dataset is used for experimental approval. It has 42 characteristics or features. The RFE algorithm is used to select only the most important features from among these. These selected features are used by LightGBM for characterization. The following parameters are used for evaluation: (i) (ii) (iii) (i)
Accuracy Precision, and Recall. Accuracy
Accuracy is the ratio of successfully anticipated qualities to total qualities. The accuracy is determined by utilizing the Eq. 3. Accuracy = (TP + TN)/(TP + TN + FP + FN)
(3)
708
E. Baburaj et al.
Fig. 4 Accuracy value of LightGBM, XGBoost, and decision tree
where TP is the Total True Positive qualities, TN is the Total True Negative qualities, FP is the Total False Positive qualities, and FN is the Total False Negative qualities. As shown in Fig. 4, the classification accuracy of LightGBM is 0.998, XGBoost is 0.989, and Decision Tree is 0.976. Experimentation is done out by varying the number of features selected. The general features got after playing out the label and one-hot encoding plans are 122. We played out the experimentation by picking the features going from 10 to 90. In light of the feedback, we discovered that the exhibition was seen at an exceedingly low level when the number of features was extremely low, i.e., low reveals 10 features in our case. When the number of highlights was 50, the most exactness was seen. (ii) Precision Precision is a measure of the percentage of correct positive predictions made. It is calculated by using Eq. 4. Precision = TP/(TP + FP)
(4)
Figure 5 shows the precision values achieved for the proposed model’s LightGBM, XGBoost, and Decision Tree classifier. The precision value is calculated by varying the number of features. It can be seen that LightGBM has a higher precision value than XGBoost and Decision Tree classifier. When the number of features selected was greater than 50, the precision value obtained by LightGBM was 0.998, XGBoost was 0.989, and Decision Tree was 0.98, and the precision value remained the same for the remaining number of features. (iii) Recall The recall is defined as the ratio of true positive to the sum of true positive and false negative values. It is calculated using Eq. 5.
A Secured Novel Classifier for Effective Classification Using Gradient …
709
Fig. 5 Precision value of LightGBM, XGBoost, and decision tree
Recall = TP/(TP + FN)
(5)
The recall values are calculated by altering the range of features. It can be seen that LightGBM has a higher recall value than the Decision Tree classifier. When the number of selected features exceeds 50, the recall value for LightGBM is 0.992, XGBoost is 0.98, and for Decision Tree is 0.975 and remains the same for the remaining number of features as in Fig. 6. We also compared the performance of the LightGBM algorithm with the XGBoost and Decision Tree, and we discovered that LightGBM performs slightly better than XGBoost and Decision Tree. The accuracy of the Decision Tree was observed as 0.99, XGBoost was 0.996, and LightGBM’s performance was 0.9983, as seen in
Fig. 6 Recall value of LightGBM, XGBoost, and decision tree
710
E. Baburaj et al.
Fig. 7 Comparison of XGBoost, LightGBM, and decision tree classifier
Fig. 7. As a result, the LightGBM classifier outperforms the XGBoost and Decision Tree classifiers.
9 Conclusion The machine learning techniques LightGBM and XGBoost are used in this research to propose an anomaly detection model based on the cloud environment and RFE for feature selection. Furthermore, the RSA method is used to protect the cloud data’s privacy and security. The Trusted Authority is in charge of issuing keys to the user and the compute engine in a cloud environment. The computation engine is in charge of deciding which features to use and how to classify them. The NSL-KDD dataset is used for experimental evaluation to do the anomaly detection process. The results show the effectiveness of the LightGBM algorithm in terms of accuracy, precision, and recall with excellent speed. Also, it acceptably handles categorical data. XGBoost has also provided very compatible results in detecting data anomalies. In the future, we will investigate our model with hyperparameter optimization and data sampling techniques using different most suitable parameters like potential memory issues.
References 1. Chiba Z, Abghour N, Moussaid K, El Omri A, Rida M (2019) New anomaly network intrusion detection system in cloud environment based on optimized back propagation neural network using improved genetic algorithm. Int J Commun Netw Inf Security (IJCNIS) 11(1):61–84 2. Panigrahi B, Hoda M, Sharma V, Goel S (2018) Nature inspired computing. Adv Intell Syst Comput Springer, Singapore 652:165–175
A Secured Novel Classifier for Effective Classification Using Gradient …
711
3. Zhao X, Zhang W (2016) Hybrid intrusion detection method based on improved bisecting Kmeans in cloud computing. In: 13th IEEE Web information systems and applications conference (WISA), Wuhan, China, pp 225–230 4. Garg S, Kaur K, ShaliniBatra GS, Aujla GM, Kumar N, Zomaya AY, Ranjan R (2020) EnABC: an ensemble artificial bee colony basedanomaly detection scheme for cloud environment. J Parallel Distribut Comput 135:219–233 5. Mehibs SM, Hashim SH (2018) Proposed network intrusion detection system based on fuzzy c mean algorithm in cloud computing environment. J Univ Babylon 26(2):27–35 6. Patel A, Taghavi M, Bakhtiyari K, JúNior JC (2013) An intrusion detection and prevention system in cloud computing: a systematic review. J Netw Comput Appl 36(1):25–41 7. Kaaniche N, Laurent M (2017) Data security and privacy preservation in cloud storage environments based on cryptographic mechanisms. Comput Commun 111:120–141 8. Eltanbouly S, Bashendy M, AlNaimi N, Chkirbene Z, Erbad A (2020) Machine learning techniques for network anomaly detection: a survey. In: IEEE International conference on informatics, IoT, and enabling technologies (ICIoT) 9. Farid DM, Harbi N, Rahman MZ (2010) Combining naive bayes and decision tree for adaptive intrusion detection, arXiv preprint, 2:12–25 10. Huang T, Sethu H, Kandasamy N (2016) A new approach to dimensionality reduction for anomaly detection in data traffic. IEEE Trans Netw Serv Manag 651–665 11. Thottan M, Ji C (2003) Anomaly detection in IP networks. IEEE Trans Signal Process 51(8):2191–2204 12. Garg S, Singh A, Batra S, Kumar N, Obaidat MS (2017) EnClass: ensemble-based classification model for network anomaly detection in massive datasets. In: IEEE Global communications conference (GLOBECOM’17), Singapore 13. Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449 14. Moshtaghi M, Bezdek JC, Leckie C, Karunasekera S, Palaniswami M (2015) Evolving fuzzy rules for anomaly detection in data streams. IEEE Trans Fuzzy Syst 23(3):688–700 15. Columbus L (2017) State of cloud adoption and security. https://www.forbes.com 16. Garg S, Singh A, Batra S, Kumar N, Yang LT (2018) UAV-empowered edge computing environment for cyber-threat detection in smart vehicles. IEEE Network 32:42–51 17. Gupta D, Garg S, Singh A, Batra S, Kumar N, Obaidat MS (2017) ProIDS: probabilistic data structures based intrusion detection system for network traffic monitoring. In: IEEE Global communications conference (GLOBECOM’17), Singapore 18. Garg S, Kaur K, Kumar N, Batra S, Obaidat MS (2018) HyClass: hybrid classification model for anomaly detection in cloud environment. In: IEEE International conference on communications (ICC), Kansas City, USA 19. Jeon H, Oh S (2020) Hybrid-recursive feature elimination for efficient feature selection. Appl Sci 10(9):3211. https://doi.org/10.3390/app10093211 20. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu (2017) T-Y. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146–3154 21. Bentejac AC, Munoz GM (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937–1967 22. Nori V, Hane C, Crown W, Au R, Burke W, Sanghavi D, Bleicher P (2019) Machine learning models to predict onset of dementia: a label learning approach. Alzheimer’s Dementia Transl Res Clin Interven 5:918–925 23. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378 24. Yoav Freund RES (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780 25. Chen T, Guestrin C (2015) Xgboost: a scalable tree boosting system, In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16. ACM, New York, pp 785–794 26. Alshari H, Saleh AY, Odaba A (2002) Comparison of gradient boosting decision tree algorithms for CPU performance. J Inst Sci Technol 37(1)
712
E. Baburaj et al.
27. Mehnaz S, Bertino E (2020) Privacy-preserving real-time anomaly detection using edge computing. In: IEEE 36th international conference on data engineering (ICDE). https://doi. org/10.1109/ICDE48307.2020.00047 28. Ruff L, Vandermeulen RA, Görnitz N, Binder A, Müller E, Müller KR, Kloft M (2020) Deep semi-supervised anomaly detection. In: International conference on learning representations (ICLR). arXiv preprint arXiv:1906.02694 29. Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inform Sci 282:111–135 30. Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsupervised, and semisupervised feature selection: a review on gene selection, IEEE/ACM Trans. Comput Biol Bioinform 13:971–989 31. Ke G, Meng Q, Finley T (2017) LightGBM: a highly efficient gradient boosting decision tree. In: 31st conference on neural information processing systems (NIPS) 32. Minastireanu EA, Mesnita G (2019) LightGBM machine learning algorithm to online click fraud detection. J Inf Assurance Cyber Secur, Article ID 263928. https://doi.org/10.5171/2019. 263928 33. Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning lightgbm and xgboost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39
E-Learning Acceptance in Higher Education in Response to Outbreak of COVID-19: TAM2 Based Approach Amarpreet Singh Virdi and Akansha Mer
Abstract The use of technological tools has robustly come to the forefront in form of e-learning with the onset of COVID-19. In the light of the accelerated shift to online classes by the educational institutions in response to the outbreak of the pandemic, the study aims to apply the technology acceptance model (TAM2) to examine the impact of students’ social influence, perceived usefulness, and perceived ease of use on e-learning intention in higher education during COVID-19. Data collected from students of different universities from the Northern region of India were analysed using structural equation modelling. The results revealed that a significant relationship exists between social influence and perceived usefulness of e-learning behavioural intention. The study also confirms the mediation of perceived usefulness between social influence and e-learning behavioural intention and the mediation of perceived usefulness between perceived ease of use and e-learning behavioural intention. The study revealed that social influence and perceived usefulness are the key factors for the acceptance of e-learning in higher education during COVID-19 pandemic. The main contribution to the study is developing a better understanding of variables that influence the adoption and use of e-learning with special reference to an outbreak of COVID-19. Keywords ICT · TAM · E-learning acceptance · Perceived usefulness · Social influence · Perceived ease of use · Behavioural intention · COVID-19
1 Introduction The use of technological tools has risen to the mainstream with the advent of COVID19, for the purpose of e-learning and other applications. Jenkins and Hanson [1] A. S. Virdi Department of Management Studies, Kumaun University, Bhimtal (Nainital), Uttarakhand, India A. Mer (B) Department of Commerce and Management, Banasthali Vidyapith, Banasthali, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_49
713
714
A. S. Virdi and A. Mer
indicated that the use technological tools aids e-learning or online learning. During pre-COVID-19 times, the online education was usually imparted by open universities in India. Until the pandemic, e-learning or online learning only accounted for a small percentage of higher education. Considering education is so important in influencing economies around the world, it was necessary to find a way to avoid interruptions in education. Technology was instrumental in reducing the pandemic’s influence on student education. To fill the vacuum that was created in education during COVID-19, various educational institutions adhered to e-learning. The e-learning introduced the students and academicians to embrace and adapt to the new technologies as Google Meet, Zoom, MS-Teams, etc. The pandemic not only brought the opportunity for the academicians and students to work with these technologies, which in traditional education otherwise had an obscure role. According to a report by UNESCO (2020), some 32 crore students’ studies have been affected due to closures of educational institutions in India. The implementation of online education and e-learning was pushed forth by COVID-19, which was an unexpected shift for both instructors and students. Therefore, adoption of the online learning system from offline learning brought several challenges especially in the Indian education scenario. Initially, through the lockdown period, the instructors and student adopted the easy means of digital technology to e-learning which is WhatsApp, telephonic conversation and electronic mails for conducting classes, assignment submission, and doubt clearance. The digital devices used were smartphones and laptops, with smartphone usage on the higher side. The students and teachers were not acquainted with the online learning/classes conducted through smartphones and laptops; this led to the disinterest, demotivation and the stressful situation towards e-learning. Most of the time, the regular timetable is followed for the classes, which also leads to the students’ disinterest in online learning. The students located in remote areas with not-so-good Internet infrastructure faced lots of connectivity issues. The study conducted by Mishra et al. [2] observed that “there were 32% of teachers using Google classroom and 45% teachers using Zoom/Cisco WebEx/Google Meet/Skype platform for taking online classes, but the recipient students were found only 20% and 15%, respectively”. As a result, including students in the education, instruction, and learning progression continues as a momentous problem in the recognition of online learning. Thus, the factor(s) that will ensure participation and adoption of e-learning/online learning throughout the epidemic should be determined. The studies have shown that interpersonal interactions influence the individual’s behaviour. As per Cooper and Zmud [3], individuals are swayed by observing others and obtaining knowledge from other people, especially for the reduction of any implied risk and uncertainty. This phenomenon is termed as “Social Influence”. Elearning is a social process, according to research reports as indicated by Punnoose [4]. The current study is the notional addition of the technology acceptance model, i.e. TAM2 (Venkatesh and Davis [5]). There are several studies conducted in adopting the technology, and TAM has appeared as the robust model. The reason for adopting this model is that during the lockdown, when course completion was the need of
E-Learning Acceptance in Higher Education in Response to Outbreak …
715
the hour, the management of the universities and faculty members floated the information of conducting online classes to the students and encouraged them to participate in online classes. Similarly, family members were more concerned about the studies of their children during the lockdown and therefore influenced their children in adopting online mode of learning. The students were also influenced and encouraged to use online classes. This implies that social influence has a positive association with the students’ attitudes towards online learning. Social influence is encompassed in TAM2. Davis [6] propounded the theory and model of TAM, which theorise that “perceived usefulness and ease of use are two primarily factors affecting linkages of users’ attitudes, intention, and actual adoption behaviour”. While an individual exhibits technology acceptance behaviour, social influence is considered as a common occurrence found in specific technologies and works as an important role in the acquisition of items and adoption of I.T.
2 Literature Review 2.1 Technology Acceptance Model (TAM) TAM is well-known to understand the acceptance and usage of technology by individuals. Empirical studies indicate that “perceived ease of use and perceived usefulness have a significant effect on the behavioural intention to use the technological system”. Davis [6] indicated that TAM hypotheses the two imperative cognitive convictions as perceived ease of use (PEOU) and perceived usefulness (PU) that affect adoption behaviour. As per Beldona et al. [7], TAM has been developed through research investigations to anticipate technology adoption. TAM has been used in different countries and in a different context of research studies. Rosaline and Wesley [8] conducted an investigation on the adoption of technological tools for online learning in higher education using TAM and inferred a strong positive correlation between social influence and behavioural intention to adopt. Decman [9] concluded that social influence has a favourable positive association with the behavioural intention to adopt technology. The original TAM model was extended by Venkatesh and Davis [5] to explain “perceived usefulness and usage intention in terms of social influence (e.g., subjective norms, voluntariness) and cognitive instrumental processes (e.g., job relevance, output quality)”. TAM2 stands for the extended model. Merhi [10] found that the behavioural intention-based research studies discussed and proposed that perceived ease of use (PEOU) and perceived usefulness (PU) affect and persuade the behavioural intention in the adoption of technology. Studies indicate that TAM2 is far better in elucidating user adoption of technology.
716
A. S. Virdi and A. Mer
2.2 E-learning Pre-COVID-19 Revythi [11] found that self-efficacy, social norms, and system access are all positively associated with behavioural intention to utilise e-learning among students in higher education by means of extended TAM. Samsudeen [12] utilised UTAUT2 and indicated that “performance expectancy, effort expectancy, social influence, worklife quality, hedonic motivation, Internet experience, and facilitating condition” are positively associated with behavioural intention to employ e-learning by students in Sri Lanka. Rodrigues et al. [13] suggested the requirements for students’ acceptance of e-learning as sound digital infrastructure, acceptance of change, digital skills. The study also pointed out the learning styles, learning motivation, collaboration with peers, learning behavioural types, and prior knowledge affect e-learning. Ali [14] found work-life quality, Internet experience, and subjective norm are all positively correlated with behavioural intention; facilitating condition has a significant correlation with actual use and positive association of behavioural intention on actual use and perceived ease of use. Tarhini et al. [15] in their study suggest that “performance expectancy, social influence, habit, hedonic motivation, self-efficacy, effort expectancy, and trust exert a positive effect on behavioural intention to use e-learning” by university students in England. Rosaline et al. [8] discovered a strong association between social influence and behavioural intention to adopt e-learning in higher education in India in a study based on UTAUT. Perceived usefulness and attitude towards utilising, exert a straight effect on e-learning adoption. According to Boateng et al. [16] perceived usefulness and ease of use, are strongly linked to attitudes towards use. Students having strong affection towards social identity in the classroom are more likely to use e-learning (Chu & Chen [17]). According to Ratna and Mehra [18], there is a positive correlation between perceived ease of use, perceived utility, attitude, behavioural intention to use, and actual usage of e-learning. In a study, Bock et al. [19] found that individual adoption of e-learning has a positive association with the social influence, with the individual attempting to improve his or her image.
2.3 E-learning During COVID-19 Sukendro et al. [20] used extended TAM to show that in higher education in Indonesia, facilitating conditions, influence perceived ease of use. Subsequently, perceived ease of use envisaged attitude and perceived usefulness. In addition, attitude and behaviour intention were predicted by perceived usefulness. Tandon [21] used UTAUT to demonstrate that, whereas performance expectancy, facilitating conditions and social influence are all positively associated with behavioural intention to use e-banking; social influence has little bearing on attitude. Raza et al. [22] employed UTAUT and concluded that performance expectancy, effort expectancy, and social influence all determine learning management system behavioural intentions. The learning
E-Learning Acceptance in Higher Education in Response to Outbreak …
717
management system’s behavioural intention was positively influenced by social isolation. Ho et al. [23] used TAM and revealed that computer self-efficacy is positively associated with perceived ease of use. System interactivity and perceived usefulness is also positively associated. On the other hand, social factors directly affected students’ attitudes. The current study is undertaken to comprehend that whether the function of TAM extended with social influence in the Indian context during COVID-19 has been successful in explaining the acceptance of online learning. The students and instructors in most of the higher studies institutions were not familiar with online learning. The obstacles in adopting online learning, particularly for students and teachers in rural, semi- urban, and small-town areas of India, were exacerbated by reluctance to a novel method of learning and the financial costs (smart gadgets and internet connectivity). The behavioural and cognitive aspects were also affected while students had to study in seclusion due to the constraints of COVID-19. The study, therefore, became important to understand the cognitive and behavioural aspect influencing the adoption of e-learning using TAM. According to a review of the literature, many studies based on original TAM have been conducted, but there is a scarcity of research on social influence-based extended TAM (TAM2), notably on the adoption of online learning. Furthermore, no study on the use of e-learning in higher education during a COVID-19 outbreak in India has been conducted to the best of the researchers’ knowledge.
3 Proposed Model and Hypothesis Development The proposed model is an extension of TAM and is used to determine the parameters influencing e-learning acceptance among Indian university students during the COVID-19 period. Szajna [24] found that a variety of external factors indirectly influence technology acceptability through perceived usefulness and perceived ease of use. The proposed model has the foundation on TAM2, which relates the constructs of perceived usefulness (PU), perceived ease of use (PEOU), and behavioural intention (BI). Social influence (SI) is expected to be one of the external elements influencing elearning acceptability in higher education in this study. Figure 1 depicts the proposed model.
3.1 Social Influence (SI) Social influence, as described by Venkatesh et al. [6], is an individual’s ability to influence others and consider the importance of his/her image in a group. Bock et al. [19] established in their study that an individual adoption of e-learning depends on social influence where the individual is keen to enhance his/her image. Previous research on technology adoption has revealed a strong correlation between social
718
A. S. Virdi and A. Mer
Perceived Usefulness Behavioural Intention
Social Influence
Perceived Ease of Use
Fig. 1 Proposed model. Source Authors’ own compilation
influence and behavioural intention. Sudden change within imparting education from the traditional face-to-face scenario to electronic form became a massive challenge to embrace the situation that erupted out of COVID-19. In the context of Indian students, social influence plays a significant role in technological adoption, particularly in a pandemic situation like COVID-19. Since, it is the parent’s pressure upon their children to study, which is influenced by the comparison of the peer students. Venkatesh et al. [25] highlighted that students’ behavioural intention to use ICT might well be influenced by seniors, teachers, and peers. Chu and Chen [17] indicated that in the classroom a student with a stronger social identity will be more likely to employ e-learning. Tajfel and Turner [26] divulged that as a result of social influence, people look for similarities between people and a given group to establish a sense of belongingness, which is known as the social categorization process. Legris et al. [27] highlighted that social influence has been concluded to be an important factor that has been studied under subjective norms, and TAM was considered to be the appropriate model. According to Al-Fraihat et al. [28], the influence of instructors, parents, and peer groups on e-learning adoption is substantial, and perceived usefulness and perceived ease of use have a favourable impact on e-learning adoption. H1: Social influence of the students’ reference groups is positively associated with their online learning during the outbreak of COVID-19.
3.2 Perceived Ease of Use (PEOU) Perceived ease of use refers to how easy an individual understands it will be to utilise a particular technology. Venkatesh et al. [25] indicated that concepts that represent the notion of perceived ease of use include complexity and effort expectation. As per Davis [6], perceived ease of use can aid performance and perceived usefulness in the short term, while a lack of it can cause discomfort, which might hamper innovative acceptance. Cheung & Vogel [29] divulged that users’ attitudes towards technology
E-Learning Acceptance in Higher Education in Response to Outbreak …
719
adoption are influenced by perceived ease of use and utility of technology. This study looked into the effect of social influence on individual e-learning technology adoption intentions. H2: During the COVID-19 outbreak, students’ perceived ease of use is positively related to their perceived usefulness of online learning. H3: The perceived ease of use of online learning is positively related to students’ intention to use it during COVID-19.
3.3 Perceived Usefulness (PU) Perceived usefulness, according to Davis [6], relates to how much a person believes that utilising a specific system will help him or her perform better at work. Studies by Nysveen et al. [30] highlighted that technology that does not aid employees in accomplishing their jobs is unlikely to be well adopted, according to research on information system adoption. Perceived usefulness of the technology has an impact on the adoption of the technology while several factors as network issues, devices supporting the latest technologies related to distance communication may act as a constraint. It is well established that the perceived usefulness of an innovation has a significant positive impact on users’ intentions to use e-learning. H4: The perceived usefulness of online learning is significantly associated with students’ intentions to use it during the COVID-19 epidemic.
3.4 Behavioural Intention (BI) Venktesh et al. [25] define behavioural intention as an individual’s willingness to perform a given behaviour. Tosuntas et al. [31] proposed that individuals with a positive attitude for e-learning had higher behavioural intentions. Behavioural intention is an important and effective factor for predicting individual behaviour. Van Raaji and Schepers [32] added subjective norm, personal innovativeness, and computer anxiety to TAM2 and studied virtual learning among MBA students. They found that subjective norm and perceived ease of use had an indirect effect, while perceived usefulness has a direct effect on virtual learning. They also discovered that personal inventiveness and computer fear have a direct impact on perceived ease of usage. Social pressure/influence acts as a catalyst in the acceptance of product, services, and technology. Group leaders have an ability to influence the perception and behaviour of the group members. H5: During COVID-19, perceived usefulness mediates the relationship between social influence and the intention to use online learning. H6: Perceived usefulness mediates the relation between perceived ease of use and intention to use online learning during COVID-19.
720
A. S. Virdi and A. Mer
4 Method 4.1 Participants During April–May 2020, the authors conducted research to test the hypotheses. A total of 500 surveys were mailed to students from various universities. Only 432 of the 500 surveys issued were returned (response rate = 86.4%). On a 5-point Likert scale ranging from 1 (completely disagree) to 5 (completely agree), the participants responded to each measure. Among 432 students, 403 (93.3%) were female and 29 (6.7%) were male. The majority of students were from management studies with 60.2%, followed by commerce students with 35.4% and rest constituted of law, arts, finance, and medicine students.
4.2 Measures In this study, the sampling instrument was an online questionnaire specifically designed for e-learning in India. The questionnaire was designed from the measurement scales validated by the authors in their studies. A Google Forms questionnaire was used to obtain the primary data for analysing the hypotheses. A total of 432 questionnaires were collected with valid cases. Non-probability sampling, which includes judgement sampling, was used to obtain the sample.
4.2.1
Social Influence
Social influence was measured using 6 items scale adapted from the modified scale from Taylor and Todd [33]. It contains items that constitute teacher’s influence, peer group influence, and family influence. A sample item is “I did what my teachers thought I should do during COVID-19”. A high score suggests that social influence plays a significant impact in e-learning adoption.
4.2.2
Perceived Usefulness
Perceived usefulness (PU) was measured with 3 items scale from the validated scale of Davis [6]. A sample item is “online classes have improved my academic performance during outbreak of COVID-19”. A high score suggests that students will benefit from e-learning.
E-Learning Acceptance in Higher Education in Response to Outbreak …
4.2.3
721
Perceived Ease of Use
Perceived ease of use (PEOU) was measured with 3 items scale from the validated scale of Davis [6]. A sample item is “I find using online learning system easy to use”. A high score indicates that e-learning technology usage is easy for the students.
4.2.4
Behavioural Intention
Behavioural intention (BI) was measured with 3 items scale from the validated scale of Venkatesh & Davis [25]. A sample item is “I plan to use online classes in future when required”. A high score suggests that students intend to use e-learning or online learning in future.
4.3 Procedure for Analysis AMOS version 20.0 and SPSS version 20.0 were used to scrutinise the data. The measurement model was examined first, and then evaluation of the structural model’s analysis was assessed. A variety of fit measures were used to test the measurement model. The model fit was evaluated using indices as the root mean square of approximation (RMSEA), the Tucker–Lewis index (TLI), and comparative fit index (CFI). With a threshold of 0.90 and exceptional fit of 0.95, the CFI and TLI measurements indicate fit. A decent fit is indicated by a low RMSEA value, whereas an exceptional fit is indicated by a high RMSEA value (Kline [34]). The construct reliability (CR) and average variance extracted (AVE) were determined to validate the variables’ reliability and validity. The mediating effects are tested by the bootstrapping method in AMOS.
5 Data Analysis and Results AMOS version 20.0 and SPSS version 20.0 were used to analyse the data. The hypotheses were investigated using structural equation modelling.
5.1 The Measurement Model The measurement model, according to Hair et al. [35], assesses the measures’ reliability and validity. Figure 2 depicts the measurement model created for the investigation.
722
A. S. Virdi and A. Mer
Fig. 2 Measurement model. Source Authors’ own compilation
Confirmatory factor analysis uses a set of indices to confirm that the model fits correctly, and it follows a recommended level of fit. These indices are discussed as follows: the values of measurement model are χ2 = 3697.951 with 105 degrees of freedom (p np.array(nums).reshape(n,m) # converts created list to a numpyarray with open(’file.txt ’,’r+’) as myfile: #opens file to read and write the array data -> myfile.read() myfile.seek(0) np.savetxt(myfile,arr,fmt="%s") # save array to the file myfile.truncate() #deletes all previous data from file myfile.close() # close the file ## Client side ts_3 -> int(time.time()) #generates timestamp SET file TO open("file.txt", "r") #read the file with OTP from server file1-> open("file1.txt","w") #opens another file to write arr-> load file data to array arr2 ->save OTP set of this device to an array using array indexing ls1-> arr2,ts_3 # adds timestamp to array of OTPs file1.write(ls1) #writes back to file along with time stamp #close both the files file1.close() file.close() ## Server Side from pathlib IMPORT Path txt -> Path(’file1.txt’).read() #saves data of file to txt c-> txt[0] #saves array of OTP received from client side to c
786
A. Chauhan and A. Mitra
Fig. 2 Flowchart for the proposed approach
IF txt[1]==ts: #compares IF timestamp of OTP generation and OTP authentication is same arr1=np.array(c) #converts received OTP to numpyarray SET equal_arrays TO comparison.all() # given boolean result IF equal_arrays==true: OUTPUT("Authenticated") ELSE: OUTPUT("Not Authenticated") file.close
Flow for our presented simulation code as presented in Algorithm 1 is graphically shown in Fig. 2.
4 Results Simulation results as achieved while executing our code are presented in Fig. 3. In Fig. 3, arbitrarily chosen four different inputs (i.e., 20 (refer Fig. 3a, 35 (refer Fig. 3b, 50 (refer Fig. 3c, and 500 (refer Fig. 3d) have been considered.
A OTP-Based Lightweight Authentication Scheme in Python Toward …
a
787
b
c
d
Fig. 3 Screenshots for our presented code at different input sizes
To calculate the time complexity of the above algorithm, the above algorithm has gone through rigorous analysis and testing with different inputs using the number of sets of OTPs to be generated (i.e., number of devices to be connected). To simplify the calculation and simulation to some extent, the number of OTP in each set is kept constant at 3 (arbitrarily chosen). Figure 4 gives the input versus time graph for the above-mentioned algorithm.
Fig. 4 Input versus time graph
788
A. Chauhan and A. Mitra
From the plot in Fig. 4, we observe a nearly linear relation between input and time implying the time complexity for the proposed lightweight authentication implementation is O(n), the same can be calculated from the proposed algorithm as well. The algorithm uses timestamp and array indexing to differentiate between different devices, hence the chance of cross authentication is almost negligible. To keep a check on space complexity, the authentication implementation clears the previous set of OTPs before issuing a fresh batch for a recently requested session. This makes sure that the fog devices do not have a huge memory for data authentication.
5 Conclusion The lightweight authentication implementation mentioned in the paper does solve the problem which arises at the time of data security of these resource-constrained devices. As the algorithm has a time complexity of O(n), which is better than a lot of proposed authentication systems, it does solve the problem of time constraints for developing a connection between small fog computing units. This also reduces the chance of data delay while ensuring data security. The proposed algorithm also clears the file before saving the set of new OTPs which proves as a solution to space constraints in these. In this way, it dumps all the previous session OTPs which are of no use once a new session is requested. This drastically reduces the space complexity of the algorithm making it better for a lightweight authentication system. Therefore, looking at the merits of the implementation, it may be concluded that the proposed OTP-based lightweight data authentication implementation has several inherent advantages for the possible uses in data security in “fog computing architecture and has further potential for easy training of reservoir computers toward uses in Internet of Things (IoT)-based multiple industry applications” [4]. Acknowledgements The authors sincerely acknowledge that the constructive and helpful reviews received from the anonymous reviewers have helped to enhance the quality of this research manuscript. Declaration of Interests The authors declare that a primary idea for this research was orally presented in the second research day of SRM University-AP, India, was held at SRM UniversityAP, India in August 2021, but no abstract/full version of the research paper was published. The authors also acknowledge the support and help from SRM University-AP, India to file a patent based on this research (Patent Application No. 202241044903 dated August 05, 2022 in the Patent Office Journal, The Govt. of India). Hence, authors declare no conflict of interests.
A OTP-Based Lightweight Authentication Scheme in Python Toward …
789
References 1. Saha S, Mitra A (2019) Towards exploration of green computing in energy efficient optimized algorithm for uses in fog computing. In: 2019 international conference on intelligent computing and communication technologies (ICICCT 2019). Springer, Singapore, pp 628–636 2. Saha S, Mitra A (2021) An energy-efficient data routing in weight-balanced tree-based fog network. In: 2019 international conference on intelligent and cloud computing (ICICC 2019). Springer, Singapore, pp 3–11 3. Mitra A, Saha S (2021) A design towards an energy-efficient and lightweight data security model in fog networks. In: 2019 international conference on intelligent and cloud computing (ICICC 2019). Springer Nature, Singapore, pp 227–236 4. Mitra A, Saha S (2021) An investigation for cellular automata-based lightweight data security model towards possible uses in fog networks. In: Examining the impact of deep learning and IoT on multi-industry applications. IGI Global, pp 209–226 5. Khakimov, Muthanna A, Muthanna MSA (2018) Study of fog computing structure. In: IEEE conference of Russian young researchers in electrical and electronic engineering (EIConRus). IEEE, pp 51–54 6. Bonomi F, Milito R, Zhu J, Addepalli S (2012) Fog computing and its role in the internet of things. In: Proceedings of the first edition of the MCC workshop on mobile cloud computing, pp 13–16 7. Chiang M, Zhang T (2016) Fog and IoT: an overview of research opportunities. IEEE Internet Things J 3(6):854–864 8. Shen J, Yang H, Wang A, Zhou T, Wang C (2019) Lightweight authentication and matrixbased key agreement scheme for healthcare in fog computing. Peer-to-Peer Network Appl 12(4):924–933 9. Shahidinejad, Ghobaei-Arani M, Souri A, Shojafar M, Kumari S (2021) Light-edge: a lightweight authentication protocol for IoT devices in an edge-cloud environment. IEEE Consumer Electron Mag 11(2):57–63 10. Verma U, Bhardwaj D (2020) Design of lightweight authentication protocol for fog enabled internet of things—a centralized authentication framework. Int J Commun Netw Inf Secur 12(2):162–167 11. Murugesan, Saminathan B, Al-Turjman F, Kumar RL (2021) A lightweight authentication and secure data access between fog and IoT user. Int J Electron Bus 16(1):77–87 12. Singh S, Chaurasiya VK (2021) Mutual authentication scheme of IoT devices in fog computing environment. Clust Comput 24(3):1643–1657 13. Lee JY, Lin WC, Huang YH (2014) A lightweight authentication protocol for internet of things. In: International Symposium on Next-Generation Electronics (ISNE). IEEE, pp 1–2 14. Mantas EG, Matischek R, Saghezchi FB, Rodriguez J, Bicaku A, Maksuti S, Bastos J (2017) A lightweight authentication mechanism for M2M communications in industrial IoT environment. IEEE Internet Things J 6(1):288–296 15. Das K, Kalam S, Sahar N, Sinha D (2020) LASF—a lightweight authentication scheme for fog-assisted IoT network. In: International conference on modelling, simulation and intelligent computing. Springer, Singapore, pp 246–254 16. Pilgrim M, Willison S (2009) Dive into python 3, vol 2. Apress, New York, NY, USA
Face Mask Detection Using YOLOv3 Devesh Singh, Himanshu Kumar, and Shweta Meena
Abstract The COVID-19 is an unprecedented crisis that has resulted in several security issues and large number of casualties. People frequently use masks to protect themselves against the transmission of coronavirus. In view of the fact that specific aspects of the face are obscured, facial identification becomes extremely difficult. During the ongoing coronavirus pandemic, researchers’ primary focus has been to come up with suggestions for dealing with the problem through rapid and efficient solutions, as mask detection is required in the current scenario, whether in public or in some institutions such as offices and other workplaces. Only detecting whether a person wears mask or not is not enough. There is another aspect of wearing the mask properly such that it covers all the required portion of the face to ensure there is no exposure to any viruses. To address this, we proposed a reliable technique based on image classification and object localization, which can be accomplished using YOLO v3’s object detection in machine learning. Keywords Face mask detection · Object detection · YOLO v3
1 Introduction From 2020, the world is facing new virus called COVID-19, which can easily transmit through air and physical contact. The virus has already caused a pandemic which has restricted people to their house and distancing themselves from others. As the virus is transmitting rapidly in all parts of world, several prevention measures are to be taken. Wearing a mask is one of the essential preventive measures which one should follow at any cost. This makes mask detection necessary in current scenario, whether in public or in some institutions like in offices and other workplaces. Various research and studies have been carried out in the past in order to solve the problem of identifying whether a person is wearing mask or not using image classification [1–4] and object detection [5–8] techniques. But every other model D. Singh (B) · H. Kumar · S. Meena Department of Software Engineering, Delhi Technological University, New Delhi 110042, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_55
791
792
D. Singh et al.
Fig. 1 Process of object detection
does not focus on the importance of wearing mask in a proper manner, i.e., complete cover of your mouth as well as your nose by the mask. This research paper focuses on the importance of wearing mask in a proper manner so as to ensure there is no transmission of virus which directly help in getting rid of corona or any other upcoming diseases. To achieve this, we stated three labels that are mask, without mask and bad mask instead of only mask and without mask. Bad mask refers to a person who wears mask improperly like mask only covering the mouth but not nose. To classify and locate the mentioned labels in an image, object detection will be an efficient approach as it is a methodology which allows us to locate and identify objects in a video or image. It includes two sub tasks: target localization and classification. This approach is being implemented using YOLOv3 [9] to detect the objects (face with mask, face without mask, and face with improper positioning of mask) in real time. This study will find application in some security unlock systems in laboratories, hospitals, institutions and other public places. YOLO [10] stands for “You Only Look Once”. It involves a single deep convolutional neural network. It is accurate and extremely fast as compared to other object detection model and performs real time detection which is most important in current scenario. Compared to the previous version of YOLO, YOLOv3 introduces stronger feature extractor network and multi-scale detection, and some changes in the loss function with the addition of some extra layers. For better understanding whole model is divided into two parts: feature extractor and detector (see Fig. 1). Feature extraction can be done through single convolutional neural network at three different scales and these features further go to detector where image is divided into regions and predicts bounding boxes and probabilities of each region. On the basis of probabilities, class information and final detection are obtained. Mean average precision (mAP) observed at 0.5 IOU YOLOv3 is on level with Focal Loss but is about four times faster as represented in Fig. 2. YOLOv3 is highly accurate and extremely fast as compared to other object detection models when trained on COCO dataset [10]. Remaining course of the paper is laid out as mentioned. Section 2 examines past researches and works related to object detection and face mask detection. Section 3 contains detailed description of the dataset and the model. Section 4 highlights the proposed model and its architecture in detail. Section 5 analyses the model and experimental results, and Sect. 6 illustrates the future scope of improvement and conclusion.
Face Mask Detection Using YOLOv3
793
Fig. 2 Comparison between RetinaNet-101, RetinaNet-50, YOLOv3
2 Related Work In this section, a brief discussion about the existing studies regarding face mask detection systems has been discussed. Due to the rapid spread of COVID-19 over the world, wearing a face mask became an essential task which makes face mask detection system important to get control over this virus and also it is important in many laboratories, hospitals, and workplaces to have a check over people to ensure health and safety. Various contributions have been made in this regard using image classification technique such as Abdulmajeed et al. [1] constructed a software tool to detect faces with masks and without the mask and it was implemented in two phases—feature extraction using Histogram Orientation Gradient (HOG) and classification using Support Vector Machine (SVM). This model was trained over Real-world Masked Face Dataset (RWMFD), Simulated Masked Face Recognition Dataset (SMFDB), Masked Face Recognition Dataset (MFRD), and Masked Faces (MAFA) with the accuracy of 97.00%, 100.0%, 97.50%, and 95.0%. Although accuracy is acceptable but dataset in each case is very small. If the dataset is increased then the outcome may not be desirable. Similarly, Catindoy [2] used the same machine learning techniques HOG and SVM but they focuses more on the proper and improper wearing of a mask and achieved an accuracy of 99% for 30 epochs of training. Although it ensures proper wearing of a mask but lacked the detection of faces without mask. A CNN based model has been described by Md. Shahriar et al. [3] which works on three levels preprocessing the images, cropping the images and classification of the images. where leaky ReLU was used as an activation function and softmax layer as an output layer and it provides an accuracy of 0.98 where validation loss and validation accuracy were found out to be 0.85 and 0.96. Similarly, Kaur et al. [4] also proposed a CNN model and introduced a feature for detection where resulting images come out with a box across it with an accuracy rate. But in their study, parameters and performance measures are missing.
794
D. Singh et al.
Limitations from image classification headed the research to a more reliable approach of object detection to identify and localize the faces. Two models were put forward by Addagarla et al. [5] one using YOLOv3 and the other using combination of Resnet, ssd, and nasnet mobile algorithms. On comparing both the models, combination one performed better than yolo v3 with the recall rate of 98% and YOLOv3 model achieved recall rate of 91.7% in real time video. A customized model has been proposed by Peishu et al. [6] in which modified Res2Net structure was used for feature extracting and EnPAN was used, which includes YOLO residual block, SPP, Coord Conv, and SE Layer blocks for the detection process. All these layers involved in EnPAN enhanced the detection which results in mAP of 0.575. Although all performance measures mAP, AP50, AP75, mAR, and AR50 are achieved well but the complexity of the model is high. Another approach using faster RCNN with inception Resnet V2 for detection has been proposed by Razavi et al. [7] and it provides an accuracy of 99.8%. Cheng et al. [8] proposed a lightweight model using YOLOv3 tiny. On comparing with others, it is fast and works perfectly in real time but the accuracy achieved is too noisy. Exact numbers are not mentioned. From the above discussion, it is observed that both image classification and object detection can work for face mask detection but object detection would be a better option in real time more accurately and fast than the image classification model. In most of the studies, the dataset is very limited [1, 5, 8], only focuses on the face with a mask and without a mask not including a proper way of wearing a mask [1, 3, 4], and also not working on performance measures other than accuracy [4]. A model with high accuracy is complex and customized models have better accuracy [7].
3 Experimental Design The detailed dataset description and dataset preparation along with various variables, performance measures and methods used for training are discussed in this section.
3.1 Dataset Description Face mask dataset contains 3416 images. It is formed by collecting some images from RMFD [11], FMD [12] and IMFD [13] datasets. In which we have detected 4984 no. of faces (Fig. 3). The dataset has three classes: • Mask (1784 faces) • Without mask (1655 faces) • Bad mask (1545 faces).
Face Mask Detection Using YOLOv3
795
Fig. 3 Sample images
3.2 Data Annotation Data annotation is the process in which we label the images with their respective classes and form bounding boxes around the targeted objects and then label them according to their respective classes. We have used Visual Object Tagging Tool (VoTT) [14] software to create annotations (see Fig. 4). Annotation file consists of the names of images, “x_min, y_min, x_max, y_max” (bounding box) and class label. The bounding box for visualization purposes is shown in Fig. 5.
Fig. 4 Microsoft/VoTT
796
D. Singh et al.
Fig. 5 Bounding box
In the dataset of 3416 images, 1784 tags belongs to mask, 1655 tags belongs to without mask and 1545 tags belongs to bad mask classes and these annotations are in csv format. We rescaled all our images in 416 × 416 while training and detecting.
3.3 Independent and Dependent Variables Independent variables in our model are number of images (3416), number of annotations, orientation of required label, channels in the images (3 initially). Dependent variables are feature map after each convolution layer, class confidence (probability of each class after each scale), coordinates of bounding boxes, objectness score (possibility of an object to be inside a bounding box).
3.4 Performance Measures Performance measures were used to check the correctness of our model. The performance measures that we have used are average precision and recall for the three classes for each confidence threshold varying from 0 to 99. Precision, Recall, mAP, and F1-score are also calculated [15].
3.5 Validation Method Train/Test Split method for model validation by splitting our dataset into two parts; one for training the dataset and the other for validating the model. The training and validation set which contains a total of 3075 images, was divided into 9:1 where 90% of the images belongs to the training set and the rest 10% of the images belongs to the validation set.
Face Mask Detection Using YOLOv3
797
4 Model Architecture YOLOv3 model. The first step is to build an input layer which takes a batch of 128 images as input. The shape of the image is 416 × 416. Initially, out of 252 layers [16], 249 layers are frozen and the training will be done on 51 epochs. After this, all the layers are unfrozen and training will be started again for the same epochs. The input is transferred to Convolution Layer. This layer is used to create convolution kernel which is convolution matrix that can be used for embossing, blurring, sharpening, edge detection, etc. by doing a convolution between a kernel and an image. Then it will pass to batch normalization layer (BatchNormalization) where input is transformed so that they are standardized, which means having mean of 0 and standard deviation of 1. Table 1 illustrate layer distribution. Leaky ReLU layer is used as an activation function (see Fig. 6). For negative values, it has a small slope instead of only zero. It not only fixes the dying ReLU problem but also helps in speeding up the training process [17]. After the 1st convolution layer, the image is passed to the 2nd convolution layer. The same procedure followed in the previous convolutional layer is repeated. Feature map is obtained which will further pass to detection layer. In this YOLOv3 network, detection is done at three different scales on feature maps of three different sizes, having strides 32, 16, 8, respectively. Input image of size 416 × 416 can be scaled into 13 × 13, 26 × 26, and 52 × 52 at three different scales (see Fig. 7). These detections are performed at 2d convolutional layer 58, 66, 74 [9]. Before sending the image to the detection layer at each scale, upsampling layer is applied. The output comes from first detection layer gets sent into convolution network for second detection which aids in preventing loss of low level features. With the help of concatenation layer, output from first detection layer and second detection layer gets concatenated. Same procedure is repeated for further detection layer. In this, we are using 9 Anchor boxes (pre-defined bounding boxes which helps in prediction of bounding boxes) 3 for each scale. At each scale, prediction of 3 bounding boxes per cell with Table 1 Layers distribution
Layer InputLayer
# 1
Conv2D
75
BatchNormalization
72
LeakyReLU
72
ZeroPadding2D Add
5 23
UpSampling2D
2
Concatenate
2
798
D. Singh et al.
Fig. 6 Leaky ReLU activation function
Fig. 7 Detection process of YOLOv3
the help of three anchor boxes. Bounding box with higher intersection over union (IOU) with ground truth is considered. IOU is defined as the ratio of intersection area over the area of union of bounding box predicted by the model and the actual bounding box (see Fig. 8). The 13 × 13 layer will take care of identifying bigger things, while the 52 × 52 layer will take care of detecting smaller objects whereas the 26 × 26 layer will take care of detecting medium-sized objects. The shape of the detection kernel can be formulated as: 1 × 1 × (B × (5 + C))
(1)
Face Mask Detection Using YOLOv3
799
Fig. 8 Intersection over Union (IoU)
where B is the number of bounding boxes can be predicted by that single cell. The number “5” represents one object confidence and 4 bounding box attributes the number of classes will be determined by C. In this model, B = 3, C = 3. Therefore, shape of detection kernel, using Eq. 1, will be 1 × 1 × 24. While prediction class confidence is calculated and identified object within bounding box is considered to belong to the class with higher probability. During the training process, parameters of the model is adjusting continuously, the value of loss function is optimized to minimum. In this, last layer of the model participates in computation of loss. Loss function “yolo_loss” encapsulates the loss layer of the custom Lambda. We have used the Adam optimizer with a learning rate of 0.001 and it will further reduce to a factor of 0.1 after every 3 epochs if loss is constant. Parameters and Hyperparameters. Parameters used are weights and biases in our model for training. The total trainable parameters to be calculated are 61,529,119 (see Fig. 9). Hyperparameters such as coefficient of learning rate, number of epochs, kernel size, batch size, activation function, number of layers in architecture. The values of the mentioned hyper parameters are the number of epochs as 51, number of layers as 252, batch as 128 and 16, and Leaky ReLU as the activation function.
5 Result The model was trained twice on 51 epochs with a batch size of 128 and 16, respectively. Initially, 249 layers out of 252 layers were frozen and then unfroze all the layers and trained again. The stopping condition for the training was set to monitor the validation loss. The loss calculated on both training and validation are 14.3942 and 16.1282 (see Fig. 10). The model was fed with 341 images as part of its testing. The bounding boxes were predicted for the test images and were then used for evaluation. These predicted bounding boxes were saved as detections and were then compared with the groundtruth values of those images (Fig. 11).
800
D. Singh et al.
Fig. 9 YOLOv3 architecture snapshot
IoU was calculated using the detections and groundtruths. IoU value formed the base for determining the true positives, true negatives, false positive, and false negatives. Precision and Recall values were calculated for each class on confidence threshold starting from 0.01 and going upto 0.99. The curves for Precision-Recall are plotted, for confidence threshold from 0.01 to 0.99, in Figs. 12, 13, and 14. The larger the area under this curve, the higher are the precision and recalls [18]. The average Precision for “mask” class was calculated to be 0.94 and the recall for the same was found to be 0.95. The average Precision of “without_mask” was 0.93 and recall was found to be 0.96. The Precision-Recall curve for the “bad_mask” class has considerably higher area under the curve than “without_mask” and hence its average precision was calculated to be 0.95 and Recall was 0.98 (refer Table 2).
Face Mask Detection Using YOLOv3
801
Fig. 10 Loss—no. of epochs curve on training and validation dataset
Fig. 11 Predicted bounding boxes for the three classes with respective confidence scores
After calculation, Precision was found to be 0.93. This means if there were 100 predictions then 93 were correct. Recall was 0.95 that means if there were 100 masks in the same image, it found 95 masks (refer Table 3). During analysis, the model was found to be detecting higher number of bad mask objects in a single image (95 out of 100) than mask objects in a single image (94 out of 100). The model registered lowest number of objects in a single image for without mask class (93 out of 100). The slight difference between the precision and recalls of the three classes were due to the quality of the dataset for different classes and it also accounts to the difference of the features between the classes of mask, bad mask, and without mask.
802
D. Singh et al.
Fig. 12 Precision-recall curve for mask class
Fig. 13 Precision-recall curve for without mask class
The mAP then calculated was 0.93 which is above par with the original YOLOv3 models when trained to COCO dataset, which was around 0.51–0.58 [10].
Face Mask Detection Using YOLOv3
803
Fig. 14 Precision-recall curve for bad mask class
Table 2 Evaluation metrics
Table 3 Results
Mask
Without mask
Bad mask
Av. precision
0.94
0.93
0.95
Recall
0.95
0.96
0.98
Metrics
Value
Precision
0.93
Recall
0.95
F1-score
0.94
mAP
0.93
6 Conclusion and Future Scope In this research, an approach based on the object detection model by YOLOv3 for recognizing face with mask, without mask and improper way of putting mask has been proposed. This study makes use of recent and popular deep learning algorithms and computer vision. YOLO v3 that is used in this model is most widely used as object detection technique. Microsoft VoTT was used to build a custom dataset. The three classes formed were faces with mask, without mask and bad mask that means improper way of putting mask. Use of Adam optimizer fastened the learning process, so that the training part was swift. Changing the batch size from 128 to 16 in the second round of training turned out to be a successful move as the loss drastically reduced and
804
D. Singh et al.
became constant. Precision and recall come out to be well for this model, i.e., 0.93 and 0.95 irrespective of the small dataset. To further improve the accuracy and other performance measures model must be trained on large dataset. Size of dataset can be increased through data augmentation technique. In future, we can work to extend the study further and make detections in a video clip so that it can work as a real time detection model. After this it can detect the faces with mask, without mask or bad mask on the videos as well as on webcam with the high precision. This will serve as a software solution for security unlock systems, automatic violators detection system, and others.
References 1. Abdulmajeed AA, Tawfeeq TM, Al-jawaherry MA (2021) Constructing a software tool for detecting face mask-wearing by machine learning. Baghdad Sci J 19(3):6–42 2. Catindoy LJ (2022) Ensure the proper wearing of face masks using machine learning to fight Covid-19 Virus. In: Proceedings of the international halal science and technology conference 2022, IHSATEC, vol 14, no 1, pp 79–83 3. Islam MS, Moon EH, Shaikat MA, Alam MJ (2020) A novel approach to detect face mask using CNN. In: 3rd international conference on intelligent sustainable systems 2020, ICISS, pp 800–806 4. Kaur G, Sinha R, Tiwari PK, Yadav SK, Pandey P, Raj R, Vashisth A, Rakhra M (2022) Face mask recognition system using CNN model. Eurosci Inf 2(3):2772–5286 5. Addagarla SK, Chakravarthi GK, Anitha P (2020) Real time multi-scale facial mask detection and classification using deep transfer learning techniques. Int J Adv Trends Comput Sci Eng 9(4):4402–4408 6. Wu P, Li H, Zeng N, Li F (2022) FMD-Yolo: an efficient face mask detection method for COVID-19 prevention and control in public. Image Vis Comput 117:0262–8856 7. Razavi M, Alikhani H, Janfaza V (2022) An automatic system to monitor the physical distance and face mask wearing of construction workers in COVID-19 pandemic. SN Comput Sci 3(27) 8. Cheng G, Li S, Zhang Y, Zhou R (2020) A mask detection system based on Yolov3-Tiny. Front Soc Sci Technol 2(11):33–41 9. How to Implement a YOLO (v3) Object Detector from Scratch in PyTorch: Part 1, https:// www.kdnuggets.com/2018/05/implement-yolo-v3-object-detector-pytorch-part-1.html. Last accessed 26 Dec 2021 10. Yolo: Real-time object detection, https://pjreddie.com/darknet/yolo/. Last accessed 26 Nov 2021 11. Real-World-Masked-Face-Dataset (RMFD), https://github.com/X-zhangyang/Real-WorldMasked-Face-Dataset. Last accessed 4 April 2021 12. Face Mask Detection Kaggle, https://www.kaggle.com/andrewmvd/face-mask-detection. Last accessed 4 April 2021 13. Incorrectly Masked Face Dataset(IMFD), https://github.com/cabani/MaskedFace-Net. Last accessed 4 April 2021 14. GitHub—microsoft/VoTT: Visual Object Tagging Tool. https://github.com/microsoft/VoTT. Last accessed 7 Dec 2021 15. An introduction to evaluation metrics for object detection, https://blog.zenggyu.com/en/post/ 2018-12-16/an-introduction-to-evaluation-metrics-for-object-detection/. Last accessed 22 Jan 2022 16. Target detection algorithm YOLOV3, https://programmerall.com/article/74231497354/. Last accessed 27 Dec 2021
Face Mask Detection Using YOLOv3
805
17. A Practical Guide to ReLU, https://medium.com/@danqing/a-practical-guide-to-relu-b83ca8 04f1f7. Last accessed 22 Jan 2022 18. Precision-Recall—scikit-learn 1.0.2 documentation, https://scikit-learn.org/stable/auto_exam ples/model_selection/plot_precision_recall.html. Last accessed 22 Jan 2022
Review on Facial Recognition System: Past, Present, and Future Manu Shree, Amita Dev, and A. K. Mohapatra
Abstract Facial recognition is one of the most efficient mechanisms to identify an individual by using facial attributes. Artificial Intelligence has been widely used to recognize facial attributes in different circumstances. Most facial recognition systems can analyze and compare various attributes of facial patterns to verify the identity of the person efficiently. Facial Recognition is the component of a biometric identification system. There have been many Deep learning algorithms proposed to extract the facial features, like Eigen Fisher faces which extract principal components and separate one from other. In the present scenario due to pandemics, people need to wear a mask which comes as a challenge as some of the facial features are not visible. However, the application of DNN helps to identify a person with the desired accuracy. If a person is wearing a mask or eyeglasses, in this case very few facial features are visible like eyebrows forehead, and shape. Many technologies help to identify the images, based on their features, color, shape, pose variation, expression changes, 2-Dimensional images, 3-Dimensional Images, RGB, and black and white images. In this paper, we present various Feature extraction and classification techniques and also compared the accuracy level of various facial recognitions methods on different databases. Keywords Feature extraction methods · Holistic and local approaches · Classification methods · Support vector machine (SVM) · Convolutional neural network (CNN) · AdaBoost · Neural networks
M. Shree (B) · A. Dev · A. K. Mohapatra Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India e-mail: [email protected] A. Dev e-mail: [email protected] A. K. Mohapatra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_56
807
808
M. Shree et al.
1 Introduction A semi-automated facial recognition system was proposed in 1960 [1]. During that period facial recognition is a very complex task and face geometry was used for facial recognition. The face is the most important component of the human body. In different Literature reviews, we can quote that face can also speak with the different expressions, which express emotions in words. We live in a society where the face is the identity of the individuals, in most of the corporate sector face is used as an authentication key [2]. It is progressively more trending, safe, reliable, and secure technology. In the current scenario, it is effectively used by many private and government organizations for the reason of its powerful security, reliability, and privacy [3]. Facial recognition includes detection of facial attributes, face angle, recognition of identity, preprocessing of the images, etc. The face detection algorithm is used to locate the X, Y coordinates in a 2-Dimensional image and X, Y, Z coordinate in a 3Dimensional image. This is the process of determining whether the area of a candidate is a face or not. The facial coordinate can be a square, rectangular, triangle, etc. [4]. It has been one of the most important areas of biometric traits, which is having many applications like data security, medical imaging system, marketing and finance, image filtration, human–computer interaction, and forensics [5]. It is very challenging to identify the covered face, eyeglass, head pose, illumination, age, face with plastic surgery, infrared images, angled face images, partial visibility, mole and birth marked, 3-Dimensional and 2-Dimensional images, Injuries on the face, Underwater images, Foggy image, Black and White images, and facial expression related variations. Sometimes the changes in appearance are because of make-up, facial hair, or any cosmetic accessories worn by the man and woman. There is very less chance of similar face identification.
2 Literature Survey Facial Recognition is a broad area of image-based study. There have been many researchers who have been researched in diverse area of facial recognition. The face contains many small elements like nose, eyebrows, iris, and having most of the identification elements based on that authentication can be done. Facial recognition is also used in many sectors like medical, bioengineering entertainment, banking, and forensics. In the present scenario, many face camera filters also have been launched to give a proper shape and enhancement of the face images. In Table 1 study has been summarized from different research study.
Review on Facial Recognition System: Past, Present, and Future
809
3 Facial Recognition Process Facial recognition is a process where we can recognize a face based on the facial property there are many cases where the entire face is not visible then we need to extract the specific area of the face which comprises shape, color, texture. Table 1 Literature review S. No.
Paper titled
Description
Outcome
1
Future of face recognition: a review, Arya et al. [1]
Limitations: The time ago people use Visible Spectrum for FR but it is challenging because of object illumination, many pose variations, facial expression, although as per the study there is no method has been proposed for IR-based FR technique
In this paper, the researcher focuses on the Infrared Spectrum (IRS) in humans. For facial Recognition, two methods have been discussed 1. Persistent Physiological Features 2. Multi-Model Fusion (VS and IRS)
2
A deep facial recognition system using computational intelligent algorithms, Salama Abd ELminaam et al. [2]
Limitations: Variation in facial expressions, wearing accessories like hats, glasses, and changes of brightness levels Datasets: SDUMLA-HMT For image classification have used DT 94%, SVM 97%, KNN 96%, D Adaptive CNN 99% Accuracy
This study has comprises three-step to recognize the face the first step preprocessing where the human face image has used as an input and in the second step Alexnet is used to extract the feature the third step include the identification of feature vector FOG and cloud computing is used as a face recognition
3
A review of face recognition technology, Li et al. [4]
Limitations: In this paper This paper comprises all the face researcher has been feature extraction and classification suggested that there would be technology a special camera for FR, Which can be used to improve the quality of the image also use for the image filtrations to reconstruct the image
4
Face recognition: Past, present and future, Taskiran et al. [5]
Limitations: As per the paper in facial recognition face has more Limitations in video-based identification Datasets: MoBo and Honda dataset, accuracy 100%
In this paper, many methodologies have been discussed regarding feature extraction like PCA LDA, and focused on Video-based and image-based facial recognition method. It is a broad review study about many data set and methods of facial recognition (continued)
810
M. Shree et al.
Table 1 (continued) S. No.
Paper titled
Description
Outcome
5
An efficient approach for face recognition based on common eigenvalues, Gaidhane [6]
Limitations: To evaluate the scatter matrix is comparatively a tough task. Because of its big size and less number of images for training Datasets: ORL, Yale datasets B and the FERET datasets with PCA, LDA-based, ERE, and DCV methods
Image matrix is used as an approach also time taken for FR is comparatively less than the PCA, LDA-ERE, and DCV methods
6
A novel periocular biometrics solution for authentication during the Covid-19 pandemic situation, Kumari et al. [7]
Limitations: The main challenge is the use of a face mask Datasets: Orl, UBIPr and Probabilistic Deformation Model (PDM) and M-Sift on FOCS andUBIPr dataset. Recognition accuracy rate 84.14% and 78.59% for left and right eye, respectively, for UBIPr dataset
A fusion Approach is used for a handcrafted and non-handcrafted descriptor for feature extraction and classification
7
Automatic 3D face detection, Normalization and Recognition, Mian et al. [8]
Limitations: Distance The proposed approaches are to between extracted features in identify the face nose tip and correct the 3D picture to identify the pose using the Hoteling transform Datasets: use FRGC v2.0 dataset
8
Human face Limitations: Self store recognition, database of images are used Sabharwal et al. [9] which are saved in.pgm format
In this Paper SIFT is used to extract the feature of a face to recognize
9
A theory of semantic information, Zhong [10]
Limitations: Semantic information study about the different signs
This paper has a different concept of semantic information, a dualistic notion of signs
10
Support vector networks, Corinna Cortes, Cortes et al. [11]
Datasets: Postal Service database NIST
In this paper SVM and uses of SVM define with the hyperplane and soft margin hyperplane has been formed on two data sets
11
Handcrafted versus Datasets: PS, VI, CH, SM, non-handcrafted BR features for For comparison computer vision classification, Loris Nanni et al. [12]
This paper has been used the handcrafted and non-handcrafted methods for implementing computer vision systems. CBP PCAN and CNN are used as non-handcrafted
(continued)
Review on Facial Recognition System: Past, Present, and Future
811
Table 1 (continued) S. No.
Paper titled
Description
Outcome
12
Past, present, and future of face recognition: a review, Insaf et al. [13]
Datasets: ORL, FERET, AR, BANCA, LFW, FRG Gabor filters, and many other CNN-based technique is proposed
This paper highlighted the area of facial recognition in 3D sector and 2D image need constant alignment and if not achieved it would not capture properly
13
Medoid-based model for face recognition using Eigen and fisher faces, Bhat et al. [14]
Limitations: In this paper, medoid is used instead of mean as a statistic in calculating the Eigen’s faces and Fisher’s faces Datasets:: JAFFE database
Eigen Face, Fisher Face, PCA, LDA, medoid
14
Semantic image analysis for intelligent image retrieval, Khodaskar et al. [15]
Datasets: 3D + t datasets are In this paper, a knowledge-driven used to analyze the image approach is used for image retrieval
15
Spontaneous facial expression recognition: a part-based approach, Perveen et al. [16]
Datasets: is used AVEC Deep CNN-based technology is used 2011 or 2012, CK and CK + to for feature extraction and SVM is , MMI. with the 75% used as a classification technique recognition rate
16
Research on face recognition based on deep learning, Xiao et al. [17]
Datasets: DeepID 97.45% DeepID2 99.15% DeepID2 + 99.47%
Zero-phase Component Analysis for whitening the pre-trained data and Linear decoder to extract the local features of the face image where the convolution kernel convert the facial images and characteristics of the image
17
Image-based face detection and recognition state of the art Ahmad et al. [18]
Datasets: A: Face Recognition Data, University of Essex B: Psychological Image Collection at Stirling (PICS)
For face detection SVM is used with HOG and for classification of the image AdaBoost is used with LBP, the author has to use LBP histogram for a more enhanced feature which is in the form of j = x y I ( f l(x, y) = i)I ((x, y) ∈ T j) Where i = 0, …, L − 1; j = 0, …, N – 1 feature which has been extracted describe the local texture n = and the global shape (continued)
812
M. Shree et al.
Table 1 (continued) S. No.
Paper titled
Description
Outcome
18
Face detection and recognition using machine learning algorithm, Ranjan et al. [19]
Limitations: Real-time dataset is used where various contrast gradients level has been modified for test data which increase the time complexity
Histogram oriented gradient is used to detect the image and manage the gradient contrast gradients are being calculated with the help of image block and then SVM is used to classify the image based on received images this has been tested on some real-time images
19
Human face Challages: 2D and 2D-based A comprehensive study on various detection system techniques can be facial recognition techniques techniques: A elaborated comprehensive review and future research directions, Hasan et al. [20]
20
Intelligent surveillance camera using PCA [21]
Limitations: Only PCA and LDA-based methods are developed much better classifiers can be used such as SVM, AdaBoost
A study based on the PCA and LDA techniques
An individual face is having many properties like eyebrows, periocular region, iris, retina by which the face can be identified. The process of Facial Recognition is shown in Fig. 1. Face Recognition is a process-based technique to identify the human face. Facial recognition is a process-based technique as shown in Fig. 1. A human face is having many hidden features and also has some feature which is easily visible and identifiable. The facial Recognition Process can be defined in the following steps.
3.1 Detect the Face Face detection is the major task in the FR system where the system needs to identify the exact face image from different objects, in most of the scenario we are surrounded by many objects like trees chairs, etc., in this case, detection of the face is difficult because there are many shapes which look alike oval and round so need to differentiate the identify for facial feature extraction. Detection of the face is the first process in the FR system. Deep learning-based algorithm can be used for detecting the face.
Review on Facial Recognition System: Past, Present, and Future
813
Fig. 1 Facial recognition process (Self-compiled by the author)
3.2 Face Retrieval Once the face gets detected we need to retrieve only the faces object from the image other objects need to discard for the identification process. When retrieval of face object is done then need to recognize the face based on the facial feature like eye, shape, color.
3.3 Feature Extraction Feature extraction is the most important step in the facial recognition process which is used to extract the facial feature of the face. Human’s face is having multiple features which used to be recognizing the identity of the person. Retina, periocular region, shape, color are some common features, which we can extract to recognize the facial id. Handcrafted and non-handcrafted techniques are being used for feature extraction.
814
M. Shree et al.
3.4 Feature Classification We need to classify the feature which has been extracted in the previous phase we can classify the feature by using SVM, SVM is a supervised learning algorithm that is the most powerful classification algorithm. Its finds the maximum marginal hyperplane which is used to divide the dataset into classes.
3.5 Feature Matching Now we are in the last phase where we need to match the extracted feature with the stored database. This phase will come with the outcome of the algorithm applied. For a more accurate and correct result, we need to apply the process in different image databases. In many research specific database has been used as per the facial feature. If recognition is based on a pericular region so UBIPr database can be used.
4 Facial Recognition Methodologies Facial recognition is a broad area and there have been many researchers who have been researched in the diverse sector with a different point of view and by using different algorithms.
4.1 Eigen Faces and Fisher Face Methodology Eigen faces and fisher face methodology [6], proposed a method to find out common eigenvalues of matrices that work on the lower Hessenberg matrix reduction method, and it is tough to calculate the lower Hessenberg matrices with given conditions. Consequently, in this perception companion matrix is suggested to beat the complexity of computational problems. Based on the companion matrix and common eigenvalues approach, a method can be proposed that used to be simple and efficient for Facial Recognition [6]. The basic use of Eigenface algorithms is to extract features where PCA is used to combine with a facial attribute by using the K-Nearest Neighbor algorithm [4]. Eigenfaces are implemented with the help of PCA. By using this technique we can identify a face with reasonable precision. However, we can improve the performance by using LDA instead of PCA which generates Fisher faces that are analogous to Eigenfaces generated by the PCA method [14].
Review on Facial Recognition System: Past, Present, and Future
815
4.2 Principal Component Analysis (PCA) Principal Component Analysis (PCA) works based on multi-variant data analysis which is based on projection methods. In facial recognition, PCA [4] is used as a feature extraction method. It is used to pre-process data set before any analysis of the image. It removes repetitive information and noise, also stores the required property of image-based data, in some cases image dimension can also be reduced, and enhance the processing speed, and decrease time and cost. Statistical [19] PCA is now used to provide a higher recognition rate and is easy to compute it relies on the scale variation techniques and linear assumptions.
4.3 Linear Discriminant Analysis (LDA) It is a dimensionality reduction technique that is used for supervised classification. But in some cases, PCA works much better where the number of samples for each class is less. Although LDA works better with vast datasets with having more than one classes.
4.4 CNN Based Model A real face recognition approach has been proposed by [20] using CNN. Where the images are used as input data and CNN based model will detect the face firstly, this model train the image based on the feature and then will go for the testing. In this study, the [INPUT–CONV–RELU–POOL–FC] this approach is being used where INPUT is the database set from where the images will be extracted. CONV is used to filter the images. RELU activate the element and assign 0 to the hidden unit and POOL is being used for down-sampling and dimension reduction In FC Layer monitors the received input. An experiment is done on the AT&T database by using the proposed CNN based model which has a 98.75% accuracy level. A CNN-based model has been used [21] which call as MTCNN it is having 3 cascade networks and work on the candidate box plus classifier. This model is trained for 50 faces and uses a PSNR peak signal to noise ratio.
4.5 Open CV-Based Model Open CV is used as [22] methodology for detecting the face it extracts the face Haar feature in a large sample setting, most of the face detection algorithms blur the background which is used to improve the accuracy. A mathematical equation is used
816
M. Shree et al.
to work on the feature extraction of the face. m−1 w+1 m + + · · · + + 1 = m |w,h| w w w m−1 h+1 m + + ··· + +1 · h h h where W is the width and height of the rectangular features We can see that the number of features is large, The Open CV has the solution of it solves this integral image method. Suppose f is an arbitrary image and g is an integral image of the image. Then, the pixels value A(x, y) can be defined as: g(x, y) =
n
f x , y
x ≤x,y ≤y
For the experiment, the FDDB dataset is used on an open CV to get simulate the result.
4.6 Computational Intelligent Algorithms There is three-step [23] to recognize the face in the first step preprocessing where human face image has used as an input and in second step ALexnet is used to extract the feature and third step include the identification of feature vector where FOG and cloud computing is used as a face recognition the experiment is done on the Yale and AT &T databases.
4.7 Visual Attention Mechanism Guidance Model in Unrestricted Posture This proposed method was trained on [24] the LFW training set, and cross-data set experiments were performed on the CMUFD database. In this paper a model is proposed which is based on PyTorch Deep learning and uses the ResNet50 network.
4.8 Deep Learning A Deep learning-based model is being used [17] to extract the local feature it uses Zero-phase Component Analysis for whitening the pre-trained data and a linear decoder to extract the local features of the face image where the convolution kernel
Review on Facial Recognition System: Past, Present, and Future
817
convert the facial images and characteristics of the image. Also, they used the softmax classifier to classify the convolutional characteristics to recognize the face. The experiment is done on the DeepID dataset with 97.45% accuracy, DeepID2 dataset with 99.15%, DeepID2+ and data set with 99.47% accuracy.
5 Facial Recognition Approaches Facial Recognition can be categorized into two parts, [8] Holistic approach and Local approach Holistic approach uses global information for Facial Recognition. Faces are having a small number of features, which can be directly derived from the pixel information of any face image. Holistic methods use the entire face as input. In this method, the face image can be seen as a signal high-dimensional vector by combining the gray values of all pixels in the face. In local approaches, a set of individual points or regions of the face images is mostly involved and the classification patterns are obtained from the restricted regions on the face image Fig. 2 shows the PCA LDA is part of the Holistic Approach.
5.1 Comparison of Facial Recognition Approaches A large number of studies have been proposed in the area of automatic facial recognition [9]. It is based on two approaches holistic approach and the local approach as shown in Tables 2 and 3. Each method is having some pros in some favorable environments and also has some cons against the environment.
6 Feature Extraction Methodology The process of converting the pixels into some higher level [1] representation of shape, texture motion, color is called feature extraction by which we will be able to get the best information about the image or pattern of the image. Periocular biometric [7], is a small region near the eye area which does not require any physical touch to get identified, like in most corporate sector fingerprint is still used to get identify the person or in some cases, the criminal activities also get identified by the biometric but there is a chance of discrepancy if people are wearing face masks and we can identify them by using periocular region then it would be the great approach. As this has been tested on ORL dataset with HOG and Binary SVM Classifier which has 90% accuracy, ORL dataset with BOF and Multiclass SVM classifier has 85% Accuracy, ORL dataset with LBP and Binary SVM Classifier with 89% accuracy.
818
M. Shree et al.
Fig. 2 Facial recognition approaches (Self-compiled by the author)
It is concerning to extract the periocular region because it is the smallest area around the eyes. The proposed approach is based on feature fusion which used handcrafted features like HOG, and non-handcrafted features extracted techniques like pre-trained CNN models and extract gender-related features by using a five-layer of CNN model. As there are many feature extraction methods in handcrafted Local descriptors shown in Fig. 1 [9] scale-invariant feature transform (SIFT) is a feature extraction algorithm it was originally used for any object recognition, pattern recognition approaches. Now it is been used for the Facial recognition feature extraction and recognition initially used fuzzy clustering to create clusters so with the help of we can easily set the area for feature detection clustering is the process of dividing the
Review on Facial Recognition System: Past, Present, and Future
819
Table 2 Advantages and disadvantages of holistic approach Approach
Advantage
Disadvantage
Eigenfaces
Simple, fast
This is Sensitive to align the pixel properly and cannot scatter the image variances in some environments
Fisher faces
Maximize the segregations of different identities
This is Sensitive to align the pixel properly, some linear classes cannot adequately show the pose variations
LEM
Simply implemented, no training data and facial element detection are required
Sensitive to the edge distortions caused by various pose variations
DCP
It is comparatively fast, with no training Sensitive to the edge distortions data and facial element detection required caused by various pose variations Sensitive to edge distortions caused by pose variation
Table 3 Advantages and disadvantages of the local approach Approach
Advantage
Disadvantage
Template matching
Pose variation can be easy by using local and simple facial component regions
Sometimes it is very is Sensitive to align pixels properly in sub-image regions, which depends upon the facial component detection
Modular PCA
Pose variation can be easy by using local and simple facial component regions
Sometimes it is very is Sensitive to align pixels properly in sub-image regions, which depends upon the facial component detection
EBGM
Local regions around facial Slow, distortions surrounded by components and Gabor wavelet give local regions were not treated pose tolerance
LBP
A simple, histogram is used in local regions of the face and is Sensitive to align the pixel properly
Image division is problematic when posing variation
data into further but here as image clustering we can determine this as pixels of the image also. Below there are four stages in which the features can be extracted. 1. 2. 3. 4.
Scale-space extrema detection Key point localization Orientation assignment Key point descriptor.
The global descriptor is being used to extract facial features like shape, color, texture. Shape-based descriptors contain some essential semantic information because humans can recognize any objects with their shape. These descriptors are categorized into three types which are regions-based, contour-based, and shapes-based for 2D images and 3D volumes.
820
M. Shree et al.
Table 4 Facial recognition methodology and databases accuracy S. No.
Facial recognition method
1
DCNN algorithm, used for SDUMLA-HMT the FR called as Alexnet [2]
Dataset
Accuracy Accuracy on different classification techniques Classification have used DT 94%, SVM 97%, KNN 96%, D Adaptive CNN 99% Accuracy
2
Deep Face used DNN, and Wild (LFW) dataset Labeled Faces in the Wild (LFW) database [25]
FR rate 97.35%
3
DeepFace, Alexnet uses network architecture with the softmax loss function. The test performance [25]
Facebook dataset
97.35%
4
DeepID2 uses Alexnet as network architecture and contrastive loss [26]
CelebFaces + dataset
99.15%
5
DeepID3 usesVGG-Net10 CelebFaces + dataset instead of Alexnet as the network architecture [27]
99.53%
6
FaceNet used GoogleNet-24 as network architecture along with triplet loss function and Google database for training [28]
LFW dataset
99.63%
7
VGGFace used LFW dataset VGGNet-16 network architecture, triplet loss function and the VGGFace dataset for training [29]
98.95%
8
MS-Celeb-1M dataset for training and congenerous cosine (CoCo) loss function [30]
LFW dataset
99.86%
9
ResNet architecture, cos face loss function and the CASIA-WebFace dataset [31]
CASIA-WebFace dataset
99.33%
10
ResNet-100 network architecture has used arc face loss function and MS-Celeb-1M data set as training data [32]
LFW dataset
99.83%
11
KNN cluster algorithm [33]
LFW, YTF dataset
99.62 and 96.5% (continued)
Review on Facial Recognition System: Past, Present, and Future
821
Table 4 (continued) S. No.
Facial recognition method
Dataset
Accuracy
12
Intraspectrum discrimination and interspectrum correlation analysis deep network (IDICN) [34]
Multi-spectral face image datasets
97–100%
14
Triangular Coil Pattern of Local Radius of Gyration Face (TCPLRGF) method [35]
AR CMU-PI Extended Yale B dataset
100% 98.27% 96.35%
Color-based descriptors is having five tools. The three starting tools show the color distribution and the last ones are for color relation between sequences, and the group of images: 1. 2. 3. 4. 5.
Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of the frame (GoF) or group of pictures (GoP).
Texture based descriptors are being used for image texture and region. It is used to observe the region with a similar impact also called homogeneity and also work on the histograms of the region borders. Texture-based descriptors can be categorized in by Homogeneous texture descriptor (HTD), Texture browsing descriptor (TBD), Edge histogram descriptor (EHD). In non-handcrafted features, there are three [12] approaches that are widely used for feature Extraction the first approach is deep transfer learning features based on convolutional neural networks (CNN), which uses Deep learning, principal component analysis network (PCAN) is a holistic approach, and the compact binary descriptor (CBD). Deep neural networks and convolutional neural networks are also used for layered approaches for facial recognition. Deep learning can train their model and learn the face features, which helps to extract complex features easily, and also can learn the hidden feature of the face. In Semantic Information Image analysis [15] is the process of extracting meaningful information from the image after processing. Image analysis and image processing both are different terms. Some researcher work to identify the 3D face [8] work on Automatic 3D Face Detection, Normalization and Recognition, In this study an approach has been proposed that automatically detect the nose tip and also corrects the pose using the Hoteling transform. SFR Low-Cost Rejection Classifier is also used for 3D images.
822
M. Shree et al.
In [36] They have been using three face recognition algorithms Local Feature Analysis (LFA), Line-based Algorithm (LBA), and Kemal Direct Discriminant Analysis (KDDA) used as fusion algorithm. The accuracy level of the fusion algorithm is 95%.
7 Image Classification Methods The process of categorizing and dividing the pixel’s group or a group of vectors of an image is called image classification. We can work on the image classification in automatic mode or manual mode also, Automatic Image classification is further divided into two categories unsupervised classification and supervised classification. Unsupervised classification is a fully automated process that does not require a training dataset [2]. With the help of an image classification algorithm, we can detect image characteristics systematically at the time of. In the Supervised classification algorithm need to select the training sample data manually and need to assign to pre-selected categories like roads, buildings, waterbody, etc. and it is required to implement statistical measures that need to be applied to the whole image (Fig. 3).
Fig. 3 Image classification method
Review on Facial Recognition System: Past, Present, and Future
823
Fig. 4 Support vector machine
7.1 Support Vector Machine (SVM) In 1995 it was developed by Vapnik and Cortes. It is used for the small sample data set and for the high-dimensional image recognition [11]. For facial recognition, we need to have the face extracted features and SVM is used to find the hyperplane for different faces. The number of the facial features has been extracted is required for the dimensional of the hyperplane, dependent upon the feature extraction. The hyperplane is only a line if the number of input features extracted is two. The number of input features is three, then the hyperplane becomes a 2D plane x, y-axis. If there is a 2D area with more training data. Then SVM will find out a set of straight lines for classifying the training data in one order. It becomes difficult to imagine when the number of features exceeds. From Fig. 4 its very clearly understood that there are multiple lines (here the hyperplane is a line because we are considering only two input features x1, x2) that divided our data points or does a classification between red and blue circles.
7.2 AdaBoost Schapire proposed this algorithm. It is used for facial detection. It is used to improve the learning of other classification algorithms. Combine the different classifiers into a strong classifier with the help of an easy rule so that the performance can be increased [37]. There are 2 problems has been identified in it. First how we will be able to make changes in the training data set, second, how we will be able to join the weak classifier form one strong classifier. But AdaBoost come to resolve these problems. It is a strong, robust and practical boosting algorithm in face recognition.
824
M. Shree et al.
Fig. 5 AdaBoost is used to adjust the sample weight. In (a) classification, the wrong sample is circled as red. In (b) it is retrained after adjusting the weight of the (a) misclassification sample [4] (compiled from A Review of Face Recognition Technology, LIXIANG LI, XIAOHUI MU1, SIYING LI, AND HAIPENG PENG)
It used the weighted training data instead of selecting random data from the sample and the main focus of AdaBoost is on relatively, large and difficult training data samples. While using the average voting mechanism it uses the weighted voting mechanism which makes the weak classifier with a good classification effect. Let make understand more about the AdaBoost classifier with Fig. 5. It input the value as x and returns the value G(x). In the AdaBoost classifier, multiple weak classifiers Gi are combined into a strong classifier, and every weak classifier has weight wi , which is shown as follows G(x) = Sign
n
wi ∗ G(X i )
i=1
7.3 Small Samples In the small sample problem, the number training sample is very less for facial recognition due to which most of the face recognition algorithms fail to achieve the performance.
Review on Facial Recognition System: Past, Present, and Future
825
7.4 Neural Networks A neural network is one of the most robust and powerful image classification techniques which is used for prediction analysis of the known data set, but also for the unknown data fact. It works for both linear and non-linear separable datasets. Neural networks has been used in many areas such as to interpret visual scenes, for speech recognition system, facial recognition, fingerprint and hand gesture recognition, iris recognition [38]. A neural network has been classified into three or more types ANN, CNN, RNN, Feed-Forward Networks (DNN). Neural network divided into three layers Input layer, Hidden layer, Output Layer. In Fig. 6 we can see that the neural network is the mesh of decisive output and every input is processed for generating the output weight is hidden layer. Input nodes have received the input in the form of a numeric expression. Information inputted by the user has been stored in neurons in the hidden layer for processing, where every node is assigned a number, the nodes having the higher number is greater activation. Then the inputted information has been passed through the network neurons. Which is rely on the connection strengths like weights, inhibition or excitation, and transfer functions, all the activation value is passed through one node to another node. The activation values receive from them each node is summed and then modifies the value based on its transfer function. The activation travels through the network, from hidden layers, until it reaches the output nodes. The output nodes then reflect the meaningful result. As many concepts relate directly to the neural network in this way Deep learning is one of the most important components of machine learning. Without feature extraction, it extracts the feature automatically required for classification it collects more efficient features of different faces. Face recognition has been completely transformed with the help of Deep learning. The first method of the facial recognition method uses convolutional neural networks (CNN). CNN uses the local set of data simulating the local perception areas, shared Weight
Fig. 6 Neural networks
Weight
826
M. Shree et al.
weights, and down-sampling of face images CNN is very similar to neural networks. CNN has neurons with training data that carry weights and bias values.
8 Comparison of Image Databases Different image databases have thousands of images for testing and training in Table 4 performance of the database and methodology has been compared.
9 Future Scope Some technical gaps have been encountered during the review study has been elaborated.
9.1 Facial Recognition with Face Mask We do not know when the pandemic gets end but we can develop a system that identifies the people with the face mask. In this era this is the need of the hours now where it is very tough to get identify that who is behind the mask, even there are some eye mask is also available as there have been some studies which work on the iris but if the iris part is also not visible then it is very critical to get recognize the face. The facial edge detection method can be proposed to identify the person based on the edge and forehead lines.
9.2 Fusion Based Algorithm We can maximize the use of a fusion-based algorithm which increases the throughput of the system. Many Boosting algorithms can be used to provide the robust feature descriptor also minimize the noise from the image. We can combine the feature descriptor with the classifiers such as KNN, SVM.
9.3 Real-Time Camera There are many tools available which has been used by digital platform to detect image such as Google photos, photos in apple, nowadays many mobile phones have
Review on Facial Recognition System: Past, Present, and Future
827
an inbuilt camera that AI enable and can identify the person and object. The work can be proposed for IOT based Devices for facial recognition.
9.4 Face Adaptive Dataset An Adoptive dataset can be implemented which does not require training. In most cases we first need to train the data model and then can apply the algorithms, A system can be developed which does not require the training data set. Which minimize time and increase the accuracy level.
9.5 Fastest Facial Recognition Support Vector Machine, ViolaJones algorithm consider as the fastest algorithm, A method can be developed for the facial Edge detection for the fastest facial recognition or many face property can be used such as lips, nose, eyes for the edge detection. A method also can be proposed on the nodal point of the face which increases the time and accuracy.
9.6 One 2D and 3D Face System A study can be proposed that works on both to get identify the 2D and 3D face, Grayscale and color images.
10 Conclusion Many researchers have been using the loss functions and definitely, it will increase the computation performance by using new loss functions currently we are achieving good accuracy as shown in Table 4, there are about 15 different loss functions used in the literature for deep learning-based face recognition. As much we will be using the loss function we can easily work on the facial recognition and it will boost the progress also. For enhancing the accuracy we can use the data set which is having multiple images from a different domain such as Infrared, RGB, different resolutions, pose variations, faces with accessories, faces with glass and faces with masks. Facial Recognition is the leading authentication technique in the new Tech era where everything is switching to the fast network. The life of humans is changing rapidly with the impact of covid-19 where humans need to wear a mask, with the help of a mask 70% of the face is hidden and it is a very tough task to identify the person.
828
M. Shree et al.
As many types of the research has been came in the area of iris and periocular region detection so that a person can be identified based on the facial feature. This paper is a review study on facial recognition and the methodology of facial recognition. As this technology can be used in many areas such as to identify the criminal record, to authenticate the gate entry, to protect the sensitive data, we can utilize the facial recognition system for free erroneous work and flawless control on the criminal identification and the security system. Facial Recognition is the broad area of research, in our paper we have discussed different feature extraction and classification method. Although a face is having multiple attributes which can be categorized into different entities. The face image can be captured from multiple angles thus it is difficult to get identify the person. Because of that, for training and testing of the algorithm we need a large data set with different images. As ORL, Yale, FERET database provides better performance to recognition rate in comparison with many other datasets in facial recognition.
References 1. Arya S, Pratap N, Bhatia K (2015) Future of face recognition: a review. Procedia Comput Sci 58:578–585 2. Salama AbdELminaam D, Almansori AM, Taha M, Badr E (2020) A deep facial recognition system using computational intelligent algorithms. Plos One 15:e0242269 3. Deng J, Guo J, Xue N, Zafeiriou S (2018) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 4. Li L, Mu X, Li S, Peng H (2020) A review of face recognition technology. IEEE Open Access 5. Taskiran M, Kahraman N, Erdem CE (2020) Face recognition: past, present and future (a review). Digital Signal Processing, 102809 6. Gaidhane VH, Hote YV, Singh V (2014) An efficient approach for face recognition based on common eigenvalues. Pattern Recogn 47:1869–1879 7. Kumari P, Seeja KR (2021) A novel periocular biometrics solution for authentication during Covid-19 pandemic situation. J Ambient Intell Human Comput, 1–17 8. Mian A, Bennamoun M, Owens R (2006) Automatic 3D face detection, normalization and recognition. IEEE 9. Sabharwal H, Tayal A (2014) Human face recognition. Int J Comput Appl 104 10. Zhong Y (2017) A theory of semantic information. China Commun 14:1–17 11. Cortes C, Vapnik V (1995) Support-vector networks machine learning, vol 20. Kluwer Academic Publisher, Boston, MA, pp 237–297 12. Nanni L, Ghidoni S, Sheryl B (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Elsevier 13. Adjabi I, Ouahabi A, Benzaoui A, Taleb-Ahmed A (2020) Past, present, and future of face recognition: a review. Multidisciplinary Digital Publishing Institute 14. Bhat A (2013) Medoid based model for face recognition using Eigen and fisher faces. Available at SSRN 3584107 15. Ladhake S et al (2015) Semantic image analysis for intelligent image retrieval. Procedia Comput Sci 48:192–197 16. Perveen N, Singh D, Mohan CK (2016) Spontaneous facial expression recognition: a part based approach. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA)
Review on Facial Recognition System: Past, Present, and Future
829
17. Han X, Du Q (2018) Research on face recognition based on deep learning. In: 2018 sixth international conference on digital information, networking, and wireless communications (DINWC). IEEE 18. Ahmad F, Najam A, Ahmed Z (2013) Image-based face detection and recognition: state of the art. arXiv preprint arXiv:1302.6379 19. Nath RR, Kakoty K, Bora DJ (2021) Face detection and recognition using machine learning algorithm. UGC Care 20. Hasan MK, Ahsan MS, Abdullah-Al-Mamun, Newaz SHS, Lee GM (2021) Human face detection techniques: a comprehensive review and future research directions. Electronics 10:2354 21. Naz F, Hassan SZ, Zahoor A, Tayyeb M, Kamal T, Khan MA, Riaz U (2019) Intelligent surveillance camera using PCA. IEEE, pp 1–5 22. Devi NS, Hemachandran K (2013) Automatic face recognition system using pattern recognition techniques: a survey. Int J Comput Appl 83 23. Pranav KB, Manikandan J (2020) Design and evaluation of a real-time face recognition system using convolutional neural networks. Procedia Comput Sci 171:1651–1659 24. Ge H, Dai Y, Zhu Z, Wang B (2021) Robust face recognition based on multi-task convolutional neural network. Math Biosci Eng 18:6638–6651 25. Lu D, Weng Q (2007) A survey of image classification methods and techniques for improving classification performance. Int J Remote Sens 28:823–870 26. Freund Y, Iyer R, Schapire RE, Singer Y (1998) An efficient boosting algorithm for combining preferences. In: International conference on machine learning, Madison, WI 27. Kasar MM, Bhattacharyya D, Kim TH (2016) Face recognition using neural network: a review. Int J Secur Its Appl 10:81–100 28. Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition 29. Sun Y (2015) Deep learning face representation by joint identification-verification. The Chinese University of Hong Kong, Hong Kong 30. Sun Y (2015) Deepid3: face recognition with very deep neural networks. The Chinese University of Hong Kong, Hong Kong 31. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition 32. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. British Machine Vision Association 33. Haamer RE, Kulkarni K, Imanpour N, Haque MA, Avots E, Breisch M, Nasrollahi K, Escalera S, Ozcinar C, Baro X, Naghsh-Nilchi AR (2018) Changes in facial expression as biometric: a database and benchmarks of identification. IEEE 34. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. IEEE 35. Ge S, Zhao S, Li C, Li J (2018) Low-resolution face recognition in the wild via selective knowledge distillation. IEEE 36. Lu D, Yan L (2021) Face detection and recognition algorithm in digital image based on computer vision sensor. J Sens 2021 37. Yuan Z (2020) Face detection and recognition based on visual attention mechanism guidance model in unrestricted posture. Sci Program 2020 38. Vatsa M, Singh R, Gupta P (2004) Face recognition using multiple recognizers. In: 2004 IEEE international conference on systems, man and cybernetics (IEEE Cat. No. 04CH37583)
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network Soumya Shrivastava
and Punit Kumar Johari
Abstract In contrast to wired networks, a wireless sensor network has widely dispersed nodes in an unguided and unsupervised environment, making an attack by an adversary much more possible than in the former. As a result, protecting these sensor nodes from harm is of paramount importance. Because a single sensor node’s transmission range is restricted and cannot transfer packets over a long distance, a partnership between sensor nodes is necessary to allow packets to be sent and communicated. Neighbour discovery is the method through which a node determines its next-door neighbour. A connection is then built between nodes to transfer the packet in a one hop distance once communication has been established. Once the packets have arrived at their destination, the procedure is then repeated. Malicious nodes may connect with each other over low-latency links if they can attach themselves to a real node. This paper mainly focuses on the malicious attacks as wormhole and tested on 25 nodes set-up in NS2 simulation. The findings of the paper is the evaluation of parameters like E2E delay, throughput of the communication network. According to the findings of the article, the performance of the network is disrupted in every area of communication during the wormhole. When using an ad hoc network, it is necessary to pay close attention in order to prevent being targeted by malicious nodes and activities. This work is to give attention on security issue of the network and the malicious activities from the intruders which hacked the data and miscommunicate the network performance. Keywords Wormhole attack · Malicious attacks · Ad hoc network
S. Shrivastava (B) · P. K. Johari Madhav Institute of Technology and Science, Gwalior, MP 474009, India e-mail: [email protected] P. K. Johari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_57
831
832
S. Shrivastava and P. K. Johari
1 Introduction Because of their intrinsic vulnerabilities, network sensors, such as their physical size, provide a significant security risk. A routing protocol is necessary since network sensors do not have routers, therefore, all nodes must aid each other in transmitting packets. Also, because of its dynamic topology, it is prone to all types of security breaches, making it a security risk. It is difficult to identify and prevent assaults like wormholes [1], which are the most dangerous. For the monitoring and recording of circumstances at multiple places, a transducer embedded with a communication infrastructure is needed. Temperature, pressure, wind direction, and more crucially, the critical functions of the human body are all examples of environmental variables. Each node in a sensor network should be independent and have a communication range, a source of power, and bandwidth. Sensors, a CPU, a transmitter, and a battery are the four essential components of a network sensor. A physical amount is used to create an electrical signal in the transducer. This sensor output is being processed and stored by a microprocessor. In addition to the processing, the transceiver gets instructions for further data transfer from a central computer. A battery powers every step of this operation. Detecting and preventing attacks in network and communication infrastructures is a highly complex and continuous research project that attracts the interest of a broad variety of academics [2–4].
1.1 Wormhole Attack In the context of wormhole attacks [5, 6], a malicious node in one place sends a routing message to a malicious node in another location through a hidden tunnel. These two malicious nodes, despite their vast separation, seem to be near to one another. In order to maximize the likelihood of capturing the data transmission path and eavesdropping or discarding data packets through the malicious nodes, it is quite likely that the hopcount travelling through the malicious nodes will be lower than that travelling through the normal nodes. An out-of-band channel and an encapsulated packet channel are both forms of out-of-band communication. Using a physical connection or a longdistance signal with a high-power transmission, two malevolent nodes may build a hidden tunnel between them, as depicted in Fig. 1b. Figure 1’s m1 and m2 are referred to as tunnel ingress and egress nodes, respectively, in this work. Alternatively, if the path goes from node d to node s, then m2 is the tunnel entry point and m1 is the tunnel exit point. An entrance or egress node of a tunnel might be a malicious node. To identify and isolate wormhole nodes, the article explains how to use an intrusion detection system (IDS) in a MANET [7–9]. When two attackers place themselves at crucial locations in a network, the assault is known as a “wormhole” The intruders then continue to listen in on the network and capture whatever wireless data they may find. Figure 2 illustrates that the two attackers are situated in a strategic location in the network, as shown.
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network
(a) Packet encapsulated channel (in-band channel)
833
(b) Out-of-band channel
Fig. 1 Two methods of wormhole attacks [31]
Fig. 2 Wormhole attack [31]
Wormhole attacks, as we previously said, put the attackers in a very advantageous position in the network. They take use of their placement, which means that they have the shortest path between the nodes as shown. Because they want other network nodes to know that they have the quickest way to convey data, they make their route public. Using a tunnel, wormhole attackers are able to capture all network activity and send it to a different part of the network for analysis. The wormhole attacker accepts packets and sends them to the opposite side of the network when the attacker nodes form a direct connection in the network. Out-of-band wormhole attacks occur when the attackers are in this scenario. Overlaying a tunnel on top of a wireless media
834
S. Shrivastava and P. K. Johari
is another kind of wormhole assault known as a “in-band wormhole attack.” As a result, the attacker is more likely to use this kind of assault [10, 11]. The above figure demonstrates how the attacker work from one to node eight through tunnel. The wormhole attacker takes packets and transfers them to the other side of the network when the attacker nodes make a direct link in the network.
2 Literature Review In [12], detection and prevention of black hole and wormhole assaults in wireless sensor networks (WSN) is the goal of this research. Phases like as node assignment, data collecting, black hole and wormhole assault detection, and optimum route communication prevention are all contained here. WSN black hole and wormhole assaults are detected and prevented for the first time, according to the authors’ knowledge. In [13], sensitive data, such as this, must be protected against interception in wireless senor networks. This makes it easier for attackers to get access to the network and launch a variety of attacks aimed at intercepting or modifying genuine data. For network sensors to help each other with packet transport, the same routing protocol must be divided across all nodes. This makes it vulnerable to all kinds of security threats in a complicated topology that is not well-defined. Wormhole assaults, which are difficult to detect and block, are a common type of attacks. In this study, a novel routing strategy is if helps to ensure that the data transfer is safe. Wormhole attacks are being studied in the study, and the technology is being used to identify and prevent them. WSN performance metrics like as energy efficiency, latency, throughput, and packet delivery rate are used to test the suggested technique. Based on the results, certain contemporary approaches have been compared to this work, and the results show that this work is the most efficient in terms of parameters. Different performance-related factors were simulated using NS2 using the approach described above. Described in [14], wireless networks are known as MANETs, or mobile ad hoc networks. Since there is no single point of failure, it is ideal for distributed computing environments. More than a few significant fields rely on MANET technology, including: the business sector; the armed forces; higher learning institutions; law enforcement; and many more. According to their findings, a wormhole assault reduces network performance, but a trusted method raises its overall worth. In other words, PDR and throughput provide the greatest results for the impacted network, whereas end-to-end latency gives the same results as for an unaffected network. In paper [15], the resource capacity and applications of wireless sensor networks vary from those of mobile ad hoc networks and vehicular ad hoc networks. The number of nearby nodes will rise as more nodes are linked through tunnel. An experimental examination of assessing neighbourhood change in dynamic WSNs with different speeds and time intervals is presented in this research. According to [16], this is a form of attack against the network’s routing mechanisms. K-nearest neighbour (KNN), support vector machine (SVM), decision tree (DT), linear discrimination analysis (LDA), Naive Bayes (NB), and convolutional neural
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network
835
network (CNN) are some of the machine learning approaches used in the classification (CNN) [5]. In [17], by changing routing protocols and draining network resources, a wormhole attack impairs the regular operation of the network. Using an artificial neural network, the research provides a method for detecting wormhole attacks in wireless sensor networks (WSN) (ANN). As a detecting feature, the suggested method makes use of information about the connection between any two sensor nodes. The suggested method has been applied in the wireless sensor network region under uniform, Poisson, Gaussian, exponential, gamma, and beta probability distributions for the sensor node placement. No extra hardware resources are needed for the suggested method, which has a high detection accuracy %. Described in [18], without a centralized infrastructure, an ad hoc network is made up of a collection of mobile nodes. Mobile nodes are both hosts and routers for all other mobile nodes. In addition, it sends packets to additional mobile nodes in the network that are not directly connected to the main network. A novel MANET defence against wormhole assaults is described in this work. To identify attacks, current technologies make advantage of quality of service (QoS) throughout the whole network. To identify both active and passive assaults, they employ a combination of the packet delivery ratio and the round-trip time for each node in their system. As a result, the suggested approach makes it feasible to identify a wormhole assault in its entirety. In paper [19], the findings indicate how the AODV routing protocol is influenced by wormhole in terms of performance and efficiency. Wormhole attacks are studied in terms of throughput, packer delivery ratio, and end-to-end latency in this research article. Using the multipath notion, a method is also put forward to identify and avoid wormhole attacks in VANET across a real landscape with varying vehicle densities. Mobile ad hoc networks are a fascinating field of study that is drawing academics because of the wide range of improvements and evolutions that are possible. These networks do not have a predetermined structure and are self-organizing. The routing principles of the ad hoc network make it an ideal alternative for transmitting data. The location-based geo-casting and forwarding (LGF) protocol, which belongs to the position-based group of routing protocols, is their primary emphasis in [20]. Wireless ad hoc networks are used by most people throughout the globe. So, reducing the wireless network’s susceptibility is now the most important thing to do. Wireless networks are vulnerable to a wide range of assaults, but the most serious is the wormhole attack. Weaknesses in cryptographic protection are not enough to stop wormhole assaults, since the intruders do not alter packet contents; they merely repeat them. The wormhole may cause a substantial disruption in communication if it is placed in the wrong spot. An investigation of current methods for detecting and thwarting wormhole attacks is described in this article [21]. MANET is vulnerable to a variety of threats, including black holes, wormholes, jellyfish, and more. In order to launch a wormhole assault, the attacking nodes may simply replicate a shorter path inside the network. Using regression analysis, a machine learning methodology, they estimate the pattern of relation between packets that are lost (PDF) and packets that are transmitted (PDF) via wormhole attack, using the method of least squares (MLS) and least absolute deviations (LAD) (LAD). As a result, the time required to determine the ratio of packets discarded to packets supplied through wormhole
836
S. Shrivastava and P. K. Johari
attack must be calculated. By using MATLAB’s wormhole attack simulator, it can be shown that linear regression analysis performs better at estimating the patterns of lost packets [22]. The unattended nature of deployment in an untrusted environment, restricted network resources, simple network access, and radio broadcast range makes a wireless sensor network (WSN) vulnerable to a wide variety of threats and assaults. For example, an attacker may set-up a low-latency connection between two sensor nodes in order to confuse the sensors and drain network resources by getting access to important information. The wormhole assault on WSN is described in detail in this document, along with a discussion of the various defences against it. To mitigate the wormhole attack in WSN, this study briefly compares and analyses several wormhole detections approaches and preventive procedures [23]. For wormhole attack detection in ad hoc networks, they utilize a machine learning approach. There are three main parts to this project. While working on a project, they are going to simulate what it would be like to have an assault on the network via one of the many various tunnels. This paper shows that their technique outperforms the other methods tested. On the statistical metrics precision, F-measure, MCC, and accuracy have been found to be statistically significant in their findings [24]. Mobile ad hoc networks are vulnerable to wormhole attacks, in which hostile nodes manipulate the network architecture in order to steal sensitive data. Wormhole detection has been suggested using a variety of approaches based on round-trip time, packet traversal time, and hop-count. In dealing with node high-speed mobility, fluctuating tunnel lengths and false information by malevolent nodes, these techniques were only partly effective. A new multi-level authentication model and protocol (MLAMAN) are proposed in this work in order to successfully identify and avoid the wormhole assaults [25]. The inherent nature of wireless sensor networks necessitates the need for security. Sinkhole, wormhole, Sybil, selective forwarding, black hole, and other such assaults are all feasible in wireless sensor networks. Many more assaults can be unleashed via the wormhole. It is simple to initiate an assault, but it is very difficult to detect one. The attack does not need knowledge of the network’s cryptographic material or procedures. When a rogue node intercepts traffic and diverts it to an unintended destination, it messes up the whole routing mechanism. For wireless sensor networks, wormhole attack detection has been described in this research [26]. Nodes in wireless sensor networks (WSNs) are self-governed and resourceconstrained, making them vulnerable to a variety of assaults, including wormhole attacks. An assault via a wormhole puts WSNs in grave danger and is difficult to detect since it causes private tunnels to route data incorrectly and damages WSNs through data leakage, dropping, and delays in delivery [27]. Any physical phenomena may be detected with a WSN, which is a network of sensors dispersed across an area. The placement of these nodes may be left to chance or done by hand. Typically, randomized deployment is used in remote places, whereas manual deployment is used in more accessible locations [28]. Wireless sensor and actuator networks will be enabled by the Internet of Things in the not-too-distant future (WSANs). The Routing Protocol for Low-Power and Lossy Networks (RPL) is the protocol of choice for WSANs because of its features. Securing these networks from assaults is crucial because they often transport very sensitive or life-threatening data. They
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network
837
explore the numerous countermeasures offered in the literature and try one of them. According to them, traffic eavesdropping and selective packet dropping may be the simplest technique to fight a wormhole attack in a WSAN [29].
2.1 Problem Statement The research found the lag of malicious activities and attacked specially in the customized network. The analysis of various typical attacks like black hole, wormhole, and others is still investigation in research process.
3 Methodology Warmholes are also called tunnel attacks. Tunnel attacks are situations in which two or more nodes can delete existing data paths while simultaneously deleting and exchanging messages. This risk allows one or more nodes to reduce the normal message flow encountered by two intruders. In figure presenting below, M1 and M2 are two exploit nodes that forcibly packet and reject path length (refer Fig. 3). Suppose S makes a way for node and wishes to start exploring the route. When achieving RREQ from M 1 S, M 1 goes through the RREQ and the current data path in tunnel M2. When M2 is observed on RREQ D, it is just moving {M1 -> A -> B -> C -> M2}. N1 and M2 headers are not updated. After finding the way you will find a destination in both directions for the destination: 1 is 5 and 2 is 4. Therefore, the matrix used to measure the length of the tunnel track can adequately prevent trusted intermediate nodes. This technique presents the state-of-the-art method to detect the wormhole. Fig. 3 Path length spoofed by tunnelling [30]
838
S. Shrivastava and P. K. Johari
4 Simulation Set-Up Total 25 communication nodes have been deployed in NS2 simulation though AODV protocol. The wormhole patch integrated during the communication. Figure 4 is the graphical layout of simulative nodes constructed in NS2. Total 25 nodes are being include in the arena. The execution is in very initial state where the packet data is ready to start. The communication has been started, and the nodes are encircled. Figure 5 represents the ring which reflect the communication signal transmission. The diagram (Fig. 6) where is attacker is depicted during their active mode. The layout of simulation is at the last stage of the communication time. Further, the data trace file has been used to analysis both conditions as before attack and after the attack. E2E delay for integrated security attack and integrated security attack detection. As tables (Tables 1, 2 and 3) show that on comparison of end-to-end delay it is found that the variation of nodes in same arena of network. The above tables clearly reflect that after the removal of attack of wormhole attack find significantly high. So, it can conclude that the wormhole attack has affected in all three parametric evolutions as E2E delay, throughput, and PDR.
Fig. 4 NS2 simulation set-up for 25 communication nodes. Source Captured from NS2
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network
839
Fig. 5 NS2 simulation set-up for 25 communication nodes (attacked initialized). Source Captured from NS2
Fig. 6 NS2 simulation set-up for 25 communication nodes (attacked recognize). Source Captured from NS2 Table 1 Comparative analysis of E2E delay for after the wormhole attack and before the wormhole attack E2E
Node 25
During attack
12.8303
Before attack
9.9188
Source Noted from trace file generated from NS2
840
S. Shrivastava and P. K. Johari
Table 2 Comparative analysis of packet delivery ratio (PDR) for after the wormhole attack and before the wormhole attack PDR
Node 25
During attack
99.84
Before attack
99.91
Source Noted from trace file generated from NS2
Table 3 Comparative analysis of throughput (packet delivery ratio) for after the wormhole attack and before the wormhole attack Throughput
Node 25
During attack
51.3
Before attack
92.33
Source Noted from trace file generated from NS2
5 Conclusion Detecting wormhole is very hard in ad hoc network. A malicious node attracts traffic from one part of the network and tunnels to another malicious node located in different area. While running the ad hoc communication, it is possible to break the security and affect the data packets. We recommend to detect the wormhole attack in wireless sensor network. The detection of wormhole types attack must be recognized so that the network serviced provided through ad hoc network must be sustain long. When compared to wired networks, a wireless sensor network contains nodes that are widely spread and operate in an unguided and unsupervised environment, which makes an attack by an opponent much more likely than it would be in the former. As a consequence, it is critical to ensure that these sensor nodes are kept safe from injury. Because the transmission range of a single sensor node is limited and it is not possible to send packets over a long distance, a partnership between sensor nodes is required in order for packets to be delivered and communicated. The process by which a node identifies who is next-door to it is referred to as neighbour discovery. Once communication has been established between the nodes, a link is created between them in order to send the packet across a one hop distance. It is then necessary to carry out the operation again after the packets have arrived at their destination. If malicious nodes are able to attach themselves to a legitimate node, they may be able to communicate with one another across low-latency lines. This article primarily focuses on malevolent assaults such as wormholes, which were tested on a network of 25 nodes in the NS2 simulation. The paper’s conclusions include an examination of characteristics such as E2E delay and throughput of the communication network. The paper found that during the wormhole the performance of network get disturbed in every aspect of communication. So the keen attention must be required to avoid the malicious node and activities when we use the ad hoc network.
Analysis of Wormhole Attack Detection in Customized Ad Hoc Network
841
6 Future Scope This paper found the wider scope on the set-up and test the same on the various virtual network and IoT network. The future of human is very dependent on the network, so the security and malicious activities are also very concerning issue. So, exploration towards various type of attacks is also very wide scope and research area.
References 1. Giri D, Borah S, Pradhan R (2018) Approaches and measures to detect wormhole attack in wireless sensor networks: a survey. In: Advances in communication, devices and networking. Springer, Singapore, pp 855–864 2. Tiruvakadu DSK, Pallapa V (2018) Confirmation of wormhole attack in MANETs using honeypot. Comput Secur 76:32–49 3. Patel MA, Patel MM (2018) Wormhole attack detection in wireless sensor network. In: 2018 international conference on inventive research in computing applications (ICIRCA). IEEE, pp 269–274 4. Patel M, Aggarwal A, Chaubey N (2018) Analysis of wormhole attacks in wireless sensor networks. In: Recent findings in intelligent computing techniques. Springer, Singapore, pp 33–42 5. Verma R, Sharma R, Singh U (2017) New approach through detection and prevention of wormhole attack in MANET. In: 2017 international conference of electronics, communication and aerospace technology (ICECA), vol 2. IEEE, pp 526–531 6. Rana R, Shekhar J (2017) Consequences and measures of wormhole attack in MANET. In: International conference on recent trends in engineering & technology (ICRTET2012) 7. Devi BR, Kaylyan Chafravarthy NS, Faruk MN (2017) Analysis of Manet routing protocol in presence of worm-hole attack using Anova tool. Int J Pure Appl Math 117(15):1043–1054 8. Kumar G, Rai MK, Saha R (2017) Securing range free localization against wormhole attack using distance estimation and maximum likelihood estimation in wireless sensor networks. J Netw Comput Appl 99:10–16 9. Karlsson J (2017) A unified wormhole attack detection framework for mobile ad hoc networks. Doctoral dissertation, The Open University 10. Jamali S, Fotohi R (2017) DAWA: defending against wormhole attack in MANETs by using fuzzy logic and artificial immune system. J Supercomput 73(12):5173–5196 11. Patel M, Aggarwal A, Chaubey N (2020) Experimental analysis of measuring neighbourhood change in the presence of wormhole in mobile wireless sensor. In: Data science and intelligent applications: proceedings of ICDSIA 2020, vol 52, p 339 12. Pawar MV, Anuradha J (2021) Detection and prevention of black-hole and wormhole attacks in wireless sensor network using optimized LSTM. Int J Pervasive Comput Commun 13. Singh S, Saini HS (2021) Intelligent ad-hoc-on demand multipath distance vector for wormhole attack in clustered WSN. Wirel Personal Commun, 1–23 14. Shukla M, Joshi BK (2021) A trust based approach to mitigate wormhole attacks in mobile adhoc networks. In: 2021 10th IEEE international conference on communication systems and network technologies (CSNT), pp 776–782. IEEE 15. Patel M, Aggarwal A, Chaubey N (2021) Experimental analysis of measuring neighbourhood change in the presence of wormhole in mobile wireless sensor networks. In: Data science and intelligent applications. Springer, Singapore, pp 339–344 16. Abdan M, Seno SAH (2021) Machine learning methods for intrusive detection of wormhole attack in mobile ad-hoc network (MANET)
842
S. Shrivastava and P. K. Johari
17. Singh MM, Dutta N, Singh TR, Nandi U (2021) A technique to detect wormhole attack in wireless sensor network using artificial neural network. In: Evolutionary computing and mobile sustainable networks. Springer, Singapore, pp 297–307 18. Sankara Narayanan S, Murugaboopathi G (2020) Modified secure AODV protocol to prevent wormhole attack in MANET. Concurr Comput Pract Exp 32(4):e5017 19. Ali S, Nand P, Tiwari S (2020) Impact of wormhole attack on AODV routing protocol in vehicular ad-hoc network over real map with detection and prevention approach. Int J Veh Inf Commun Syst 5(3):354–373 20. Spurthy K, Shankar TN (2020) An efficient cluster-based approach to thwart wormhole attack in adhoc networks. Int J Adv Comput Sci Appl 11(9):312–316 21. Sharma N, Sharma M, Sharma DP (2020) A review of proposed solutions for wormhole attack in MANET 22. Majumder S, Bhattacharyya D (2020) Relation estimation of packets dropped by wormhole attack to packets sent using regression analysis. In: Emerging technology in modelling and graphics. Springer, Singapore, pp 557–566 23. Dutta N, Singh MM (2019) Wormhole attack in wireless sensor networks: a critical review. Adv Comput Commun Technol, 147–161 24. Prasad M, Tripathi S, Dahal K (2019) Wormhole attack detection in ad hoc network using machine learning technique. In: 2019 10th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–7 25. Vo TT, Luong NT, Hoang D (2019) MLAMAN: a novel multi-level authentication model and protocol for preventing wormhole attack in mobile ad hoc network. Wireless Netw 25(7):4115– 4132 26. Patel M, Aggarwal A, Chaubey N (2019) Analysis of wormhole detection features in wireless sensor networks. In: International conference on internet of things and connected technologies. Springer, Cham, pp 22–29 27. Luo X, Chen Y, Li M, Luo Q, Xue K, Liu S, Chen L (2019) CREDND: a novel secure neighbor discovery algorithm for wormhole attack. IEEE Access 7:18194–18205 28. Kumar Dwivedi R, Sharma P, Kumar R (2018) A scheme for detection of high transmission power based wormhole attack in WSN. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–6 29. Perazzo P, Vallati C, Varano D, Anastasi G, Dini G (2018) Implementation of a wormhole attack against a RPL network: Challenges and effects. In: 2018 14th annual conference on wireless on-demand network systems and services (WONS). IEEE, pp 95–102 30. ASROP: AD HOC Secure Routing Protocol—Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Example-of-Wormhole-attack_fig1_3354 63265. Accessed 18 March 2022 31. A Security Scheme against Wormhole Attack in MAC Layer for Delay Sensitive Wireless Sensor Networks—Scientific Figure on ResearchGate. Available from: https://www.researchg ate.net/figure/CL-MAC-Wormhole-attack-A-Wormhole-joining-two-disjoined-paths-B-Wor mhole-in-the_fig4_277682524. Accessed 18 March 2022
Software Fault Prediction Using Particle Swarm Optimization and Random Forest Samudrala Santhosh, Kiran Khatter , and Devanjali Relan
Abstract Software fault prediction deals with the identification of software faults in the early phases of the software development cycle. It is essential to detect the defects in software as early as possible for the smooth functioning of the software and to reduce the resources and time taken for its maintenance in the future. It can be done either manually, or by using automatic predictors. As the complexity of the software increases, it becomes hard to identify the faults manually. So, to deal with the faults in a timely and more accurate manner, there are many automatic predictors already in use, and various new ones are also being proposed by several researchers. In this study, a method to improve the fault prediction rate in software fault prediction is proposed by combining particle swarm optimization (PSO) with the random forest (RF) classifier. NASA MDP Datasets, which are considered large-scale datasets, are utilized to test the proposed model. The findings show that PSO with RF classifier increases performance when applied on the NASA MDP Datasets and overcomes prior research constraints. Keywords Software fault prediction (SFP) · Particle swarm optimization (PSO) · Class imbalancing · Random forest
1 Introduction Software fault prediction is an essential part of software testing in the software development life cycle (SDLC). Software systems are an integral part of our life and are used in various places such as manufacturing, security, finance, and business S. Santhosh · K. Khatter (B) · D. Relan BML Munjal University, Kapriwas, India e-mail: [email protected] S. Santhosh e-mail: [email protected] D. Relan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_58
843
844
S. Santhosh et al.
or in more sophisticated places such as aircrafts, space shuttles, and a lot more. When an organization is producing software for a client according to their needs and requirements, they have an obligation to provide software of the highest quality. The software has to be reliable in order for an organization to put it into use in a system. To make it reliable, companies spend a lot of time, money and other resources [1] in checking for the possibility of software failures and correcting them manually. Software failures occur mainly because of faulty modules in the software. A fault is an incorrect step, code, process, data definition, or physical defect that occurs in hardware or software [2]. Software fault prediction is a machine learning classification task aimed at distinguishing fault-prone modules from non-fault-prone modules based on static code characteristics [3]. This manual defect prediction is not a viable solution for organizations to detect faults in their software when there are large modules present in them. It is so tedious; time taking and costs a lot of money. Due to this, most organizations utilize automated predictors to focus on faulty modules, allowing the software developer to analyze the defective portions of the software in more detail more quickly [3]. There are various types of automated predictors helping in detecting faults more accurately in software using machine learning algorithms such as random forest (RF), Naïve Bayes (NB), support vector machine (SVM), neural networks (NN), etc. Using parameter optimization methods, parameters of the estimation techniques used in building prediction models are optimized and are found out which set of parameters are supporting the model to be more accurate in predicting the faults. Some of these methods that are used for parameter optimization for the prediction models are gradient descent—which is a general optimization technique used in machine learning models, PSO [4], and genetic algorithm (GA)— which are natureinspired algorithms that proved to be better optimizers when combined with machine learning algorithms [5]. PSO is a population-based evolutionary algorithm developed by James Kennedy and Russell Eberhart in 1995, it is based on natural phenomena such as flocks of birds or schools of fish. Particle swarm algorithms have been used to tackle a variety of complicated problems, including software testing, work sequencing issues and the traveling salesman problem [2, 6]. In this paper, the model is proposed to optimize the parameters of random forest classifier such as n_estimators, max_features, max_depth, min_samples_split, min_samples_leaf and criterion using PSO to achieve the best fault prediction accuracy. Then, the results produced from other researchers’ work [2, 3, 7, 8] are compared with the proposed model. In [2], PSO is combined with SVM and in [3] authors used ensemble predictors. Banga and Bansal [7] and Singh and Chug [8] use classifiers with baseline parameters (estimation techniques that are trained with their default values rather than using an algorithm for parameter optimization). We have used the National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP) Software Defects datasets [9] to evaluate the performance of the proposed model. The performance metrics taken into account are accuracy score and receiver operating characteristics (ROC).
Software Fault Prediction Using Particle Swarm Optimization …
845
The remaining sections of the paper are organized as follows: Sect. 2 presents the related works in the field of software fault prediction, nature-inspired algorithms, and how combining these nature-inspired algorithms help in predicting the faults more accurately. Sections 3, 4, 5 and 6 introduces the methodology proposed for this study, a detailed explanation of the concepts used in building the model, the datasets used, the combined approach of the concepts, and the process of effectively detecting the faults. In Sect. 7, the results of the proposed model are compared with the results of present models in terms of accuracy, and AUC scores. Section 8 presents the final concluding remarks.
2 Related Work Jaberipour et al. [10] presented an approach for solving a system of nonlinear equations using the PSO algorithm by converting the system of nonlinear equations to a minimization problem and solving it. Also, some standard problems were presented to demonstrate the efficiency of their new proposed PSO algorithm. Rodriguez et al. [11] examined several machine learning techniques and performance metrics for dealing with unbalanced data in software defect prediction, taking into consideration the data’s quality, which comprises many duplicate occurrences. They took two versions of NASA MDP Datasets, one with imbalanced datasets and the other where all the imbalances were removed using pre-processing. To find out on which of these datasets the algorithms would perform better, they came to the conclusion that even though the latter datasets performed better, it is not for sure that the duplicate data in pre-processing have to be removed if the data is collected properly which will only be known if the source code is present. Chinnathambi and Lakshmi [12] proposed a methodology to overcome the issue of class imbalance in Software Defect Prediction Datasets, class imbalance is a threat to the learning model which results in overfitting and high false alarm late. So, they overcame this issue of class imbalance by using the evolutionary approach to generate diversity-based synthetic samples which ensure that synthesized samples reside in the cluster boundary and eliminate the outliers in the distribution measure. This method outperforms other models in terms of better recall value and less false alarm rate. Alsghaier and Akour [2] combined PSO and GA with SVM and compared the results of the hybrid model with the SVM classifier. They concluded that the integrations between SVM and optimization algorithms improve the prediction performance. The author in [13] conducted a systematic literature review in order to analyze and assess the performance of machine learning techniques in software fault prediction. He systematically collected a lot of papers and did a detailed assessment of the machine learning algorithms and compared them with other machine learning models. He assessed a lot of machine learning models including Naïve Bayes, multilayer perceptron, support vector machines, random forest, etc. He found out that the
846
S. Santhosh et al.
average values of AUC range from 0.7 to 0.83 and the average value of accuracy ranges from 75 to 85% and also found out that random forest was outperforming all other ML models in all studies. In [8], the main research aim was to find out which common metrics are mostly used in software fault prediction studies. The paper also explores what different types of datasets are used by different authors in software fault prediction studies, which techniques and algorithms are used mostly for software fault prediction, and which algorithm performs better. The authors found out that static code metrics, hybrid metrics, procedure metrics, NASA Datasets, decision trees, artificial neural networks, and support vector machine were the most used. They concluded that linear class algorithm, neural networks and decision trees were better performers than any other algorithm presented in their work. It is evident from the literature survey that decision trees and random forest performed better among all other classifiers. When one of these classifiers is applied along with parameter optimization it enhanced the performance of classification. This was the motivation behind the study to explore the role of a classifier and particle software optimization algorithm in predicting the faults in software.
3 Particle Swarm Optimization (PSO) PSO is a stochastic population-based meta-heuristic algorithm, it is inspired by a flock of birds or a school of fishes from nature. Here, each particle represents a possible solution of the function that is to be optimized. Each particle moves around in a N -dimensional problem space with a position xi = (x1k , x2k , …, x Nk ) and a velocity vi = (v1k , v2k , …, v kN ). The initialization of the particles is random and they move in the N-dimensional problem space iteratively with the help of the personal best ( pbest) of each particle after every iteration and global best (gbest) of the all particles combined for each iteration. After the completion of each iteration, all the particles are moved toward the optimal solution. The particles velocity and position for each iteration are updated with the help of the following 2 equations: vik = w × vik + c1 × r1 × pbestik − xik + c2 × r2 × gbestik − xik xik+1 = xik + vik+1 The inertia weight, w, controls the velocity vector, r1 , r2 are random integers uniformly distributed between 0 and 1, vik is the velocity of the i th particle at the k th iteration, and xik is the current solution (or location) of the i th particle at the k th iteration and c1 , c2 are positive constants also known as constriction factors. vmax is a velocity limit that prevents particles from rapidly traveling from one section of
Software Fault Prediction Using Particle Swarm Optimization …
847
the search space to another, perhaps missing a promising solution. It is set up as a function of the problem’s range. There are three components in the above formula—inertia (w × vik ), k k and social component c1 × r1 × pbesti − xi cognitive component k k c2 × r2 × gbesti − xi . The inertia component w is used to balance the exploration and exploitation of the particle, and it varies from 0.9 to 0.4, or 0.2. It determines the contribution of the previous velocity of each particle to the current iteration. Since PSO often converges to the global optimum but has difficulties in early convergence, constriction factors are used to ensure early convergence of the PSO. The constriction factors c1 and c2 helps in controlling the cognitive and social components in the velocity of a particle. As per the literature survey conducted, inertia is preferred over constriction factors to balance the exploration and exploitation of the particles [4, 10].
4 Random Forest (RF) Random forest is an ensemble method that consists of multiple decision trees. The individual decision trees each give out a prediction and depending on the problem, the majority or median/mean of the predictions made by the decision trees is considered and given as the result of the prediction. A random forest can solve both regression as well as classification problems, and the simplicity of the algorithm makes it easy to use and most effective. In the classification problem, the majority of the predictions made by decision trees are considered and in the regression problem, the median/mean of the continuous value predictions made by the decision tree is taken into account for giving the final output. It also has the ability to handle large datasets with many different feature types and high dimensionality. Random forest parameters and their accepted search space are shown in Table 1.
5 Datasets National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP) [14] Software Defects datasets (Refer to Table 2) are used to build our software fault prediction model. Each dataset of the NASA software system or subsystem contains static code metrics for each comprising module, a module can be anything referring to a method, function, or a procedure [2]. Each row in a dataset represents a module, and each column except the last one represents the attributes of the software. The last row specifies whether that module is faulty or not. Hence, it is a classification task, where the modules are either classified into non-faulty modules or faulty modules.
848
S. Santhosh et al.
Table 1 Random forest parameters, their explanation and search space Parameters
Explanation
Parameters search space
n_estimator
Number of decision trees in a RF
[10, 100]
max_features
Maximum number of features allowed to take a decision for each decision tree
[1, n_features*] (*n_features is dependent on the number of features in every individual dataset)
max_depth
Maximum depth allowed for each decision tree
[5, 50]
min_samples_split
Number of samples that can be allocated to each decision tree
[2, 11]
min_samples_leaf
Minimum number of leaves to be there for a decision tree to take a decision
[1, 11]
criterion
Determines the quality of the split
[0, 1]
Table 2 Description of the NASA MDP datasets used Dataset Number of attributes Number of instances Non-faulty instances Faulty instances CM1
22
498
449
49
KC1
22
1183
869
314
KC2
22
522
415
107
MC1
40
9466
9398
68
MC2
41
161
109
52
PC1
22
549
479
70
PC2
38
5589
5566
23
PC3
39
1563
1403
160
PC4
39
1458
1280
178
6 PSO with Random Forest Model (PSO–RF) As observed from Table 1, there are six different parameters n_estimators, max_features, max_depth, min_samples_split, min_samples_leaf, and criterion that can affect the accuracy of the random forest. It is necessary that the correct combination of parameters is identified so that the accuracy of our model can be increased. Here in PSO, each parameter is considered as a dimension where particles are randomly deployed in the N-dimensional space that has been created. So, to optimize the six parameters present in the random forest, all the particles are initialized with the random values (in between the range of their search spaces) in a six-dimensional space. Now, with the help of PSO equations given in Sect. 3, the pbest of each particle and gbest of all the particles are updated after every iteration, and then after a set of iterations the particles merge into a minimum for the objective function that
Software Fault Prediction Using Particle Swarm Optimization …
849
is chosen, and the coordinates of the minima are considered to be the best set of parameters that can predict the software faults more accurately. From Table 1, it can be observed that the number of faulty instances are far less when compared to the non-faulty instances. This implies that most of the datasets in the NASA MDP datasets have the problem of class imbalance. The technique that is chosen in the study to tackle the class imbalance is oversampling where the minority class instances are taken and duplicated to adjust the class distributions in the datasets accordingly. Further, it is compared with the majority class to make the dataset balanced. Data that is received after adjusting the class distributions in the dataset is pre-processed and made ready to use as a training dataset, validation dataset, and test dataset in the model. A threefold cross-validation technique is used to divide the dataset into train data and validation data, which helps in building the model effectively. Further, the objective function is defined and a random forest classifier is used with PSO for classification of faulty and non-faulty software modules.
7 Results and Discussions After the optimization of parameters using PSO, random forest classifier is used as an inductive algorithm, the performance measures that are taken into account are accuracy scores and AUC values (Refer to Table 3). The accuracy scores of the proposed model on different datasets are compared with the accuracy scores of models used in [2, 7, 8]. The comparison results are illustrated in Table 4. In [3] proposed by Yucalar et al., the use of ensemble predictors such as random forest, and multilayer perceptron are used to detect the faults in software with the performance metric as the area under the curve score. The proposed model performed better than all the ensemble predictors used in [3]. The performance of PSO-RF is compared with the ensemble predictors used in [3] (Refer to Table 5). Table 3 Accuracy and AUC scores of the proposed model
Datasets
PSO–RF Accuracy
AUC
CM1
96.02
0.96
KC1
90.01
0.91
KC2
87.89
0.88
MC1
99.85
0.99
MC2
85.51
0.88
PC1
96.10
0.97
PC2
99.94
0.99
PC3
96.48
0.97
PC4
96.27
0.96
850
S. Santhosh et al.
Table 4 Comparison of proposed model with other models [2, 7, 8] Author name, year of publication [reference]
Datasets checked
Average accuracy of existing work
Alsghaier and Akour (2020) [2]
CM1, KC1, KC2, MC1, MC2, PC1, PC2, PC3, PC4
PSOGA–SVM = 88.27 94.23
Banga and Bansal (2020) [7]
CM1, KC1, MC2, KNN = 71.74 PC1, PC2, PC3, PC4 SVM = 64.07
Singh and Chug (2017) [8]
CM1, KC1, KC2, PC1
NB = 83.84 NN = 86.36
Proposed model’s average accuracy (PSO + RF)
94.33 92.50
Table 5 Comparison of proposed model with ensemble predictors used in [3] Datasets
AUC values of proposed model
AUC values of [3] (max of all ensemble predictors)
CM1
0.96
0.77
KC1
0.91
0.87
MC2
0.88
0.76
PC1
0.97
0.86
PC3
0.97
0.83
It can be observed from Tables 4 and 5 that the proposed PSO–RF model outperforms all the other classifiers with baseline parameters and also with the model where PSO is combined with SVM.
8 Conclusion and Future Works Software fault prediction is an essential component for any company, because it hugely reduces the time, resources, manpower, and money required to detect the faults in software in an efficient manner with the use of automatic predictors instead of manual prediction with the help of humans. The automatic predictors make the software more reliable to use because producing software takes time and maintaining it for a long time manually is a tedious process, but with the help of these predictors, it is easy to identify the faults in a short time and also to rectify the faults in the software in earlier stages itself making it more reliable. In this paper, random forest classifier parameters are optimized using PSO. PSO is used as a search algorithm for detecting faults in the software, and random forest is used as an inductive algorithm where the evaluation metrics used are accuracy and area under curve scores. It is observed that the proposed model outperforms all the other classifiers with baseline parameters and also the model where PSO is combined with support vector machine. The accuracy scores of the datasets used range from 88 to 99%, and AUC scores
Software Fault Prediction Using Particle Swarm Optimization …
851
vary from 0.88 to 0.99, so it is evident from these ranges that the proposed model has high classification and prediction accuracy compared with other models in [2, 3, 7, 8]. Future works will focus on using other optimization algorithms with other classifiers for predicting software faults.
References 1. Turhan B, Kocak G, Bener A (2009) Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Syst Appl 36(6):9986–9990 2. Alsghaier H, Akour M (2020) Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier. Softw Pract Exp 50(4):407–427 3. Yucalar F, Ozcift A, Borandag E, Kilinc D (2020) Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950 4. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural networks, vol 4. IEEE, pp 1942–1948 5. Ali A, Gravino C (2020) Bio-inspired algorithms in software fault prediction: a systematic literature review. In: 2020 14th International Conference on Open Source Systems and Technologies (ICOSST). IEEE, pp 1–8 6. Kumar S, Ranjan P (2017) A comprehensive analysis for software fault detection and prediction using computational intelligence techniques. Int J Comput Intell Res 13(1):65–78 7. Banga M, Bansal A (2020) Proposed software faults detection using hybrid approach. Secur Privacy, e103 8. Singh PD, Chug A (2017) Software defect prediction analysis using machine learning algorithms. In: 2017 7th international conference on cloud computing, data science & engineering-confluence. IEEE, pp 775–781 9. NASA MDP Datasets. https://www.kaggle.com/aczy156/software-defect-prediction-nasa 10. Jaberipour M, Khorram E, Karimi B (2011) Particle swarm algorithm for solving systems of nonlinear equations. Comput Math Appl 62(2):566–576 11. Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10 12. Chinnathambi A, Lakshmi C (2021) Genetic algorithm based oversampling approach to prune the class imbalance issue in software defect prediction 13. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518 14. Nassif AB, Azzeh M, Idri A, Abran A (2019) Software development effort estimation using regression fuzzy models. Comput Intell Neurosci 2019
Cardinality Constrained Portfolio Selection Strategy Based on Hybrid Metaheuristic Optimization Algorithm Faisal Ahmad, Faraz Hasan, Mohammad Shahid, Jahangir Chauhan, and Mohammad Imran
Abstract Investment is one of the necessary economic activities, and stock market investment attracts investors due to its profitability, transparency, and liquidity. But the market offers a variety of securities with diverse returns and risk factors. The experts in the field always tried to analyze the stock market securities to select the best combination of securities. Over time, several innovative techniques have been developed to optimize investment activity by minimizing the risk and increasing the return. This paper proposes a hybrid portfolio selection strategy using ant lion optimization (ALO) and cuckoo search (CS) to minimize portfolio risk and maximize mean return. An experimental study evaluates the proposed strategy by comparing it with state-of-the-art methods on the standard benchmark dataset of the German stock exchange (100 stocks). The study exhibits the proposed strategy’s better performance among GA, SA, and TS-based solution approaches. Keywords Portfolio optimization · Mean–variance model · Cardinality constraints · Ant lion optimizer · Cuckoo search
F. Ahmad (B) Workday Inc, Pleasanton, USA e-mail: [email protected] F. Hasan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India e-mail: [email protected] M. Shahid · J. Chauhan Department of Commerce, Aligarh Muslim University, Aligarh, India M. Imran Department of Computer Science, Aligarh Muslim University, Aligarh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_59
853
854
F. Ahmad et al.
1 Introduction In the past, many innovations have taken place in the field of investment, but the period of the early sixties has been a landmark in the history of investment. Henry Markowitz published a paper with the title’ Portfolio Selection’ in 1952. In this paper, Henry Markowitz explained the portfolio optimization technique scientifically by emphasizing the basic principles of diversification. The portfolio diversification was explained by using various statistical tools. After his work was published, the proposed model became popular worldwide within a short time. The popularity of this model has drawn the attention of the people associated with the field to make an inquiry and investigation the model to develop other portfolio optimization models. With the increasing complexities due to various portfolio constraints, the construction of a portfolio slowly and gradually became very tough. Today, it has become a challenging task to manage the portfolio constraints like boundary constraints, cordiality constraints, quantity constraints, etc., to reap the benefits of an optimum portfolio, i.e., obtaining maximum return with minimum possible risk [1, 2]. The field of investment has also attracted. The experts in the field of IT, engineering, and mathematics have started applying to optimizing the portfolio. In today’s world, using some of the conventional methods and techniques of portfolio optimization are not giving complete solutions to the portfolio selection problems. Investment has gradually become a profession due to the requirement of specialized knowledge for analysis. Due to the financial attraction of this area, professionals in other fields like IT, engineering, and mathematics have also started working to develop certain metaheuristic-based portfolio optimization models. These methods have been applied in risk budgeting and portfolio optimization by considering various constraints [2, 3]. This paper proposes a hybrid portfolio selection strategy using ant lion optimization (ALO) [4] and cuckoo search (CS) [5] to minimize portfolio risk and maximize mean return. Both the optimization algorithms are applied randomly to the same population and try to improve the objective in successive generations. The experimental study evaluates the proposed strategy by comparing performance with state-of-the-art methods (GA, SA, and TS) [6] from the literature on the standard benchmark dataset of the German stock exchange (100 stocks). Study reveals the superior results of the proposed ALO-based strategy. In the rest of the paper, the study is designed as Sect. 2 deals with the related review of literature related to the pertinent subject matter to find out the gap in the study. The problems selected or the formulation of portfolio selection problems have been marked in Sect. 3. Section 4 is exclusively meant the giving solution to the formulated problem. The summary of the ALO and cuckoo search model has also been elaborated in this section. This proposed model’s comparative analysis and findings have been placed in Sect. 5 of this paper. Finally, the conclusion part of the paper has been discussed in Sect. 6, which is the last section of this paper.
Cardinality Constrained Portfolio Selection Strategy Based on Hybrid …
855
2 Related Work The development of computational resources has given many new portfolio optimization techniques. These computational intelligence-based techniques have given innovative ways and methods of portfolio optimization. Due to the complexities of large data used in portfolio optimization and various constraints, the previously developed techniques have not achieved the overall optimum results in all the cases. So, it has given a chance to some new techniques like evolutionary algorithm [7– 10], swarm intelligence algorithm [11–18], and nature-inspired algorithm [19–22] to solve the portfolio selection problem. One of the core candidates in evolutionary methods is the genetic algorithm. Chang et al. proposed genetic algorithm (GA), simulated annealing (SA), and Tabu search (TS) to optimize the portfolio selection objectives with cardinality constraints [6, 7]. Stochastic fractal search-based approaches are presented, including an unconstrained selection model [8] and risk budgeting constraint [9, 10], maximizing the Sharpe ratio. Due to the results shown by using swarm intelligence algorithm, it has been well recognized and accepted globally in terms of popularity and application. Later on, this technique was innovated and modified in various forms, popularly known as particle swarm optimization [11–13], ant colony [14], artificial bee colony optimization [15, 16], firefly algorithm [17], and gray wolf optimization [18]. Based on applying these modified optimization techniques, the results were obtained and compared with the results of previously developed models by different researchers. A metaheuristic technique, particle swarm optimization, was developed for portfolio optimization. PSO technique was meant to give the solution by gradually analyzing the particle movements toward the optimum particle. The obtained results of PSO have been found to be more efficient in terms of speedy computation and memory [11, 12]. A two-stage hybrid PSO is proposed in [13] for portfolio selection. Deng et al. presented an ant colony algorithm for portfolio optimization [14]. An artificial bee colony-based solution approach for the constrained portfolio [15], but with infeasibility toleration procedures [16] is also presented. Young developed a swarm-based algorithm inspired by the firefly’s behavior. The firefly algorithm has been applied in the portfolio selection, reporting significantly better results [17]. A gray wolf optimizer is proposed to solve portfolio selection by optimizing the Sharpe ratio [18]. Invasive weed optimization has been used with a mean–variance model in solving multi-objective portfolio optimization [19]. Inspired by the weed’s behavior, IWO suggested how to grow and adapt to the current environment [19]. The invasive weed optimization technique has been applied in risk budgeting portfolio optimization improving the Sharpe ratio [20]. Unconstrained portfolio selection strategies have been proposed using a golden eagle optimizer [21] and gradient-based optimizer [23]. Dhaini and Mansour have reported the portfolio selection using squirrel search with cardinality constraint [22].
856
F. Ahmad et al.
3 The Problem Statement The portfolio selection problem and its formulation have been discussed in this section. A portfolio (P) combining securities is designed, P = (S 1 , S 2 , S3 , …, S N ). Their respective weight in the portfolio is as (W 1 , W 2 , W 3 , …, W N ). The expected returns of the portfolio’s securities are (R1 , R2 , R3 , …, RN ). The risk and expected return of the portfolio can be calculated by using the formulas as N N N N Risk P = Wi ∗ W j ∗ C Vi j Risk P = Wi ∗ W j ∗ CoVi j i
j
i
Return P =
N
(1)
j
Wi ∗ Ri
(2)
j
In this formula, W i represents the respective weight of selected securities Si of the constructed portfolio P. The interactive risk between Si and Sj in the form of covariance is shown by CoV (i, j). The objective of the portfolio is to achieve the objective value, i.e., somehow maximizing return and minimizing the risk. Risk premium is zero in the case of the equity-based securities. The weighted sum of Risk P and Return P is estimated for the studied problem as Min(Z ) = λ ∗ Risk P − (1 − λ) ∗ Return P
(3)
subject to the constraints N (i) i=1 wi = 1 (ii) wi ≥ 0 (iii) 0≤ wi ≤ 1 N (iv) i=1 Z i = K . Where (i) represents the fully invested constraints and (ii) represents the constraints that restrict the short sell. (iii) is boundary constraints for weights. (iv) is the cardinality constraint that restricts the number of securities K out of N in the selected portfolio. In the above problem, the constraints represented are linear with a feasible convex region. A repair method is used to handle these constraints. Whenever lower or upper bounds are violated, respective weights are replaced by the lower or upper bound value. The normalization approach is used to maintain budget constraint (i.e., weight sum equal to 1) in which each stock weight is divided by the sum of the total weights of the portfolio. For cardinality constraint, securities are included or excluded if the total number of securities is less or more than K, respectively.
Cardinality Constrained Portfolio Selection Strategy Based on Hybrid …
857
4 Hybrid Solution Approach In this subsection, the proposed hybrid metaheuristic-based solution approach has been [4] explained in detail. Here, ant lion optimizer (ALO) and cuckoo search (CS) algorithms have been hybridized to attain a possible optimum portfolio and presented in the coming sections.
4.1 Ant Lion Optimizer (ALO) The ant lion optimization method resembles the ant lions and ants’ interaction in the trap. The same behavior has been transformed into a model where instructions between these two are made, ants revolving within the search space, and ant lions are meant to hunt them. Thus, they become fit into the trap. It is very natural for the ant to move in such a way to optimize while searching for food. A random walk method is selected for movements in the model formed in the following way: X (t) = [0, cumsum(2r (t1 ) − 1), cumsum(2r (t2 ) − 1), . . . cumsum(2r (tn ) − 1)] (4) Here, cumsum represents the cumulative sum, n represents the iterations number within the space, t reflects the random walks within the number of iterations, and r (t) is the stochastic function explained as r (t) =
1 if rand > 0.5 0 if rand < 0.5
(5)
where t represents the steps of random walks of the iterations and rand is denoted for random numbers in [0 1]. • Random walks of ants On the basis of Eq. (4) random walks are adjusted. At every step of optimization with the random walk, ants adjust their position within the search space provided. Equation (4) cannot be used directly by the ants to update their position due to the boundary of every solution space or range of variables. The following equation of (min–max normalization) has been used for the random walks within the solution space
X it
X it − ai × di− cit = + ci dit − ai
(6)
Here, ai and bi represent the minimum and maximum random walk of i-th, respectively. cit and dit reflects are the minimum and maximum of i-th variable at
858
F. Ahmad et al.
t-th iteration, respectively. Guaranteed the random walks inside the solution space are modeled by using Eq. (6) in each iteration as follows: • Sliding ants toward antlion In the proposed ALO model, the ant lions are enabled to make their traps in proportion to their fitness, and ants are meant to move randomly within the search space. If it is known that an ant is in the trap, as a result, sands are shot outward from the center of the pits by antlions. In this way, the trapped ants trying to escape slide down. In the mathematical form, the radius of ants’s random walks hyper-sphere is reduced adaptively, and the same can be presented by Eq. (7) as follows: ct=
Ct l
dt =
dt l
(7) (8)
• Re-building the pit and catching prey The last stage of the ant hunt will take place by the ant lion when the ant is caught in the antlion’s jaw, one the ant reaches the lowest stage of the pit. Antlion pulls it inside the sand to eat the ant’s body. It is said that an ant is being consumed when it goes into the sand to state this process. Thereafter, the antlion occupies the new position to hunt the new ant by positioning in a better way or improving on the previous hunting process. The hunting process of the antlion is proposed in the following form of the equation: Antliontj = Antit if f Antit > Antliontj
(9)
Here, t is reflecting the current iteration, j is the selected jth location of antlion at t-th iteration, and Antit reflects the location of i-th ant and t-th iteration • Elitism Elitism is one of the core features of evolutionary approaches allowing for passing the best solutions achieved at any generation in the process to the next generation. Here, an antlion’s best position obtained for each iteration is saved and considered an elite. The fittest antlion is called the elite, enabling the movement of all the ants during iterations. Thus, it is believed that an ant randomly walks around the antlion selected by the roulette wheel, and the elite position is obtained. This process is represented in the equation given as follows: Antit =
R tA + R tE 2
(10)
Here, R tA and R tE are the random walks around the selected antlion and the elite at t-th iteration, respectively. antit the location of i-th ant at t-th iteration.
Cardinality Constrained Portfolio Selection Strategy Based on Hybrid …
859
4.2 Cuckoo Search (CS) In cuckoo search, three idealized rules were used those are given as follows: (i)
Every cuckoo lays exactly one egg at a time and dwells its egg in the randomly selected nest; (ii) The nest with the highest quality eggs will carry forward to the next generation. (iii) The number of nests is fixed, and the probability that the host bird discovered the egg laid by a cuckoo is pa ∈ [0, 1]. In this situation, the host bird can have two options to perform, either (i) the bird has to throw the egg away or (ii) abandon the current nest and build a completely new nest. To simplify, the fraction pa can easily estimate the last assumption and the current nests are replaced by the new random solutions (nests). For an optimization problem, a solution’s fitness or quality can easily be judged by estimating the proportion to the value of the objective function. As a solution approach, ALO and Cs have been applied randomly to generate a random number, randn between 0 and 1. This randn direct which algorithm (ALO or CS) will be run for the current population. If randn is less than 0.5, ALO is executed; otherwise, CS is executed.
5 Experimental Results In this section, the portfolio selection problem, with the help of an experimental study using antlion algorithm approach, has been made to maximize the portfolio’s fitness function. This approach has been implemented for the targeted solution using MATLAB on ThinkPad Intel(R) Core i7 processing with 16 GB of RAM. A repair method is used for constraints satisfaction as mentioned in Sect. 3. In the proposed model, the parameters are set up as the initial population size is 100 and the stopping criteria of the algorithm are a maximum number of iterations is 100. In order to obtain the best combination of mean return and variance of return (risk) using the proposed solution approach, 20 different runs have been executed for 51 different values of risk aversion parameter (λ) are shown in Fig. 1 along with the standard interpolated efficient frontier. The objective values, return, and risk corresponding to each λ have been shown in Table 1. The proposed solution approach has computed mean percentage error and median percentage error. It is very clear from Table 2 that the proposed solution approach has performed better than GA, SA, and TS approaches as per the standard benchmark proposed by Chang et al. [6] on both mean percentage error and median percentage error.
860
F. Ahmad et al.
Fig. 1 Cardinality constraint efficient frontiers and standard UEF on DAX 100 dataset
10 -3
10 9
Mean of Return
8
K=10
7 UEF ALO-CS
6 5 4 3 2 1 0
0.5
1
1.5
2
2.5
Variance of Return
3 10
-3
Table 1 Results of objective function values on varying risk aversion parameters (λ) S. No.
λ
1 2
Objective value (Z)
Return
Risk
0
−0.009187945
0.009187945
0.002383037
0.1
−0.008058601
0.009204887
0.002257967
3
0.2
−0.006968991
0.009306766
0.002382107
4
0.3
−0.005701465
0.009013737
0.002027169
5
0.4
−0.004606772
0.009225674
0.002321581
6
0.5
−0.003603206
0.008392931
0.001186518
7
0.6
−0.002624744
0.0081454
0.001055694
8
0.7
−0.001832771
0.007496679
0.000594619
9
0.8
−0.001016714
0.006837462
0.000438473
10
0.9
−0.000314171
0.005708712
0.000285223
Table 2 Comparative results of the proposed algorithm as per Chang et al. [6] benchmark Measures
GA [6]
TS [6]
SA [6]
ALO-CS
Mean percentage error
2.5424
3.3049
2.9297
0.01364524
Median percentage error
2.5466
2.6380
2.5661
0.017832371
6 Conclusion In this paper, an attempt has been made by using the antlion optimizer algorithm to give the solution to the portfolio selection problems. The main objective of using the proposed solution approach was to improve the objective parameter of the constructed portfolio. A repair method was used to control the various constraints of the constructed portfolio. The proposed solution approach was developed as a model to attain objective parameters to the optimum level. The study results show
Cardinality Constrained Portfolio Selection Strategy Based on Hybrid …
861
that the proposed approach has the edge over GA, SA, and TS algorithms in terms of giving solutions to the portfolio optimization problems on the standard benchmark problem, DAX 100.
References 1. Markowitz HM (1952) Portfolio selection, The. J Finance 7(1):77–91 2. Markowitz H (1991) Portfolio selection: efficient diversification of investments. Cambridge, MA 3. Di Tollo G, Roli A (2008) Metaheuristics for the portfolio selection problem. Int J Oper Res 5(1):13–35 4. Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98 5. Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 World congress on nature & biologically inspired computing (NaBIC). IEEE, pp 210–214 6. Chang TJ, Meade N, Beasley JE, Sharaiha YM (2000) Heuristics for cardinality constrained portfolio optimisation. Comput Oper Res, 1271–1302 7. Chang TJ, Yang SC, Chang KJ (2009) Portfolio optimization problems in different risk measures using genetic algorithm. Expert Syst Appl 36(7):10529–10537 8. Shahid M, Shamim M, Ashraf Z, Ansari MS (2022) A novel evolutionary optimization algorithm based solution approach for portfolio selection problem. IAES Int J Artif Intell (IJ-AI) 11(3):847–850 9. Shahid M, Ansari MS, Shamim M, Ashraf Z (2022) A stochastic fractal search based approach to solve portfolio selection problem. In: Gunjan VK, Zurada JM (eds) Proceedings of the 2nd international conference on recent trends in machine learning, IoT, smart cities, and applications. Lecture notes in networks and systems, vol 237. Springer, Singapore. https://doi.org/10. 1007/978-981-16-6407-6_41 10. Shahid M, Ashraf Z, Shamim M, Ansari MS (2022) Solving constrained portfolio optimization model using stochastic fractal search approach. Int J Intell Comput Cybern. https://doi.org/10. 1108/IJICC-03-2022-0086 11. Wang W, Wang H, Wu Z, Dai H (2009) A simple and fast particle swarm optimization and its application on portfolio selection. In: 2009 international workshop on intelligent systems and applications. IEEE, pp 1–4 12. Zhu H, Wang Y, Wang K, Chen Y (2011) Particle Swarm Optimization (PSO) for the constrained portfolio optimization problem. Expert Syst Appl 38(8):10161–10169 13. Zaheer KB, Abd Aziz MIB, Kashif AN, Raza SMM (2018) Two stage portfolio selection and optimization model with the hybrid particle swarm optimization. MATEMATIKA: Malaysian J Ind Appl Math, 125–141 14. Deng GF, Lin WT (2010) Ant colony optimization for Markowitz mean-variance portfolio model. In: International conference on swarm, evolutionary, and memetic computing. Springer, Berlin, Heidelberg, pp 238–245 15. Gao W, Sheng H, Wang J, Wang S (2018) Artificial bee colony algorithm based on novel mechanism for fuzzy portfolio selection. IEEE Trans Fuzzy Syst 27(5):966–978 16. Kalayci CB, Ertenlice O, Akyer H, Aygoren H (2017) An artificial bee colony algorithm with feasibility enforcement and infeasibility toleration procedures for cardinality constrained portfolio optimization. Expert Syst Appl 85:61–75 17. Tuba M, Bacanin N (2014) Artificial bee colony algorithm hybridized with firefly algorithm for cardinality constrained mean-variance portfolio selection problem. Appl Math Inf Sci 8(6):2831 18. Imran M, Hasan F, Ahmad F, Shahid M, Abidin S (2023) Grey wolf based portfolio optimization model optimizing shape ratio of the portfolio constructed from Bombay stock exchange. In: 4th international conference on machine intelligence and signal processing (MISP 2022) (In press)
862
F. Ahmad et al.
19. Pouya AR, Solimanpur M, Rezaee MJ (2016) Solving multi-objective portfolio optimization problem using invasive weed optimization. Swarm Evol Comput, 42–57 20. Shahid M, Ansari MS, Shamim M, Ashraf Z (2022) A Risk-Budgeted Portfolio Selection Strategy Using Invasive Weed Optimization. In: Tiwari R, Mishra A, Yadav N, Pavone M (eds) Proceedings of International Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore, pp 363–371. https://doi.org/10.1007/978-981-16-38022_30 21. Hasan F, Ahmad F, Imran M, Shahid M, Shamim Ansari M (2023) Portfolio selection using golden eagle optimizer in Bombay stock exchange. In: 4th international conference on machine intelligence and signal processing (MISP 2022). In press 22. Dhaini M, Mansour N (2021) Squirrel search algorithm for portfolio optimization. Expert Syst Appl 178:114968 23. Shahid M, Ashraf Z, Shamim M, Ansari MS (2022) A novel portfolio selection strategy using gradient-based optimizer. In: Proceedings of international conference on data science and applications. Springer, Singapore, pp 287–297
Homograph Language Identification Using Machine Learning Techniques Mohd Zeeshan Ansari, Tanvir Ahmad, Sunubia Khan, Faria Mabood, and Mohd Faizan
Abstract Multilingual systems often have to deal with base language texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such settings, the system should be capable enough to classify the language of each piece of text effectively. This task becomes more complex in case of mixed-lingual sentences when the mono scripted corpora contain a considerable number of foreign language words. The present language identification systems accurate in identifying the language of mixed-lingual texts, however, they lack robustness to identify the language of homographs. In this work, we present a supervised learning framework using word sense disambiguation in the perspective of language identification to effectively deal with homographs. The dataset from Twitter is proposed in respect of significant homographs and to distinguish their language, a supervised learning framework based on word sense disambiguation is proposed. The framework is evaluated using various classification methods and results show that random forest algorithm performs the best among all with the F1-measure of 97.94. Keywords Mixed-lingual text · Language identification · Homographs · Word sense disambiguation
1 Introduction Language identification or classification is the task of automatically recognizing the language(s) existing in a textual document based on the content of the document. Language identification techniques predominantly assume that every piece of text is written in one of a closed set of known languages for which there is a training data and is thus formulated as the task of selecting the most likely language from the set of M. Z. Ansari (B) · T. Ahmad · S. Khan · F. Mabood Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India e-mail: [email protected] M. Faizan Department of Electronics and Communication, Jamia Millia Islamia, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_60
863
864
M. Z. Ansari et al.
training languages [1]. In natural language processing (NLP), language identification is the problem of determining in which natural language the given content spoken or written. There are some words spelled the same but not necessarily pronounced the same and having different meanings and origins which are called homographs [2]. These induce interlingual ambiguity in Hindi-English mixed-lingual text when the base script is predominantly Roman alphabet. Language identification is significant for several NLP applications to work robustly, since trained models are usually learn the parameters using data from a single language. If a model is trained on English text and subsequently used for prediction on Hinglish, Hindi written with Roman alphabet, we usually see a significant decrease in the performance. In applications such as information extraction, sentiment analysis, machine translation, etc., language identification is used as the preliminary step, and based on the detected language, appropriate trained models are applied for underlying subtasks. For this work, we selected the Hindi-English language pair as this language set is of greater interest to the researchers in light of the fact that the large part of the general population in India communicates in mixed Hindi-English. This language pair is difficult to process since Hindi is being written using the Roman script, rather than its usual script Devanagari script, due to which, homographs appear in the text, which are words that share the spellings, though, do not share the meanings. These homographs promote active bilingual conversation in mixed-lingual settings, however, they create ambiguity in language processing tasks. For example, the word ‘door’ in English means an entrance or a gate, on the other hand, in context of Hindi, as word ‘ ’ its meaning is far or distant, however, ‘ ’ spells as ‘door’ when written in Roman script. We observed that there have been no research work on homographs for Hindi-English text written in Roman script. To address this issue, we develop a homograph language identifier that predicts the language of homographs using the word sense disambiguation framework. Word sense disambiguation is a task in natural language processing that involves identifying which sense (meaning) of a word triggered by its usage in a specific context. We exploit the context of the homograph in focus using language models for training the homograph language classifier. The overall paper is organized as follows. Section 2 presents the related work on various techniques in respect of language identification task and its impact on subsequent tasks. Section 3 defines the word sense disambiguation framework and its significant recent works in brief. Section 4 presents the homograph language identifier as word sense disambiguation problem, its data preparation and overall methodology. Section 5 specified the experimental setting and presents discussion on results, and Sect. 6 is the conclusion of the work.
Homograph Language Identification Using Machine Learning Techniques
865
2 Related Work Traditional machine learning methods for automated language recognition relied heavily on the distinctive and consistent patterns found in a language’s words and sentences. To represent a specific language feature, Lui and Baldwin used orthographic characteristics and character-set mappings to classify language in multilingual documents [3]. King and Abney investigated word-level language recognition and found that using N-gram models combined with logistic regression enhanced results [4]. Ensemble classifier-based language identification was utilized by Malmasi and Dras [5]. Using a dataset of common English and transliterated Hindi words, Ansari et al. investigated the impact of POS tags on language label learning [6]. Several deep neural networks, such as gated recurrent units (GRU), long short term memory (LSTM), and convolutional neural network (CNN), have been used in recent research to represent language detection as a sequence labeling problem [7]. Bidirectional LSTM was used by Jaech et al. to transfer the sequence of CNN learnt character vectors to the language label [8]. The impact of word and character embeddings using LSTM for Spanish–English language pairs from the second shared task on language identification in code-switched data was investigated by Samih et al. [9]. We see an enormous amount of effort being done to handle common mixed language natural language understanding issues for language identification [10–12]. Language identification is also a prerequisite for a number of other applications, developed a sentiment analysis method using the mixed script information retrieval 2015 mixed-lingual text of English–Tamil, English–Telugu, English– Hindi, and English–Bengali datasets [13]. The English–Bengali–Hindi corpus was tagged for parts of speech, including the language identification stage [14]. Another emotion identification method was created using machine learning, and teaching learning-based optimization approaches for Hindi-English data [15].
3 Word Sense Disambiguation Semantics, or the sense of a given text or speech, is an important characteristic in communication and data acquisition. A word has one or more different senses attached to it. Each and every sense of a word is represented by a definition, a list of synonyms, and an example of sense usage. The process of assigning the appropriate definition or the appropriate sense to the word that is in consideration for disambiguation is called word sense disambiguation (WSD). For example, the term plant is ambiguous. As a noun it may be used in the meaning for ‘botanical life form’ or ‘an industrial building’. It also has uncommon meanings like ‘an actor who pretends to be a member of an audience’ and numerous linguistic connotations like ‘to bury’. However, whenever the language input to an automated system contains different meanings of a target word, then it needs a mechanism of using the contextual
866
M. Z. Ansari et al.
information to determine the meaning which depend on the semantic interpretation. This problem is resolved by word sense disambiguation algorithms. The majority of WSD techniques are knowledge-based methods, supervised learning, and unsupervised learning methods. The knowledge-based methods present the issue as a classification or sequence learning work, in which either a target word or all content words in a sequence must be tagged with a probable meaning [16, 17]. Rather than that the latter makes use of graph algorithms on knowledge bases. On the one hand, knowledge-based models have shown more versatility when it comes to disambiguating uncommon words and texts in low-resource languages, despite the absence of statistical evidence of lexical context. On the other hand, supervised models typically outperform unsupervised models in English word sense disambiguation, albeit at the expense of less flexibility and inferior outcomes when scaling to other languages. Recent research has focused on novel strategies for minimizing the consequences of the knowledge acquisition bottleneck via the automated generation of high-quality, sense-annotated training corpora [18].
4 Homograph Language Identification Using Word Sense Disambiguation In order to resolve the problem of language identification for homographs present in a Hindi–English mixed-lingual text written in Roman script, we applied supervised learning techniques for word sense disambiguation. We prepare the Twitter corpus for the task, in which each homograph in every sentence is annotated according to its language sense label. The overall work is described as follows.
4.1 Data Preparation We prepared a dataset of Hindi-English mixed language from the social media content over Internet. For the purpose of our work, we gathered the data from the tweets of popular handles such that the target homograph words are available in it. In order to identify the candidate homographs, we manually scanned the Hindi–English dataset of Ansari et al. [19] and selected two high frequency homographs as the preliminary work. These selected homographs along with their both Roman and Devnagri version are shown in Table 1. Preprocessing. The collected approximately 500 tweets for both homographs are preprocessed thoroughly in order to ensure that data is clean and in clear form. We, subsequently, removed smilies, urls, mentions, and special symbols from the tweets.
Homograph Language Identification Using Machine Learning Techniques
867
Table 1 Linguistic structures of multilingual homographs Multilingual focus word
English sense Orthography (Roman)
Meaning(s)
Hindi sense
Door
Door
Gate, entrance
Far, distant
Pass
Pass
Move, progress
Near, close
Orthography (Devanagari)
Meaning(s)
Table 2 Excerpts from dataset Homograph language
Sentence
English
Left my balcony door open last night and now it is 53 degrees in here So the chaos came to my door step Pass on our best wishes Hopefully he will pass on the message
Hindi
Jab koi door hota ha esa lgta ha jesy sub khatam ho gaya ha Na tum door jaana na hum door jayenge Pass aaye duriyaan fir bhi kam na hui Bhagwan ka diya hua sab kuch hai mere pass
Table 3 Dataset statistics Homograph Sample Number of tweets for English sense Number of tweets for Hindi sense Door
Train
357
311
Test
153
134
Pass
Train
337
342
Test
144
147
Annotation. Each tweet is manually annotated to the language sense label Hindi and English according to the homograph in it. The excerpts from final prepared dataset for homographs are shown in Table 2 where homographs are in italics. Repeating the process for all the homographs words selected for study (see Table 1), we obtain the complete the dataset. Summary of the collected dataset is given in Table 3.
4.2 N-Grams N-gram language modeling is a powerful probabilistic method for extracting contextual characteristics from a text sample. For example, a 2-gram or bi-gram is a sequence
868
M. Z. Ansari et al.
of two adjacent elements from a stream of tokens, which are typically alphabets, syllables, or words. As the name suggests, the bi-gram model approximates the probability of a word given all the previous words by using only the conditional probability of one preceding word. In other words, the probability is given by P(x t | x t−1 ), where x t is the current token and x t −1 is the preceding token. The probability calculation can be extended to higher values of N leading to encapsulation of N context tokens. We generated N-grams from the tweets for N = 1–7 for both word and character N-grams separately.
4.3 Classification Algorithms The classification algorithms used for this work include probabilistic models, ensemble methods, kernel methods, and other approaches to classification. Support Vector Machine. The support vector machines (SVM) classifier is a supervised learning algorithm that may be used to classify data, perform regression, and identify outliers. The SVM is among the state-of-the-art classical machine learning classification method that is utilized in wide range of applications. Random Forest. The random forest (RF) is a classification algorithm composed of a group of tree like structured decision stumps. It is an ensembling technique that works by building several decision stumps during training and returning the class label that is best for input. AdaBoost. Boosting is a common method of improving the effectiveness of a learning system by merging weak classifiers. The boosting approach helps to decrease the number of mistakes that are caused by weak learners. AdaBoost iterates the weak learner several times in a row, as a result, it reduces the weights associated with correctly predicted sample points while increasing them for incorrectly predicted sample points. Once this has been accomplished, the moderately precise learners are merged to produce a single highly accurate learner. XGBoost. XGBoost is an ensemble learning method based on decision trees that employs a gradient boosting technique. Gradient boosting is a technique that generates newer models in order to figure out what are the mistakes of previous models. The final classification is obtained by adding these residuals together. Because it uses a gradient descent approach to reduce loss while bringing in new models, it is referred known as gradient boosting.
4.4 Methodology The homograph language identification process begins by tokenization of words present in corpus text. The text after preprocessing is tokenized to form N-gram language models. We considered both word and char N-grams for N = 1–7. The resulting language model is able to generate the input feature representation for
Homograph Language Identification Using Machine Learning Techniques
Homograph discovery
Mixed Language Dataset
search
High frequency Homograph selection
N-gram Feature extraction
Training data
869
Training Algorithm
Homograph Dataset N-gram Feature extraction
Test data
Twitter
Predicted Homograph Language
Model
Fig. 1 Proposed framework for language identification of homographs
each tweet. This representation is given as input to the classification algorithm for learning of the training model. Finally, the trained model may be utilized to detect the homograph language for a new tweet. This process is carried out separately for each homograph in consideration for this study. The overall proposed framework is present in Fig. 1.
5 Experiments and Results We conducted the experiments on two homograph words door and pass by dividing the dataset into train test ratio of 7:3. We tuned all the classifiers with their optimal hyper-parameters and recorded the results. The macro averaged precision, recall, and F1-measure are given in Table 4. Table 4 Micro averaged precision, recall, and F1-measure Homograph = door Word N-gram P
R
Homograph = pass Char N-gram
F
P
R
Word N-gram F
P
R
Char N-gram F
P
R
F
SVM
94.48 94.20 94.15 96.22 96.22 96.22 86.25 81.90 82.15 92.51 91.65 91.89
RF
96.12 95.84 95.87 97.91 97.97 97.94 91.37 90.62 90.83 95.77 95.94 95.81
AdaBoost 94.08 93.86 93.81 95.56 95.52 95.53 91.51 90.58 90.82 94.07 94.03 94.05 XGBoost
92.80 92.49 92.43 96.94 96.90 96.91 92.80 92.02 92.24 93.18 92.86 92.18
870
M. Z. Ansari et al.
5.1 Evaluation Metrics We use precision, recall, and F1-measure to evaluate the performance of homograph language identification system. Precision is used to quantify the quality of predictions made using that system. Precision is expressed as the ratio of predicted labels that are right responses to the total number of labels predicted using the proposed. Recall quantifies the extent to which a suggested model can reflect the actual responses in test data. Recall is defined as the number of predicted labels that correspond to the right answer divided by the number of true responses in test data. The F1-score is defined as the harmonic average of accuracy and recall.
5.2 Analysis of Results Table 4 shows that the random forest classifier performs best in all cases except with word N-gram language model of homograph pass, in which the XGBoost classifier shows the best performance. The best precision, recall, and F1 are observed for random forest with char N-gram with values of 97.91%, 97.97%, and 97.94%, respectively. Moreover, when we compare the performance of both word N-gram and char N-gram language models, we observe that for all classifiers, char N-gram in every case outperforms word N-grams, see Fig. 2. Table 5 shows the confusion matrix of the random forest in respect of both chosen homographs door and pass. For homograph door, we observe that only one sentence 100 98 96 94 92 90 88 86 84 82 80
w-ng c-ng w-ng c-ng w-ng c-ng w-ng c-ng w-ng c-ng w-ng c-ng Precision
Recall
F1-measure
Precision
Homograph=door SVM
Recall
F1-measure
Homograph=pass RF
AdB
Fig. 2 Performance comparison of word and char N-grams
XGB
Homograph Language Identification Using Machine Learning Techniques
871
Table 5 Confusion matrix of random forest classifier Homograph = door
Homograph = pass
Predicted
True
Predicted
English sense
Hindi sense
English sense
133
11
Hindi sense
1
146
True
English sense
Hindi sense
English sense
146
7
Hindi sense
19
115
is wrongly predicted as English sense instead of Hindi, on the other hand, for homograph pass, 19 such labels are predicted wrongly. In case of English sense predictions, 11 and seven labels are predicted incorrectly for door and pass, respectively. Conclusively, for door, we observe a high recall rate of 99.3% for Hindi sense as compare to the 92.36% of English sense, however, same does not reflect with pass. We do observe that the recall rate for the Hindi word pass is 85.82%, which is quite dissimilar to the recall rate for the word door. The recall rate for English sense for pass, on the other hand, is rather high at 96.68%.
5.3 Performance Comparison The proposed homograph language identification models are trained over the novel prepared dataset on two homographs door and pass. The average test F1-measure obtained on both homographs when selected for classification by random forest classification is 94.38%, which is the best average across the supplementary utilized classifiers. In order to draw the comparison of suggested model with prior supervised state-of-the-art models, we take into consideration the benchmark dataset of SemEval-2015 Word Sense Disambiguation dataset. However, since we prepared our own dataset, this comparison may not be considered relative. The recent techniques include sense vocabulary compression (SVC) [20], extended WSD incorporating sense embeddings (EWISE) [21], layer weighing and gated linear unit (LWGLU), [22] and GlossBERT [23] which are put together with the proposed model in Table 6. Table 6 Performance comparison of proposed model with existing models
Model
Dataset
F1-measure
Proposed
Homograph
94.3
GlossBERT 2020
SemEval-2015
76.1
SVC 2019
SemEval-2015
78.3
EWISE 2019
SemEval-2015
69.4
LWGLU 2019
SemEval-2015
71.1
872
M. Z. Ansari et al.
6 Conclusion This work is undertaken to make a proposal of a novel framework for identifying language of homographs using the supervised word sense disambiguation approach. This work helps to progress the performance of the generic language identification task. We selected two high frequency homographs for the study and subsequently, prepared and annotated the required dataset. This study uses machine learning approaches to develop distinct classifiers for each homograph. Word and character level N-gram language models are employed for the transformation of text to the numerical input representation. The performance of the developed framework is evaluated using the five machine learning classification models namely support vector machines classifier, random forest classifier, AdaBoost classifier, and XGBoost classifier. The performance is evaluated on the basis of precision, recall, and F1-measure by utilizing the confusion matrix. Finally, empirical outcomes indicate that the random forest classifier enhanced with a char N-grams language model outperformed the other suggested classifiers with an F1-measure of 97.94%. For future work, more homographs may be discovered and incorporated into the framework. Moreover, an integrated data approach may be also be more beneficial if the frequency of distinct homographs approaches large values.
References 1. Hughes CE, Shaunessy ES, Brice AR et al (2006) Code switching among bilingual and limited english proficient students: possible indicators of giftedness. J Educ Gifted 30:7–28. https:// doi.org/10.1177/016235320603000102 2. Jurafsky D, Martin JH (2013) Speech and language processing, 2nd edn. Pearson 3. Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist 2:27–40 4. King B, Abney S (2013) Labeling the languages of words in mixed-language documents using weakly supervised methods. In: Proceedings of the 2013 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 1110–1119 5. Malmasi S, Dras M (2017) Multilingual native language identification. Nat Lang Eng 23:163– 215. https://doi.org/10.1017/S1351324915000406 6. Ansari MZ, Khan S, Amani T et al (2020) Analysis of part of speech tags in language identification of code-mixed text. In: Sharma H, Govindan K, Poonia RC et al (eds) Advances in computing and intelligent systems. Springer, Singapore, pp 417–425 7. Bhattu N, Krishna N, Somayajulu D, Pradhan B (2020) Improving code-mixed POS tagging using code-mixed embeddings. ACM Trans Asian Lang Inf Process 19:1–31. https://doi.org/ 10.1145/3380967 8. Jaech A, Mulcaire G, Hathi S, et al (2016) Hierarchical Character-Word Models for Language Identification. In: Proceedings of The fourth international workshop on natural language processing for social media. Association for Computational Linguistics, Austin, TX, USA, pp 84–93 9. Samih Y, Maharjan S, Attia M, et al (2016) Multilingual Code-switching Identification via LSTM Recurrent Neural Networks. In: Proceedings of the second workshop on computational
Homograph Language Identification Using Machine Learning Techniques
10.
11.
12.
13.
14.
15.
16. 17.
18.
19. 20. 21.
22.
23.
873
approaches to code switching. Association for Computational Linguistics, Austin, Texas, pp 50–59 Singh K, Sen I, Kumaraguru P (2018) Language identification and named entity recognition in Hinglish code mixed tweets. In: Proceedings of ACL 2018, student research workshop. Association for Computational Linguistics, Melbourne, Australia, pp 52–58 Ramanarayanan V, Pugh R, Qian Y, Suendermann-Oeft D (2019) Automatic turn-level language identification for code-switched Spanish–English dialog. In: D’Haro LF, Banchs RE, Li H (eds) 9th international workshop on spoken dialogue system technology. Springer Singapore, Singapore, pp 51–61 Shekhar S, Sharma DK, Beg MS (2020) Language identification framework in code-mixed social media text based on quantum LSTM—the word belongs to which language? Mod Phys Lett B 34:2050086 Bhargava R, Sharma Y, Sharma S (2016) Sentiment analysis for mixed script Indic sentences. In: 2016 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 524–529 Barman U, Wagner J, Foster J (2016) Part-of-speech tagging of code-mixed social media content: Pipeline, stacking and joint modelling. In: Proceedings of the second workshop on computational approaches to code switching, pp 30–39 Sharma S, Srinivas P, Balabantaray R (2016) Emotion detection using online machine learning method and TLBO on mixed script. In: Language resources and evaluation conference, pp 47–51 Zhong Z, Ng HT (2010) It makes sense: A wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL 2010 system demonstrations, pp 78–83 Raganato A, Bovi CD, Navigli R (2017) Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1156–1167 Scarlini B, Pasini T, Navigli R (2019) Just “OneSeC” for producing multilingual senseannotated data. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, pp 699–709 Ansari MZ, Beg MMS, Ahmad T, et al (2021) Language identification of Hindi-English tweets using code-mixed BERT. arXiv:210701202 [cs] Vial L, Lecouteux B, Schwab D (2019) Sense vocabulary compression through the semantic knowledge of wordnet for neural word sense disambiguation. arXiv:190505677 [cs] Kumar S, Jat S, Saxena K, Talukdar P (2019) Zero-shot word sense disambiguation using sense definition embeddings. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 5670–5681 Hadiwinoto C, Ng HT, Gan WC (2019) Improved word sense disambiguation using pre-trained contextualized word representations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 5297–5306 Blevins T, Zettlemoyer L (2020) Moving down the long tail of word sense disambiguation with gloss informed bi-encoders. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Online, pp 1006–1017
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies Rosmy Antony and R. Sunder
Abstract The rule breaking design of growth in terms of global, energy, and wealth which leads the globe experiencing a time of maximum urbanisation. The goal of sustainability development is to transform people’s lives, work and get around by using a knowledge-based approach. The primary form of production for smart cities is data-driven urbanism. The purpose of this article is to examine current trends and technology for smart cities using an information-driven approach in order to be encouraged to be efficient and sustainable in the tomorrow. Because of its immense potential to enhance sustainability, the Internet of Things has evolved to become an important part of smart city ICT facilities. Future research including sustainability will increasingly use backcasting. Big Data and analytics help utilities achieve operational efficiency in data-driven smart cities. The suggested framework will aid academics in assessing and backcasting methods for constructing future smart sustainable urbanisation models, as well as support for IoT and Big Data. Finally, we conclude that any framework for designing smart and livable communities based on contemporary terminologies has strategic relevance in solving lots of the complicated issues and concerns that must be addressed to environmentalism and urbanisation, as well as its advances in accelerating sustainable development. Keywords IoT · Data-driven urbanism · Big Data · Smart cities · Backcasting · Smart sustainable cities · ICT infrastructure
1 Introduction The usage of technological services and applications to create smart cities has revealed incredible development requirements in the future. By 2050, it is estimated that by 2050, 66% of population of the world would live in cities. It means that the world urban population might increase by 2.4 billion people. A smart city is one R. Antony (B) · R. Sunder Sahrdaya college of Engineering and Technology, Kodakara, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_61
875
876
R. Antony and R. Sunder
that can become smarter in terms of attitude and conservation in order to improve the comfort of its residents in a cost-effective, egalitarian, and livable way. Smart cities use technology to help people and are designed with them in mind. They begin with an information network aimed at maximising resources such as energy, water, and food security. Communities that are energy-neutral, there are energy-efficient smart buildings and sustainable smart communities use natural resources. We can observe how digital technology can assist us in improved monitoring and utilisation of municipal infrastructure [1]. As we all know, IoT and low-cost sensors and actuators exist in the physical world. In recent years, more than 25 billion such devices have been deployed. This is due to technological downsizing and the ability to integrate pervasive wireless communication into devices and sensors in physical infrastructure. After gathering the necessary information on the cities, the following stage is to ensure that the cities are livable. Over the last 58 years, there has been a lot of discussion about the future and smart cities. However, it is necessary to determine what is actually required for the best of both habitable cities.Livable cities are designed in a way that is both environmentally and socially beneficial. It is about integrating planning, urbanisation, and natural system conservation to develop a long-term construct and natural environment [2]. Cities are becoming smarter, more livable, and more responsible, and we are only seeing the beginning of what technology can do for the urban environment in the future. Smart technology has recently been viewed by city administrators as one of the most important tools for improving efficiency behind the scenes. Now, technology is more immediately infused into the lives of residents. Big Data is becoming a more important source of evidence for making high-quality decisions. By examining current or previous situations, Big Data computing is the analysis of extraordinarily large data sets in order to evaluate patterns, trends, and associations, particularly in relation to human behaviour and relationships [3]. Smart cities are described in modern parlance as government bodies that collect digital data from inhabitants, infrastructure items, and electronic gadgets for a variety of administrative and backcasting technique management reasons in order to fulfil future visions. Backcasting techniques that are goal-oriented declare long-term strategies for achieving future goals. Future research will provide viable methods for putting together smart and sustainable cities, especially in difficult conditions when the issue is complex and considerable change is required. It is a widely used methodology and can be used in any somewhat sophisticated planning process. Backcasting as a structural transformation is progressively being used in scientific investigations in sectors relevant to environmental regeneration in terms of its practical uses. The analysis will be carried out by researching cases utilising a backcasting technique, such as renewable city and intelligent city policies and a fresh approach, as well as innovations, with a clear visual definition on the basic principles of sustainable urbanisation [4].
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
877
2 Related Work We present a quick review of the Internet of Things, as well as some features of IoT architecture, Big Data, and backcasting to the building of a sustainable smart city, in this part.
2.1 An Emergence of Featured Cities These methods of urbanisation and development have been founded on revolutionary model data-driven in recent years, techniques for smart sustainable cities have been developed. The advantages of future cities are clearly described, as are the strengths and crucial points. This primarily attempts to plan, analyse, and construct the future’s vision. The Eco-city will provide green infrastructure, a maintains energy system, a sustainable garbage management system, and green technological development as important environmental and economic benefits [5]. Smart cities can transform into sustainable cities that can undergo significant changes during their entire life cycles. The most frequently asked question, which covers urbanism, is where to go with sustainable urbanisation. There could be new paths to long-term development. This set of six design principles blends society-based, complexity-led, and landscapedriven design into the creation of a sustainable city [6].
2.2 IoT Paradigm Build the Smart Environment The Internet of Things is undergoing rapid changes as a result of the large amount of data created by devices and sensors. These gadgets are connected to the Internet. It allows for the rapid development of intelligent systems and applications in a variety of industries [7]. Mainly concerns with three layers and the sensor which collect the packets and transformed into digital signal. In network layer, interconnected devices share the resources and application layer creates intelligent environment like smart buildings, smart home, smart health, etc., and guarantees the integrity (Fig. 1). The ability to relate to the layers thanks to a deep understanding of the IoT architecture. The IoT paradigm, which has recently expanded, is being used to create smart surroundings. It focuses mostly on security issues and proposes intrusion detection technologies. It was created to solve security problems and exploit security flaws [8].
878
R. Antony and R. Sunder
Fig. 1 IoT Paradigm based on related work
2.3 Big Data and IoT Adaptive Solutions The IoT architecture provides adaptable solutions in the basis of hardware and communication capabilities. The IoT is a five-layer architecture with three domains: application domain, network domain, and physical domain. To increase service quality, the development of network-enabled sensors and AI algorithms is recommended. There are a variety of human-centred smart systems, such as smart healthcare and autonomous driving [9] (Fig. 2). In today’s smart city design and implementation, real-time data processing and analysis has become a critical component. It focuses on smart city architecture that incorporates Big Data Analytics. The scheme’s major purpose is to improve the quality of real-time decision making by utilising efficient Big Data. Resources, traffic
Fig. 2 Big Data with IoT in smart cities
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
879
control, parking management, and air pollution measures are all authenticated in a smart city. To classify network-enabled devices and examine possible services, use Big Data approaches. Big Data offers a wide range of possibilities. All daily operations are translated into data, and each transaction is converted into large data that may be analysed and performed [10].
2.4 Digital Transition of ICT The integration of telecommunications and computers, as well as necessary enterprise software and storage that enable users to access, store, transmit, understand, and manipulate information, is known as information and communications technology (ICT). ICT framework is an intelligent network of connected objects and machines transmitting data using wireless technology and the cloud. Each city has incorporated information processing and networking technology to enable creative sustainable city solutions. To meet the difficulties of sustainability, new digital transition in ICT will be implemented [11]. Long ago, analysing energy policy was quite difficult. As a result of the use of existing ways to explain forecast methodologies, an alternative method known as energy backcasting will be implemented. It establishes policy objectives and discusses how they can be achieved [12]. In the year 2030, Canada implements a sustainable city. They make use of energy-efficient resources and a soft energy route analysis that has been carried out in Canadian cities. Trends in sustainability is based on backcasting and depicts four non-overlapping sustainability concepts. Future research will employ strategic smart sustainable city development: a research and planning approach [13].
3 Future Review It is seen as a long-term goal-oriented approach to operationally sustainable development. Futures that are possible, probable, and preferable are all considered.
3.1 A Smart City Network Design-IOT The IoT is an arrangement of physical things that are integrated with sensors, actuators, and other technologies that allow them to communicate data with other devices. IoT infrastructure that can handle massive amounts of data and analyse it quickly enough to derive profound insights from it utilising cognitive computing. Millions of sensors are used to collect data. The application layer, network layer, and perception layer are the sole layers in the IoT architecture. The application layer is
880
R. Antony and R. Sunder
responsible for creating smart environments such as smart homes and smart health, as well as ensuring data probity, veracity, and secrecy. Network layer takes data from bottom layer, recognised as perception layer, and connects different devices to share information. A smart city cognitive framework that strives to build or improve the long-term viability of cities that operate with cognitive capacities. Creating a flexible smart city by implementing a cognitive IoT-based network architecture. We can use several resources for data collecting from a variety of sensors to support different applications in this way of cognitive computing. The main components of cIoT architecture include components of sensor, machine learning, and model of semantic. Sensors are utilised to collect data from the source programmes of ML that allows for algorithm optimization, which enhances performance. Semantic modelling is a cognitive procedure in which seen information is used to generate semantic rules. Scalable and adaptable cognitive computing framework. The smart city platform, layers of IoT, data, cognitive computing, and service layer were all supported by the CioT architecture. The smart platform is used, together with a number of sub layers, to create a flexible smart city, which includes smart homes, buildings, transportation, and energy, among other things. This is an era of massive data generation, which includes both structured and unstructured data. IoT layer describes data collected from sensors in a variety of devices, as well as providing a continuous real-time view. The data layer is responsible for a variety of sensor data, including brain activity, expressions the environment, social media, and gesture detection. The cognitive computing layer’s algorithm includes data preparation, analysis of data, extraction of cognitive traits, and machine learning. This allows for the creation of more customised solutions. The final layer, the service layer, discusses numerous cognitive computing the framework of smart cities, there are a variety of applications. It includes municipal government, medical care, driverless vehicles, firefighting, and the media sector, all of which help to improve analytical skills. Computing and the significance of Big Data are used as enabling technologies in cognitive computing. Traditional machine learning has a limited range of benefits, receives limited input files, and cannot learn new knowledge. The implementation of CioT architecture is made possible by deep learning and reinforcement learning. A significant amount of data is required to train deep learning models that map data to cognitive systems that provide individualised responses. Constant learning and improving the technique leads to refining the strategy using the incentive system, which is similar to how humans learn. Figure 3 shows the future-enabled smart city. It gives all in one solution with managing the all sources in efficient manner. Evaluating the each sources which are very important in daily life. The smart parking, electric charging of vehicles, water quality, smart home, and Internet of Things so on, makes smart sustainable life strategy. The importance of Big Data comes from IoT sensors and devices, which generate massive amounts of organised and unstructured data. IoT, natural language processing, deep learning, machine learning, and Big Data are all examples of cognitive computing. Deep learning and reinforcement learning both require a large amount of data, which cognitive computing may help with. Scalability, natural interactivity,
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
881
Fig. 3 Future-enabled smart city
and dynamism are all advantages of using cognitive computing with large data. It saves time and effort by reducing the amount of duplicate calculations. It is taken into account that cognitive systems have the ability to make data-driven decisions. As a result, the backcasting approach identifies the scale of concerns, challenges, and problems faced in sustainability of cities, and proposes solutions for sustainability in the future. The technology part of IoT, Big Data and others are unique applications that obtains the relationship between sustainability and intelligent cities. A case study is a research technique that allows for the analysis and comprehension of complicated phenomena in their natural setting case studies are well-established in a variety of scientific and technological domains, and you use them to investigate causation in order to discover the underlying principles. As per the result of a good research method which is applicable through the examining of large individual data and with a wide range of concerns evaluated. It results in significant advances in the diversity of approaches. The case research which strengthened due to the contributions made a significant role to the development.
4 Case Research The case study involves a kind of research technique that allows for the analysis and understanding of complicated phenomena in their natural setting. Case studies are well-known in a variety of scientific and technological domains, so you investigate
882
R. Antony and R. Sunder
causation to discover the fundamental principles. This has resulted in a sound study strategy that allows us to examine a wide range of unique instances in depth and comprehend the diversity of challenges. It results in significant advances in the diversity of approaches. These contributions greatly aided Farquhar’s case study research in developing and strengthening it (2012).
4.1 Case Study Based on Relevance and Strategy Descriptive research entails outlining, analysing, and interpreting the current nature, proposition, and processes of data-driven cities. With the most important and relevant focus on the relevant conditions, or how these cities behave in terms of their realisation and developing plans based on the criterias and technologies. Furthermore, because an urbanisation supports two scenarios, the solution is smart city incorporates a number of indications of an integrated city system in operation, which need research in order to acquire a comprehensive and detailed understanding of the system.
4.2 Case Study Based on Backcasting The backcasting approach plays an essential part in case studies that are supported by planning and assessing the circumstances, according to the core of future research. However, the goal of evaluating two dense cities [3] and two eco-friendly cities is to see how well they work together [1–3]. Nonetheless, case study review to suit the researchers provides a best platform for a variety of studies that may reveal new insights into a number of themes. With the ability to ask why, what, and how of a problem about complicated topics in context and find solutions, a wide range of inquiries arises.
4.3 Results The result shows the average analysis of each area and various developments across each cities. The emerging technologies show more confidence that will make up a good city. Here we compare two cities London and Barcelona, competing with other countries also there.
4.3.1
Comparative Study of London and Barcelona
In the early 2010s, the European cities firstly to embrace data-driven smart technology to qualitative their services were London and Barcelona. To better their lives, they
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
883
Fig. 4 Comparison between London & Barcelona. Source Nikitin et al. (2016)
are primarily focusing on Big Data especially in smart cities (London City 2018). They primarily invest in ICT infrastructure, which includes IoT devices for various stages of data collection, such as transportation, energy, environment, security, and healthcare. Both cities rely heavily on datacification. For a better life settlement, there is a wide range of modern technologies made up of a variety of sources. Barcelona has a significant role to play in 2015, with a desire to serve the people through emerging technologies (Bibi 2020a). The first four main cities are analysed in a comparative study of London and Barcelona based on urban management. Figure 4 shows the results. The legend and expert analysis are both represented graphically. London and Barcelona are the places where world leaders’ technology and decisions are put into action. London is regarded as one of the world’s most technologically advanced cities. Table 1 lists the important indicators and data-driven technologies involved with the comparison investigation, as well as the readiness and technologies of the two cities. London’s municipal system is ready to occupy the new technologies as a distributive manner. They provide creative industries for real-time geographical services. Already Barcelona is the huge collection of technologies which improves their implementations which improves the efficiency.
884
R. Antony and R. Sunder
Table 1 Implementation and readiness for data-driven technology Cities Preparedness indicators London
Barcelona
The introduction of technologies into municipal systems Model of decentralised systems The influence of the creative thinking and developments Each city, there is distribution of activities Services that are connected to a certain geographic regions Barcelona stands-high level of technology development It’s now time to put into action Barcelona is heavily relying on data-driven solutions The texts make use of modelling, efficiency, and optimizations
Table 2 Energy monitoring Type Metre for electricity The external environment A metre for gas Internal conditions-ambient Analyzer of networks Installation of solar thermal energy Temperature
No. of devices
Frequency of information
28 7 1 41 421 5 36
Every 1 and 15 min Every 1 and 15 min Every 1 min and 15 min Every 1 min and 15 min Every 1 min and 15 min Every 1 min and 15 min 15 min
Barcelona Smart Cities: Strategies of energy monitoring, the performance analysis can be analysed and approach energy consumption with smart metering devices and softwares, IoT devices to coordinate the supply based on energy market and consumption. Introduce MONICA project which establishes real and urgent information about quality and safety. According to Barcelona, 100% of the renewable supply mission is through smart energy. Installation of solar energy panels have been established as a part of distribution through the cities. There are two types of energy monitoring systems used in solar thermal installations. Examine the solar thermal energy produced and used, as well as the solar thermal installation’s benefits to the city, using data on energy usage acquired by analysing sensors installed in municipal buildings. Frequency of information mainly deals with send and upload details. The type, no. of components frequency of information illustrated in Table 2.
4.3.2
Smart, Sustainable Grids and Buildings-New Policy
This seeks to help with the development of smart grids, which are activated by two cities on a wide spectrum of research and development. There are numerous
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
885
Fig. 5 Controlling the source: Perera et al. (2017)
advantages to analysing, monitoring, and controlling energy efficiency, conserving and maximising the energy supply chain, and lowering expenses. By 2022, smart grid development will save 14 billion in annual energy expenditures, up from 3.4 billion in 2017. Smart grid development will also increase grid reliability and efficiency (Fig. 5). Smart metre rollouts and obtaining energy-saving strategies, as well as sensing technology, have produced positive results.Taking into account the following benefits, which are primarily concerned with supporting smart grid technology projects; the obstacles, such as energy security and privacy protection, which are the main impediments to smart grid development.
4.3.3
Urban Infrastructure Management
The urban structure is associated with Big Data and IoT which monitors,controls, and automates which involves the operations of tunnels, tracks, and road. The infrastructure increases the risk and compromises the safety involved in structural conditions. IoT devices used to monitor the waste completion time and improve incident management. As a result, London and Barcelona are engaged in a smart competition to increase the use of natural resources in order to achieve a clean city infrastructure and to implement a greater number of collection solutions.
886
R. Antony and R. Sunder
Fig. 6 Urban infrastructure
Fig. 7 Smart energy solutions based on data in SRS. Source Bibri (2020b)
London’s core field of data measurement is the London energy and greenhouse gas inventory, which substantially promotes low-cost investments in smarter electrical heating, waste, and water management. Smart garbage disposal and smart energy management are included. The public services in London are not fully integrated to implement data-driven solutions, but they are in the process of doing so. The smart waste system in Barcelona in Fig. 6 with ultrasonic sensors used to count degree of filling the waste bin box and plan the routes properly.The workers plan the duties regarding they receive the data,the system reduces the amount of time and efforts.The garbage is sucked up by the suction through pipes, which decreases noise. The energy obtained from garbage incineration is generally associated with smart Eco-cities, which provide data-driven solutions. Essential information must be transmitted in real time, as shown in Fig. 7. Sensors are used to access the consumers on production, transmission can take place and distribution, communication of technologies through electricity networks. In Fig. 7, the number 1 symbolises home automation with customer support and real-time usages. Number 2 smart metres which records the readings of electrical energy,consumption and billing. It supports two way digital communication between central system and metre which gives usages by real time. Number 3 illustrates handled by power supply concerned with data-driven applications. And also association of grid applica-
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
887
tions,monitoring and balanced accurately and enhance the power of central system.As seen in Fig. 7, number 4 is related to distributed and renewable energy. The use of distributed solar and wind power stations boosts renewable energy generation.
5 New Challenges and Possibilities Urban academics and planners hope to build a model for data-driven sustainable cities, which is becoming a prominent trend. The impact of recent technologies will have numerous opportunities.
5.1 Improved Efficiency There were progressive environmental performance and robust environment policies and set environmental goals. There is a high air quality score among the world’s top five most environmentally friendly countries. Making daily activities more faster due to the datacification which get better public health, applications can make prevent the diseases through smart monitoring. Smart cities associated with the ICT which provide operational efficiency as well as provide more safety. Efficient use of energy resources, analyse them to supply better or improved efficiency [14].
5.2 Decision Making Backcasting methodology scenarios defined the future goals and want to explore the long run. The qualitative and quantitative processes create opportunities, guiding actions, and enhance the decision-making process. By the way, future predictions are based on the types of decisions that are made and actions that are taken in the present. Present, past, and future backcasting deals in it. Proposed contains four group of tools, five stages, and methods are applied and necessity in such frame work[15]. Participatory backcasting was utilised to achieve various types of goals in the Novel Protein Foods project at the Sustainable Technology Programme in the Netherlands and the Nutrition case research of the project was Sustainable Households (SusHouse). To support decision making and strategic planning proposes the pluralistic backcasting. Its goal is to bridge the gap between scientific understanding and policymakers. The method depicts various conceivable scenarios and multiple future visions for CO2 emissions in Finland’s transportation sector up to the year 2050. Future studies are assisted by this higher cognitive process and nowadays decisions may play an important role [16].
888
R. Antony and R. Sunder
5.3 Real-time Feedback Urban metabolism is employed to know the flow of energy and concrete environments by facilitating their description and analysis. The technical and socio economic processes in cities lead to growth and production of energy, it eliminates the wastes regarding duplication of energy resources by improving efficient use of resources [17].
5.4 Civic Security CCTV is already fixed to be used to detect meetings and deliver notifications in smart cities such as London and Barcelona, where it is extensively employed as smart policies and apps. Developed smartphone applications allowing residents to send messages about events. Platforms based on IoT and numerous sensors are used to collect data for the building of smart cities. Advanced pattern recognition to keep an eye on public safety and respond fast to any criminal activity allegations [3]. Although we already mentioned some open research challenges, in this section, we discuss them and elaborate on several possible research questions that can address these issues. Planning, design, operation, and management are all significant elements that must be addressed and conquered in order to create data-driven, sustainable smart cities. Such obstacles and issues confronting the development of a sustainable environment, as well as the developing complicated questions, must be examined by experts in the field of smart sustainable environment [18]. IoT data has a big impact on the digital world because it allows you to make real-time choices, improve quality, and provide a wonderful user experience. Take, for example, a parking system where customers use the real-time system on a dayto-day or season-to-season basis. For solutions of new technologies that are time intensive or do not put data to immediate use. Complex patterns and the need for real-time data are common in many business applications, and patterns can help to improve operational efficiency. As a result, use data management systems at the top development level. Data governance and regulation in IoT data, and its use by different entities. There is a privacy concern that introduces the way data accessed by the participation of private owners or citizens have permission to share data from sensors installed in public places for monitoring purposes.
6 Conclusion and Future Work The complicated complexities and problems with the sustainability and urbanisation are addressed by building the solution to smart cities to become sustainable one. Many smart and sustainable cities around the world already have begun to use Big
A Review on Data-Driven Approach Applied for Smart Sustainable City: Future Studies
889
Data technologies to their advantage. We are on the cusp of a new era in which IoT architecture combined with Big Data and backcasting is fundamentally altering the way smart sustainable cities are built. The ultimate goal which use effective and advanced methods to make better and enable their donation to sustainability. The strategic development of the sustainability of city development associated with backcasting which makes the better life settlement. In this, digitised technologies for smart cities which improve the life quality and for decision making also. Integrating the advance study of Big Data informs the design of urban systems which leads to the urban planning. The IoT which give better design and Big Data contributes huge amount of data. They have become major domains in sustainable urbanism, and the next generation of technologies will be focused on smart sustainable cities. They mainly focus on the give and take policy like which has a faster lifestyle. For monitoring and assessing urban infrastructure, the development of data analysis tools is important parts of urbanisation. Furthermore, there are various obstacles and problems in this new area of technology in relation to smart sustainable cities that must be addressed and in order to solve and achieve the intended objectives. We want to think about resource use in addition to other things. Processing assurances and fault tolerance are taken into consideration. We also want to add other sources of information.
References 1. Bibri SE, Krogstie J (2020) Smart eco-city strategies and solutions: the cases of royal seaport, Stockholm, and western Harbor, Malmö, Sweden. Urban Sci 4(1):1–42 2. Bibri SE, Krogstie J (2020d) Data-driven smart sustainable cities of the future: a novel model of urbanism and its core dimensions, strategies, and solutions. J Futures Stud (in press) 3. Bibri SE, Krogstie J (2020b) The emerging data-driven smart city and its innovative applied solutions for sustainability: the cases of London and Barcelona. Energy Inform 3:5. https://doi. org/10.1186/s42162-020-00108-6 4. Bibri SE, Krogstie J (2019) A scholarly backcasting approach to a novel model for smart sustainable cities of the future: strategic problem orientation. City, Territory, Architect 6(3):1– 27 5. Bibri SE (2021) A novel model for data-driven smart sustainable cities of the future: the institutional transformations required for balancing and advancing the three goals of sustainability. Energy Inform 4:4. https://doi.org/10.1186/s42162-021-00138-8 6. Ahvenniemi H, Huovila A, Pinto-Seppä I, Airaksinen M (2017) What are the differences between sustainable and smart cities? Cities 60:234–245 7. Sheth A (2016) Internet of things to smart IoT through semantic, cognitive, and perceptual computing. IEEE Intell Syst 31(2):108–112 8. Elrawy M, Awad A, Hamed H (2017) Intrusion detection systems for IoT-based smart environments: a survey. J Cloud Comput 7:21. https://doi.org/10.1186/s13677-018-0123-6 9. Bandyopadhyay D, Sen J (2011) Internet of things: applications and challenges in technology and standardization. Wirel Pers Commun 58(1):49–69 10. Chen M, Herrera F, Hwang K (2018) Cognitive computing: architecture, technologies and intelligent applications. IEEE Access 6:19774–19783
890
R. Antony and R. Sunder
11. Ameer S, Shah MA (2018) Exploiting big data analytics for smart urban planning. In: IEEE 88th Vehicular technology conference (VTC-Fall). Chicago, IL, USA, pp 1–5. https://doi.org/ 10.1109/VTCFall.2018.8691036 12. ] Ahmed E, Yaqoob I, Hashem IAT, Khan I, Ahmed AIA, Imran M, Vasilakos AV (2017) The role of big data analytics in the internet of things. J Comput Netw 129:459–471 13. Bibri SE, Krogstie J (2017b) The core enabling technologies of big data analytics and contextaware computing for smart sustainable cities: a review and synthesis. J Big Data 4(38). https:// doi.org/10.1186/s40537-017-0091-6 14. Robinson J (1982) Energy backcasting—a proposed method of policy analysis. Energy Policy 12(1982):337–344 15. Tinker J (1996) From ‘Introduction’ ix–xv. In: Robinson JB et al (eds) Life in 2030: exploring a sustainable future for Canada. University of British Columbia Press, Vancouver 16. Bibri SE (2020a) Advances in the leading paradigms of urbanism and their amalgamation: compact cities, eco-cities, and data-driven smart cities, vol 2020 17. Bibri SE (2020b) The eco-city and its core environmental dimension of sustainability: green energy technologies and their integration with data-driven smart solutions. Energy Inform 3(4). https://doi.org/10.1186/s42162-020-00107-7 18. Bibri SE (2019) Big data science and analytics for smart sustainable urbanism: unprecedented paradigmatic shifts and practical advancements. Springer, Germany, Berlin
Sensor Fusion Methodologies for Landmine Detection Parag Narkhede, Rahee Walambe, and Ketan Kotecha
Abstract Landmine is one of the most deadly kinds of warfare weapon being used in the current century. Technological advancements are forcing the use of sensory systems for landmine detection operations. By sensing the different characteristics of a landmine, its presence can be predicted. Multi-sensors based landmine detection systems have shown significant improvement in a false alarms and target missing compared to the single sensor systems. The paper discusses widely used multi-sensor fusion techniques for landmine detection applications. Sensor fusion methodologies that can be extended to be used for landmine detection are also proposed in this paper. Experimental results by considering the Bayesian inference method and Kalman filter are also presented for landmine detection application. Keywords Kalman filter · Landmine detection · Probability · Sensor fusion
1 Introduction A landmine can be considered as a device composed of explosives filled with high pressure. According to international mine action standards (IMAS), a landmine is designed to be exploded by the presence, proximity or contact of a person or a vehicle [1]. Landmine is basically hidden below or buried underground so that after the explosion, it injures or kill a person or damage the vehicle. According to the worldP. Narkhede (B) Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune 412115, India e-mail: [email protected] R. Walambe · K. Kotecha Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune 412115, India e-mail: [email protected] K. Kotecha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_62
891
892
P. Narkhede et al.
wide surveys, at present more than 100 million landmines are buried underground, and the number is drastically increasing day by day. Detecting the presence of a buried landmine plays an important role in the military operations as well as for the civilians as most of the landmines are buried in the civilian and agricultural land. Animals like dogs and rats are traditionally used in landmine search operations. However, with the advent of technology, advancements are being adopted by many researchers to solve the problem of landmine detection using sensors based automatic detection systems. These automated systems are saving the costs for training and maintaining the animals along with their human operators. These automatic systems consider the different characteristics of landmines to predict their presence. Metal detector (MD) and ground-penetrating radars (GPR) are the most widely used sensors in landmine detection. MD works on the principle of electromagnetic inductance. The sensor consists of a coil that employs electromagnetic inductance, and the presence of a metallic object affects that inductance. GPR works by penetrating signals into the ground and analyzing the signals returned from the boundaries of the buried objects [2]. Landmine retains or releases heat at a different rate than the surrounding area. These thermal characteristics of the landmine are studied for detection in infrared detector (IR) based technique by [3]. Along with these popular sensing techniques, other techniques like acoustic waves [4], nuclear quadrupole resonance [5], millimetre waves [6], gas-based detection [7], biological methods, etc. are also studied and explored by the researchers for landmine detection. In the context of landmine detection, two terms are majorly used: false alarm and target missing. If the sensor system provides the alarm as the presence of a landmine by sensing the unwanted objects, then it’s a false positive alarm. In contrast, if the system fails to detect the landmine, its target is missing, false negative. The automatic detecting systems are targeted to detect the landmines along with the reduced probability of false alarming as well as target missing. However, researchers have identified that the probability of false alarms and target missing is high when a detecting system is designed with a single sensor. To counteract this, multiple sensors are being used in a single system. Combining the information from two or more sensors is known as sensor fusion [8]. Sensor fusion provides the advantage of providing more accurate and precise information. It also provides improved temporal and spatial coverage in the measurement. The purpose of sensor fusion is to improve the system’s performance compared to that with individual sensors. Solving the problem of landmine detection using sensor fusion has been studied in the literature [9–11]. Prado and Marques, in their work, discussed the method of reducing false positives via sensor fusion methodology [12]. Sensor fusion of Metal Detector and Ground Penetrating Radar is studied in [13–15]. Artificial Intelligencebased decision-making algorithm for landmine detection is studied by Lozano et al. in [16]. Although sensor fusion provides better results, it is not much explored in the detection of landmine detection. The availability of the large dataset in public domain can be one of the major for this. As reported, researchers used probabilistic techniques like Bayesian formulation, Dempster-Shafer algorithm, Naïve Bayes approach, voting method and rule-based for sensor fusion in landmine detection application. Literature also discusses the use of Artificial Intelligence and Fuzzy
Sensor Fusion Methodologies for Landmine Detection
893
inference based methods for sensor fusion in landmine detecting systems. This paper aims at discussing the different sensor fusion methodologies reported in the literature for solving landmine detection. Other than the above-listed fusion methodologies, Kalman Filter (KF) is reported to be widely used for sensor fusion in robotics and navigation applications. Hence KF is discussed, proposing their use for landmine detection. The complete paper is organized as follows. Section 2 discusses the popular sensor fusion methodologies used by researchers in the landmine detecting system. Section 3 proposes experimental work and results obtained for landmine detection algorithms, Sect. 4 discusses the landmine detection methodologies and concludes the paper.
2 Sensor Fusion Methodologies Sensor fusion is the process of combining multiple data from the same sensor or the data derived from the multiple sensors so that the resultant information is in some sense better than the individual sensor information [8]. The sensor fusion can be of three types; data fusion, feature fusion and decision fusion. Data based fusion techniques fuse the direct data obtained from the multiple sensors. Feature-based techniques fuse the features extracted from the sensory data. Decision level fusion techniques combine the decisions taken by measuring the sensory data. Decision level fusion techniques have shown remarkable performance compared to the other two techniques [9]. The further sections discuss the different multi-sensor fusion techniques employed for the detection of landmines. The fusion techniques basically consider the confidence level of the sensor for predicting the final decision. A confidence level of the sensor is the sensor output in the presence of the required object. A higher confidence level implies a high probability of object detection; however, the scale may not be linear. If the confidence level of the sensor with measurement x is given by a discriminate function f (x) then, If f (x) >= thr eshold then object is present i.e., Landmine If f (x) < thr eshold then the measurement belongs to non-object region i.e., background
2.1 Statistical Fusion Technique Statistical Fusion Technique is based on the probability and cost for the detection of landmines. In this technique, the decision is to be made for each sensor measurement at sample location regarding the presence (M) or absence (B) of landmine, i.e. presence of background. For each combination of decisions, the cost is calculated as:
894
P. Narkhede et al.
C B/B : cost for correct decision for identifying a background, C B/M : cost for identifying landmine as a background; missing of landmine, C M/B : cost for identifying background as a landmine; a false alarm, C M/M : cost for a correct decision of identifying landmine. Based on these costs, the expected cost for identifying background is: P(B/x) = C B/B ∗ P(B/x) + C B/M ∗ P(M/x)
(1)
and the expected cost of identifying landmine is: P(M/x) = C M/B ∗ P(B/x) + C M/M ∗ P(M/x)
(2)
Where P(B/x) and P(M/x) are the posterior probabilities and can be computed using Bayes’ rule as: P(B)P(x/B) (3) P(B/x) = P(x) P(M/x) =
P(M)P(x/M) P(x)
(4)
Where, P(x/M) and P(x/B) are the conditional probabilities whereas P(M) and P(B) are the prior probabilities of landmines and background. The final prediction is made by comparing the expected costs. If the cost for identifying a landmine is less than that of background, then the decision is taken as there is a landmine. P(M/x) < P(B/x) ⇒ Landmine
(5)
The major disadvantage and challenge in this method is the prediction of conditional probabilities. The discriminate function f (x) can be given as: f (x) =
f (x) >=
P(x/M) P(x/B)
C M/B − C B/B = threshold C B/M − C M/M
(6)
(7)
The ratio is known as the likelihood ratio, and by changing the threshold value, i.e. by changing the different costs, the tradeoff between landmine detection and false alarm can be achieved. Non-statistical methods like Naïve Baye’s approach, Bayesian approach, Dempster-Shafer approach are also explored by researchers for multisensor decision level fusion. These methods are primarily based on the optimization of the discriminate function.
Sensor Fusion Methodologies for Landmine Detection
895
2.2 Bayesian Approach Bayesian inference technique is a tool based on the probability of detecting the presence of a landmine under uncertain conditions [17]. The Bayesian formulation considers information in two stages, past information and current information. It considers the belief in the probability of the presence of landmine on current time and location as well as on the prior information of sensor observations. Diagrammatically Bayesian Fusion process can be represented as in Fig. 1. The probability of landmine detection depends on the priori probability of each sensor. For simplicity, all the sensors measurements are considered independent so that one sensor’s measurement does not affect the measurement by another. If the most recent sensor measurement is xm , then the probability of the presence of landmine at a current location can be computed using the Baye’s rule as: N P(M/xm ) =
i=1
P(xi /M)P(M) P(xm )
(8)
Where, P(xi /M) represents the conditional probability; P(M) and P(xm ) represent the priori probabilities of landmine and recent measurement respectively. The Bayesian approach is extensively studied for sensor fusion in landmine detection problems by Prado et al. [18]. The disadvantage of this approach is the requirement of conditional probabilities, as this information is not readily available and may not be directly estimated from past observations. This method is best suited for the system where prior knowledge is available and conditional dependency can be easily modelled [19].
Confidence Level of Sensor 1
Measurement from Sensor 1
Measurement from Sensor 2
Probability Assignment
Probability Assignment
Confidence Level of Sensor 2
Fig. 1 Bayesian sensor fusion
Bayesian Inference
Probability of Presence
896
P. Narkhede et al.
2.3 Naive Baye’s Approach Naïve Baye’s approach is based on Naive’s implementation of the likelihood ratio. It uses Baye’s theorem with the assumption that the sensor confidence levels are independent. Hence the conditional probabilities can be computed using the following equations. P(x/M) ≈
N
P(xi /M)
(9)
P(xi /B)
(10)
i=1
P(x/B) ≈
N i=1
Therefore the discriminate function f (x) can be represented as: f (x) =
N N P(xi /M) f i (xi ) = P(xi /B) i=1 i=1
(11)
Here, f i (xi ) is known as a marginal likelihood ratio. Naive Baye’s classifiers require very little training data and can be extremely fast compared to other methods of classification [20]. This approach is employed for landmine detection by [17].
2.4 Dempster-Shafer Approach A study carried out by Dempster for multiple values mapping [21] can be applied to solve the problem of sensor fusion. Dempster-Shafer approach can be considered as an approximation or the special case of the Bayesian approach. Dempster-Shafer fusion technique is extensively studied and used to solve the problem of sensor fusion in [22]. For the application of this method, three inputs per sensor are required; A probability mass assigned to landmine: m(M), probability mass assigned to background: m(B), unassigned probability mass m(M ∪ B). The probability mass for landmine signifies the faith in the presence of landmine, whereas unassigned probability mass signifies the uncertainty of sensor. These masses follow the condition that they sum up to unity. m(M) + m(B) + m(M ∪ B) = 1
(12)
The unity sum assures that there is only landmine and background under consideration. The sensors confidence level ci at each location need to be mapped to all
Sensor Fusion Methodologies for Landmine Detection
897
the three probability masses. The mapping is done by the mapping parameter u i . The range of u i is [0, 1]. The confidence levels for sensor i can be mapped with the probability masses as: m i (M) = (1 − u i )ci m i (B) = (1 − u i )(1 − ci )
(13)
m(M ∪ B) = u i At sensor confidence level zero, m(M) = 0 and m(B) = 1 whereas at confidence level one m(M) = 0 and m(B) = 0. If the probability masses of two sensors m 1 and m 2 are combined, then the resulting normalized probability masses obtained can be given as: m 1,2 (M) =
m 1 (M)m 2 (M) + m 1 (M)m 2 (M ∪ B) + m 2 (M)m 2 (M ∪ B) k m 1,2 (M ∪ B) =
m 1 (M ∪ B)m 2 (M ∪ B) k
(14)
(15)
where, k = 1 − m 1 (M)m 2 (B) − m 1 (M)m 2 (B). The output of the Dempster-Shafer technique is given by combining the probability mass assigned to the landmine and the uncertainty. Hence the resultant discriminate function is given by: f (x1 , x2 ) = m 1,2 (M) + m 1,2 (M ∪ B)
(16)
The use of Dempster-Shafer theory for landmine detection showed a very high landmine detection rate and reduced false alarm rate [19].
2.5 Rule Based Fusion Rule-based techniques form a flexible approach for decision level sensor fusion. In this method, the decision thresholds can be set by intuition and hence prior knowledge of the environment can be easily incorporated into the system. Den Breejen et al. [23] applied rule-based method for decision fusion in landmine detection. Detection of landmines depends on the different environmental factors like types of soils, different weather conditions etc., in such cases, rule-based fusion method provides the flexibility of providing different rules for different situations. Mathematically, rule-based method can be represented as: (c1 > t1 ) ∧ (c2 > t2 ) ∧ · · · ∧ (c N > t N ) ⇒ Landmine
(17)
898
P. Narkhede et al.
This expression consists of a conjunction (∧) operation for each decision ci > ti . Here, ci represents the confidence level of sensor i and ti represents corresponding threshold. Particular rule can be ignored in the system by setting its threshold to zero. If there exits more than one rule set then disjunction (∨) operation can be used for decision which can be represented as: (c1 > t1,1 ) ∧ (c2 > t2,1 ) ∧ · · · ∧ (c N > t N ,1 )∨ (c1 > t1,2 ) ∧ (c2 > t2,2 ) ∧ · · · ∧ (c N > t N ,2 ) ⇒ Landmine
(18)
Here, ti, j represents the threshold where i is the sensor number, and j is the rule number. A rule set is derived from the training data such that each sensor must be covered in at least one rule. The resolution of the rule-based method depends on the number of rules and the thresholds considered in the system.
2.6 Voting Fusion Voting fusion is a decision level sensor fusion methodology. This simple technique of sensor fusion is studied for landmine detection by [23]. This approach uses N + 1 rules for the decision where N represents the total number of sensors in the system. N thresholds are used for N sensors, and the remaining threshold is used for deciding the number of votes. A vote given by the sensor is considered only if its confidence level is greater than its corresponding threshold. The discriminating function for the voting fusion method can be represented as: if f (x) = 1 then consider x as Landmine if f (x) = 0 then consider x as Background where, f (x) can be represented as N F(ci , ti ), t N +1 (19) f (x) = F i=0
1, c > t F(c, t) = 0, c ≤ t
(20)
The optimal threshold for a number of votes is selected based on the required landmine detection rate.
2.7 Fuzzy Logic Fuzzy logic is a reasoning approach based on the degree of truth of happenings. Each happening is assigned a value between 0 and 1, known as membership value. The
Sensor Fusion Methodologies for Landmine Detection
899
membership value 0 signifies the lowest, and value 1 signifies the highest degree of truth. Although fuzzy logic and probability theory looks similar, they address different kinds of uncertain conditions. Fuzzy logic is based on the membership function, whereas probability uses the likelihood of the condition. Fuzzy Probabilities can be employed to incorporate the uncertainty. The fuzzy membership function is instrumental in defining the uncertainty in the sensor measurement by assigning the probabilities to them. By taking the intersection of different sensor measurements and finding the minimum/maximum of it, the decision about the presence or absence of the landmine can be made. The resultant probability obtained as a result of fuzzification and then can be defuzzyfied to obtain the confidence level of the particular sensor. Gaussian membership is one of the popular methods of implementing fuzzy probabilities. It is expressed as μi (x, Ci ) = σi G(x; Ci , σi )
(21)
Where, Ci is the confidence level which is the mean of the Gaussian curve. The width σi of the curve represents the uncertainty of the measurement. Where smaller width means large influence. The aggregation of sensors is done using minimum intersection as ¯ = min(μ1 (x, C1 ), . . . , μ N (x, C N )) (22) μi (x, C) The final confidence/decision is made by using median based defuzzification method. ¯ = median(μ(x, C)) ¯ F(C)
(23)
Meitzler et al. [24] developed the algorithms for the detection of landmines. They considered two infrared cameras to generate the image of the minefield. The obtained two grayscale images are applied as an input to the fuzzy inference system. Using the developed membership functions, fuzzification is carried to obtain a single image. The resultant image is used for the detection of landmines. On a similar line, [25] fused the data from an array of GPR sensors to detect the landmine. The disadvantage of the fuzzy logic method is the predefinition of a membership function. However, it becomes difficult to consider a particular membership function for sensor fusion as there exists a number of different kinds of membership functions, as well as the fusion methods [24].
2.8 Kalman Filter Kalman Filter is a recursive algorithm that works on the principle of prediction and correction (Fig. 2). When KF is used for sensor fusion, a process model is formulated from the measurement of one sensor, whereas the measurement model is formulated
900
P. Narkhede et al.
measurements
initial states & filter parameters
Prediction
Correction
State estimates Fig. 2 Process of Kalman filter
with the help of measurements from another sensor. The combined result of both the sensors is then obtained from the state estimate equation of KF. In this work, the KF is modelled as (24) Process Model: xk = e1k Measurement Model: yk = e2k
(25)
Here, e1 and e2 are the two sensors measurements. The Kalman Filter works satisfactorily when the process model and measurement model contain white Gaussian noise. Here, the sensor measurements are directly considered for defining the process model, and measurement model and these measurements are assumed to have inherent white Gaussian noise with zero mean. Hence extra noise parameter is not considered while modelling the KF. The presence of a landmine will cause the change in both sensor measurements, a linear relationship is assumed between the process model and measurement model. Due to the existing linear relationship, linear discrete Kalman Filter is designed for performing sensor fusion. Suppose the process model and measurement model possess a non-linear relationship between them. In that case, the non-linear version of the KF, i.e. Extended Kalman Filter or Unsent Kalman Filter, is required to be considered. The state estimation equation for the KF can then be given as below. The KF is modelled using sensor measurements directly and does not consider any external input to the KF, the input parameter in state estimation equation of prediction state is set to zero i.e. u k = 0 As here any type of state transaction does not exist, the state transaction matrix is A and C are set to unity, i.e. A = [1]1x1 and C = [1]1x1 . The state estimation equation can be represented as Step I: Prediction State estimation: xˆk− = ek Error Covariance Estimation: Pk− = Pk−1 + Q
(26) (27)
Here, Q is the 1x1 matrix representing the noise covariance of the process model. In the next step (correction step), Kalman gain and corrected step estimates are computed.
Sensor Fusion Methodologies for Landmine Detection
901
Step II: Correction −1 Kalman Gain: K k = Pk− Pk− + R
Correct/update state estimate: xˆk = xˆk− + K k (yk − xˆk− ) Update error Covariance: Pk = (1 − K k ) Pk−
(28) (29) (30)
The Kalman Gain signifies that as measurement error covariance approaches zero, the trust on measurement y increases. While performing the sensor fusion using the Kalman filter, both the sensor measurements are required on the same measurement scales. Hence, the measurements from both the sensors are normalized between [0, 1]. As the inputs to the KF are normalized values, the estimated state by KF, i.e. xˆk also lies between [0, 1]. This value between 0 and 1 can be considered as the probability value providing the combined probability for the presence of landmine based on both sensor measurements.
3 Experimental Work and Results The sensor fusion algorithms for fusing the measurements from two sensors are discussed in the previous section. Landmines generally contain metal parts and also have an outer metal casing; hence Metal Detector (MD) sensor is chosen as the primary sensor for fusion. However, a metal detector is unable to generate the required signatures when the landmines are made up of non-metallic elements like plastic or polycarbonate. Additionally, the metal content in the ground is due to other objects, and there is a higher possibility of false positives. Hence another sensor output is fused to support or refute the outcome of the MD. This leads to better predictions of the presence of the landmine during demining operation. Ground Penetrating Radar (GPR) is also one of the widely used sensors for landmine detection applications. GPR contains a transmitter and receiver module. The high-frequency signals are transmitted from the transmitter towards the ground, i.e. landmines. Some of these signals are get absorbed by the landmine material, and the remaining signals are reflected back, which are received by the receiver. The strength of the received signal then identifies the presence of a landmine. The strength of the reflected signals varies based on the material from which the signals are reflected. Hence GPR can identify metallic as well as for non-metallic landmines. For validation of the fusion algorithms developed above, real data from live landmines was required. However, due to safety and confidentiality reasons, such data was not made available. Based on the reparented interactions with scientists, we identified the range of the sensor data in the presence/absence of the mines. We then carefully generated the set of synthetic data that can be used effectively for validation. Based on the property of individual sensors, namely MD and GPR, synthetic data is generated and used for algorithm validation. MD measures the magnetic field strength, which is generally a digital value assumed to be 10 bits and signal strength measured by GPR is represented in decibels.
902
P. Narkhede et al.
Bayesian Fusion works on probability values. In literature, different techniques like fuzzy logic are used for assigning probability values to sensor measurement. However, and manual probability assignment is a widely adopted method in the literature. This method is static and requires knowledge of the completer system to assign the probability value to senor measurements. In this work, a simple normalization based methodology is proposed. The measurements from MD and GPR are on a different scale, and Kalman Filter requires the measurements to be on the same scale. Hence, both the sensor measurements are brought to the same scale using normalization. To assign the probability value to sensor measurements for performing Bayesian Fusion and bring both the sensor measurements on the same scale, the measurements from MD and GPR are normalized between [0, 1] using the equation. normalized_value =
s − min max − min
(31)
Here, s is the sensor measurement, max and min are the maximum and minimum values of sensor measurements denoting the complete range of measurement values. The obtained sensor fusion results of the Bayesian Fusion and Kalman Filter Fusion are provided in Figs. 3 and 4. Figure 3 provides the sensor fusion results when the considered landmine is of metallic type, whereas Fig. 4 depicts the sensor fusion results when the landmine is of non-metallic, i.e. plastic-type. In these figures, the x-axis depicts the landmine scan location, whereas the y-axis indicates the corresponding probability for the sensor measurements. In both these figures, blue and orange curves denote the probability of the presence of landmine based on individual M D(P(L|M D)) and G P R(P(L|G P R)) sensor-based measurements. In comparison, green and red curves denote the fusion results for Bayesian Fusion (P(L|M D, G P R)_B F) and Kalman Filter based fusion (P(L|M D, G P R)_K F) respectively. Standard Deviation graph (SD) is shown in black color. In Fig. 3, it is assumed that the metallic landmine is present at location C. A onedimensional scan is considered for taking the sensor measurements (from location A to E). Locations A and E are far away from the landmine location, and hence the probability of the presence of landmine at these locations measured by MD as well as GPR is very low. As the scan moves towards the landmine location, the probability of the presence of the landmine increases gradually. The GPR measurements are highly affected by environmental parameters as well as the structure of the Earth surface; hence a little bit higher probability is considered even at locations A and E. As the scan location is closer to the landmine location (C), the probability measured by MD and GPR is increasing. It is highest at landmine location C. Here, landmine considered is of metallic type, and hence MD sensor can detect it easily providing very high probability measurements. The metal affects the GPR signals as well; therefore, the GPR also provides a high probability of landmine presence. The fused probability using the BF as well as using the KF indicates the combined probability of the presence of a landmine. Fusion results also depict the presence of landmine at location C by indicating high probability values. Location A, B, D and E are away
Sensor Fusion Methodologies for Landmine Detection
903
Metallic Landmine Probability of Landmine Presence
1
0.5
0
A
B
C
P(L|MD)
P(L|GPR)
D
P(L|MD GPR)_BF
P(L|MD GPR)_KF
E SD
Fig. 3 Sensor fusion results for metallic type landmine detection
Plastic Landmine
Probability of Landmine Presence
1
0.5
0
A
B P(L|MD)
C P(L|GPR)
P(L|MD GPR)_BF
D P(L|MD GPR)_KF
E SD
Fig. 4 Sensor fusion results for plastic-type landmine detection
from the exact landmine location, so at these locations, the fused probabilities are also a little comparatively low, indicating the less change of presence of landmine at these locations. In the second experiment, the non-metallic (plastic) landmine is taken into consideration, and individual sensor predictions, as well as fused predictions, are represented in Fig. 4. In this experimentation also the one-dimensional scan is considered from location A to E. A plastic-type landmine is assumed to be at location C. It can be observed that the MD is not able to identify and produce the required signatures
904
P. Narkhede et al.
Normalized Error - Metallic Mine 6
4
2
0
-2
A
B
C BF
D
E
KF
Fig. 5 Normalized error plot for metallic landmine
for non-metallic landmines. However, some non-zero probability measurements are considered for MD because MD produces signatures even due to metallic particles present in the soil. GPR measurements vary with the presence of even plastic landmines and hence the probability of the presence of landmine by GPR. Measurements start increasing as the scanning location is closer to landmine location C. It can be observed in Fig. 4 that the probability of the presence of landmine through the MD measurements is almost constant as MD can identify non-metallic landmines. However, the probability of landmine at location C from the measurements of GPR is very high. However, GPR measurements are susceptible to environmental changes and could not guarantee the presence of landmine. GPR generates similar signatures as of landmines even when there exists a random plastic object and could not discriminate between landmines and any random object. The plastic landmines also contain a smaller metallic part. The presence of smaller metallic parts allows using the MD sensor to detect its presence. The inclusion of MD in the GPR measurements reduces the probability of the detected object as any random plastic object and hence increases the probability of the detected object being a landmine. In the experimentation process, normalized error is calculated using the equation: x m − xr e f E=
Um2 + Ur2e f
(32)
Here, xm represents the measurement and Um represents the uncertainty in measurement. xref is the reference measurement and Uref is the uncertainty in reference. In this experimentation, uncertainty value is considered to be as 5%. The normalized error plot computed for Bayesian Inference and Kalman Filter is shown in Fig. 5. From both these experiments, it has been observed that when a landmine is made up of a metallic object, both metal detectors and ground-penetrating radar sensors
Sensor Fusion Methodologies for Landmine Detection
905
can identify the presence with considerable probability. When the landmine is made up of plastic types, then the metal detector does not produce significant signatures; however, in this case, ground-penetrating radar produces the required signatures for the presence of the landmine. It can also be observed that MD produces similar signatures for metallic landmines as well as for random metallic objects. Similarly, GPR produces similar signatures in the case of plastic landmines as well as for any random plastic object. The combination of MD and GPR can detect metallic as well as plastic landmines.
4 Conclusion In this paper, sensor fusion methodologies to solve the landmine detection problem are discussed. Probabilistic methods such as Bayesian inference and Naïve Baye’s method use the sensor confidence levels and conditional probability to predict the presence of landmine. By deciding a proper threshold value, a tradeoff between the detection rate and false alarm can be managed. Prior information can be easily incorporated into the system by the Bayesian inference method. However, the prediction of conditional probability is a point of concern in real-time implementation. On the other hand, the naïve Bayes method requires significantly less training data and can perform computations at faster rates. Discriminant functions describing the rule-based method and voting method are similar. However, a voting method is a subset of the rule-based method. These methods use conjunctions and disjunctions to combine information. The rule-based method has the highest degrees of freedom compared to other methods and can incorporate more prior knowledge. Fuzzy logic and rule base method follows the if-then rule and hence look similar. However, due to sharp thresholds, the rule-based method may not perform better than the baseline. Researchers reported that these sensor fusion methods show significant improvement in landmine detection compared to a single sensor. It has also been reported that fusion of multiple algorithms ever performed better than these discussed discrimination algorithms. Among the explored sensing techniques, metal detection and ground-penetrating radar methodology are widely used to detect landmines. However, individually the metal detector fails to detect the non-metallic landmines, whereas ground-penetrating radar signals are affected by environmental parameters. Hence a sensor fusion based methodology is proposed and tested on the synthetic data. Bayesian Fusion and Kalman Filter based fusion methods are applied to fuse metal detector and ground-penetrating radar signals. Bayesian Fusing is a probabilistic technique and hence requires the priory probability of the sensor measurements. This task of achieving the priory probability to sensor measurements is solved using the normalization method. The Kalman filter required both the sensor measurements to be on the same scale of measurements. The measurements from both the sensors are brought to the same level again using the normalization method and combined using the Kalman Filter. The fusion results for detecting landmines are promising
906
P. Narkhede et al.
and increase the measurements’ confidence while classifying them as background or landmine. However, though the fusion of metal detectors and ground-penetrating radar can detect plastic and metallic landmines, it cannot discriminate between landmines or any random object with metallic and plastic parts. To overcome this issue, future work can be extended to measure the chemical properties or heat signatures of the landmines and traditional signatures to identify the presence of landmines. The result can also be explored and opened in artificial intelligence for the accurate detection of landmines. Acknowledgements This research was funded by Symbiosis International (Deemed University) under the grant Minor Research Project Grant 2017–18 void letter-number SIU/SCRI/ MRP APPROVAL/2018 /1769.
References 1. I 04.10 (2003) Glossary of mine action terms, definitions and abbreviations 2. Tbarki K, Ksantini R, Said SB, Lachiri Z (2021) A novel landmine detection system based on within and between subclasses dispersion information. Int J Remote Sens 42(19):7405–7427 3. Milisavljevic N, Bloch I (2003) Sensor fusion in anti-personnel mine detection using a two-level belief function model. IEEE Trans Syst Man Cybernetics Part C (Appl Rev) 33(2):269–283 4. Xiang N, Sabatier JM (2000) Land mine detection measurements using acoustic-to-seismic coupling. Detection and Remediation Technologies for Mines and Minelike Targets V, International Society for Optics and Photonics, 4038:645–655 5. Hibbs AD, Barrall GA, Czipott PV, Lathrop DK, Lee Y, Magnuson EE, Matthews R, Vierkotter SA (1998) Land mine detection by nuclear quadrupole resonance. Detection and Remediation Technologies for Mines and Minelike Targets III, International Society for Optics and Photonics 3392:522–532 6. Du Bosq TW, Lopez-Alonso JM, Boreman GD (2006) Millimeter wave imaging system for land mine detection. Appl optics 45(22):5686–5692 7. Bielecki Z, Janucki J, Kawalec A, Mikołajczyk J, Pałka N, Pasternak M, Pustelny T, Stacewicz T (2012) Sensors and systems for the detection of explosive devices-an overview. Metrol Measure Syst 19(1):3–28 8. Elmenreich W (2002) An introduction to sensor fusion. Vienna Univ Technol Austria 502:1–28 9. Cremer F, Schutte K, Schavemaker JG, den Breejen E (2000) Toward an operational sensorfusion system for antipersonnel land mine detection. Detection and Remediation Technologies for Mines and Minelike Targets V, International Society for Optics and Photonics 4038:792–803 10. Cremer F, Schutte K, Schavemaker JG, den Breejen E (2001) A comparison of decision-level sensor-fusion methods for anti-personnel landmine detection. Inf Fusion 2(3):187–208 11. Knox M, Rundel C, Collins L (2017) Sensor fusion for buried explosive threat detection for handheld data. Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXII, International Society for Optics and Photonics 10182:101820D 12. Prado J, Marques L (2017) Reducing false-positives in multi-sensor dataset of landmines via sensor fusion regularization. In: 2017 IEEE international conference on autonomous robot systems and competitions (ICARSC) 13. Sule SD (2017) Handheld sensor fusion for landmine detection using metal detector and gpr. Frontiers Sci 7(4):51–56 14. Kim B, Kang J, Kim D, Yun J, Choi S, Paek I (2018) Dual-sensor landmine detection system utilizing gpr and metal detector. In: 2018 international symposium on antennas and propagation (ISAP)
Sensor Fusion Methodologies for Landmine Detection
907
15. Marsh LA, Van Verre W, Davidson JL, Gao X, Podd FJW, Daniels DJ, Peyton AJ (2019) Combining electromagnetic spectroscopy and ground-penetrating radar for the detection of anti-personnel landmines. Sensors 19(15):3390 16. Florez-Lozano J, Caraffini F, Parra C, Gongora M (2020) Cooperative and distributed decisionmaking in a multi-agent perception system for improvised land mines detection. Inf Fusion 64:32–49 17. Frigui H, Zhang L, Gader P, Wilson JN, Ho K, Mendez-Vazquez A (2012) An evaluation of several fusion algorithms for anti-tank landmine detection and discrimination. Inf Fusion 13(2):161–174 18. Prado J, Cabrita G, Marques L (2013) Bayesian sensor fusion for land-mine detection using a dual-sensor hand-held device. In: IECON 2013-39th annual conference of the IEEE industrial electronics society, IEEE, pp 3887–3892 19. Mudigonda NR, Kacelenga R, Erickson D (2003) The application of dempster-shafer theory for landmine detection. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications, International Society for Optics and Photonics 5099:103–112 20. Zhang H (2004) The optimality of Naive Bayes. AA 1(2):3 21. Dempster AP (2008) Upper and lower probabilities induced by a multivalued mapping. In: Classic works of the Dempster-Shafer theory of belief functions, Springer, pp 57–72 22. Xu S, Hou Y, Deng X, Chen P, Ouyang K, Zhang Y (2021) A novel divergence measure in dempster-shafer evidence theory based on pignistic probability transform and its application in multi-sensor data fusion. Int J Distribut Sensor Netw 17(7):15501477211031472 23. Breejen ED, Schutte K, Cremer F (1999) Sensor fusion for antipersonnel landmine detection: a case study. Detection and Remediation Technologies for Mines and Minelike Targets IV, International Society for Optics and Photonics 3710:1235–1245 24. Meitzler TJ, Bryk D, Sohn E, Lane K, Raj J, Singh H (2003) Fuzzy-logic-based sensor fusion for mine and concealed weapon detection. Detection and Remediation Technologies for Mines and Minelike Targets VIII, International Society for Optics and Photonics 5089:1353–1362 25. Gader P, Keller JM, Frigui H, Liu H, Wang D (1998) Landmine detection using fuzzy sets with gpr images. In: 1998 IEEE international conference on fuzzy systems proceedings. IEEE World congress on computational intelligence (Cat. No. 98CH36228)
Machine Learning Techniques to Predict Intradialytic Hypotension: Different Algorithms Comparison on Unbalanced Data Sets Domenico Vito
Abstract The purpose of this work is the development of an effective and efficient tool for an inline Intradialysis hypotension events prediction based on pre-session patient-specific parameters. The availability of a large clinical database compiled during the DialysIS project allowed evaluating the performance of different machine learning algorithms (i.e., Random Forest, Artificial Neural Network, Support Vector Machines). Given the unbalanced nature of the analysed data set, different minority class oversampling techniques were also implemented and compared. Despite the absence of a strengthened state of the art for the proposed application, satisfying accuracies have been reached (e.g., 88.26%, 2.80% accuracy for SVM model). The lack in literature of similar studies puts this work as an interesting starting point for further improvements. Keywords Machine learning · Dialysis hypotension · Data analysis
1 Introduction The availability of several new generation digital instruments and sensors makes nowadays easier to produce and gather large number of clinical related data, not only referred to longitudinal trend in the pathology evolution, but also referred to single treatments. [1] Thanks to the use of large amounts of data, standard medical practice is moving from relative ad hoc and subjective decision-making to evidence-based and personalized healthcare [2]. Using data for decision-making is the key on the way of personalized medical treatments: the promise of data-driven decision-making could allow a huge improvement in diagnostic and therapy. In recent years, a dramatic increase in the use of computation-intensive methods to analyse biomedical signals has been noticed. One of many applications of this approach is to create classifier that can separate subjects into (usually) two or (rarely) D. Vito (B) Politecnico di Milano, p.zza Leonardo da Vinci 32, Milano 20090, MI, Italia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 551, https://doi.org/10.1007/978-981-19-6631-6_63
909
910
D. Vito
more classes based on attributes measured in each subject. A potential use of such a classifier is to analyse biomedical data and detect or diagnose disease, making the most of machine learning power in discovering hidden information patterns and correlations [3]. Powerful techniques are needed to investigate such patterns and relationships among medical variables and patient pathophysiological states. A machine learning approach can be especially useful in chronic diseases management where pathophysiological condition of a patient is steadily monitored. Renal pathologies affect about 10% of the world population [4]. Despite recent advances in nephrology care, renal pathologies and related complications still remain one of the main causes of death and invalidity all over the world [5]. Intradialysis hypotension (IDH) is the most common adverse complication during haemodialysis (HD) [6–9]. Intra-HD nausea, vomit and fainting could be symptoms of hypotension which imply discomfort to the patient and extra work to the clinicians. Moreover, when IDH determines a pre-mature interruption of the dialysis session, the patient’s blood may be not adequately purified. In the long term, frequent hypotension episodes may lead to heart and intestine permanent damages. In the recent years, IDH prevention has been investigated through different approaches, highlighting the multifactoriality of the phenomenon [10–14]. Its early prediction and prevention will dramatically improve the quality of life for patients with end stage renal disease. Large quantities of data can be indeed collected during renal replacement therapies administration. HD patients are treated two to three times per week in clinics: the data related to their pathological condition and treatment can therefore be easily recorded. Davenport [15] DialysIS project is an European project based on the collaboration of the dialysis units of different hospitals between Italy and Switzerland. During its course, a wide and common database has been collected [16]. The high number of available data made possible to evaluate the effectiveness of machine learning techniques in the development of models able to evaluate at the beginning of a HD the risk of IDH onset during the session.
2 Materials and Methods IDH prediction has been approached as a binary classification problem. Being IDH events recorded only in the sessions, the considered data set appears unbalanced and over-sampling was required. Different techniques (i.e. bootstrap, SMOTE) combinations were tested and compared. Given the lack of similar studies in the literature different algorithms have been compared: Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine(SVM). The developed models have been evaluated in terms of accuracy (ACC), precision (PRC) and recall (RCL), and area under the Receiver Operating Characteristic Curve (ROC) compared between each other’s and with
Machine Learning Techniques to Predict Intradialytic Hypotension: …
911
similar application found in the literature. Intended to investigate such patterns and relationships among medical variables and patient pathophysiological states. A machine learning approach can be especially useful in chronic diseases management.
2.1 Data The database used for this study was composed by 808 dialysis sessions, 130 patients. DialysIS database was compiled integrating treatment recorded data and patientspecific parameters measures. For each patient registry, data were also registered in order to have a better definition of their medical case: • • • • •
name, surname, patient gender, birth date; height [cm], weight [Kg]; main pathologies that led to Chronic Renal Disease (CRD) other concurrent pathologies (i.e. hearth condition, diabetes, etc.), previous therapies • prescriptions and relative doses (if taken) • dialytic therapy (i.e. HD or HDF) and dialytic age [months]; • esteemed dry weight [Kg] and weight loss [Kg]. For each session, following parameters were registered by the software: • every minute: – – – – – – – –
time [min], ultrafiltration speed (UF)[ml/h ultrafiltrate volume [ml], weight loss [Kg]; dialysate ux [ml/min] and temperature [°C]; hematic ux [ml/min] and cumulative hematic volume [ml]; fistula and body temperature [°C]; haematocrit [%], haemoglobin [g/dl] and Residual Blood Volume (RBV) [5%]; conductibility; arterious and venous pressure;
• every 15 min: – systolic and diastolic pressure [mmHg]; – heart rate [bpm]. Hematic composition was measured at the beginning, at the end and every hour during the haemodialysis session. In the hematic composition analysis were considered: • pH, oxygen partial pressure (pO2), carbon dioxide (pCO2), haematocrit (Ht), • haemoglobin (Hb);
912
D. Vito
• electrolytes concentration (calcium, magnesium, sodium, potassium, chlorine, phosphates); glycemia, urea; • albumin, total calcaemia and phosphorus (only at the beginning and at the dialysate bath(only at the beginning). Other exams such as parathormon (PTH), Beta-2 micro-globulin and creatinine plasmatic level were performed at the beginning and at the end of the RST therapy session. Patient hydration was analysed at the start of first and last studied sessions by bioimpedentiometry. During each session, eventual complications, infusions and food and drinks consumption were also registered. IDH events were automatically identified using the IDH-D dialysis criteria, an improvement of the IDH characterization criterion proposed in [17]. There is an IDH episode if: • • • •
SAPt