125 77 25MB
English Pages 814 [782] Year 2023
Lecture Notes in Electrical Engineering 1078
Bhuvan Unhelkar Hari Mohan Pandey Arun Prakash Agrawal Ankur Choudhary Editors
Advances and Applications of Artificial Intelligence & Machine Learning Proceedings of ICAAAIML 2022
Lecture Notes in Electrical Engineering Volume 1078
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Napoli, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, University of Karlsruhe (TH) IAIM, Karlsruhe, Baden-Württemberg, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Dipartimento di Ingegneria dell’Informazione, Sede Scientifica Università degli Studi di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Intelligent Systems Laboratory, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, Department of Mechatronics Engineering, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Intrinsic Innovation, Mountain View, CA, USA Yong Li, College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Subhas Mukhopadhyay, School of Engineering, Macquarie University, NSW, Australia Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova, Genova, Italy Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Federica Pascucci, Department di Ingegneria, Università degli Studi Roma Tre, Roma, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, University of Stuttgart, Stuttgart, Germany Germano Veiga, FEUP Campus, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Haidian District Beijing, China Walter Zamboni, Department of Computer Engineering, Electrical Engineering and Applied Mathematics, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University, Kowloon Tong, Hong Kong
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering—quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada Michael Luby, Senior Editor ([email protected]) All other Countries Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **
Bhuvan Unhelkar · Hari Mohan Pandey · Arun Prakash Agrawal · Ankur Choudhary Editors
Advances and Applications of Artificial Intelligence & Machine Learning Proceedings of ICAAAIML 2022
Editors Bhuvan Unhelkar University of South Florida Sarasota, FL, USA Arun Prakash Agrawal Sharda School of Engineering and Technology Sharda University Greater Noida, India
Hari Mohan Pandey School of Science and Technology Bournemouth University Poole, UK Ankur Choudhary Sharda School of Engineering and Technology Sharda University Greater Noida, India
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-99-5973-0 ISBN 978-981-99-5974-7 (eBook) https://doi.org/10.1007/978-981-99-5974-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Contents
Development of Big Data Dimensionality Reduction Methods for Effective Data Transmission and Feature Enhancement Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. M. Subrahmanya and T. Shivaprakash IndianFood-7: Detecting Indian Food Items Using Deep Learning-Based Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritu Agarwal, Nikunj Bansal, Tanupriya Choudhury, Tanmay Sarkar, and Neelu Jyothi Ahuja
1
9
Prediction of Protein-Protein Interaction Using Support Vector Machine Based on Spatial Distribution of Amino Acids . . . . . . . . . . . . . . . Monika Khandelwal, Ranjeet Kumar Rout, and Saiyed Umer
23
A Computational Comparison of VGG16 and XceptionNet for Mango Plant Disease Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinita and Suma Dawn
33
Generate Artificial Human Faces with Deep Convolutional Generative Adversarial Network (DCGAN) Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charu kaushik and Shailendra Narayan Singh Robust Approach for Person Identification Using Three-Triangle Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikita kumari, Ashish Mishra, Aadil Shafi, and Rashmi Priyadarshini
45
57
COVID-19 Disease Detection Using Explainable AI . . . . . . . . . . . . . . . . . . . Vishant Mehta, Jai Mehta, Pankti Nanavati, Vighnesh Naik, and Nilkamal More
71
Towards Helping Visually Impaired People to Navigate Outdoor . . . . . . . Rhea S Shrivastava, Abhishek Singhal, and Swati Chandna
83
v
vi
Contents
An Analysis of Deployment Challenges for Kubernetes: A NextGen Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nisha Jaglan and Divya Upadhyay
93
A New Task Offloading Scheme for Geospatial Fog Computing Environment Using M/M/C Queueing Approach . . . . . . . . . . . . . . . . . . . . . 105 Manoj Ranjan Mishra, Bibhuti Bhusan Dash, Veena Goswami, Sandeep Nanda, Sudhansu Shekhar Patra, and Rabindra Kumar Barik Face Recognition Using Deep Neural Network with MobileNetV3-Large . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Sakshi Bisht, Abhishek Singhal, and Charu Kaushik Detection of BotNet Using Extreme Learning Machine Tuned by Enhanced Sine Cosine Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Nebojsa Bacanin, Miodrag Zivkovic, Zlatko Hajdarevic, Aleksandar Petrovic, Nebojsa Budimirovic, Milos Antonijevic, and Ivana Strumberger Cloud Services Management Using LSTM-RNN . . . . . . . . . . . . . . . . . . . . . . 139 Archana Yadav, Shivam Kushwaha, Jyoti Gupta, Deepika Saxena, and Ashutosh Kumar Singh Detection of Various Types of Thyroid-Related Disease Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Aashish Nair, Akash Sonare, Jay Deshmukh, and Pranav Gadhave Implementation of WSN in the Smart Hanger to Facilitate MRO Operations on Aircraft Fuselage Using Machine Learning . . . . . . . . . . . . . 163 Tripti Singh and Surendra Rahamatkar Wi-Fi Controlled Smart Robot for Objects Tracking and Counting . . . . . 177 V. Prithviraj and K. C. Sriharipriya Speech Recognition for Kannada Using LSTM . . . . . . . . . . . . . . . . . . . . . . . 189 D. S. Jayalakshmi, K. P. Sathvik, and J. Geetha Computer Vision-Based Smart Helmet with Voice Assistant for Increasing Driver Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Yash Grover, Shreedev Ganesh Mishra, Susheet Kumar, Priyank Verma, and Soumya Ranjan Nayak Predicting Aging Related Bugs with Automated Feature Selection Techniques in Cloud Oriented Softwares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Harguneet Kaur and Arvinder Kaur Time Series Analysis of Crypto Currency Using ARIMAX . . . . . . . . . . . . 233 Sahil Sejwal, Kartik Aggarwal, and Soumya Ranjan Nayak
Contents
vii
A Machine Learning Approach Towards Prediction of User’s Responsiveness to Notifications with Best Device Identification for Notification Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Devansh Srivastav, Hinal Srivastava, Naman Mittal, Bayare Deepak Singh, Induvadan Bachu, Ashwani Kumar Yadav, Pushpa Gothwal, MSSK Sharma, and Bhargav Krishnamurthy Real-Time Full Body Tracking for Life-Size Telepresence . . . . . . . . . . . . . 261 Shafina Abd Karim Ishigaki and Ajune Wanis Ismail Solar Power Generation Forecasting Using Deep Learning . . . . . . . . . . . . 275 Anant Gupta and Suman Lata Applications of Big Five Personality Test in Job Performance . . . . . . . . . . 289 Vanshika Nehra, Simran Bafna, Renuka Nagpal, Deepti Mehrotra, and Rajni Sehgal Local Mean Decomposition Based Epileptic Seizure Classification Using Ensemble Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Parikha Chawla, Shashi B. Rana, Hardeep Kaur, and Kuldeep Singh An Equilibrium Optimizer-Based Ensemble for Aspect-Level Sentiment Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Tanu Sharma and Kamaldeep Kaur Malware Detection Using Big Data and Deep Learning . . . . . . . . . . . . . . . . 329 Nitin Pise Review on the Static Analysis Techniques Used for Privacy Leakage Detection in Android Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Manish Verma and Parma Nand Radon Transformation-Based Mammogram Image Classification . . . . . . 353 Bhanu Prakash Sharma and Ravindra Kumar Purwar Unmanned Arial Vehicle as a Tool for Facemask Detection . . . . . . . . . . . . 365 Rajendra Kumar, Vibhor Gupta, Ngonidzashe Mathew Kanyangarara, Kalembo Vikalwe Shakrani, and Prince Tinashe Parowa A Novel Ensemble Trimming Methodology to Predict Academic Ranks with Elevated Accuracies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Nidhi Agarwal and Devendra K. Tayal A Multiclass Tumor Detection System Using MRI . . . . . . . . . . . . . . . . . . . . 389 G. Gayathri and S. Sindhu Analysis on DeepFake Dataset, Manipulation and Detection Techniques: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Aale Rasool and Rahul Katarya
viii
Contents
Vituperative Content Detection: A Multidomain Architecture Using OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Sai Manoj Cheruvu, Domakonda Sesank, K. Sriram Sandilya, and Manu Gupta Paraphrase Generator Using Natural Language Generation and Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Rosy Salomi Victoria Devasahayam, Megha Nagarajan, and E. R. Rajavarshini An Investigational Study on Implementing Integrated Frameworks of Machine Learning and Evolutionary Algorithms for Solving Real-World Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Sree Roja Rani Allaparthi and Jeyakumar G Deep Learning Based Cryptocurrency Real Time Price Predication . . . . 447 Nidhi Shukla, Ashutosh Kumar Singh, and Vijay Kumar Dwivedi Secure Electronic Polling Process Utilizing Smart Contracts . . . . . . . . . . . 459 Chhaya Dubey, Dharmendra Kumar, Ashutosh Kumar Singh, and Vijay Kumar Dwivedi Machine Learning-Based Smart Waste Management in Urban Area . . . 473 Sutanika Barik, Shaheen Naz, Usha Tiwari, and Monika Jain Twenty V’s: A New Dimensions Towards Bigdata Analytics . . . . . . . . . . . 489 G. M. Dhananjaya and R. H. Goudar A Computer Vision-Based Human–Computer Interaction System for LIS Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Ravi Khatri, Ankit Khatri, Abhishek Kumar, and Pankaj Rawat Semantic Segmentation for Edge Devices—Survey and an End-to-End Readiness on the Target Platform . . . . . . . . . . . . . . . . . . . . . . . . 515 Vishwa Teja Gujjula, Nisarg Joshi, Srikanth Sambi, Arpit Awasthi, and Jitesh Kr. Singh Performance Comparison of Machine Learning and Deep Learning Models in DDoS Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Poonam Siddarkar, R. H. Goudar, Geeta S. Hukkeri, S. L. Deshpande, Rohit B. Kaliwal, Pooja S. Patil, and Prashant Janagond Application of Artificial Intelligence in Insurance Sector: A Bibliometric View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Dolly Mangla, Renu Aggarwal, Mohit Maurya, Sweta Dixit, and Mridul Dharwal Evaluation of Stock Prices Prediction Using Recent Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Harshit Kesharwani, Tamoshree Mallick, Aakash Nakarmi, and Gaurav Raj
Contents
ix
Deep Learning Techniques for Explosive Weapons and Arms Detection: A Comprehensive Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Anant Bhatt and Amit Ganatra Real-Time Implementation of Automatic License Plate Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Govind Suryakant More and Prashant Bartakke Deep Learning and Optical Character Recognition-Based Automatic UIDAI Details Extraction System . . . . . . . . . . . . . . . . . . . . . . . . . 599 Sayali Ramesh Kulkarni and Ranjit Sadakale Comprehensive Dashboard for Alzheimer’s Disease Through Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Sneha S. Narayan and V. K. Annapurna Empirical Study of Meta-learning-Based Approach for Predictive Mutation Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Chetna and Kamaldeep Kaur A Comparative Analysis of Transfer Learning-Based Techniques for the Classification of Melanocytic Nevi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Sanya Sinha and Nilay Gupta Machine Learning Using Radial Basis Function with K Means Clustering for Predicting Cardiovascular Diseases . . . . . . . . . . . . . . . . . . . . 651 Lamiaa Mohammed Salem Akoosh, Farheen Siddiqui, Sameena Naaz, and M. Afshar Alam Land Registry System Using Smart Contract of Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 Ashok Kumar Yadav, Dheeraj Kumar, Sangam, and Tanya Srivastava An Image Based Harassment Detection System Using Emotion Feature Vector Based on MTCNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 Gunjan Bhardwaj, Pooja Pathak, and Parul Choudhary Recent Advancement in Pancreatic Cancer Diagnosis Using Machine Learning-Based Methods: A Systematic Review . . . . . . . . . . . . . 685 Deepak Painuli, Suyash Bhardwaj, and Utku Köse Interactive Gaming Experience in VR Integrated with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 Sarthak Bhan, Lokita Varma, Kush Maniar, Russel Lobo, Sanket Shah, and Nilesh Patil Comparative Analysis of U-Net-Based Architectures for Medical Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Nidhi Beniwal and Srishti Vashishtha
x
Contents
Real-Time Fruit Detection and Recognition for Smart Refrigerator . . . . 727 Aparna Aravind Patil and Prashant Bartakke MCDM Based Evaluation of Software Defect Prediction Models . . . . . . . 739 Ajay Kumar and Kamaldeep Kaur Leveraging Zero-Trust Architecture for Improving Transaction Quality of Financial Services Using Locational Data . . . . . . . . . . . . . . . . . . 751 Priyali Sakhare and Ashish Jadhav Mobile Computing Based Security Analysis of IoT Devices . . . . . . . . . . . . 765 Nikita Valte and Borkar Gautam Abusive Content Detection on Social Networks Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Monika Chhikara and Sanjay Kumar Malik Ultrasound Nerve Image Segmentation Using Attention Mechanism . . . . 789 Nemil Shah, Jay Bhatia, Nimit Vasavat, Kanishk Shah, and Pratik B. Kanani
About the Editors
Bhuvan Unhelkar is an accomplished IT professional and Professor of IT at the University of South Florida, USA. He is also the Founding Consultant at MethodScience and a Co-Founder and Director at PlatiFi. He is proficient in business analysis & requirements modeling, software engineering, Big Data strategies, agile processes, mobile business, and green IT. His domain experience is banking, financial, insurance, government, and telecommunications. Dr. Unhelkar has designed, developed, and customized a suite of industrial courses, which are regularly delivered to business executives and IT professionals globally. Dr. Unhelkar’s thought leadership is reflected through multiple Cutter Executive Reports, various journals, and more than 25 books. He is a winner of the Computerworld Object Developer Award (1995), Consensus IT Professional Award (2006) and IT Writer Award (2010). He has a Doctorate in the area of “Object Orientation” from the University of Technology, Sydney (1997). He is a Fellow of the Australian Computer Society, a Senior Member of IEEE, a Professional Scrum Master, and a Lifetime Member of the Computer Society of India. Hari Mohan Pandey is an associate professor in computer science at Edge Hill University, UK. He is specialized in computer science & engineering (undergraduate degree, postgraduate degree and Ph.D.). His research area includes artificial intelligence, soft computing techniques, natural language processing, language acquisition and machine learning algorithms. He is also the author of various books on computer science and engineering. He has published over 80 scientific papers in reputed journals and conferences. He has also served as session chair, leading guest editor and delivered keynotes. He is also a recipient of the prestigious award “The Global Award for the Best Computer Science Faculty of the Year 2015”, the award for completing the INDO-US project “GENTLE”, award (Certificate of Exceptionalism) from the Prime Minister of India and award for developing innovative teaching and learning models for higher education.
xi
xii
About the Editors
Arun Prakash Agrawal is currently working as a professor in the Department of computer science and Engineering at Sharda University, Greater Noida, India. He obtained his Master’s and Ph.D. in Computer Science and Engineering from Guru Gobind Singh Indraprastha University, New Delhi, India. He has several research papers to his credit in refereed journals and conferences of international and national repute. He has also taught short-term courses at the Swinburne University of Technology, Melbourne, Australia and Amity Global Business School, Singapore. His research interests include machine learning, software testing, artificial intelligence and soft computing. He has rich experience in organizing international conferences and has chaired several sessions in conferences of repute. He is also on the reviewer board of several SCI/Scopus-indexed international journals. Ankur Choudhary is a professor in the Department of Computer Science and Engineering at Sharda University, Greater Noida, India. He is also the Deputy Director of the Teaching & Learning Centre at Sharda University. He is majoring in Computer Science and Engineering and pursued Ph.D. from the School of ICT, Gautam Buddha University (GBU), Greater Noida, India. His area of research is nature-inspired optimization, artificial intelligence, software engineering, medical image processing and digital watermarking. He has published research papers in various SCI/Scopusindexed international conferences and journals. He has also organized various special sessions at international conferences, served as session chair and delivered keynotes.
Development of Big Data Dimensionality Reduction Methods for Effective Data Transmission and Feature Enhancement Algorithms H. M. Subrahmanya and T. Shivaprakash
Abstract With the advent of 5G globally, large volumes of data are available in digital form across multiple sectors. Sales, production, marketing, medical, manufacturing, and many more sectors dish out voluminous data that are used for decisionmaking by policymakers. In real time, many attributes are irreverent and not so significant which leads to the concept of dimensionality reduction of the data obtained. Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are the possible dimensionality reduction methods on popular machine learning algorithms. Missing value ratio, low variance filter, ensemble trees, etc., are various techniques for feature selection in this context. Agent-based algorithm is proposed for feature and instance dimensions’ data reduction. The obtained results are evident that the experiments perform well when datasets are huge. Dimensionality reduction may be irrelevant for smaller datasets. PCA provides better performance results over LDA. Further, it is found that the performance of the method developed using a reduced dataset is favorable over other techniques. Keywords Linear discriminant analysis (LDA) · Principal component analysis (PCA) · Decision tree induction · Support vector machine (SVM) · Naive Bayes classifier · Random forest classifier · Machine learning (ML)
H. M. Subrahmanya (B) Department of Computer Science and Engineering, KNS Institute of Technology, Bangalore, India e-mail: [email protected] Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India T. Shivaprakash Department of Computer Science and Engineering, VijayaVittala Institute of Technology, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_1
1
2
H. M. Subrahmanya and T. Shivaprakash
1 Introduction The Big Data era has evolved along with 5G technologies recently and poses challenges to solve large-scale machine learning problems [1]. Volume, velocity, variety, variability, veracity, and value are the inherent properties of complex data. Data generation, management, analytics, and application are the phases involved in Big Data applications such as finance, healthcare, defense, etc. Even though ML provides tools and methods to solve varieties of problems, dimensionality of the data makes it inefficient [2]. Hence, scalability, computational speed-up, and performance of the ML algorithms need to be analyzed with massive datasets [3]. The approaches considered when we discuss large data are divide and conquer, parallelization, sampling, granular computing, and feature selection [4]. These techniques currently propose to achieve scalability and speed-up [5]. It is necessary to propose a modified algorithm to handle problem of decomposition of huge datasets through prototype selection from clusters of instances, stacking, rotation procedures, and agent-based population algorithm [6]. Cardiotocography (CTG) test generates huge datasets with high dimensionality of fetal health information and pregnancy periods [7]. Many ML techniques are available for dimensionality reduction through decomposition of attributes, decomposition of temporal information, and transforming data to numerical to reduce the complexity [8]. In this regard, decision trees, Naïve Bayes, random forest, and SVM ML algorithms are used over LDA and PCA to validate the performance in the context of dimensionality reduction using open-source datasets [9].
2 Works The recovery of potential fields of huge datasets is demonstrated through dimensionality reduction by identifying patterns using artificial neural networks, and other ML methods. The obtained results are compared with the proposed model to validate the effectiveness of the results. Using ECG signals on Convolutional Neural Networks (CNN), prediction of heart disease [10] is attempted and this can provide features with different positions. A new method is proposed to select a subset of fields from the input dataset in order to increase the computational accuracy in less time [11]. The medical practitioners are assisted by this developed system so that they can make decisions in less and optimal time [12]. Dimensionality reduction and optimal feature selection have improved the accuracy of medical diagnosis [13]. Preprocessing, image segmentation, cancellation of noise, and signal calibration are the stages involved in image processing techniques on such huge datasets [14]. Median filtering method and equalization are used for de-noising the datasets during preprocessing [15]. SVM classifier followed by ensemble technique is used for confidence voting method. In some of the research works, SVM is integrated with simulated annealing method to diagnose some medical diseases based on historical
Development of Big Data Dimensionality Reduction Methods …
3
huge medical records [16]. Sometimes, a hybrid approach by integrating SVM with genetic algorithm is proposed for reduction in dataset dimensions and the results were promising [17]. A holistic approach is proposed by the researchers with a private key and homomorphic encryption for securing medical Big Data health records [18]. Experimental results are evident that the proposed methodology provides promising results in categorizing records to correct class [19]. Decision tree-based model is used to identify the risk factors involved in medical records [20]. The authors have proposed an integrated method by combining PCA with k-means in predicting diabetes over different age groups [21]. Datasets were collected with varying backgrounds and varying attributes for the study [22]. PCA is applied first for dimension reduction and later k-means for clustering on some of the brain tumor samples. Images of particular body parts are studied and CNN is used for accuracy identification [23].
3 Objectives As per the above survey on previously conducted work by many researchers, the following objectives are defined: • Explore the behavior of dimensionality reduction over a variety of dimensions; • Analyze the behavior of considered two algorithms with standard metrics; • Realize that there will not be any performance degradation to a greater extent due to dimensionality reduction of the considered datasets.
4 Proposed Dimensionality Reduction Method PCA transforms correlated to uncorrelated variables through statistical procedure for exploratory analysis of data. As it is used for checking the relationships among variables, it helps in reducing the dimensionality [24]. On the other hand, LDA aims to target with clear separation capability in order to reduce computational cost [25]. It maximizes the data variance and separation of multiple classes [26]. This helps in mapping the dimension space to a lesser subspace without hampering the total class information. The two major methods used to reduce the dimension of the datasets are LDA and PCA. The proposed method in this paper follows this hybrid version and is presented in this section. The parameters considered for the study are baseline, accelerations, light decelerations, severe decelerations, prolonged decelerations, percentage of time with abnormal short-term variability, mean value of short-term variability, percentage of time with abnormal long-term variability, and mean value of long-term variability [27]. The proposed work verifies the reduction in features and dimension through ML algorithms and performance is recorded using open-source health datasets [28, 29]. Initially, feature selection, normalization, and converting categories to equivalent
4
H. M. Subrahmanya and T. Shivaprakash
Fig. 1 Proposed dimensionality reduction technique based on PCA and LDA
numerical data are performed [30]. Min–max standard scalar normalization method is used. LDA is applied to fetch the most relevant features and ML algorithms are called for the experiment [31]. On this, PCA is applied and evaluated with respect to various metrics. Over multiple iterations, the obtained results are evaluated using the mentioned metrics and results are recorded [32]. Without and with the dimension reduction using considered two algorithms, the obtained results are analyzed. The performance of each ML algorithm in both experiments is investigated [33]. Various datasets with varying dimensions are considered for the study. Figure 1 shows the overall procedure of reducing the dimension of the dataset using LDA and PCA.
5 Analysis of the Obtained Results The entire process is conducted on datasets that are collected from open source available in ML repository using Python. Any high-end desktop with a suitable operating system along with 8 GB RAM and above is required for this experimentation [34]. The metrics used for evaluating the proposed model are precision, recall, and F1-score. Other parameters are also available, but the study is limited to only these three standard metrics. Micro average, macro average, and weighted averages are the possible levels of recording the results [35]. Table 1 shows the confusion matrices to record the performance of ML algorithms independently. Table 2 shows the matrices of ML algorithms integrated with PCA. Finally, Table 3 shows the confusion matrices of ML algorithms integrated with LDA [36]. As discussed earlier, the health records with varying dimensions are considered for the experiment [37]. Here, initially, after all preprocessing steps, each of the ML
Development of Big Data Dimensionality Reduction Methods … Table 1 Confusion matrices for ML algorithms
5
Decision tree
Precision
Recall
F1-score
Micro avg.
0.98
0.98
0.98
Macro avg.
0.98
0.97
0.98
Weighted avg.
0.98
0.98
0.98
Micro avg.
0.97
0.97
0.97
Macro avg.
0.93
0.91
0.91
Weighted avg.
0.97
0.97
0.97
Micro avg.
0.99
0.98
0.92
Macro avg.
0.99
0.96
0.94
Weighted avg.
0.99
0.97
0.95
Micro avg.
0.97
0.98
0.91
Macro avg.
0.96
0.98
0.93
Weighted avg.
0.95
0.97
0.97
Naïve bayes
Random forest
SVM
Table 2 Confusion matrices for ML algorithms with reduced dimensions using PCA
Decision tree-PCA
Precision
Recall
F1-score
Micro avg.
0.97
0.89
0.91
Macro avg.
0.95
0.90
0.92
Weighted avg.
0.93
0.93
0.92
Micro avg.
0.91
0.89
0.92
Macro avg.
0.92
0.89
0.93
Weighted avg.
0.98
0.93
0.95
Micro avg.
0.90
0.88
0.93
Macro avg.
0.95
0.90
0.94
Weighted avg.
0.92
0.93
0.95
Micro avg.
0.91
0.98
0.96
Macro avg.
0.92
0.94
0.96
Weighted avg.
0.94
0.94
0.94
Naïve bayes-PCA
Random forest-PCA
SVM-PCA
algorithms, that is, random forest, decision tree, SVM, and Naïve Bayes are applied independently and results are recorded as per the metrics considered [38]. Later, as per the procedure explained earlier, these datasets are subjected to LDA followed by PCA for dimensionality reduction. At each stage, again the performance of the
6 Table 3 Confusion matrices for ML algorithms with reduced dimensions using LDA
H. M. Subrahmanya and T. Shivaprakash
Decision tree-LDA
Precision
Recall
F1-score
Avg. (micro)
0.93
0.99
0.94
Avg. (macro)
0.85
0.93
0.94
Avg. (weighted)
0.89
0.93
0.94
Avg. (micro)
0.80
0.94
0.97
Avg. (macro)
0.89
0.94
0.93
Avg. (weighted)
0.92
0.97
0.92
Avg. (micro)
0.94
0.97
0.90
Avg. (macro)
0.95
0.91
0.90
Avg. (weighted)
0.95
0.91
0.90
Avg. (micro)
0.96
0.93
0.90
Avg. (macro)
0.93
0.93
0.92
Avg. (weighted)
0.93
0.92
0.96
Naïve bayes-LDA
Random forest-LDA
SVM-LDA
ML algorithms is recorded. From the below results, it is evident that, marginally, we are able to record the same accuracy, even after the reduction in dimensions. Both Tables 2 and 3 show almost the same performance and favor the process of applying LDA followed by PCA. With the help of these values, the attempted dimensionality reduction procedure is justified successfully.
6 Conclusion The investigation of dimensionality reduction on Big Data open-source medical datasets using LDA and PCA on four standard ML algorithms is discussed in this article. Initially, the dataset with 38 interdependent attributes is reduced to 24 attributes or fields, without compromising the reduction in evaluation metrics values. Around 94% of accuracy has been recorded in the combined procedure. LDA has reduced the number of attributes drastically over PCA. These reduced attributes are trained on commonly used ML algorithms and the performance of each method is analyzed. It is noticed that the performance of classifier by PCA is much more satisfactory than the classifier by LDA. Random forest and decision tree methods are showing promising results as compared to the other two ML classifiers. When the same pattern of experiments is performed on various other health records, different behaviors are noticed and sometimes negative. It is difficult to predict the behavior of PCA and LDA combinations when datasets are changed. The effectiveness of the proposed technique can be analyzed further on different other
Development of Big Data Dimensionality Reduction Methods …
7
types of datasets, such as images, text, etc. Other ML approaches such as CNN, deep neural networks, and recursive and recurrent neural networks can also be considered as future enhancements.
References 1. Alfirevic Z, Gyte GM, Cuthbert A, Devane D (2017) Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labour. Cochrane Database Syst Rev 2:1–108 2. Zheng, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media, Newton, MA, USA 3. Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) NBaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervas Comput 17(3):12–22 4. Li Z, Ma X, Xin H (2017) Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today 280:232–238 5. Cheng CA, Chiu HW (2017) An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national-wide database. In: Proceedings 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC). pp 2566–2569 6. Zaman S, Toufiq R (2017) Codon based back propagation neural network approach to classify hypertension gene sequences. In: Proceedings international conference on electrical, computer and communication engineering (ECCE). pp 443–446 7. Tang H, Wang T, Li M, Yang X (2018) The design and implementation of cardiotocography signals classification algorithm based on neural network. Comput Math Methods Med 2018. Art. no. 8568617 8. Zhang Y, Zhao Z (2017) Fetal state assessment based on cardiotocography parameters using PCA and AdaBoost. In: 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). pp 1–6 9. Marques JAL, Cortez PC, Madeiro JPDV, Fong SJ, Schlindwein FS, Albuquerque VHCD (2019) Automatic cardiotocography diagnostic system based on Hilbert transform and adaptive threshold technique. IEEE Access 7: 73085–73094 10. Cömert Z, Sengür ¸ A, Akbulut Y, Budak Ü, Kocamaz AF, Güngör S (2019) A simple and effective approach for digitization of the CTG signals from CTG traces. IRBM 40(5):286−296 11. Abdar M, Makarenkov V (2019) CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Meas 146: 557–570 12. Tao Z, Huiling L, Wenwen W, Xia Y (2019) GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl Soft Comput 75: 323–332. 13. Orphanou K, Dagliati A, Sacchi L, Stassopoulou A, Keravnou E, Bellazzi R (2018) Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis. J Biomed Informat 81: 74–82 14. Qummar S, Khan FG, Shah S, Khan A, Shamshirband S, Rehman ZU, Ahmed Khan I, Jadoon W (2019) A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access 7: 150530–150539 15. Zhu C, Idemudia CU, Feng W (2019) Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Informat Med Unlocked 17. Art. no. 100179 16. Kaya E, Pehlivanlı AÇ, Sekizkarde¸s EG, Ibrikci T (2017) PCA based clustering for brain tumor segmentation of T1w MRI images. Comput Methods Programs Biomed 140: 19–28 17. Hu L, Cui J (2019) Digital image recognition based on fractional-orderPCA-SVM coupling algorithm. Meas 145:150–159
8
H. M. Subrahmanya and T. Shivaprakash
18. Thippa Reddy G, Khare N (2016) FFBAT-optimized rule based fuzzy logic classifier for diabetes. Int J Eng Res Afr 24:137–152 19. Khare N, Reddy GT (2018) Heart disease classification system using optimised fuzzy rule based algorithm. Int J Biomed Eng Technol 27(3):183–202, 2018 20. Reddy GT, Khare N (2017) An efficient system for heart disease prediction using hybrid OFBAT with rule-based fuzzy logic model. J Circuits, Syst Comput 26(04). Art. no. 1750061 21. Gadekallu TR, Khare N (2017) Cuckoo search optimized reduction and fuzzy logic classifier for heart disease and diabetes prediction. Int J Fuzzy Syst Appl 6(2):25−42 22. Reddy GT, Khare N (2017) Hybrid firefly-bat optimized fuzzy artificial neural network based classifier for diabetes diagnosis. Int J Intell Eng Syst 10(4):18–27 23. Bhattacharya S, Maddikunta, Kaluri R, Singh S, Gadekallu TR, Alazab M, Tariq U (2020) A novel PCA-firefly based XGBoost classification model for intrusion detection in networks using GPU. Electron 9(2):219 24. Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550 25. Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738 26. Kaur S, Singh M (2019) Hybrid intrusion detection and signature generation using deep recurrent neural networks. Neural Comput Appl 1:1−19 27. Gadekallu TR, Khare N, Bhattacharya S, Singh S, Maddikunta PKR, Ra IH, Alazab M (2020) Early detection of diabetic retinopathy using PCA-firefly based deep learning model. Electronics 9(2):274 28. Rehman MHU, Liew CS, Abbas A, Jayaraman PP, Wah TY, Khan SU (2016) Big data reduction methods: a survey. Data Sci Eng 1(4):265−284 29. Wang X, He Y (2016) Learning from uncertainty for big data: future analytical challenges and strategies. IEEE Syst, Man, Cybern Mag 2(2):26–31 30. Triguero, Galar M, Bustince H, Herrera F (2017) A first attempt on global evolutionary unsampling for imbalanced big data. In: 2017 IEEE congress on evolutionary computation (CEC). San Sebastian, Spain, pp 2054–2061 31. Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):551–577 32. Czarnowski, J˛edrzejowicz P Stacking and rotationbased technique for machine learning classification with data reduction. In: 2017 IEEE international conference on innovations in intelligent systems and applications (INISTA). Gdynia, Poland, pp 55–60 33. Czarnowski, J˛edrzejowicz P (2016) An approach to machine classification based on stacked generalization and instance selection. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC). Budapest, Hungary, pp 4836–4484 34. Czarnowski I, J˛edrzejowicz P (2017) Learning from examples with data reduction and stacked generalization. J Intell & Fuzzy Syst 32(2):1401−1411 35. Arnaiz-González A, Díez-Pastor JF, Rodríguez JJ, García-Osorio C (2016) Instance selection of linear complexity for big data. Knowl-Based Syst 107: 83–95 36. Liu C, Wang W, Wang M, Lv F, Konan M (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73 37. Chen SH, Venkatachalam R (2017) Agent-based modelling as a foundation for big data. J Econ Methodol 24(4): 362–383 38. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4): 606–626
IndianFood-7: Detecting Indian Food Items Using Deep Learning-Based Computer Vision Ritu Agarwal, Nikunj Bansal, Tanupriya Choudhury, Tanmay Sarkar, and Neelu Jyothi Ahuja
Abstract Indian Food Dishes are famous around the globe. Object detection in an image is a well-known task in computer vision. Indian Food Dishes detection using deep learning-based models will maximize the impact of computer visionbased models in the food domain. Deep learning-based model usage is still limited in recognizing and detecting Indian Food Items due to the lack of datasets on Indian Food. We introduce the IndianFood-7 Dataset, which contains images of more than 800 and having 1700 + annotations spreading beyond seven special Indian food items. We report the comparative study of numerous variants which are having current Avant-grade for object detection models, YOLOR and YOLOv5. Furthermore, we have inspected and evaluated the model performance by revising the predictions verified on the images of the test dataset.
All authors contributed equally to this work and all are the first author. R. Agarwal (B) Research Scholar, School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun, Uttarakhand 248007, India e-mail: [email protected] T. Choudhury Professor, CSE Dept., Symbiosis Institute of Technology, Symbiosis International University, Lavale, Mulshi, Pune, Maharashtra 412115, India T. Choudhury (B) Ex-Professor, School of Computer Science, University of Petroleum and Energy Studies (UPES), Uttarakhand 248007 Dehradun, India e-mail: [email protected]; [email protected]; [email protected] N. Bansal · N. J. Ahuja School of Computer Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand 248007, India T. Sarkar Department of Food Processing Technology, Government of West Bengal, Malda Polytechnic, Bengal State Council of Technical Education, Malda, West Bengal 732102, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_2
9
10
R. Agarwal et al.
Keywords Indian food dishes · Deep learning · Object detection · Computer vision · Indian food dataset
1 Introduction With the upward push of autonomous vehicles, intelligent video surveillance, facial detection, and diverse human counting applications, speedy and correct item detection methods are growing in demand [1]. These methods include not only recognizing and classifying each item in an image but also localizing it with the aid of drawing the ideal bounding box around it. Computer vision technique named object detection concentrates on labeling and identifying images, videos, or even online streaming. Object detection models are trained with a surplus of annotated visuals to be able to perform this system with new data. It feeds input visuals and receives a completely marked-up output visual. A key aspect is the object detection bounding box which identifies the edges of the object tagged with a straightforward rectangle. They are followed via way of means of a label of the object, whether or not it’s a person, a car, or a dog, to explain the goal object. The bounding box can overlap to show off multiple objects in a given shot so long as the version has earlier information of objects it is tagging. Computer vision tasks are explained as follows: Image classification is the prediction of the class of an object in a picture. Segmentation is the task of grouping pixels with comparable properties together instead of bounding boxes to identify objects [2]. Object localization seeks to discover the area of 1 or more objects in an image, while object detection identifies all items and their borders without focusing on placement [3]. Object detection isn’t always viable without models designed specifically for dealing with that task. These object detection models are trained with lots of visual content material to optimize the detection accuracy on an automated basis later on. To scale your annotation pipeline, Training and refining models are made simple to be had datasets like COCO (Common Objects in Context) [4]. In 2016, scientists of Washington University, Allen Institute for Artificial Intelligence, and Facebook Artificial Intelligence Research had proposed the “You Only Look Once” (YOLO) [5]. In deep learning, a method of Object detection is a circle of relatives of neural networks that stepped forward the accuracy and velocity. The combination of the classification process and object detection in a single network is the primary development of YOLO. Alternatively, extracting regions and features separately, the YOLO model is performing each and every step through a single network in a single pass. Hence the name “You Only Look Once.” It forecasts all bounding boxes across all the classes for an image in parallel. It enables real-time speed and end-to-end training along with managing the high average precision. It divides the images into an S*S grid with bounding boxes w, h, x, y, and confidence. Deep learning models are still not trained to detect Indian food due to the less availability of datasets. We introduce the IndianFood-7 dataset to train, validate
IndianFood-7: Detecting Indian Food Items Using Deep …
11
and test data on deep learning models. Various object detection models such as Region-based Fully-CNN (R-FCN) [6], region-based CNN (R-CNN) [7], Single Shot Detection (SSD) [8], and You Only Look Once (YOLO) [5] are available. This paper compares YOLOv5 [9] and YOLOR (Representation) [10] to analyze and evaluate the performance of the methods by prediction and comparison study made on a test dataset which are the multiple variants of a current object detection methods. Our research is to expand our Indian Dataset to increase the impact of the computer vision-based methods in detecting popular Indian food Items.
2 Literature Review To lead a well-maintained and managed state, a person must keep track of the complete intake of items of food. Our smartphones and apps available can do this efficiently, but this method contains different kinds of errors. Researchers or Scientists had developed numerous apps and systems to evaluate intake meals [11] and rectify these kinds of errors, and evaluate the proper calculations. The most recent apps available are to capture a picture of the intake of food from the smartphone camera and record the consumption of intake food. Already existing models and apps which are used to recognize the food items and their calories [12] have many drawbacks. Either the result is not accurate, or the detection of the object is not properly significant. Moreover, all previous research had been evaluated for western food items only and not adequate and proper to identify the other food literature worldwide. Indian food items have been well-known worldwide for their consummate taste and unique health advantages [13]. With many benefits, Indian cuisine has been shadowed down by innovation and development enhancement till today. Upon careful examination of all the conditions and previous assertions, the primary objective of the proposed research is to enhance training accuracy from a broader standpoint and decrease the error rate in identifying food items. Also, along with the unique features of the Indian haute cuisine, more explanation is given on the apex of Indian gastronomy and the South Indian cuisine. Our focus is to precisely calculate and calibrate the amount of calories for intake of food. So, create a system that employs an image processing method and defined algorithms. Thames [14] had developed a dataset of their own for the actual food items with extensive nutritional accuracy, content interpretation, and high-resolution images. The potential of curating a dataset is marked by briefing an algorithm of computer vision that has the capacity to evaluate and predict the macronutrients and caloric amounts of the complicated food item at an accurate rate that not even a professional dietician can reveal. Sathish et al. [15] their research is for recognizing Indian food and calculating its calories. So, OpenCV based Deep Learning model had been analyzed and developed by them. The Indian food primary dataset was prepared by collecting pictures from the internet and preprocessed them to fair quality. When an extensive amount of analysis performed, the Indian food cuisine detects 99.19% accuracy on the data of
12
R. Agarwal et al.
training and 95.30% on the data of testing in the developed model, and the evaluation of calories was accurate with a variation of error of ± 10 cal for the real food. From our research work, the primary Indian cuisine detection has been inferred among InceptionV3, VGGNet16, and Basic CNN models, the InceptionV3 Model evaluate the 99.19% highest accuracy score for the training dataset in comparison to the basic CNN and VGGNet16 models having 82.33% accuracy and 97.13% accuracy of the training datasets respectively. Yue et al. [16] introduce an object detection algorithm of deep learning-based to axiomatically recycle the dishes available in canteens as well as in restaurants for empty-dish recycling robots. It was proposed for dish detection in images like cups, chopsticks, etc., for the lightweight model of object detection that YOLO-GD (Ghost Net and Depth wise convolution). This model is 1/5 the size of the Avant-grade model YOLOv4, and YOLO-GD accomplishes a precise of 97.38%, which is 3.41% more than YOLOv4. Then on estimation, the inference time per image decreases in the YOLO-GD model from 207.92 to 32.75 ms and reaches 97.42%, a little high than the model without estimation. Fakhrou et al. [17], introduced a customized food recognition dataset, which is a trained model of deep CNN food. The food recognition dataset, which is customized, contains different varieties of fruits and 29 different food dishes. Also, the transfer learning approach analyzed the conduction of numerous avant-grade recognition of the food for deep CNN. The ensemble model developed achieves food conceding of 95.55% of accuracy in a dataset of food that is customized and that also performed better than Avant-grade CNN models. Hassannejad et al. [18] introduces this architecture for the classification of the different food images from distinct food images in three datasets: UEC FOOD 256, UEC FOOD 100, and ETH Food-101. On above three datasets they accomplish, 76.17%, 81.45% and 88.28%, as top-1 accuracy and 97.27%, 92.58% and 96.88%, as top-5 accuracy respectively. Mezgec et al’s [19] work performed was on recognizing a dataset having huge 225,953 (512 × 512 pixels) images of 520 various drink and food classes from a vast range of different food classes; on a detection dataset containing 130,517 images, they perform 94.47% and also produce 86.72% accuracy of classification performance. They also performed on a dataset of self-acquired photos for a real-world analysis, all taken using a camera of smartphone which incorporated images from a patient with Parkinson’s disease, accomplishing a top-five accuracy of 55%, which is quite an impressive and motivating result for real-world images. Extensively, at University of the Milano-Bicocca 2016 (UNIMIB2016), the dataset of food images was tested for NutriNet, and found to perform better on the criterion marginal recognition result. Mishra et al. [20], introduces the model to determine the appearance of the food objects within an image using the trained object detection model of deep learning– based which is the first attempt. They introduce the primary dataset, the Allergen30, that can activate an adverse reaction. Allergen30 is a primary dataset that contains 6,000 and more annotated images of the 30 famous food items. They make the comparison for the performance of the collective variants of a current state-of-art
IndianFood-7: Detecting Indian Food Items Using Deep …
13
YOLOR and YOLOv5 object detection methods. Moreover, they analyze the conduction of the methods by observing the predictions built on the images of the dataset for testing. Evaluation displays that all algorithms accomplish concerning precision and accomplish higher than mAP of 0.74. Also, with 0.811 mAP, YOLOv5s reports the fastest inference time of 5 ms, although the best accuracy performs in YOLOR-P6.
3 Methodology In this paper, we are detecting seven popular Indian dishes in an image using a deep learning-based object detection model trained on our IndianFood-7 Dataset. We used different variants of YOLOv5 and YOLOR, which are detection models of the stateof-the-art object. We evaluated and compared model performance using F1 Score, mAP, Precision, and Recall.
3.1 Data Preparation Starting with, we prepare a list of famous Indian food items. Then based on list, we collect Indian food images of the different classes for the preparation of the dataset. Then annotate the bounding boxes region to locate an arrangement of every image of the specified items of food. Then, our dataset, IndianFood-7, constitutes 800 and more images with 1700 + annotations spreading beyond seven classes of popular Indian food items. The following link is defined for downloading the dataset and collecting more information about the number of annotations per class in the dataset (https:// universe.roboflow.com/indianfood/indian_food-pwzlc). Table 1 shows Indian food class labels and the number of annotations per label present in our dataset. The flow of Dataset preparation is added in Fig. 1. Samples of Annotated Images from IndianFood-7 Dataset are added in Fig. 2. Table 1 Showing food class labels and the number of annotations per label present in the IndianFood-7 dataset
S.no
Indian food classes
Number of annotations
1
Samosa
381
2
Idli
310
3
Gulab jamun
250
4
Besan cheela
214
5
Poha
205
6
Dosa
199
7
Palak paneer
192
14 Fig. 1 Flow of dataset preparation
R. Agarwal et al.
IndianFood-7: Detecting Indian Food Items Using Deep …
15
Fig. 2 IndianFood-7: samples of dataset annotated images
3.2 Our Experimentation on Object Detection Models Delinquent to its higher performance related to both speed and accuracy, we have compared and trained YOLOR-P6, YOLOv5l, YOLOv5s, and YOLOv5m algorithms on our dataset prepared on our own. The models were trained on the Nvidia Tesla T-4 GPU system for 150 Epochs. The flow of the experimentation is added in Fig. 3.
16
R. Agarwal et al.
Fig. 3 Flow of experimentation
4 Results Our research work is to train a model on YOLOv5 and YOLOR algorithms to detect the presence of Indian food items by training on our curated dataset. We trained three variants of YOLOv5, namely YOLOv5s, YOLOv5m, YOLOv5l, and the p6 variant of YOLOR. We took the input image size as 416 X 416, trained for 150 Epochs, and used the GoogleColab Notebook to train the models. Figures 4, 5, and 6 observe the variation in evaluation metrics of precision, F1 score, recall, and mean average precision (mAP) (on the y-axis) while training epochs (on the x-axis) of 150 data for YOLOv5s YOLOv5m, and YOLOv5l models, respectively. Table 2 displays a detailed analysis of the performance of different models trained on the testing dataset. After analysis, we state that the YOLOR model performs best, with an mAP score of 0.893. In terms of F1-Score, the YOLOv5l model is best performing with an F1-Score of 0.869. An elevated Recall value is noticed in the substantial models (YOLOv5l and YOLOR), i.e., they can detect the appearance of
IndianFood-7: Detecting Indian Food Items Using Deep …
17
Fig. 4 Training summary for YOLOv5s beyond 150 epochs
items of Indian food in the image more accurately than smaller models (YOLOv5s and YOLOv5m). Table 3 displays the accomplishment of the YOLOv5l model beyond all the distinct labels. Table 4 shows the conduction of a YOLOv5m model beyond all the different labels. Table5 displays the effect of the YOLOv5s model beyond all the labels of annotated images.
18
Fig. 5 Training summary for YOLOv5m beyond 150 epochs
Fig. 6 Training summary for YOLOv5l beyond 150 epochs
R. Agarwal et al.
IndianFood-7: Detecting Indian Food Items Using Deep …
19
Fig. 6 (continued) Table 2 Evaluating the models on the testing dataset Model
Precision
Recall
F1-score
mAP
YOLOR
0.782
0.899
0.836
0.893
YOLOv5s
0.934
0.792
0.857
0.89
YOLOv5m
0.892
0.834
0.862
0.878
YOLOv5l
0.881
0.857
0.869
0.879
Table 3 Detailed analysis of the YOLOv5l model across all the different labels Classes of food
Precision
Recall
F1-score
mAP
Besan cheela
0.756
0.818
0.786
0.853
Dosa
0.79
0.754
0.771
0.74
Gulab jamun
0.955
0.895
0.924
0.93
Idli
0.894
0.788
0.838
0.853
Palak paneer
0.954
1
0.976
0.995
Poha
0.894
1
0.944
0.995
Samosa
0.926
0.741
0.823
0.787
Table 4 Detailed analysis of the YOLOv5m model across all the different labels Classes of food
Precision
Recall
F1-score
mAP
Besan cheela
0.792
0.691
0.738
0.812
Dosa
0.82
0.75
0.783
0.745
Gulab jamun
0.939
0.868
0.902
0.922
Idli
0.961
0.791
0.868
0.878
Palak paneer
0.973
1
0.986
0.995
Poha
0.848
1
0.918
0.985
Samosa
0.909
0.736
0.813
0.807
20
R. Agarwal et al.
Table 5 Detailed analysis of the YOLOv5s model across all the different labels Classes of food
Precision
Recall
F1-score
mAP
Besan cheela
0.878
0.545
0.672
0.752
Dosa
0.935
0.721
0.814
0.911
Gulab iamun
0.97
0.848
0.905
0.918
Idli
0.942
0.753
0.837
0.833
Palak paneer
0.97
0.95
0.960
0.993
Poha
0.93
0.95
0.940
0.987
Samosa
0.914
0.778
0.840
0.837
In Fig. 7, you can see that Gulab Jamun missed by YOLOv5m is picked up by YOLOv5l as YOLOv5l has higher recall and F1-Score compared to YOLOv5m (as shown in Table 2). In Fig. 8, you can see that YOLOv5s confuses Idli with Besan Cheela, which is rectified by the larger model YOLOv5l. In Fig. 9, you can see that YOLOv5s provide extra false positives in comparison to YOLOv5l. In Fig. 10, you can see that YOLOv5l gives false positives, which will be rectified by YOLOR.
Fig. 7 Gulab Jamun dropped by YOLOv5m (a) is identified up by YOLOv5l (b)
Fig. 8 YOLOv5s (a) distract Idli with Besan Cheela which is revised by YOLOv5l (b)
IndianFood-7: Detecting Indian Food Items Using Deep …
21
Fig. 9 YOLOv5s (a) provides more false positives in comparison to YOLOv5l (b)
Fig. 10 YOLOv5l (a) provides more false positives in comparison to YOLOR (b)
5 Conclusion In our research, we introduced the IndianFood-7 Dataset containing seven popular Indian Dishes. We applied various image augmentation techniques to make our models more robust. Our objective is to train a detection algorithm derived from a deep learning-based object to determine the appearance of the items in Indian food. We used various variants of the YOLOv5 and YOLOR models, which are state-of-art for an image in object detection. In future work, we would like to expand our Indian Food Dataset to maximize the impact of computer vision-based methods in detecting popular Indian Food Items.
22
R. Agarwal et al.
References 1. Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.287 6865 2. Cheng H, Jiang X, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Pattern Recogn 34(12):2259–2281 3. Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 4. Lin TY et al (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision–ECCV 2014. ECCV 2014. Lecture notes in computer science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48 5. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 779−788. https://doi.org/10.1109/CVPR.2016.91 6. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. Curran Associates, Inc. 7. Liu Y (2018) An improved faster R-CNN for object detection. In: 2018 11th international symposium on computational intelligence and design (ISCID). pp 119–123. https://doi.org/10. 1109/ISCID.2018.10128 8. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A (2016) SSD: Single shot multiBox detector, vol 9905. pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2 9. Liu K, Tang H, He S, Yu Q, Xiong Y, Wang N (2021) Performance validation of yolo variants for object detection. In: Proceedings of the 2021 international conference on bioinformatics and intelligent computing (BIC 2021). Association for Computing Machinery, New York, NY, USA, pp 239–243. https://doi.org/10.1145/3448748.3448786 10. Wang CY, Yeh IH, Liao HY (2021) You only learn one representation: unified network for multiple tasks 11. Martin CK, Nicklas T, Gunturk B, Correa JB, Allen HR, Champagne C (2014) Measuring food intake with digital photography. J Hum Nutr Diet 27:72–81 12. Anitha U, Narmadha R, Jebaselvi GA, Kumar SL, & Reddy PRS (2021) Real time estimation of calories from Indian food picture using image processing techniques. In: Recent trends in communication and electronics. CRC Press. pp 292–298 13. Sen C (2004) Food culture in India. Greenwood Press 14. Thames Q, Karpur A, Norris W, Xia F, Panait L, Weyand T, Sim T (2021) Nutrition5k: towards automatic nutritional understanding of generic food 15. Sathish S, Ashwin S, Quadira MA, Pavithra LK (2022) Analysis of convolutional neural networks on Indian food detection and estimation of calories 16. Yue X, Li H, Shimizu M, Kawamura S, Meng L (2022) YOLO-GD: a deep learning-based object detection algorithm for empty-dish recycling robots 17. Fakhrou A, Kunhoth J, Al Maadeed S (2021) Smartphone-based food recognition system using multiple deep CNN models 18. Hassannejad H, Matrella G, Ciampolini P, De Munari I, Mordonini M, Cagnoni S (2017) Food image recognition using very deep convolutional networks 19. Mezgec S, Korouši´c Seljak B (2017) NutriNet: a deep learning food and drink image recognition system for dietary assessment 20. Mishra M, Sarkar T, Choudhury T, Bansal N, Smaoui S, Rebezov M, Shariat MA, Lorenzo JM (2022) Allergen30: detecting food items with possible allergens using deep learning based computer vision
Prediction of Protein-Protein Interaction Using Support Vector Machine Based on Spatial Distribution of Amino Acids Monika Khandelwal, Ranjeet Kumar Rout, and Saiyed Umer
Abstract Protein-protein interaction (PPI) is vital for understanding protein functions and various cellular biological functions like DNA replication and transcription, signaling cascades, metabolic cycles, and metabolism. However, various experimental techniques exist for detecting protein-protein interactions, i.e., mass spectroscopy, protein arrays, yeast two-hybrid, etc. But these techniques are expensive and tedious, so there is a necessity to devise computational processes to facilitate the prediction of protein-protein interactions among the proteins. Computational methods offer a lowcost method to discover protein interactions that complement experimental methods. The methods based only on primary sequence data are more generic than methods based on additional details or protein-specific assumptions. This paper proposes a sequence-based model that combines local descriptors with Shannon entropy and Hurst exponent to detect PPI. Here, features are extracted directly from primary sequences, and the Support Vector Machine algorithm is used as a classifier. The proposed model on the DIP (Database of Interacting Proteins) dataset gives 96.71% accuracy with 94.94% precision and 98.58% recall. The findings validate that the proposed model performs better than various state-of-the-art predictors for proteinprotein interactions. Keywords Protein-protein interaction · Shannon entropy · Hurst exponent · Machine-learning methods · Support vector machines
M. Khandelwal · R. Kumar Rout (B) National Institute of Technology Srinagar, Hazratbal, J&K, India e-mail: [email protected] M. Khandelwal e-mail: [email protected] S. Umer Aliah University, Kolkata, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_3
23
24
M. Khandelwal et al.
1 Introduction In computational biology, protein-protein interactions (PPI) are of significant interest. By simulating protein interaction, strength, and folding, computational biology can help researchers better understand how proteins interact with one another. Proteins appear in a variety of shapes and functions. Protein-protein Interactions are the direct connection between two or more protein molecules that are either the same or different. A set of various proteins performs the majority of the molecular operations that take place within the cell. Understanding protein interactions is therefore crucial for cell development and repair [1]. PPI performs a significant task in understanding diverse biological activities within a cell involving metabolic cycles, signaling cascades, DNA transcription, and DNA replication. Prediction of protein interactions is significant for studying illness, biological systems, and developing treatments [2]. The essence of understanding protein interactions has motivated the advancement of experimental techniques to predict PPI. The experimental techniques developed for PPI prediction includes protein chips [3], mass spectrometry [4], yeast two-hybrid systems [5], tandem affinity purification (TAP) [6], and so on. However, these highthroughput techniques are time-intensive, costly, and produce false positives and negative predictions [7]. Experimentally derived PPI pairs only represent a small proportion of the complete PPI network [8]. So there is a necessity for a computational approach to identify PPIs. Various computational techniques have been devised to aid supporting data to experimental techniques [9, 10]. Existing methods mainly employ binary classification schemes with different characteristics for representing protein pairings. Protein interactions are inferred using a variety of protein characteristics, including gene expression, phylogenetic profiles, gene neighborhood, protein domains, literature mining expertise, and protein structural information [11–13]. These techniques cannot be used until pre-knowledge of the proteins is provided [14]. Recently, some approaches that generate information only from protein sequences have gained significant attention. Many studies exist to develop a sequence-based model for predicting PPI. Shen et al. [15] devised a support vector machine (SVM) based technique for predicting PPI from protein sequence information only. In their work, the 20 amino acids are clustered into seven categories depending on side-chain volumes and dipole strength. They then used conjoint triad descriptors to describe the primary sequence depending on the grouping of amino acids. This model on the human PPI dataset gives 83.9% accuracy. Guo et al. [16] devised a method for predicting PPI using SVM and auto covariance to derive interaction details from discontinuous regions of the amino acid sequences. This method on the yeast saccharomyces cerevisiae PPI dataset gives 88.09% accuracy. Additionally, You et al. [17] devised a sequence-based model by incorporating continuous and discontinuous feature sets with SVM to consider interactions among spatially close and sequentially distant amino acids. You et al. [18] developed another predictor for PPI based on multi-scale local descriptor and random forest (RF) to extract information from continuous segments of amino acids.
Prediction of Protein-Protein Interaction Using Support Vector Machine …
25
Further, Sun et al. [19] utilized a stacked autoencoder, a deep learning approach depending on the encoding decoding procedure, to create a sequence-based model for predicting PPI. The model was trained and evaluated on various species data sets involving Caenorhabditis elegans, Drosophila, Human, and Escherichia coli. Hashemifar et al. [20] devised DPPI, a deep learning technique to predict interactions between proteins by combining data augmentation along a deep convolutional neural network. Li et al. [21] devised a method by extracting discriminative features from position-specific scoring matrices, which contain evolutionary information from the primary sequences, and utilizing the rotation forest algorithm to find the interaction between proteins. Machine learning techniques comprises of support vector machine [22, 23], random forest [24], artificial neural network [25], decision tree [26], and clustering algorithms [27–29] are also popular in other research problems. This article proposes a supervised learning model to predict PPI from the information extracted from primary sequences. The presented model comprises two steps: representation and prediction. In the representation step, the amino acid sequences are represented by a fixed-length vector by integrating features, i.e., local descriptors, Shannon entropy, and Hurst exponent. In the second step, protein-protein interactions are predicted by using the SVM algorithm. The remaining article is depicted as follows. Section 2 describes the experimental setup used for the study. Section 3 discusses the methodology of the proposed model, along with the data set and features utilized to characterize a protein sequence. The results and discussion are described in Sect. 4, and Sect. 5 concludes the paper.
2 Experimental Setup This paper proposes a PPI prediction model to predict interacting pairs from primary sequences. First, the data set is collected from Wei et al. [30], which is used for predicting protein-protein interactions. Next, each protein sequence is characterized by a fixed-dimensional vector by extracting features from the amino acid sequences. The features used are Shannon entropy, local descriptors, and Hurst exponent. Next, the supervised machine learning algorithm, SVM, is utilized for training the model using 70% of the data set. Then the model’s performance is assessed using the remaining 30% data set to classify interacting and non-interacting pairs. The implementation of the model is done on a workstation with Windows operating system having a 3.6 GHz 6-core CPU and 64 GB RAM. The SVM algorithm is implemented using Python version 3.10.
3 Methodology This paper presents a prediction model to identify PPI pairs from amino acid sequences. A fixed-length vector characterizes every protein sequence by concate-
26
M. Khandelwal et al.
nating features, i.e., Shannon entropy, Hurst exponent, and local descriptors. Two protein sequences are concatenated to characterize a PPI pair. In this paper, SVM is used to make predictions of PPI pairs.
3.1 Data Set Wei et al. [30] provides the data set used in this work. DIP (Database of Interacting Proteins) is used to gather PPI interactions [31]. This will create the positive samples (interacting proteins), and the negative samples are created by using non-interacting protein pairs randomly chosen from the set of all non-interacting pairs. After that, protein pairs with identical subcellular locations are excluded. Then positive and negative samples are combined to create the standard data set. The standard data set comprises 12890 protein pairs having a proportion of 1:1 among positive and negative samples.
3.2 Feature Representation An amino acid sequence is represented by various features such as Shannon entropy, Hurst exponent, and Local descriptors comprised of composition, transition, and distribution.
3.2.1
Shannon Entropy
Shannon entropy (SE) measures the amount of uncertainty within a protein sequence. SE estimates protein sequence information and may be used to predict PPI [32]. The below equation can calculate SE: SE = −
20 ∑
.
pi log2 (pi )
(1)
i=1
where .pi denotes the probability of .ith amino acid within a primary sequence. The SE of two proteins .P1 and .P2 are concatenated together to represent the pair .P12 (interacting or non-interacting pair).
3.2.2
Hurst Exponent
Hurst exponent (HE) is utilized to depict autocorrelation in time series analysis [33]. The HE value ranges between 0 and 1. If HE is equal to 0.5, the series is random,
Prediction of Protein-Protein Interaction Using Support Vector Machine …
27
indicating no relationship between the variable and its previous values. A time series has a negative autocorrelation if the HE value is between 0 and 0.5 and a positive autocorrelation if it is between 0.5 and 1 [34]. HE is computed by using rescaled range (R/S) analysis. The below equation can compute the Hurst exponent for a protein sequence .Pn : n HE R(n) .( ) = (2) 2 S(n) ⎡ | n |1 ∑ (Pi − m)2 .S(n) = √ n i=1
where
(3)
and R(n) = max(Q1 , Q2 , . . . , Qn ) − min(Q1 , Q2 , . . . , Qn )
.
Qj =
j ∑
.
(Pi − m) for j = 1, 2, 3, . . . , n
(4) (5)
i=1
1∑ Pi n i=1 n
m=
.
(6)
The HE of two proteins .P1 and .P2 are concatenated to represent the protein pair P12 .
.
3.2.3
Local Descriptors
Local descriptors consist of composition, transition, and distribution. The composition represents the frequency of amino acids representing a particular group. Transition indicates the frequency with which other groups follow the amino acids of a specific group. The distribution indicates the length of the sequence in which the first, 25%, 50%, 75%, and 100% of amino acids of a group are found, respectively [17]. The 20 amino acids are arranged into 7 clusters depending on side-chain volume and dipole strength to minimize the complex representation of 20 amino acids. Table 1 shows the grouping of amino acids. For every primary sequence, the three descriptors are estimated and concatenated to form a 63-dimension vector: 7 for composition, 35 for distribution, and 21 for transition [18]. The local descriptors of two proteins .P1 and .P2 are concatenated to characterize the protein pair .P12 . All three features, i.e., Shannon entropy, Hurst exponent, and local descriptors, are combined to describe the protein pair by a 130dimensional feature vector.
28
M. Khandelwal et al.
Table 1 Amino acids grouping based on side-chain volumes and dipoles Group number Amino acids Group.1 Group.2 Group.3 Group.4 Group.5 Group.6 Group.7
A, V, G C S, M, Y, T P, F, L, I H, Q, W, N R, K E, D
3.3 Support Vector Machines (SVM) SVM is a supervised method developed by Vapnik for regression and classification problems [35]. SVM is a well-known technique that may be applied in different fields of computational biology and bioinformatics. Recently, SVM also gained a lot of interest in the prediction of PPI pairs. The critical distinction between SVM and other classification algorithms is that it focuses on minimizing structural risk rather than empirical risk. SVM training can deal with many features since it always finds a global optimum solution and prevents over-fitting. SVM finds the hyperplane, which splits the data set into positive and negative classes and minimizes the error.
4 Results and Discussion The evaluation standards that were used to examine the prediction model are covered in this section. Analysis and comparison of the proposed model’s performance against previous state-of-the-art predictors are also discussed in this section.
4.1 Evaluation Metrics The evaluation metrics like precision, accuracy, and sensitivity (recall) estimate the proposed model performance. Accuracy (ACC) =
.
TP + TN TP + TN + FP + FN
Precision (PR) =
.
TP FP + TP
(7)
(8)
Prediction of Protein-Protein Interaction Using Support Vector Machine …
29
Fig. 1 Comparison of the SVM, DT, and KNN for prediction of PPI pairs
Sensitivity (SE) =
.
TP FN + TP
(9)
where FN, FP, TN, and TP indicate the count of false negatives, false positives, true negatives, and true positives, respectively.
4.2 Performance of Proposed Model A model is proposed to identify PPI from primary sequences. In this paper, various classifiers involving decision tree (DT), k-nearest neighbor (KNN), and support vector machine (SVM) are used. The comparison results of all classifiers are shown in Fig. 1. From Fig. 1, we observed that the SVM performs better than diverse classifiers concerning all evaluation metrics, i.e., precision, accuracy, and recall (sensitivity). So, SVM is selected as a classifier to predict PPI pairs. A fixed-length vector characterizes each protein sequence by combining features, i.e., Shannon entropy, Hurst exponent, and local descriptors consisting of composition, transition, and distribution. A 130-dimensional vector characterizes every PPI pair. To analyze the proposed model performance, 70% data set is used for training the model, and 30% data set is used for testing the model. From Fig. 1, we can observe that the proposed model gives 96.71% accuracy, 94.94% precision, and 98.58% sensitivity for PPI prediction.
30
M. Khandelwal et al.
Table 2 Comparison of the proposed model with existing predictors Method Accuracy (%) Precision (%) Yang et al. [38] You et al. [10] You et al. [17] Zhou et al. [13] Wong et al. [36] Du et al. [37] You et al. [18] Our model
86.15 87.00 91.36 88.56 93.92 92.50 94.72 96.71
90.24 87.59 91.94 89.50 96.45 94.38 98.91 94.94
Sensitivity (%) 81.03 86.15 90.67 87.37 91.10 90.56 94.34 98.58
4.3 Proposed Model Comparison Against Various Predictors The model is also compared with various state-of-the-art predictors. The proposed model performance is compared with Wong et al. [36], Du et al. [37], Zhou et al. [13], Yang et al. [38], You et al. [10], You et al. [17], and You et al. [18]. From Table 2, we observed that the proposed model gives an accuracy of 96.71%, 10.56% higher than Yang et al. [38], 9.71% higher than You et al. [10], 5.35% higher than You et al. [17], 8.15% higher than Zhou et al. [13], 2.79% higher than Wong et al. [36], and 4.21% higher than Du et al. [37]. The proposed model gives 94.94% precision, which is higher than all models except You et al. [18], and 98.58% sensitivity, which is higher than all previous models. Table 2 indicates that the proposed prediction model performs superior to previous state-of-the-art models.
5 Conclusion This paper presented a prediction model to predict PPI pairs from protein sequences. The amino acid sequences were converted into a fixed vector by concatenating features, i.e., Shannon entropy, Hurst exponent, and local descriptors. SVM was utilized to predict interactions among proteins. A 130-dimensional feature vector represented every PPI pair. The proposed model gives 96.71% accuracy, 94.94% precision, and 98.58% sensitivity. Experimental findings indicate that our proposed model performed well in differentiating between non-interacting and interacting PPI pairs. Future research work may be extended by employing a better coding scheme, fractal dimension to find self-similarity [39], and ensemble classifier to classify PPI pairs.
Prediction of Protein-Protein Interaction Using Support Vector Machine …
31
References 1. Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147 2. Browne F, Zheng H, Wang H, Azuaje F (2010) From experimental approaches to computational techniques: a review on the prediction of protein-protein interactions. Adv Artif Intell 16877470 3. Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A et al (2001) Global analysis of protein activities using proteome chips. Science 293(5537):2101–2105 4. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868):180–183 5. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci 98(8):4569–4574 6. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643 7. You ZH, Lei YK, Gui J, Huang DS, Zhou X (2010) Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21):2744–2751 8. Han JDJ, Dupuy D, Bertin N, Cusick ME, Vidal M (2005) Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 23(7):839–844 9. Shoemaker BA, Panchenko AR (2007) Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol 3(4):e43 10. You ZH, Lei YK, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinf 14(8):1–11 11. Lei YK, You ZH, Ji Z, Zhu L, Huang DS (2012) Assessing and predicting protein interactions by combining manifold embedding with multiple information integration. BMC Bioinf 13(7):1–18 12. Zhang QC, Petrey D, Deng L, Qiang L, Shi Y et al (2012) Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 490(7421):556–560 13. Zhou YZ, Gao Y, Zheng YY (2011) Prediction of protein-protein interactions using local description of amino acid sequence. In: Advances in computer science and education applications. Springer, Berlin, Heidelberg, pp 254–262 14. Autore F, Pfuhl M, Quan X, Williams A, Roberts RG et al (2013) Large-scale modelling of the divergent spectrin repeats in nesprins: giant modular proteins. PLoS One 8(5):e63633 15. Shen J, Zhang J, Luo X, Zhu W, Yu K et al (2007) Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci 104(11):4337–4341 16. Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 36(9):3025– 3030 17. You ZH, Zhu L, Zheng CH, Yu HJ, Deng SP, Ji Z (2014) Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinf 15(15):1–9. BioMed Central (2014) 18. You ZH, Chan KC, Hu P (2015) Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 10(5):e0125811 19. Sun T, Zhou B, Lai L, Pei J (2017) Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinf 18(1):1–8 20. Hashemifar S, Neyshabur B, Khan AA, Xu J (2018) Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics 34(17):i802–i810 21. Li Y, Wang Z, Li LP, You ZH, Huang WZ, Zhan XK, Wang YB (2021) Robust and accurate prediction of protein-protein interactions by exploiting evolutionary information. Sci Rep 11(1):1–12
32
M. Khandelwal et al.
22. Khandelwal M, Rout RK, Umer S (2022) Protein-protein interaction prediction from primary sequences using supervised machine learning algorithm. In: 2022 12th international conference on cloud computing, data science and engineering (confluence). IEEE, pp 268–272 23. Umer S, Mohanta PP, Rout RK, Pandey HM (2021) Machine learning method for cosmetic product recognition: a visual searching approach. Multimed Tools Appl 80(28):34997–35023 24. Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104 25. Khandelwal M, Gupta DK, Bhale P (2016) DoS attack detection technique using back propagation neural network. In 2016 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1064–1068 26. Song YY, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130 27. Rout RK, Hassan SS, Sindhwani S, Pandey HM, Umer S (2020) Intelligent classification and analysis of essential genes using quantitative methods. ACM Trans Multimed Comput Commun Appl (TOMM), 16(1s):1–21 28. Rout RK, Hassan SS, Sheikh S, Umer S, Sahoo KS, Gandomi AH (2022) Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences. Comput Biol Med 141:105024 29. Khandelwal M, Sheikh S, Rout RK, Umer S, Mallik S, Zhao Z (2022) Unsupervised learning for feature representation using spatial distribution of amino acids in aldehyde dehydrogenase (ALDH2) protein sequences. Mathematics 10(13):2228 30. Wei L, Xing P, Zeng J, Chen J, Su R, Guo F (2017) Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med 83:67–74 31. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucl Acids Res 32(Suppl 1):D449–D451 32. Khandelwal M, Shabbir N, Umer S (2022) Extraction of sequence-based features for prediction of methylation sites in protein sequences. Artif Intell Technol Comput Biol 33. Hurst HE (1951) Long-term storage capacity of reservoirs. Trans Am Soc Civ Eng 116(1):770– 799 34. Qian B, Rasheed K (2004) Hurst exponent and financial market predictability. In: IASTED conference on financial engineering and applications. Proceedings of the IASTED international conference, Cambridge, MA, pp 203–209 35. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 36. Wong L, You ZH, Li S, Huang YA, Liu G (2015) Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor. In: International conference on intelligent computing. Springer, Cham, pp 713–720 37. Du X, Sun S, Hu C, Yao Y, Yan Y, Zhang Y (2017) DeepPPI: boosting prediction of proteinprotein interactions with deep neural networks. J Chem Inf Model 57(6):1499–1510 38. Yang L, Xia JF, Gui J (2010) Prediction of protein-protein interactions from protein sequence using local descriptors. Protein Pept Lett 17(9):1085–1090 39. Rout RK, Pal Choudhury P, Maity SP, Daya Sagar BS, Hassan SS (2018) Fractal and mathematical morphology in intricate comparison between tertiary protein structures. Comput Methods Biomech Biomed Eng Imaging Vis 6(2):192–203
A Computational Comparison of VGG16 and XceptionNet for Mango Plant Disease Recognition Vinita
and Suma Dawn
Abstract Globally, plant diseases have caused an estimated 14% annual yield loss thereby causing suffering to untold millions of people over the years. As a result, early detection of plant disease is critical for increased agricultural productivity. Plant diseases are caused by both bacterial and fungal diseases and some factors such as temperature, pH, humidity, and moisture. Misdiagnosis can result in misuse of chemicals, environmental imbalance, and drug resistance. Traditional approaches needed a large amount of time, intensive research, and continual farm monitoring to identify plant health concerns. However, thanks to recent technical developments, companies have managed to identify the ideal solutions, leading to increased yields. In a range of industries, artificial intelligence (AI), which is a fast-expanding science, can be utilized to automate processes and boost productivity. In the agricultural sector, it was also employed to increase crop productivity through the early identification and categorization of disease affection. Image acquisition, pre-processing, segmentation, augmented features, and model prediction all contribute to automatic disease detection. Deep convolutional neural networks have now been utilized in recent years to accurately identify crop illnesses and classify them into the appropriate classifications. In this paper, we use a deep neural network approach to classify illnesses in 11 crop species using thousands of photos. As a result, two deep learning algorithms, namely, VGG16 and XceptionNet are applied to the Kaggle database to classify the diseases under study. The accuracy and loss are compared and then the findings are discussed for both the models. Keywords VGG16 · XceptionNet · Deep CNN · Convolution · Early disease detection
Vinita (B) · S. Dawn Department of Computer Science and Engineering, JIIT Noida, Sector-62, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_4
33
34
Vinita and S. Dawn
1 Introduction In India, agriculture is a significant source of income. The majority of the workforce in the nation is employed in agriculture, either implicitly or explicitly. Therefore, increasing agricultural output is essential for the nation’s continuing economic growth. Furthermore, the agriculture sector has started to explore whole new ways to improve food production in terms of population expansion, weather changes, and political unpredictability. Farmers continue to struggle with problems like early plant disease detection. But it is not always possible to look at the type of disease on a plant’s leaf with the unaided eye, an automatic professional device that could help detect the ailment on time may be incredibly helpful. The use of photo processing and a device learning technique, in particular, will help farmers identify plant disorders in their early stages. India is the top mango-producing nation in the world, contributing roughly 40% of the total production. Mango production in India accounts for roughly 57% of global output, and there is a market for them both domestically and abroad [1]. Pests and diseases are thought to be responsible for 30–40% of the agricultural yield loss. Alternaria leaf spots, Gall infestation, mango deformity, Webber’s attack, Anthracnose, stem miner, etc., are some of the frequent diseases that affect mango plants. Pathogens like bacteria, viruses, fungi, parasites, and more can cause these diseases, as well as unfavorable environmental factors. Disease in the leaf impairs photosynthesis, which kills the plant. The type of disease is determined by the signs and the areas of the diseased leaves. In the past, identifying plant diseases was often done by skilled farmers examining plants regularly. It was simple to identify the infections in small farms and put quick prevention and control strategies measures in place. Large farms, however, find it to be time- and money-consuming. Therefore, finding an automated, precise, quick, and less expensive system for identifying plant diseases is crucial. The most well-liked and extensively utilized methods for identifying and categorizing plant leaf diseases are image processing and machine learning. Utilizing non-destructive, automated, and cost-effective methods, machine vision technology guarantees a rise in crop productivity [2]. Deep learning, a branch of machine learning in general, employs neural networks and has stretched its wings across several industries, offering a wide range of applications. Over the past few decades, the detection of leaf disease has been a long-running research area. Researchers have looked into a variety of machine learning and pattern recognition techniques to increase the accuracy of disease detection. Convolutional neural networks [3], artificial neural networks [4], back propagation neural networks [5], support vector machines [6], and other image processing techniques [7, 8] are only a few examples of the machine learning approaches used. Convolutional neural networks themselves perform feature extraction and classification in the aforementioned methods. For feature extraction, alternative techniques employ the Color Cooccurrence matrix [9], Angle Code Histogram, Zooming algorithm [10], Canny edge detector, and several more algorithms. Research has been done to categorize several
A Computational Comparison of VGG16 and XceptionNet for Mango …
35
illnesses in a single plant species or a single disease in multiple plant variations. Numerous crops, including rice, wheat [11], maize [12], and cotton [13], are grown using these cutting-edge approaches. In comparison to other techniques, CNN also requires little or no image pre-processing. Deep learning algorithms have recently been used in much research on the automated identification of plant diseases. A technique for identifying and categorizing photos of rice plant leaves that are sick was put forth by Yang Lu et al. utilizing deep convolutional neural networks. We used a dataset of 500 real photographs of damaged and healthy rice stems and leaves that were taken in an experimental field. The tenfold cross-validation technique trains CNNs to recognize ten prevalent rice illnesses. Additionally, the stable characteristic doesn’t somehow hold across color channels because color images are being employed. To acquire training features and testing features, principal component analysis and whitening are performed once the images are rescaled in the [0, 1] range. A deep learning-based identification system was created by Alvaro Fuentes et al. [14] for the real-time identification of infestations and leaf diseases in tomato plants. A dataset of 5000 photos is used to train three distinct detectors to identify nine different tomato illnesses, including Region-based Fully Convolutional Network (R-FCN), Single Shot Multibox Detector (SSD), and Faster Region-based Convolutional Neural Network (Faster R-CNN). To decrease false positives and improve accuracy, data annotation, and augmentation are also carried out. Mohanty et al. [15] presented a deep convolutional neural network-based approach for identifying leaf diseases. This study classified 38 categories built up of 14 different crops and 26 disease variations using a collection of 54,306 images out of the Plant Village dataset. A DCNN-based computational method for evaluating plant leaf disease is provided in this paper. It uses batch normalization and the ReLU activation function to swiftly and accurately produce recognition results. To apply deep neural meta-architecture in real condition detection and recognition using tomato plant leaves, R-CNN, faster R-CNN, and SSD processing techniques are employed [16]. This deep learning architecture was found to be suitable for processing tomato leaf sample illness classification. According to [17], a dense deep convolution network design can be used to train large plant leaves. For extracting the features of leaf diseases, the textural analysis of turmeric leaf blemish and leaf spot utilizing SVM and GLCM classifier is discussed in [18]. The CNN (Convolutional Neural Network) structure utilized in this paper’s computational research, which uses VGG 16 and XceptionNet, is used to forecast the disease category that affects mango plants. The precise classification of the leaves is the aim of this experimental technique. Alternaria leaf spots, Anthracnose, leaves gall, leaf Webber, and leaves burn are five distinct mango leaf diseases that are the focus of this research. The CNN analytically extracts automated features from the raw inputs. The selection of the features based on their likelihood values determines the categorization. The Kaggle dataset is used to train the VGG16 and XceptionNet models. When compared to the VGG16 model, which has a 70.25% accuracy rate, the XceptionNet model has a much higher accuracy rate of 80.75%. The following sections are included in the paper: The transfer learning strategy and various models used to train the model are described in Sect. 2. The dataset
36
Vinita and S. Dawn
utilized is briefly described in Sect. 2 only. The numerous experimental results and analyses are covered in Sect. 3. Finally, we discussed the conclusion of our research in Sect. 4 and put an end to the paper.
2 Methodology and Dataset Images of plant leaves are used in many agricultural applications, such as plant disorder identification and detection, to extract the data needed for analysis. The information is extracted using device learning algorithms and image processing techniques. Detailed descriptions of the whole training and validation procedure for the model for plant disease detection are discussed further.
2.1 Architecture of the Proposed System The suggested model architecture that we employed in our investigation is shown in Fig. 1. The Kaggle database is where the mango leaf dataset is initially compiled. The data is then pre-processed to remove noise and extraneous information. Following pre-processing, we divided the data into the training set and validation set ratios of 80% to 20%, respectively. The training data is then individually loaded further into VGG16 and XceptionNet models to extract significant patterns and insights from the input data. The trained model is validated against the testing data to determine how well these transfer learning models perform. Finally, a model comparison was done.
2.2 Dataset Description The dataset contains 768 (healthy and unhealthy) images of 32 mango leaf species (Indian). Each species has 24 images from different angles and orientations. Various bacterial, fungal, and viral illnesses in mango crops are included in the diseased classes. The pictures in this dataset were taken in an organized and sophisticated setting (i.e., they were taken with a DSLR or high-quality camera under ideal lighting conditions). The various classes available in the dataset include Alphanso, Amarpali, Amarpali Desi, Ambika, Austin, Chausa, Chausa Desi, Dusheri, Dasheri Desi, Duthpedha, Farnadeen, Kent, and Keshar. Diseased images [19] of mango plant leaves are shown in Fig. 2.
A Computational Comparison of VGG16 and XceptionNet for Mango …
37
Dataset Images (Input)
Data Preprocessing (224 X 224 size images)
Dataset splitting (Training Data (80%), Testing (20% Data))
Model Training (VGG16 and XceptionNet)
Validating models and Disease Classification
Evaluating Performance (Accuracy and Loss) Fig. 1 Proposed methodology for mango leaf disease
(a)
(b)
(c)
Fig. 2 a Malformed branches b Anthracnose disease c Gall formation
2.3 Data Pre-processing The images in the aforementioned database are first scaled. When training a Convolution neural computer vision system for image classification, it is always recommended to minimize the dimensions because doing so increases the trained model’s performance with mini-batch processing of images. We are using the VGG16 and
38
Vinita and S. Dawn
XceptionNet concepts in our study, and the input size is kept at 224 × 224 RGB images. The second step is to apply data augmentation to increase the number of photographs in the data. It uses a variety of modification techniques to subtly alter the original photographs. The two fundamental types of data augmentations are position augmentation and color augmentation. When using color augmentation, the RGB values of the pixels are modified to either increase or decrease the luminance of the images. Using the image data generator function, which applied random changes to the training and validation dataset, picture pre-processing was carried out. The actions include rescaling by 1/255, zooming out by 0.2, shifting the width and height by 0.2, and shearing by 0.2.
2.4 Models Used To resolve the plant disease identification issue, we have used various machine learning models for the Kaggle dataset in this study. Here, we employ the transfer learning strategy to increase the precision of our model. This method entails extracting features from a learned problem and using them to solve a separate but related problem. The execution of transfer learning for this study entails the following steps: 1. Using layers from a model that has already been trained. 2. Freeze them to remove any data that might be present in those layers and influence upcoming training sessions. 3. The characteristics of frozen layers can be used to classify a new problem by adding new layers (trainable) over the frozen layers. 4. Working through the fresh layers. The transfer learning strategy for detecting plant diseases was applied using the previously trained models listed below: VGG16 Architecture The Oxford University Visual Geometry Group created VGG16, where 16 denotes the number of neural network layers in the model [20]. On the ImageNet dataset, which comprises more than 14 million images in 1000 different classes, VGG16 has a 92.7 percent accuracy rate. The VGG16 design is shown in Fig. 3, and it consists of 13 convolutional networks followed by 3 dense layers that are fully connected. Max-pooling layers are used to divide the convolutional layers into various blocks. Two convolutional layers with a 3 × 3 filter size are present in the first block. The convolutional stride and spatial padding are both set at 1 pixel. Following the first block is a max-pooling layer with a 2 × 2 pixel window and a stride value of 2. Block 2 contains one max-pooling layer and two convolutional layers, just like Block 1. The system uses three convolutional layers in blocks 3, 4,
A Computational Comparison of VGG16 and XceptionNet for Mango …
39
Fig. 3 Layer representation in VGG16 [21]
and 5, and a max pooling layer separates each block. For the convolutional layers and max-pooling layers throughout the model, the values of the filter size, padding, and stride are the same. A SoftMax classification layer comes after two dense layers in block 6. For every convolutional layer, the system employs the ReLu activation function. XceptionNet Architecture A CNN architecture called the Xception model was introduced by Francois Chollet. There are 71 layers in the model and its architecture is shown in Fig. 4. This network’s pre-trained version was developed using millions of photos from the ImageNet dataset. The pre-trained model can categorize objects into 1000 different categories and offers a rich feature representation for a range of photos.
40
Vinita and S. Dawn
Fig. 4 XceptionNet layer representation [22]
2.5 Training and Compiling the Model Building CNN using the VGG16 and XceptionNet architecture, assembling the model, and fitting images into the model are the steps involved in training the model. The data is divided into training and testing sets in an 8:2 ratio after pre-processing. From the Keras application library, import the necessary model. While training the image size is specified as 224 × 224 × 3, where 224 × 224 is the input image’s dimension and 3 is the RGB channel. The weights are specified as “ImageNet” weights. The output of the VGG model is then flattened, and SoftMax is applied to the output layer’s activation function. The model was trained using epoch values up to 10. The optimizer, loss, and metrics are the three parameters needed to compile the model. The stochastic gradient descent optimizer was utilized, and its learning rates were set to 0.001 and 0.05. Momentum was set to 0.9, and decay was set to 0.005. As the loss function, categorical cross-entropy is employed. With the Adam optimizer, the system utilizes categorical cross-entropy as the loss function. In situations where the target class has many classes, categorical cross entropy is applied. The Adam optimizer is an optimization method that combines the root mean square propagation and the gradient descent with momentum algorithm methods of gradient descent. When we have a big model with lots of classes and plenty of data or parameters, the Adam optimizer works well. It uses less memory and is effective.
A Computational Comparison of VGG16 and XceptionNet for Mango …
41
3 Result and Analysis We conduct the following experiments to evaluate the effectiveness of plant disease detection using deep learning models: (a) Utilize the Kaggle dataset to train the VGG16 model; (b) Utilize the Kaggle dataset to train the XceptionNet model; and (c) Examine each model’s performance and accuracy on the dataset. The VGG-16 architecture, which has 16 layers, and the XceptionNet architecture, which has 71 layers, were used to train the dataset. The result obtained for both models is a learning rate of 0.001. A batch normalization layer and a dropout of 0.25 were added to the model to increase its accuracy. To lessen model overfitting, dropout is implemented. It reduces overfitting by dropping off a certain proportion of neurons at random from the network. The inputs and outputs are standardized using the batch normalization layer. By bringing the hidden layers into the ordinary, it accelerates learning. When the trained model was validated using test data, VGG16 and XceptionNet both had accuracy ratings of 66 and 87 percent, respectively. The training and testing data for the XceptionNet model’s accuracy and loss are shown in Tables 1 and 2, respectively. It is evident from the data in Tables 1 and 2 that any model trained using transfer learning outperforms a standard CNN model (used in the literature) in terms of performance. Although VGG16’s accuracy score is lower than the Xception model’s accuracy value. The reason for having less accuracy in the VGG16 model is the use of the “Adam” optimizer in the training phase as it fails due to the large number of parameters in the VGG network. Second, there weren’t as many images used for training, therefore not all the features were adequately taken into account. Consequently, the pre-trained weights of transfer learning tend to raise the model’s overall accuracy. The advantage of utilizing pre-trained weights is that it allows you to get over the restriction of a short dataset, such as one with only 600 photos. However, you must make sure that just the final few layers are made trainable and the rest are rendered untrainable. Table 1 Training using VGG16
Epoch
Training loss
Validation loss
Accuracy
1
3.4949
3.5140
0.0312
2
3.4188
3.4464
0.1125
3
3.2612
3.3697
0.1150
4
3.1181
3.2987
0.2625
5
2.8506
3.1937
0.3313
6
2.6241
3.0879
0.4187
7
2.3931
2.9781
0.4625
8
2.1890
2.8829
0.5125
9
1.8715
2.7275
0.6625
10
1.7299
2.4655
0.6625
42 Table 2 Training using XceptionNet
Vinita and S. Dawn
Epoch
Training loss
Validation loss
Accuracy
1
2.6411
3.1242
0.7585
2
2.7960
2.9649
0.5895
3
2.6414
2.8730
0.6619
4
2.5217
2.8648
0.7344
5
2.4298
2.7855
0.7926
6
2.3969
2.6892
0.7713
7
2.3272
2.6115
0.7926
8
2.2576
2.5337
0.8139
9
2.1879
2.4560
0.8452
10
2.1182
2.3782
0.8666
The accuracy and loss results from the training and validation sets are plotted. The accuracy and loss graphs for the VGG16 model are shown in Fig. 5, while the accuracy and loss graphs for the XceptionNet model are shown in Fig. 6. As the epoch advances, it is evident from the preceding graphs that the validation loss, training loss, and accuracy are all reduced. In this manner, an effective training result was attained, with accuracy improving and loss decreasing. The aforementioned VGG16 & XceptionNet models have a 77% average accuracy. However, in [20], the authors trained the dataset with eight popular transfer learning models and employed GANs to enhance the volume of data to apply the XceptionNet Model to obtain greater accuracy. This demonstrates how rapidly mango leaf diseases can be identified and classified using XceptionNet. To shorten the training period, deep learning models can be trained both with and without the use of freezing. Accuracy can also be increased by layer freezing.
Fig. 5 VGG-16 model loss and accuracy graphs
A Computational Comparison of VGG16 and XceptionNet for Mango …
43
Fig. 6 Loss and accuracy graphs of the XceptionNet model
4 Conclusion Intense plant diseases are the cause of the yearly decrease in agricultural productivity. Therefore, early detection of flora diseases is essential for preventing such severe losses in the future. The most widely applied classification methods for identifying and detecting diseases on plant leaves have been assessed. In this study, VGG16 and XceptionNet, two deep convolutional neural networks, were used to diagnose diseases in mango crops. Based on various criteria, including accuracy and the ability to forecast noisy and blurry images, the models were compared. Though the VGG16 model took less time for training than XceptionNet, it could not perform well in terms of accurately recognizing the affected area in the mango leaves. Because of fewer images in the dataset used it can be concluded that XceptionNet, which had a high accuracy of 87 percent, performed better in disease identification than the VGG16 model.
References 1. Khoje S, Bodhe SK (2013) Application of colour texture moments to detect external skin damages in guavas (Psidium guajava L). World Appl Sci J 27(5):590–596 2. Rehman TU, Mahmud MS, Chang YK, Jin J, Shin J (2019) Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Comput Electron Agric 156:585–605. ISSN 0168–1699. 3. Lu Y, Shujuan, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep convolutional neural networks. Elsevier J Neurocomputing
44
Vinita and S. Dawn
4. Omrani E, Khoshnevisan B, Shamshirband S, Shaboohi, H, Anuar NB, Nasir MHNM (2014) Potential of redial basis function-based support vector regression for apple disease detection. Meas 55:512−519 5. Fulsoundar K, Kadlag T, Bhadale S, Bharvirkar P (2014) Detection and classification of plant leaf diseases. Int J Eng Res Gen Sci 2(6) 6. Panchal SS, Sonar R (2016) Pomegranate leaf disease detection using support vector machine. Int J Eng Comput Sci 5(6). ISSN: 2319–7242 7. Preethi R, Priyanka S, Priyanka U, Sheela A (2015) Efficient knowledge based system for leaf disease detection and classification. Int J Adv Res Sci Eng 4(01) 8. Revathi P, Hema Latha M (2012) Classification of cotton leaf spot diseases using image processing edge detection techniques. Int J Appl Res 169–173. ISBN 9. Arivazhagan S, Newlin Shebiah R, Ananthi S, Vishnu Varthini S (2013) Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture features. CIGR J 15(1):211 10. Phadikar S, Sil J (2008) Rice disease identification using pattern recognition techniques. In: Proceedings of 11th international conference on computer and information technology. pp 25–27 11. Khairnar K, Dagade R (2014) Disease detection and diagnosis on plant using image processinga review. Int J Comput Appl 108(13):3639 12. Zhang LN, Yang B (2014) Research on recognition of maize disease based on mobile internet and support vector machine technique. Trans Tech Publ 108(13):659662 13. Shicha Z, Hanping M, Bo H, Yancheng Z (2007) Morphological feature extraction for cotton disease recognition by machine vision. Microcomput Inf 23(4):290292 14. Fuentes A, Yoon S, Kim SC, Park DS (2017) A robust deep-learning-based detector for real-time tomato plant diseases and pests detection. Sensors 17(9):2022. 15. Mohanty SP, Hughes D, Salathe M (2016) Using deep-learning for image base plant disease detection. Front Plant Sci 7:1419. https://doi.org/10.3389/fpls.2016.01419 16. Fuentes A, Yoon S, Kim S, Park D (2022) A robust deep-learning-based detector for real-time tomato plant diseases and pest’s recognition. Sensors. 17(9) (2017):2022 17. Tiwari V, Joshi RC, Dutta MK (2021) Dense convolutional neural networks based multiclass plant disease detection and classification using leaf images. Ecol Inform 63:101289. ISSN 1574–9541 18. Kuricheti G, Supriya P (2019) Computer vision based turmeric leaf disease detection and classification: a step to smart agriculture. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). pp 545–549 19. https://www.plantsdiseases.com/p/diseases-of-mango.html. Accessed 27 June 2022 20. Liu B, Tan C, Li S, He J, Wang H (2020) A data augmentation method based on generative adversarial networks for grape leaf disease identification. IEEE Access 8:102188–102198 21. Gopalakrishnan K, Khaitan SK, Choudhary A, Agrawal A (2017) Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr Build Mater 157(8):322–330 22. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1251–1258
Generate Artificial Human Faces with Deep Convolutional Generative Adversarial Network (DCGAN) Machine Learning Model Charu kaushik and Shailendra Narayan Singh
Abstract Generative Models have received a lot of interest in the field of unsupervised learning thanks to a cutting-edge framework called Generative Adversarial Networks (GANs) because of their impressive data-generating skills. Numerous GAN models have been published, and numerous applications in computer vision and machine learning have emerged in a variety of domains. Backpropagation and a competitive process called Generative Network G and Discriminative Network D, in which G generates artificial images and D classifies them into real or artificial image categories, are combined in GANs. As the training progresses, G acquires realistic picture-generating abilities to deceive D. This work has trained Deep Convolutional Generative Adversarial Networks (DCGAN), a GAN-based convolutional architecture to develop a generative model capable of producing precise images of human faces. The CelebFaces Attributes Dataset was utilized to train the DCGAN model (CelebA). The outcomes demonstrate that the model is capable of producing human faces from unlabeled data and random noise. The trained DCGAN model has been quantitatively evaluated using the Structural Similarities Index (SSIM), which calculates the structural and contextual similarity between two images, and a Peak Signal-to-Noise Ratio (PSNR), which measures how well an image can be represented compared to the amount of noise that can degrade the quality of that representation. Keywords GANs · Generative network · Discriminative network · Celebrates faces · Deep convolutional generative adversarial networks
C. kaushik (B) · S. N. Singh Amity University, Uttar Pradesh, Noida, India e-mail: [email protected] S. N. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_5
45
46
C. kaushik and S. N. Singh
1 Introduction In the area of unsupervised learning, generative networks have attracted a lot of attention. The most complicated approach in generative learning is GAN. Generative Adversarial Networks (GANs) can help with this. On paper, GAN attempts to do unattended learning using a supervised learning strategy by producing fictitious or fictional test data. Deep convolution can be learned while not having severely constrained information thanks to generative adversarial networks. They are obtained by this act coupled with opposing techniques between two networks that are at odds with one another. There are numerous planned generative models for this problem. Deep Convolution Gan, or DCGAN, is one of them. One type of GAN that uses convolutional layers that are coupled is thought to be DC GAN. The GAN model that is being employed in this instance is a DCGAN. project’s initial objective is to collect faces with latent vectors or random noise. The coaching of two networks simultaneously—the generating network, represented by the letter G, and the individual network, represented by the letter D—is the essence of a GAN. D is genuinely someone who classifies original images as real using his or her approach. In qualification, G may be a generator that produces images and makes an effort to trick the person by producing information that appears to be relatively real. As these two networks separate, G eventually produces accurate information, while D improves at predicting false ones. The representations that GANs will learn can also be used for a variety of tasks, including image classification, image super-resolution, image fusion, and phonetic image writing. Many applications, such as hand-written font creation, picture mixing, image in-painting, face aging, text synthesis, and human create a synthesis, as well as script apps and image manipulation applications, have been introduced by GAN. Three-dimensional (3D) picture synthesis, object identification, and visual salience prediction medical application, application of facial cosmetics, and facial image super-resolution, landmark d1 detection, sketch synthesis, texture synthesis translation from image to image, read from the front generation, speech and language synthesis, and music generation, laptop video apps, and community for graphics (Fig. 1). At both the Discriminator and the Generator, DCGANs support convolutional brain networks rather than simple brain networks. They take better-looking photos and are steadier. The Generator is a collection of convolutional layers that either have partial stridden convolutions or render convolutions, which up-example the information image at each convolutional layer. The discriminator down-examples the information picture at each convolution layer since it is a collection of stridden convolution layers. DCGAN is a generative adversarial organization engineering. It employs a few special rules, including [1]: • Replacing any pooling layers with stridden convolutions (discriminator) and partial stridden convolutions (generator). • Making use of batch norm in the generator and discriminator. • Complete removal of related secret layers for constructions with greater depth.
Generate Artificial Human Faces with Deep Convolutional Generative …
47
Fig. 1 Generative adversarial networks (GAN)
• Including ReLU implementation in the generator for all layers aside from the outcome, which makes use of tanh. • Activating Leaky ReLU in the discriminator for every layer. DCGAN continues to be a useful place to start a new project until we figure out the bottleneck and understand how to train GANs more effectively. One of the wellliked and effective GAN network designs is DCGAN. It mostly consists of convolution layers without any completely connected or max pooling layers. For the downsampling and the upsampling, it employs transposed convolution and convolutional stride. The generator’s network design is shown in the image below (Fig. 2).
Fig. 2 Deep convolutional generative adversarial networks (DCGAN) for generative model [2]
48
C. kaushik and S. N. Singh
2 Related Work Mohana et al. [3]: The DCGAN model has been created using the CelebFaces Attributes Dataset (CelebA). For a quantitative evaluation of the created DCGAN model, the Underlying Similarity Index (SSIM), which measures the main and context-oriented comparability of two images, has been used. The nature of the created photographs is quite similar to the top-notch pictures in the CelebA dataset, according to the results. Silva Costa et al. [4]: To prepare, DCGAN used a variety of CelebA public dataset samples that were obtained from facial credits, such as cheekbone projection, face shape, and eyebrow thickness. Results indicate that. Facial attributes significantly reduce the pictures’ between-changeability, especially the face shape; (2) more modest pictures’ between-inconstancy prompts a more modest required preparing dataset; and (3) separating the DCGAN’s preparation datasets using the oval face shape prompts a required dataset almost half less than without sifting for creating humanely satisfactory pictures. Liu et al. [5]: This model uses two sub-encoders to first plan a given face to an age-restrictive and individual inert vector. By incorporating these two vectors into the generator, stable and photographically acceptable face images are then created by preserving customized face features and altering aging conditions. The target work in this research replaces the antagonistic loss of GANs with perceptual similitude misfortune. The examination findings demonstrate that face images created using the approach appreciate more accuracy and sincerity in light of the present face data collection. Fang et al. [6]: In light of the convolutional brain structure and significant convolution generative maladaptive networks, provide a different motion acknowledgment calculation. They achieve excellent results when using this technique for text yield, estimate, and articulation acknowledgment. The results show how the suggested technique may get superior motion arrangement and discovery impacts while preparing the model to react to fewer examples. Additionally, this motion recognition technique is less vulnerable to light and building blockage. Tingfei et al. [7]: The suggested computation consolidates the convolution theory and designs a system of organization to complete the inpainting of the ISAR image Bhargav et al. Bhargav et al. [8]: To generate data for the previously constructed generative model and determine an inactive space depiction, a straight regressor is used to analyze the fMRI data. The regressor is created using a composite misfortune work that combines the Perceptual and Multi-Scale Structural Similarity Index (MS-SSIM) misfortunes. Eunji Kim et al. [9]: Hybrid face forensics framework based on a convolutional neural network combining the two forensics approaches to enhance the manipulation detection performance. To validate the proposed framework, we used a public Face2Face dataset and a custom DeepFake dataset collected. Throughout class activation map visualization, the proposed framework provided information on which face parts are considered important and revealed the tempering traces invisible to the naked eyes. Han et al. [10] image to-picture GANs, which may orchestrate sensible/ various extra preparation pictures to complete the information need the real picture conveyance in this circumstance. However, no investigation has made conclusions
Generate Artificial Human Faces with Deep Convolutional Generative …
49
public, despite the constant demand from one GAN to another for more funding for the exhibition. Liang et al. [11], PGSgan may incrementally absorb US images from low to high aim by simply developing both the generator and discriminator. Comprehensive perceptual analysis, client research, and division results highlight the promising appropriateness and efficacy of the suggested PGSgan by combining ovarian and follicle US photos. Heo et al. [12]: As a result, the technique of organically darkening from a dark white drawing might find practical use. They suggest using U-Net and the deep convolutional generative adversarial network (DCGAN) in the generative model to implement programmed sketch colorization.
3 Methodology Generator and Discriminator are the two Networks of Generative adversarial Networks (GAN) (Fig. 3).
3.1 Experimental Setup Get data to train adversarial networks, and obtain data using the CelebFaces Attributes Dataset (CelebA). Preliminary data: Each CelebA image has been cropped to any facial features before being reduced in size to 64 × 64 × 3 NumPy images. Prepare the data and load it: Each CelebA image has been cropped to exclude any facial features before being reduced in size to 64 × 64 × 3 NumPy images. The very huge CelebA data are broken down into smaller chunks in this pre-processed dataset. The pre-processed data being visualized serves as a tool for visualizing the data’s photos. Model A discriminator and a generator are the two adversarial networks that make up a GAN. Discriminator Convolutional classifier is used here. We should define this using normalization, it is suggested. Tensor images measuring 32 × 32 × 3 are used as the discriminator’s inputs. Whether an image is real or fake will be indicated by a single value in the output. Generator: The generator must upsample an input to
Fig. 3 Basic block diagram of generative adversarial network (GAN) [13]
50
C. kaushik and S. N. Singh
produce a brand-new image that is 32 × 32 × 3 in size, the same size as training data. The majority of these should be transposed convolutional layers with output normalization. The generator receives vector inputs of length z size. The result is a 32 × 32 × 3-pixel picture. Initialize the weights to networks: All weights were initialized from a zero-centered normal distribution with a standard deviation of 0.02. Losses for discriminators: The total loss for the discriminator is given by loss = d real loss + d fake loss. Generator loss: With the labels reversed, the generator loss will still appear identical. The generator wants the discriminator to believe the images it produces are real. Training Alternating between training the discriminator and the generator will make up training. Alternating between actual and phony images can help train the discriminator. The generator should then have an opposite loss function to deceive the discriminator (Fig. 4).
Fig. 4 Workflow
Generate Artificial Human Faces with Deep Convolutional Generative …
51
3.2 Dataset Description The CelebFaces Attributes Dataset (CelebA) [14] includes more than 200 K images of famous people with 40 descriptions of each attribute. The images vary from ludicrous postures to ferociously disordered foundations. This dataset is fantastic for developing and testing facial recognition algorithms since the images include a wide range of persons, backgrounds, and postures. It can identify people who are grinning, have earthy-colored hair, or are wearing glasses. Overall, there are 202,599 pictures of famous people’s faces. • 10,177 distinct identities, but we don’t know who they are • 5 landmark locations; 40 binary attribute annotations per picture. Dataset preparation: Every CelebA photograph has been cropped to exclude any facial features before being reduced in size to 64 × 64 × 3 NumPy images. A tiny portion of the extremely huge CelebA data set makes up this pre-processed dataset. Using DataLoader, batch the neural network data. batch size: The number of photos in a batch; the size of each batch. DataLoader with batched data is returned. Parameter data-dir: Directory where image data is placed. Parameter Img size: The square size of the image data (x, y) is 0.32 for batch size; image size: 32.
3.3 Model Description Network Discriminator: The discriminator will be described. The convolutional classifier is used here. It is advised that we standardize how we describe things. Tensor images measuring 32 × 32 × 3 from the discriminator’s contributions. The outcome is a single value that will show if a specific picture is real or fake. Complete Discriminator Architecture Sequential (conv1): ((0): Conv2d(3, 32, kernel size = (4, 4), stride = (2, 2), pad ding = (1, 1), bias = False)Sequential (conv2): Conv2d (32, 64, kernel size = (4, 4), stride = (2, 2), pad ding = (1, 1), bias = False (conv3): Conv2d (64, 128, eps = 1e-05, momentum = 0.1, affine = True, track running stats = True) (Fc): Linear (in features = 2048, out features = 1), Network GeneratorsThe generator should upsample an information and create a second image that is the same size as prepared information, which is 32 × 32 × 3. Convolutional layers with results standardization should be used for the majority of this. Vectors of at least z-size bytes in length are the contributions to the generator. An image of shape 32 × 32 × 3 is the result. Rectified Linear Unit, sometimes known as ReLU, is a kind of activation function. It is described mathematically as y = max (0, x). The most used activation function in neural networks is ReLU. Forward propagation of the x-parameter in the neural network: The neural network’s input results in The result is a 32 × 32 × 3 tensor picture. Final Generator Architecture(Fc): Linear (in features = 100, out features = 2048, bias = True) (deconv1): Sequential ((0): ConvTranspose2d (128, 64, kernel size = (4), stride(2), padding(1,‘1), bias = False) BatchNorm2d (32, eps = 1e-05, momentum = 0.1, affine = True, track running stats = True) (deconv2):
52 Table 1 Hyperparameters
C. kaushik and S. N. Singh
Hyperparameters
Value determined
Batch size
32
Epochs
2,5,25,40
Optimizer
Adam
Activation function
Leaky ReLu, ReLu, Tanh
Image size
32
Sequential ((0): ConvTranspose2d (32, 3, kernel size = (4, 4), stride = (2, 2), padding = (1, 1), bias = False) (Table 1).
4 Results Plotting the training loss for the generator and discriminator, recorded after every epoch, and calculating the PSNR, which measures how well an image may be represented in comparison to the amount of noise that can decrease the quality of that representation, and SSIM, which evaluates the structural and contextual similarity between two images. The network has completed 40 epochs, and the findings are as follows (Figs. 5, 6, 7, and 8) (Table 2). The average and minimum losses are: Loss of Discriminator on Average: 0.8556 Average Generator Loss: 7.0655 Loss of Disc Minimum: 0.0249 Minimum Gen Loss: 1.5849 Average PSNR: 30.165
Fig. 5 Training losses at epochs 2 and generated image
Generate Artificial Human Faces with Deep Convolutional Generative …
53
Fig. 6 Training losses at epochs 5 and generated image
Fig. 7 Training losses at epochs 25 and generated image
Fig. 8 Training losses at epochs 40 and generated image Table 2 Represents SSIM, PSNR, and training losses for generator (G_losses) and discriminator (D_losses) Epochs
SSIM
PSNR
G_losses
D_losses
2
0.2830165128736583
29.446343524433757
1.906
1.4403
5
0.6863092246921659
30.219892653951767
1.6942
0.6678
25
0.7023511029633411
30.165919160017193
5.6546
0.4327
40
0.8043412731964944
30.135881631772975
12.1777
0.1466
54
C. kaushik and S. N. Singh
5 Future Scope and Conclusion The surge in interest in GANs is due to both their capacity to produce a high number of outcomes with unlabeled data as well as their capacity to turn latent data into relevant information. The generative model builds on an existing dataset and learns from it using this capability. CelebA dataset is used in the model to train the discriminator and generator, resulting in the necessary picture data. Advertising firms may utilize these images in their marketing materials with the belief that they can address diversity, might present themselves as “diverse” in marketing materials and commercials without actually making the troublesome measures to do so. Fake pictures can be used to market things to live people this sort of method may be used in a variety of situations to organically add to existing datasets. The future of artificial intelligence is unsupervised learning and is headed there. In general, generative models and GANs are both fascinating and confusing. They represent an additional step toward a society where artificial intelligence is more important. GANs have a wide range of uses, including the creation of realistic photographs, the generation of examples for image datasets, the translation of images from one format to another, the generation of new human poses, the aging of faces, the generation of face frontal views, the prediction of videos, the creation of 3D objects, etc.
References 1. Radford A, Metz L, Chintala S Unsupervised representation learning with deep convolutional generative adversarial networks. https://doi.org/10.48550/arXiv.1511.06434 2. Chicago/Turabian Style Suh S, Lee H, Jo J, Lukowicz P, Lee YO (2019) Generative oversampling method for imbalanced data on bearing fault detectionand diagnosis. Appl Sci 9(4):746. https://doi.org/10.3390/app9040746 3. Mohana, Shariff DM, A. H, A. D (2021) Artificial (or) fake human face generator using generative adversarial network (GAN) machine learning model. In: 2021 fourth international conference on electrical, computer and communication technologies (ICECCT). pp 1–5. https://doi. org/10.1109/ICECCT52121.2021.9616779 4. Da Silva Costa D, Moura PN, Garcia ACB (2021) Improving the human perception of GAN generated facial image synthesis by filtering the training set considering facial attributes. In: 2021 IEEE international conference on systems, man, and cybernetics (SMC). pp 100–106. https://doi.org/10.1109/SMC52423.2021.9659033 5. Liu X, Xie C, Kuang H, Ma X (2018) Face aging simulation with deep convolutional generative adversarial networks. In: 2018 10th international conference on measuring technology and mechatronics automation (ICMTMA). pp 220–224. https://doi.org/10.1109/ICMTMA.2018. 00060 6. Fang W, Ding Y, Zhang F, Sheng J (2019) Gesture recognition based on CNN and DCGAN for calculation and text output. IEEE Access 7:28230–28237. https://doi.org/10.1109/ACCESS. 2019.2901930 7. Tingfei W, Jingpeng G, Zhiye J (2021) ISAR image inpainting algorithm based on DCGAN. Int Symp Antennas Propag (ISAP) 2021:1–2. https://doi.org/10.23919/ISAP47258.2021.961 4545
Generate Artificial Human Faces with Deep Convolutional Generative …
55
8. Bhargav K, Ambika S, Deepak S, Sudha S (2020) Imagenation-A DCGAN based method for image reconstruction from fMRI. In: 2020 fifth international conference on research in computational intelligence and communication networks (ICRCICN). pp 112–119. https://doi. org/10.1109/ICRCICN50933.2020.9296192 9. Kim E, Cho S (2021) Exposing fake faces through deep neural networks combining content and trace feature extractors. IEEE Access 9:123493–123503. https://doi.org/10.1109/ACCESS. 2021.3110859 10. Han C, Rundo L, Araki R, Nagano Y, Furukawa Y (2019) Combining noise-to-image and imageto-image GANs: brain MR image augmentation for tumor detection. IEEE Access 7:156966– 156977. https://doi.org/10.1109/ACCESS.2019.2947606 11. Liang J, Yang X, Li H, Wang Y, Van MT, Dou H, Chen C (2020) Synthesis and edition of ultrasound images via sketch guided progressive growing GANS. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI). pp 1793–1797. https://doi.org/10.1109/ISB I45749.2020.9098384 12. Heo H, Hwang Y (2018) Automatic sketch colorization using DCGAN. In: 2018 18th international conference on control, automation and systems (ICCAS). pp 1316–1318 13. Bharath https://blog.paperspace.com/complete-guide-to-gans/Complete guide to generative Network” 14. Jessica li “Celeb Attributes (CelebA) Dataset” kaggle datasets download-d jessicali9530/ celeba-dataset.
Robust Approach for Person Identification Using Three-Triangle Concept Nikita kumari, Ashish Mishra, Aadil Shafi, and Rashmi Priyadarshini
Abstract Managing a large volume of data in today’s fast-paced environment is a challenging feat. As with the increasing number of data breaches, there are concerns about data security and authentic use. This landed us on the significance of metrics relating to a person’s personal identity. The personal attributes cannot be stolen and are very simple to deploy. This study uses an individual’s facial features to permit his/her presence in a specific location, in this case, university premises. The interesting aspect of this system is that it is subjected to use in association with a passive infrared sensor (PIR) to detect human presence. The improved face recognition system employed in the work poses a new algorithm that incorporates a combined approach for extracting ear and geometrical features of the face using the threetriangle concept. By keeping a student database along with the participation record in classrooms, the implementation of a data handling system using a cost-effective, interwoven recognition technology with KNN as the classifier is demonstrated. As a result, an innovative framework for the potential consumers is devised with reduced memory size and error rate. Keywords Ear recognition · Facial characteristics · PIR sensor · KNN classifier · Three-triangle concept
N. kumari (B) · A. Mishra · A. Shafi · R. Priyadarshini Department of Electrical, Electronics and Communication Engineering, Sharda University, Greater Noida, UP 201310, India e-mail: [email protected] A. Mishra e-mail: [email protected] A. Shafi e-mail: [email protected] R. Priyadarshini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_6
57
58
N. kumari et al.
1 Introduction Any biometric trait [1], whether psychological or behavioral, should have the four characteristics to examine while evaluating it: Individuality, Measurable, Universal, and Existence. Individuality is when no two people should have the same attribute. A measurable characteristic is defined to be a trait that can be measured numerically. Universality is considered a feature that should be present in everyone. Existence is associated with a feature that does not change over time. Furthermore, there are issues with lost pixels during image restoration in the iris recognition [2] and accepting input from users with dirty or damp hands in fingerprint technology [3]. However, the trade-off between accuracy rate and the fewest problems led us to facial biometrics. Face recognition offers a reliable option for more accurate outcomes. While the shape of a face can vary with the shedding of skin and passage of time, the distance between the localization sites has a minimal chance of changing. Between the ages of 14 and 58, and in some cases up to 60, it remains stable. This offers recognition technology a new insight and to take advantage in that it can detect this property and use it to create secure and reliable systems. The comparative study of different facial pattern identification techniques [4] demonstrates that hybrid approaches enhance the system’s performance. Many old methodologies can be combined with cutting-edge techniques to boost system’ precision and robustness. The accuracy and performance rate improved by proposing a hybrid model of the face recognition trait of an individual with the ear characteristic. As a result, the suggested model is reliable and incorporates the Haar Algorithm with a blend of the KNN classification method. The contributed work results in the reduction of the memory size by limiting the number of face localization points while maintaining the system’s accuracy.
2 Literature Review The reviewed models from past work have fewer limitations. The scarf-covered faces do not cooperate with identification technology and do not provide reliable results. It is necessary to train the system under various images posed to conditions like occurrence, lighting, and orientation during the training of the dataset. The paper [5] uses only high-quality photos for detection and so makes only correct predictions. Some of the datasets are too small [6, 7]. To obtain reliable predictions, the system must be trained under standard databases. All of the issues listed in Table 1 below are resolved in our proposed effort, and we devised a system that is both productive and efficient. It employs the ESP32 Cam module, which has an embedded (1600 * 1200 high quality) OV2640 camera module, a slot for storing captured images on an SD card up to 4 GB, and a Wi-Fi and Bluetooth chip. The system is extremely adaptable, due to a novel face recognition
Robust Approach for Person Identification Using Three-Triangle Concept
59
algorithm that blends OpenCV with an ear recognition method to boost the system’s performance. The Haar method, which is very proficient, is used in designing the system.
3 Methodology The research is performed under various tasks: Task 1—To build a face detection system for testing equipment and the base code. Task 2—Embedding PIR Sensor Module into the existing circuitry. Task 3—Adding an additional algorithm in the previous code of face detection.
3.1 Block Diagram of Recommended System See (Fig. 1).
3.2 Algorithm Used Novel Approach (Concept of Three-Triangle Formation and Reduced Localization Points) The ear is a distinctive attribute that is identified with the individual and does not change over time. The distance between the top lobe of the ear and the frontal nose, the inner lobe of the ear to that frontal nose, and the distance between the lobes form a triangle. Between the age of 14 and 58, the distance, or side length of a triangle, is nearly constant. Localization points [15] that create the triangle are indicated in the figure below. Because the recognition method is no longer only based on face points/nodes, this unique feature minimizes the face localization points. Furthermore, because of the participation of the ear characteristic, the system is more competent. The combined strategy significantly decreases the memory size as well as increases the system performance (Fig. 2). For Ear Recognition Pre-processing → Ear Detection → Edge Detection → Post-processing → Feature extraction c classifier. Pre-processing—The Gaussian filter is used to reduce noise from images and obtain a smoothed image for detection purposes. The used Gaussian equation:
60
N. kumari et al.
Table 1 Previous work done in the area (2003–2022) Year
Trait used
Description
2022
Hybrid
Proposed system: 98% A highly competent model that combines ear and facial features to set a new standard in recognition technology, culminating in a system that is flexible, cost-effective, and highly accurate Features: • Cost-effective • Highly accurate as uses the fusion of face and ear traits • Reduction in memory size • Easy to deploy • No security issues
Accuracy
2022
Hybrid [8] (Face + Ear)
Technique-PCA (face) 99.25% and ICP (ear) Software-MATLAB Database-FRGC database (length = 30 persons)
• Costly as uses 3D scanner • Not flexible: as not deployed in light devices like mobiles
2021
Face [9]
Technique-Neural Effective networks Classifier-Linear SVM Library-OpenCV Range-PIR sensor is used to detect the range between door and person Description: • Adaptive and low-cost surveillance system that uses standard UBS camera to distinguish between relatives and intruders by use of facial characteristics
• Large time required to train the dataset • SVM not suitable for large dataset • USB cameras can run only for Android applications
2020
Face [10]
Technique-Haar-cascade [11] Classifier-AdaBoost classifier Range-50–70 cm Database-Own database
• Sensitive to quality of dataset • Dependency on outliers
96.4%[9]
Challenges offered
(continued)
Robust Approach for Person Identification Using Three-Triangle Concept
61
Table 1 (continued) Year
Trait used
Description
Accuracy
Challenges offered
2019
Face [12]
Technique-PCA Database-Tested under standard databases with noisy images (Yale B, Caltech, etc.) Approach-2D face recognition with eigenvector approach Description: • If intruder is not recognized by system but belongs to premises, the operator will send a password, which can be inserted via the keypad to unlock door, and till then alarm will ring
Excellent (can also detect poor quality images
Not Flexible: • Due to trouble caused by the ethernet cable physically • System setup along with monitor and keypad is not movable
2012
Ear [6]
Technique-Geometrical feature extraction Software-MATLAB Database-Own database (length = 30)
90%
• Sensitive to variations in image • Small database
2010
Ear [7]
Technique-Moment invariant technique along with back propagation neural network Database-Own database (length = 60)
91.8%
• Small database
2006
Ear [13]
Technique-Haar-wavelet 96% Calculation-Hamming distance Database-Standard database (length = 950) Description: • Coefficient matrices are calculated of the wavelet to form the feature template
• Large matrices
2005
Ear [5]
Technique-Feature No error with extraction using contour such an easy detection dataset Database-Standard database (length = 240 images) Description: • Only the high-quality, simple images were used for testing
• Erroneous curve detection
(continued)
62
N. kumari et al.
Table 1 (continued) Year
Trait used
Description
2004
Ear [14]
Technique-Edge based ear 85% recognition Classifier-Back propagation learning Range-2 m away from camera Database-Own database (length = 77)
Accuracy
Challenges offered • Sensitive to noisy data • Accuracy is
Fig. 1 Block diagram of the proposed system
Fig. 2 Reduced localization points to 31
G(x, y) =
−(x 2 +y 2 ) 1 2σ 2 ex p 2π σ 2
where (X,y) Are Pixel Coordinates and σ is Standard Deviation
(1)
Robust Approach for Person Identification Using Three-Triangle Concept
63
Distance of Sides of Triangle—The Euclidean Distance (D) is used to compute the distance between two pixels. D{(x1 , y1 ), (x2 , y2 )} =
/
(x1 − x2 )2 + (y1 − y2 )2
(2)
Classifier—The highly accurate classifier, i.e., K-Nearest Neighbor (KNN) [16, 17, 18] with a minimum absolute difference. D(x, y) =
N
|xi − yi |
(3)
i=1
Implementation of Algorithm Image Processing Library OpenCV provides the base of our algorithm. Object detectors [19] based on the Viola–Jones architecture are included in OpenCV. The majority of them are experts in dealing with the frontal face pattern and its interior components, such as the eyes, nose, and mouth. We focus on ear pattern identification in this study, especially when the image has a head profile or virtually profile view. The goal is to develop real-time ear detectors using OpenCV’s generic object detection framework. The performance obtained after training the classifier to recognize left ears, right ears, and ears overall is accurate. Installing ESP32 Cam Library in Arduino IDE On the ESP32 microcontroller, the esp32cam library offers an object-oriented API for using the OV2640 camera. Platform Used We used Visual Studio Code [20] to write the code since it has an excellent code editing environment, and we used Python for facial identification. When a person enters the room, the PIR sensor [21] detects his or her presence and the Camera Module turns on. To capture an image, a person must place his face in front of the ESP32 Cam module as shown in Fig. 3. The image is then compared with the image recorded in the database by the controller module, which is the Arduino UNO. 1. If a person’s image is found, the database is then updated to include the student’s information. 2. If not, the image is saved as “unidentified” in an unknown folder (Fig. 4).
64
Fig. 3 Concept of three-triangle formation
Fig. 4 3D view of proposed model
4 Circuit Layout See (Fig. 5).
5 Interfacing of Components See (Fig. 6).
N. kumari et al.
Robust Approach for Person Identification Using Three-Triangle Concept
65
Fig. 5 Schematic of hardware
Fig. 6 Assembling and interfacing of components (Hardware section)
6 Experimental Results Firstly, we tested the system under our class strength of 28 subjects. Each student’s data is saved on the Cam module’s SD card. As illustrated in Fig. 7, the automatically created Excel sheet contains the student’s name, student ID, and time of entrance data. Then we put our algorithm to run on IIT Delhi’s standard database, which has three ear pictures for every 50 users. We took two images for the training dataset and one image for testing throughout the algorithm’s training phase. Consequently, 100 samples for the training dataset and 50 samples for the testing dataset are taken. The system is extremely efficient, with a 98% accuracy rate. We compared our work to past approaches and based on Table 1, we came up with the conclusion displayed in the Pie Chart (Fig. 8).
66
N. kumari et al.
Fig. 7 Data record in Excel sheet
The previously exhibited pie chart illustrates how the system performs when several incorporated algorithms are used. According to the statistical study, the hybrid approach enhances system robustness. The diagram indicates that the accuracy of the system progressively improves when ear traits are combined with edge-based and backpropagation classifiers. However, the figure shoots when the suggested model, which combines the Haar classifier and KNN with facial and ear lobes is implemented.
Robust Approach for Person Identification Using Three-Triangle Concept
67
Fig. 8 Comparison chart of different techniques and traits based on performance rate [8]
7 Conclusions The trade-off between time, speed, security, false recognition rate, and accuracy has landed us on this approach. The face detection, capturing, and processing System is designed successfully. The presence of a person at the entry gate can be detected by the PIR sensor, which activates the camera system and records an image of an individual. The captured image is compared to the database and an Excel sheet with the database’s details is generated as a result. The projected autonomous system consumes less memory and reduces the storage needed to access the system. The optimized algorithm is specifically designed to identify frontal face patterns by using fewer localization units. The PIR sensor that is included in the circuitry extends the system’s interoperability. The benefit is broadened further by comprising system protection against data breaches with continuous monitoring of any abrupt environmental change. Consequently, a feasible model is deployed without compromising the device’s overall performance.
68
N. kumari et al.
8 Future Scope The proposed system is quite useful since it can be employed for security applications. Each of the other biometric features comes with its own set of disadvantages. The flaws discovered with each deployment of different biometric traits are thoroughly examined [22]. Iris recognition has security challenges, as it can only be deployed in advanced security applications and has problems with pixels being lost during image resolution. Signatures and fingerprints [23], on the other hand, do not always yield reliable findings. For identification [24] purposes, the hybrid method, which combines face and ear features [25, 26], is particularly effective. The system has several benefits, including reduced system memory size due to the inclusion of the Ear attribute, which reduces localization points while ensuring system accuracy. The system is tested with covered ear lobes and only three-quarters of the face exposed on the identifying screen. The system also works phenomenally well in a variety of angles and lighting environments. However, the technique could be refined in the future to recognize faces hidden behind half covering. When half of the face is hidden, the system fails to acknowledge the individual. Author(s) Contributions The project is being undertaken under the supervision of Dr. Rashmi Priyadarshini, the guide of the project. Her consistent monitoring of work progress and evaluation regularly directed the team’s significant efforts in the right direction. Further gratitude extends to Sharda University for providing the necessary laboratory for the assembly and testing of the circuitry. As a co-author, Ashish Mishra contributed to the interfacing section of components. Meanwhile, Adil Shafi assembled the components and tested them for smooth operation before incorporating them into the final system. Effective integration resulted in data gathering and testing under variable circumstances to train the model for unpredictability.
References 1. Alrahawe EA, Humbe VT, Shinde GN (2019) An analysis on biometric traits recognition. Int J Innov Technol Explor Eng 8:307–312 2. Sallam A, Amery H, Qudasi S, Ghorbani S, Rassem TH, Makbo NM (2021) Iris recognition system using convolutional neural network. In: International conference on software engineering & computer systems and 4th international conference on computational science and information management. ISBN: 978–1–6654–1407–4 3. Drahansky M, Kanich O, Brezinova E (2017) Challenges for fingerprint recognition—spoofing, skin diseases, and environmental effects. In: Handbook of biometrics for forensic science. pp 63–83 4. Priyadarshini R, Kumari N, Mishra A, Mishra A (2022) A review on various face recognition techniques and comparative analysis based on their performance rate. Sci, Technol Dev J 11:325–332 5. Choras M (2005) Ear biometrics based on geometrical feature extraction. Electron Lett Comput Vis Image Anal, 84–95 6. Jawale J, Bhalchandra AS (2012) The human identification system using multiple geometrical feature extraction of ear–an innovative approach. Int J Emerg Technol Adv Eng 2(3) 7. Wang X, Xia H, Wang Z (2010) The research of ear identification based on improved algorithm of moment invariants. In: Proceedings of third international conference on information and computing. pp 58–60
Robust Approach for Person Identification Using Three-Triangle Concept
69
8. Tharewal S, Malche T, Tiwari P, Jabarulla M, Alnuaim A, Mostafa AM, Ullah M (2022) Scorelevel fusion of 3D face and 3D ear for multimodal biometric human recognition. In: Comput Intell Neurosci. pp 1–9 9. Shetty AB, Bhoomika D, Ramyashree J (2021) Facial recognition using Haar cascade and LBP classifiers. Proc Glob Transit 2:330–335 10. Surve M, Joshi P, Jamadar S, Vharkat M (2020) Automatic attendance system using face recognition technique. Int J Recent Technol Eng 9:2134–2138 11. A. P, D. C, R. Agarwal, Shrivastava T, K (2022) Face detection using Haar cascade classifier. In: Conference of Advancement in Electronics & Communication Engineering. pp 593–599 12. Rakesh G, Kora P, Swaraja K, Meenakshi K, Karuna G (2019) Gsm based face recognition using Pir sensor on raspberry Pi 3. Int J Innov Technol Explor Eng 8:329–332 13. Sana A, Gupta P, Purkait R (2006) Ear biometrics: a new approach. Adv Pattern Recognit, 46–50 14. Mu Z, Yuan L, Xu Z, Xi D, Qi S (2004) Shape and structural feature based ear recognition. In: Li SZ, Sun Z, Tan T, Pankanti S, Chollet G, Zhang D (eds) Advances in biometric person authentication, LNCS, vol 3338. Springer, Berlin, Heidelberg, pp 663–670 15. Alirezaee S, Aghaeinia H, Faez K, Askari F (2006) An efficient algorithm for face localization. Int J Inf Technol 12:30–35 16. Ali N, Neagu D, Trundle P (2019) Evaluation of k-nearest neighbor classifier performance for heterogeneous data sets. J SN Appl Sci 17. Amato G, Falchi F (2011) On kNN classification and local feature based similarity functions. In: Filipe J, Fred A (eds) Agents and artificial intelligence, vol 271. Springer, Berlin, Heidelberg 18. Kang M, Kim J (2007) Real time object recognition using K-nearest neighbor in parametric eigenspace. In: Li K, Fei M, Irwin GW, Ma S (eds) Bio-inspired computational intelligence and applications, LSMS, vol 4688. Springer, Berlin, Heidelberg 19. Santana C, Navarro J, Sosa D (2011) A study on ear detection and its applications to face detection. In: Proceedings of advances in artificial intelligence. pp 1–10 20. Amann S, Proksch S, Nadi S, Mezini M (2016) A study of visual studio usage in practice. In: International conference on software analysis, evolution and reengineering. pp 124–134 21. Kolluru PK, Vijaya KS, Marouthu A (2021) IoT based face recognition using PIR sensor for home security system. Turk Online J Qual Inq 12:1042–1052 22. Kumari N, Priyadarshini R (2022) A comprehensive synopsis of scanning technology with its application in framing of an IOT-based, autonomous data handling system. Int J Sci Res Eng Dev 5:1362–1369 23. S. M., S. G., Narayan L, Manju (2020) Fingerprint recognition and its advanced features. Int J Eng Res & Technol 9 24. Priya B, Rani M (2020) A multimodal biometric user verification system with identical twin using SVM 2. Int J Recent Technol Eng 8:118–122 25. Anwar AS, Ghany KA, Elmahdy H (2015) Human ear recognition using geometrical features extraction. Int Conf Commun, Manag Inf Technol, 529–537 26. Sharkas M (2022) Ear recognition with ensemble classifiers; A deep learning approach. J Multimed Tools Appl (2022) 27. Chang K, Bowyer KW, Sarkar S, Victor B (2003) Comparison and combination of ear and face images in appearance-based biometrics. IEEE Trans Pattern Anal Mach Intell 1160–1165
COVID-19 Disease Detection Using Explainable AI Vishant Mehta, Jai Mehta, Pankti Nanavati, Vighnesh Naik, and Nilkamal More
Abstract With the advent of technologies like Artificial Intelligence, Machine Learning and Deep Learning, there has been unfathomable progress across all the fields. This progress has produced equally important results in the medical field, where these technologies are being used for diagnosis, identifying healthcare requirements and providing the necessary solutions with greater accuracy. These technologies have particularly shown highly accurate results in medical imaging, and the results are also extremely sensitive to the minute details while processing the information. Although we have achieved such overwhelming headway, when it comes to the layman, these results become extremely difficult to analyse and always require professionals for help. The entire process of making predictions using these technologies is covered under a black box, which needs to be made comprehensible for everyone. To make this information understandable, there is another concept arising known as Explainable AI. Keywords COVID-19 · Support vector machines (SVM) · Explainable AI (XAI) · Convolutional neural network (CNN) · ResNet50 · Medical field
V. Mehta (B) · J. Mehta · P. Nanavati · V. Naik · N. More K J Somaiya College of Engineering, Mumbai 400077, India e-mail: [email protected] J. Mehta e-mail: [email protected] P. Nanavati e-mail: [email protected] V. Naik e-mail: [email protected] N. More e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_7
71
72
V. Mehta et al.
1 Introduction In 2019 began one of the most dreadful pandemics, the COVID-19 pandemic. The outbreak of the coronavirus, starting from a laboratory in China, slowly engulfed the entire world. There has been an official claim of 5 million lives taken by this pandemic, and the actual estimate is around 17 million. During this gruesome period, a large number of medical professionals were losing their lives, making it even more difficult to fight the pandemic and curb its spread. Adding to this situation, hospitals were full of infected patients, and hence people were scared to visit hospitals for diagnosis of their health, in spite of having severe symptoms. This made us realise that we need advanced healthcare systems that can help us in combating such situations. Not only do we need such systems, but also, they should be such that they are easy to understand and can be used by anyone, i.e. from a layman to a professional, so that they can reach the highest number of individuals. To meet this requirement, we decided as a group to create a system that will allow people to diagnose themselves at home. All they require is an image of a CT scan of their chest showing their pair of lungs. Our system would take this image as an input, preprocess it and pass it through a classification algorithm to produce a classification output as either COVID-19 positive or COVID-19 negative. Additionally, our system would show another image to the user that would show the areas influencing the result of classification if the person is classified as COVID-19 positive. This part of the project is the most significant part, as authors aim to create a system that produces an output that is explainable. We started the implementation by taking a dataset containing a rich number of CT scan images of the chests of patients. The dataset was balanced and contained approximately an equal number of images of COVID-19 positive or COVID-19 negative patients. Following that, we preprocessed the dataset to extract as much information as possible from the images for use in training the classification model. After the preprocessing, we come to the step where we have to select the classification model. For this selection, we tried using various algorithms like Support Vector Machines, Convolutional Neural Networks and the RESNET (50) model with added dropout and max pooling layers. Finally, after choosing the appropriate model, we came to the Explainable AI part, where we were supposed to make the output explainable. For this part of the project, we tried out the technologies LIME and Grad-CAM. The results of various trials and the outputs of all the implementations will be discussed in the paper in further detail.
2 Explainable Artificial Intelligence Artificial Intelligence refers to building smart machines, i.e. machines that incorporate intelligence similar to that of humans and are able to think, learn and perform actions like humans. Over the years, artificial intelligence has grown in significant importance.
COVID-19 Disease Detection Using Explainable AI
73
The three principles/pillars of Explainable AI The three main principles or the pillars of explainable AI are transparency, interpretability and explainability. – Transparency—Transparency has become a very important aspect for all clients. The system should be able to explain how it has arrived to a particular solution. So, the system should be basically be a white box as opposed to a black box. It should be able to explain the reasoning and the things taken into consideration before coming to a decision. The user should be able to understand the working of the system so as to trust the decision of the system [1]. – Interpretability—Interpretability is a key aspect which should be considered while designing artificial intelligent models. The developers develop a model which is simple and easy to explain to the clients. This results in removal of complicated tools from the developer’s toolkit. In other words, the system that can successfully explain which features led to a particular output or decision by simply looking at the output is called interpretability [2]. – Explainability—Explainability is a core aspect of artificial intelligence. The outputs to particular inputs should be understandable by the users. This concept is essentially known as white box which contrasts with the concept of black box wherein even the designers and developers cannot tell why a particular artificial intelligent system arrived at a particular decision. This can be visualised through Fig. 1. The explainable artificial models provide explanations for the outputs that provide users with more information about the decision that has been made [2]. Explainable AI in Health care Explainable Artificial Intelligence has numerous application ranging from health care to manufacturing. There are various AI-powered medical solutions coming up which can help in minimising the repetitive tasks and save time and energy of the doctors as well as patients. However, the major dis-
Fig. 1 Black box versus white box
74
V. Mehta et al.
advantage is that the doctors are unable to explain how the outcome was derived to the patients. This tends to affect the trust of the patients towards the AI-driven technology. There was a study conducted to examine the trust of people on AI-powered technology. The people were asked to take a free assessment of their stress level. When they were informed that the test would be conducted by a human doctor, approximately 40% of the people took the test. And when the people were told that the test would be taken by algorithm, only 25% people took the test. This concludes of the lack of trust among the people for the AI-powered methods. Hence, it becomes imperative to include explainable AI to health care. It would only lead to increase in the trust of the patients, decrease in the errors and increase in the accuracy. We have used LIME (local interpretable model-agnostic explanations) method for Explainable AI which focusses on local interpretation rather than global interpretation. It changes a data sample by disturbing/changing the feature values and then observes/understands the changes on the output. So, basically it uses the new perturbed dataset to understand the relationship between the feature and the prediction by training it on some explainable model (e.g. linear model). Hence, the model is able to find the important features for explanation of a particular prediction.
3 Dataset Description The dataset consists of CT scan images of COVID-19 patients’ chests. The dataset images are prelabelled as COVID-19 positive and COVID-19 negative. There are a total of 1252 CT scans of patients that are positive for SARS-CoV-2 infection, i.e. COVID-19, and 1230 CT scans of patients that are negative, i.e. non-infected by SARS-CoV-2. Hence, there are a total of 2482 CT scan images of patients, consisting of both infected and healthy patients. This dataset has been collected from real patients in hospitals in Sao Paulo, Brazil [3]. The aim of the dataset which is collected is to identify whether a person is infected by COVID-19 through the analysis of his or her CT scan image of the chest. A good dataset requires that there be a good balance between the different types of data. Our dataset confirms this norm as well. Our dataset consists of approximately an equal number of healthy and infected patients’ CT scans, thereby aiding our model to perform well in classifying and also reducing the number of false positives and false negatives.
COVID-19 Disease Detection Using Explainable AI
75
4 Approach to the Proposed System 4.1 Support Vector Machine After preprocessing, the first algorithm authors implemented is the Support Vector Machine (SVM) algorithm. It is a supervised machine learning algorithm used to solve classification problems. It is based on the concept of classifying the data on the basis of finding a hyperplane in an N-dimensional space. In the model trained, we used the Support Vector Classification (SVC) based on the SVM library from the module sklearn by Scikit-learn. While implementing, two parameters were set: the kernel function that we used and the kernel coefficient gamma used for non-linear hyperplanes. We set the kernel function as linear as it is relatively faster and yields great results in the majority of cases, and the parameter gamma as ‘auto’. The gamma parameter is the deciding factor about the scope of influence of each training example. When the trained model was tested against unknown data, we checked the accuracy score, which turned out to be 80.83 percent. Besides the accuracy score, we further checked the confusion matrix for the measures of sensitivity, precision and F-measure. For COVID-19 positive classification, the sensitivity, precision and Fmeasure were 0.83, 0.79 and 0.81, respectively, as shown in Table 1. For COVID-19 negative classification, the sensitivity, precision and F-measure were 0.73, 0.83 and 0.81, respectively, as shown in Table 2. In medical imaging, sensitivity is an important measure, and since we were not satisfied with the value of sensitivity and accuracy, we decided to continue training a model using another algorithm.
4.2 Convolutional Neural Networks Convolutional neural networks are a class of deep neural networks that are able to capture the spatial dependencies in images. They solve the issues posed by regular neural network models like loss of spatial structure in two-dimensional images and intensive computation by applying convolution and pooling layers [4]. The convolution layer performs an operation called “convolution” for feature extraction. The main function of the pooling layer is to decrease the spatial size of the convolved image and to extract the features that are dominant. In the model implemented, we defined the number of training, testing and validation images in the dataset and set a common size for all the images. To speed up the training process, the image shape was set to (240, 240), which is a comparatively smaller size. We performed preprocessing to remove ambiguities and unwanted variations in the data and also carried out rescaling and shuffling. After preprocessing, CNN model was built using different layers such as convolution2D, dropout, pooling, flatten and dense layers [5].
76
V. Mehta et al.
The model was compiled and trained for 50 epochs. For COVID-19 positive classification, the precision, recall and f-1 score were 0.19, 0.51 and 0.28 respectively, as shown in Table 1. For COVID-19 negative classification, the precision, recall and f-1 score were 0.81, 0.50 and 0.62, respectively, as shown in Table 2. The testing accuracy of the model was found to be 66.53% which was not up to the mark. On trying out different sample CT scan image inputs for the prediction of COVID-19, many of the results were accurate, but a few of them turned out to be incorrect. The main issue with our CNN model was overfitting. Even after varying the parameters of the model and using different combinations, the problem still persisted. For these reasons, authors concluded that this model was not the right fit for our implementation, and hence our approach was changed and authors selected a different algorithm.
4.3 ResNet50 ResNet or Residual Neural Network is an artificial neural network that stacks residual blocks on each other to form a network. The ResNet50 model, which is a variant of ResNet, is a convolutional neural network that is 50 layers deep. It has 48 convolution layers along with 1 MaxPool and 1 AveragePool layer. The ResNet50 architecture is used for object detection, image classification, image segmentation and object localization. As Resnet works well for image classification as seen in the paper [6], we have used a ResNet50 layer pre-trained on more than a million images from the ImageNet database. First, the CT scan images are resized to make all the images of the same shape. Then, after normalising the image data, the dataset is split into training and validation data. To improve the performance of the model, we have used data augmentation using the keras’ ImageDataGenerator class. This class rotates, shifts and zooms the images, which create different variations to train the model. Along with the ResNet50 layer, the model has a combination of GlobalAveragePooling2D, Dropout and normalisation layers with the output being a dense layer with two nodes and softmax activation. As we had binary classification, we used the categorical cross-entropy loss function and Adam as our optimiser. We have used the ReduceLROnPlateau callback function, which reduces the learning rate once the validation loss has stopped improving. After 50 epochs, the model achieved an accuracy of 86% and a validation loss of 0.29. Performance metrics for COVID and COVID Negative images are listed in Tables 1 and 2. Compared with the other models, i.e. support vector and convolution neural network, ResNet50 performed significantly better. Thus, we will be explaining the output of ResNet using LIME.
COVID-19 Disease Detection Using Explainable AI
77
4.4 Implementation of Explainable AI When using a machine learning model for critical use cases like in the medical industry, it becomes essential to corroborate the output. This can be achieved by using Explainable AI to explain how the model reaches the result. In order to explain the outcome of the machine learning model, we have used the LIME algorithm. Local Interpretable Model-agnostic Explanations or LIME is model-agnostic, so they can be used on any machine learning model. Magesh et al. [7] uses LIME to incorporate explainability in their model for detection of Parkinson’s disease. LIME works by dividing the image into superpixels, using the quick shift algorithm from the skimage segmentation module. The result of the quickshift algorithm can be seen in Fig. 2. The next step is to generate perturbations by turning combinations of superpixels on and off. Perturbations are in the form of one-dimensional array consisting of 0s and 1s, with each index representing one superpixel. We have 200 such combinations, one of them is shown in Fig. 3. Using the ResNet50 model, the prediction probabilities for all the perturbation are predicted. The main aim is to see how the perturbations impact the output. We do that by training a linear model on perturbation array and the probability of the class we want to explain. Weights given to this model signify the importance of each perturbations. It is the distance of the perturbation from the original image. After fitting the linear model, it gives a coefficient for each of the superpixels, which states how much it impacts the classification result. Now from these coefficients, the most impacting superpixels are extracted and are displayed using a heatmap displayed over the original image. Pseudocode for LIME can be seen in Algorithm 1:
Fig. 2 Superpixels on CT scan
78
V. Mehta et al.
Fig. 3 Perturbations on CT scan
Algorithm 1 LIME image classification Select image img Divide img into superpixels Create 200 combinations of superpixels called perturbation Model ← Res N et50 I mageClass ← model. pr edict (img) for every perturbation P do pr ediction ← [] append( per tur bation, model. pr edict (image)) end for Model2 ← Linear Regr ession Model2. f it (P, pr ediction[I mageClass]) coe f f icient ← Model2.coe f Superpixel with highest coefficient impact the output of classification the most.
5 Proposed Methodology In order to select the model that best works on our dataset, we have trained three different algorithms: ResNet50, Support Vector Machine (SVM) and Convolutional Neural Network (CNN). These three models are widely used in image classification. Since performance metrics for SVM and CNN were not satisfactory, we decided to train the ResNet model. ResNet50 outperformed the other two models. Next comes the task of explaining the output of the ResNet model using the technique called LIME. As a part of the explanations provided by LIME, we get the superpixels that most affect the output of the model. These superpixels are converted into a heatmap and superimposed over the original image. A sample of output is shown in Fig. 4. These highlighted parts are the superpixels impacting the result of the classification.
COVID-19 Disease Detection Using Explainable AI
79
Fig. 4 Final output
Fig. 5 Flask upload page
We have implemented the concept on a flask application, where user will get explainable COVID-19 results on uploading CT scan image. Figure 5 shows the upload page and Fig. 6 illustrates the output of the application.
80
V. Mehta et al.
Fig. 6 Flask application output
6 Results On the COVID CT scan dataset, we have trained three models, i.e. Support Vector Machine (SVM), Convolutional Neural Network (CNN) and ResNet50, as discussed earlier. We have compared three models in terms of Precision, Recall and F-measure. For the COVID positive images, ResNet50 outperforms SVM and CNN in terms of Precision, Recall and F-Measure. The results are summarised in Table 1. Similarly, for non-COVID or COVID negative images, ResNet50 again outperforms SVM and CNN in terms of all three parameters: Precision, Recall and F-Measure. The results are summarised in Table 2. It is clear from the comparison in Tables 1 and 2 that ResNet50 algorithm has more accuracy. Hence, we used ResNet50 model with LIME to explain the output. The LIME method as explained above divides the images that are inputted into several superpixels in which pixels share the similar visual pattern. We get the top impacting superpixels through the LIME method and then we display it using heatmap. So that the infected areas are easily spotted.
Table 1 Comparison for COVID-19 positive images Precision Recall ResNet50 SVM CNN
0.89 0.79 0.19
0.89 0.83 0.51
F-measure 0.89 0.81 0.28
COVID-19 Disease Detection Using Explainable AI Table 2 Comparison for COVID-19 negative images Precision Recall ResNet50 SVM CNN
0.89 0.83 0.81
0.89 0.73 0.50
81
F-measure 0.89 0.81 0.62
7 Conclusion and Future Scope The current application provides comprehensive results based on the CT scan images of the lungs uploaded by the patient. It predicts the possibility of COVID-19 infection by using Machine Learning and Deep Learning technologies. One of the improvements over the existing model could be an automatic appointment booking system wherein the user can directly consult a doctor on receiving a COVID-19 positive report without making any extra effort. On getting a positive COVID-19 result, medically approved remedies for the containment and cure of the disease can be provided to the user. A feature that can provide users with important information such as the nearest hospitals and the nearest isolation centres based on the location of the patient can also be added to the application. With the increase in the number of CT scan image samples, the dataset can be updated and the model can be better trained. This can improve the performance parameters like accuracy, sensitivity and recall which are crucial for the model. The algorithm that has been implemented can also be extended to several other diseases where CT scan images are required for detection.
References 1. Došilovi´c FK, Brˇci´c M, Hlupi´c N (2018) Explainable artificial intelligence: a survey. In: 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO), pp 0210–0215. https://doi.org/10.23919/MIPRO.2018.8400040 2. Hanif A, Hanif A, Wood S (2021) A survey on explainable artificial intelligence techniques and challenges. In: 2021 IEEE 25th international enterprise distributed object computing workshop (EDOCW), pp 81–89. https://doi.org/10.1109/EDOCW52865.2021.00036 3. Soares E, Angelov P, Biaso S, Higa Froes M, Kanda Abe D (2020) SARS-CoV-2 CT-scan dataset: a large dataset of real patients CT scans for SARS-CoV-2 identification. medRxiv (2020). https://doi.org/10.1101/2020.04.24.20078584; Angelov P, Soares E (2020) Towards explainable deep neural networks (xDNN). Neural Netw 130:185–194 4. Irmak E (2020) A novel deep convolutional neural network model for COVID-19 disease detection. In: 2020 medical technologies congress (TIPTEKNO) conference, December 2020. https://doi.org/10.1109/TIPTEKNO50054.2020.9299286 5. Munawar MR (2021) Image classification using convolutional neural networks (CNN). https:// medium.com/nerd-for-tech/image-classification-using-convolutional-neural-networks-cnneef587ed0c1. Accessed 22 May 2021 6. Wang M, Gong X (2020) Metastatic cancer image binary classification based on ResNet model. In: 2020 IEEE 20th international conference on communication technology (ICCT), pp 1356– 1359. https://doi.org/10.1109/ICCT50939.2020.9295797
82
V. Mehta et al.
7. Magesh PR, Myloth RD, Tom RJ (2020) An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery. Comput Biol Med 126:104041. ISSN: 0010-4825. https://doi.org/10.1016/j.compbiomed.2020.10404
Towards Helping Visually Impaired People to Navigate Outdoor Rhea S Shrivastava, Abhishek Singhal, and Swati Chandna
Abstract Vision is one of the crucial senses and is the birthright of every human being. Its impairment or loss leads to various difficulties. Blind and Visually Impaired (BVI) people find it tough to maneuver outdoors daily. Even though the market is laden with countless aids for BVI people, there is still a lot to be achieved. The idea of every new research in the market is to assist these individuals in any possible way. Individuals deprived of vision require numerous reliable methods to overcome these barriers. Also, with the advent of science and technology, there is nothing that a human being can’t do. Researchers and manufacturers are coming up with new inventions and tech gadgets now and then. In this paper, Convolutional Neural Network (CNN) models, vgg16 and vgg19, are used along with the selfcreated dataset involving two classes: roads and crosswalks, which underwent the ML procedures resulting in the accurate detection of the respective classes. Keywords Vision · Machine learning · Blind and visually impaired · BVI · Neural network · CNN · VGG16 · VGG19 · Roads · Crosswalks
1 Introduction Vision is a crucial aspect of proper motor functioning of the human body. Whatever tasks are performed daily is possible because of this valuable sense. Be it reading a document, proofreading an article, verifying the accounts, observing the work on the construction sites, or going grocery shopping. All these chores are easy to perform R. S Shrivastava (B) Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi), New Delhi, India e-mail: [email protected] A. Singhal Amity University, Uttar Pradesh, Noida, India S. Chandna SRH University Heidelberg, Heidelberg, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_8
83
84
R. S Shrivastava et al.
because of the ability to see. So what will happen if we are devoid of it? Will it be that easy to carry out all of the above now? The answer to the question is negative. Without the vision, it is challenging to undertake any of these tasks. According to the World Health Organization (WHO), around 2.2 billion people across the globe have vision disability (near or distant). The reputed website also addresses the fact that approximately a billion or half of the active cases could have been prevented or are yet to be investigated properly [1]. So, to solve the situation of blindness and help visually impaired individuals, many researchers from prestigious institutions are contributing to this cause. It has been observed in the published research articles that You Only Look Once (YOLO) model has been used for the task, and only a few have worked with Visual Geometry Group (VGG). So, in this paper, Convolutional Neural Networks (CNNs); vgg16 and vgg19, have been used to identify the two classes, roads and crosswalks, on the self-created dataset such that the persons with vision impairment can navigate outdoors freely and safely.
1.1 Convolutional Neural Network One type of artificial neural network is convolutional neural network. It is used for the recognition of images and processes and employs deep learning to execute the aforementioned tasks [2]. The input and the output are in a 3D format where the input gets altered via differential equations. Hyperparameters are tuned as per the requirement of the task.
1.2 Visual Geometry Group VGG is based on CNN with only 3 × 3 filters, pooling layers, and a completely connected layer which is an additional component [3]. It focuses on the “depth”. Let’s discuss the models. • VGG16: VGG framework developed by Zisserman and Simonyan that won the ImageNet Large Scale Visual Recognition Challenge in 2014. The basic structure of VGG is different from other CNN models as it uses a 3 × 3 kernel-sized filter one over the other. This is one aspect which enhances its working as opposed to AlexNet [4]. • VGG19: This model is similar to the above with the exception of 19 weight layers instead of 16. That implies it has 19 convolutional layers and the above-mentioned model has 16 [5, 6].
Towards Helping Visually Impaired People to Navigate Outdoor
85
2 Literature The analysis of the problem statement was possible after studying the existing research. Shin et al. [7] proposed the crossing Assistance system which functions on the Bluetooth low-energy principle and received signal strength indicator for tracking indoor/outdoor locations by eliminating the innate restrictions of outdoor noise to find the user quickly. The first implementation resulted in 99.8% accuracy. The second examination was complex as it assessed nine zones and worked on kNN and support vector machine algorithms. The team found out that out of kNN and SVM, the latter presented the best results with 97.7% exactness. Whereas Plikynas et al. [8] worked on an approach based on the recordings of indoor pathways accumulated from the volunteers. The algorithms process the sensory data used by blind and severely visually impaired people. According to Zhao et al. [9], their team incorporated augmented reality (AR) visuals to stimulate people with low vision to aid them in navigating the stairs. They developed the visuals indicating the position of the individual on the stairs. These visuals were created and tested on 12 people with low vision problems to analyze the time taken by the individuals on the stairs, which decreased significantly. Htike et al. [10] worked on eight visual augmentations where the augmentations are lines on walls, colored-line overlays, comic overlays, zoom overlays, color overlays, live view overlays, object recognition, and edge overlays. The research carried out on 18 participants using the HoloLens v1 proved beneficial for individuals with moderate vision loss. Croce et al. [11], have utilized sensors and smartphones for navigation with the help of landmarks. Their system delivers direction estimations to the users. Lima et al. [12] had the data regarding the pre-existing routes between various landmarks of the city in Portugal and used other applications to assist BVI individuals in using public transport with ease. Velázquez et al. [13] proposed a wearable aid for BVI people which tracks the real-time data of GPS taken from the smartphone and analyzed by a reliable system. Then the vibrations are encoded and provided to the user’s shoe via an insertional tactile device. The results implied that the users responded quickly. By Ran et al. [14], Drishti is the device which helps the BVI people navigate anywhere effectively and safely with an indoor precision of 22 cm. Zhao et al. [15], interviewed and observed 14 BVI individuals as they performed the tasks. The volunteers walked indoors and outdoors, along four stairwells and then the two city blocks. This experiment highlighted that the low-vision people are sensitive to lighting conditions which might influence their judgment when using visual enhancements. Alghamdi et al. [16] proposed a framework consisting of RSSI and RFID techniques where the RSSI technology is used in two ways to detect distances from short-range to 70 m. The implementation of the research framework indoors resulted in an accuracy of 98% for a successful identification rate. Bilal Salih et al. [17] investigated various strategies and interviewed 60 partially or fully visually impaired people. They gathered facts and found that visually impaired people rely heavily on their other senses. Therefore, the interviewees exclaimed how crucial it is for them to use the aural signals properly and efficiently in new models.
86
R. S Shrivastava et al.
3 Methodology The proposed methodology adopted for this work is described in Fig. 1. Firstly, the dataset was created, and the application followed by the analysis of the existing approaches was checked. Once, that was done, we detected the concerned objects in the images and trained-tested the model further concluding the work by analyzing the results.
3.1 Create the Dataset The dataset contains a total of 1300 images which are of roads and crosswalks. The images are downloaded from various online sources. Some of the images are also taken from Kaggle which are then combined with the downloaded ones. Some of the images are shown below (Figs. 2 and 3).
Fig. 1 The proposed methodology
Fig. 2 Images of crosswalks [18, 19]
Towards Helping Visually Impaired People to Navigate Outdoor
87
Fig. 3 Images of road images taken from various sources [20]
3.2 Applying Existing Approach Once the literature review was completed it was evident that for the application of outdoor navigation, separate data regarding the location was necessary. This could only be accessed if the photographs were manually taken by smartphones or sensors were used. In this case, the images are downloaded from online sources. Hence, evaluating the existing method on this dataset was not feasible.
3.3 Analyzing the Existing Approach The approach needed numerical data for proper analysis which the dataset was not equipped with. Hence, the conclusions drawn from this step were straightforward the sensorial data was a necessity.
3.4 Detect Objects in Image After understanding the existing approach, the next step was to perform object detection on the customized image dataset. This step required knowledge about various object detection algorithms to proceed further.
3.5 Train and Test the Model Once the prior knowledge was accumulated of the algorithms then the CNN models were decided for the processing and execution. The models used in this work are vgg16 and vgg19.
88
R. S Shrivastava et al.
3.6 Analyzing the Results After the execution, the images were analyzed to check whether the system was able to detect the desired classes effectively. This is the breakdown of the steps that are followed in this research work.
4 Experimentation The mentation is done on the efficient CNN models, vgg16 and vgg19. It begins by tuning the hyperparameters. The hyperparameters throughout the process of execution are: • Split: The split percentage of the dataset is Train: 80%, Test: 20%. • Seed: This is to not let the random function generate different random values for every execution of the code on identical or different systems. • Batch_size: The batch size initialized in the code is 50. • Img_size: This parameter is set to 220 × 220. The code uses the ModelCheckpoint and EarlyStopping phenomena to avoid overfitting in callbacks as portrayed in Fig. 4. Callbacks are able to execute numerous functions at any stage when the model is being trained [21]. ModelCheckpoint is a wonderful feature provided by Keras which enables us to save the model at the end of each epoch.“filepath” parameter “preserves the file and “save_best_only” is used when the model is regarded as the best as per the scrutinized quantity and will not be overwritten. Certain parameters are defined in this function EarlyStopping is a preferred callback function provided by Keras and stops the training of model in between to avoid overfitting or crashing of the model. From Fig. 4, the “val_loss” (validation loss) is monitored for the training to be stopped [22, 23]. The “min_delta” function is the minimum requirement for change in the quantity which is monitored and classified as refinement [23]. The patience defines the number of epochs that will run after which the training will be stopped as there has been no significant improvement [24]. Then the pre-trained model’s characteristics are assessed and displayed. Then to optimize the working of the model, some layers were frozen, and “Adam” and “RMSprop” optimizers were chosen for vgg16 and vgg19, respectively. Both these algorithms are first-order optimization algorithms. Table 1 below showcases the configuration of the parameters set by the authors for optimized performance. Once the basic models were altered according to our preference, then it was executed on the dataset with various values of epochs (20, 30, 50, 80, 100) and patience (10, 20) having batch size as 256. The results are shown in Figs. 5, 6, 7, and 8. The time taken for complete execution is 6 hours, and the output displayed has 11 out of 12 images of roads and crosswalks detected correctly by the model.
Towards Helping Visually Impaired People to Navigate Outdoor
89
Fig. 4 Callbacks
Table 1 Freezing the layers for optimized performance
Sr no
VGG parameters
Number of parameters
01
Total parameters
19,709,250
02
Trainable parameters
03
Non-trainable parameters
Fig. 5 Training and validation loss and accuracy of vgg16
Fig. 6 Training and validation loss and accuracy of vgg19
4,955,650 14,753,600
90
R. S Shrivastava et al.
Fig. 7 Comparing loss and accuracy of the model
Fig. 8 Output the model
5 Conclusion and Future Work The best implementation of vgg16 and vgg19 models was observed when the model was executed for 30 epochs having patience set to 10 and batch size as 256, which gives an accurate output. The result was compared with the same parameters except batch size. When the batch size is taken as 50, then the system gives inconsistent results. It is observed that a higher batch size in this case is beneficial as the model appears to work accurately. The results can be further optimized by using a larger dataset such that we can incorporate more classes into them. Other aspects to enhance the working of the existing model include the identification of pedestrians, various automobiles on the road, and several inevitable obstacles on the path. Also, a real-time web-based application can be formed to efficiently assist BVI people by providing audio instructions to them regarding their environment.
Towards Helping Visually Impaired People to Navigate Outdoor
91
References 1. Who.int (2022) Vision impairment and blindness. [online]. https://www.who.int/news-room/ fact-sheets/detail/blindness-and-visual-impairment#:%7E:text=Globally%2C%20at%20l east%202.2%20billion,uncorrected%20refractive%20errors%20and%20cataracts. Accessed 17 May 2022 2. Contributor T (2018b) Convolutional neural network. SearchEnterpriseAI. https://www.tec htarget.com/searchenterpriseai/definition/convolutional-neural-network#:%7E:text=CNNs% 20are%20powerful%20image%20processing,natural%20language%20processing%20(NLP. Accessed 9 Sep 2022 3. Wei J VGG neural networks: the next step after alexNet. https://towardsdatascience.com/vggneural-networks-the-next-step-after-alexnet-3f91fa9ffe2c 4. GeeksforGeeks (2022b) VGG-16|CNN model. https://www.geeksforgeeks.org/vgg-16-cnnmodel/. Accessed 9 Sep 2022 5. VGG-19 convolutional neural network-MATLAB vgg19-mathworks United Kingdom (n.d.). https://uk.mathworks.com/help/deeplearning/ref/vgg19.html;jsessionid=2fac10e45795a23fce 5e37733563#:%7E:text=VGG%2D19%20is%20a%20convolutional,%2C%20pencil%2C% 20and%20many%20animals. Accessed 9 Sep 2022 6. Boesch G VGG very deep convolutional networks (VGGNet)-what you need to know-viso.ai. https://viso.ai/deep-learning/vgg-very-deep-convolutional-networks/#:%7E:text=The%20c oncept%20of%20the%20VGG19,more%20convolutional%20layers%20than%20VGG16 7. Shin K, McConville R, Metatla O, Chang M, Han C, Lee J, Roudaut A (2022) Outdoor localization using BLE RSSI and accessible Pedestrian signals for the visually impaired at intersections. Sensors 22(1):371. https://doi.org/10.3390/s22010371 8. Plikynas D, Indriulionis A, Laukaitis A, Sakalauskas L (2022) Indoor-guided navigation for people who are blind: crowdsourcing for route mapping and assistance. Appl Sci 12(1):523. https://doi.org/10.3390/app12010523 9. Zhao Y, Kupferstein E, Castro BV, Feiner S, Azenkot S (2019) Designing AR visualizations to facilitate stair navigation for people with low vision. In: UIST ’19: proceedings of the 32nd annual ACM symposium on user interface software and technology. New Orleans, LA USA. https://doi.org/10.1145/3332165.3347906 10. Min Htike H, Margrain TH, Lai YK, Eslambolchilar P (2021) Augmented reality glasses as an orientation and mobility aid for people with low vision: a feasibility study of experiences and requirements. In: Proceedings of the 2021 CHI conference on human factors in computing systems. https://doi.org/10.1145/3411764.3445327 11. Croce D, Giarre L, Pascucci F, Tinnirello I, Galioto GE, Garlisi D, lo Valvo A (2019) An indoor and outdoor navigation system for visually impaired people. IEEE Access 7:170406−170418. https://doi.org/10.1109/access.2019.2955046 12. Paiva S, Lima A, Mendes D (2018) Outdoor navigation systems to promote urban mobility to aid visually impaired people. J Inf Syst Eng & Manag 3(2). https://doi.org/10.20897/jisem. 201814 13. Velázquez R, Pissaloux E, Rodrigo P, Carrasco M, Giannoccaro N, Lay-Ekuakille A (2018) An outdoor navigation system for blind pedestrians using GPS and tactile-foot feedback. Appl Sci 8(4):578. https://doi.org/10.3390/app8040578 14. Ran L, Helal S, Moore S (2004) Drishti: an integrated indoor/outdoor blind navigation system and service. In: Second IEEE annual conference on pervasive computing and communications, 2004. Proceedings of the, 2004. pp 23–30. https://doi.org/10.1109/PERCOM.2004.1276842 15. Zhao Y, Kupferstein E, Tal D, Azenkot S (2018) “It looks beautiful but scary.” In: Proceedings of the 20th international ACM SIGACCESS conference on computers and accessibility. https:// doi.org/10.1145/3234695.3236359 16. Alghamdi S, van Schyndel R, Khalil I (2014) Accurate positioning using long range active RFID technology to assist visually impaired people. J Netw Comput Appl 41:135–147. https:// doi.org/10.1016/j.jnca.2013.10.015
92
R. S Shrivastava et al.
17. Bilal Salih HE, Takeda K, Kobayashi H, Kakizawa T, Kawamoto M, Zempo K (2022) Use of auditory cues and other strategies as sources of spatial information for people with visual impairment when navigating unfamiliar environments. Int J Environ Res Public Health 19(6):3151. https://doi.org/10.3390/ijerph19063151 18. Crosswalk-dataset (2020) Elias Teodoro da Silva Junior, Fausto Sampaio, Lucas Costa da Silva, David Silva Medeiros, Gustavo Pinheiro Correia. https://www.kaggle.com/datasets/dav idsilvam/crosswalkdataset 19. (n.d.-d). https://www.google.co.in/search?q=crosswalk&sca_esv=566033897&tbm=isch& source=lnms&sa=X&ved=2ahUKEwiBjdXphbGBAxXRwjgGHeAGAOcQ_AUoAXoECA YQAw&biw=1536&bih=707&dpr=1.25 20. Unsplash. (n.d.-a). 100+ Roads Pictures [HD] | Download free images on Unsplash. Unsplash. https://unsplash.com/s/photos/roads 21. Team K Keras documentation: Callbacks API. https://keras.io/api/callbacks/#:%7E:text=A% 20callback%20is%20an%20obje 22. Dwivedi R (2021) Beginners guide to Keras callBacks, modelCheckpoint and EarlyStopping in deep learning. Analytics India Magazine. https://analyticsindiamag.com/tutorial-on-kerascallbacks-modelcheckpoint-and-earlystopping-in-deep-learning/. Accessed 25 May 2022 23. Team K (n.d) Keras documentation: earlyStopping. Keras. https://keras.io/api/callbacks/early_ stopping/. Accessed 25 May 2022 24. Early stopping of deep learning experiments | Peltarion Platform (n.d) Peltarion. https:// peltarion.com/knowledge-center/documentation/modeling-view/run-a-model/early-stopping. Accessed 25 May 2022
An Analysis of Deployment Challenges for Kubernetes: A NextGen Virtualization Nisha Jaglan and Divya Upadhyay
Abstract Virtualization is a technology that allows us to create numerous simulated environments, which means instead of using an actual version of something a virtual copy is created. It makes things easier to manage. It was also a choice for optimizing server capacity. But with the introduction of Kubernetes, we get a highly efficient way to do the different tasks using containers. Containers came up as a less-weight replacement for Virtual Machines (VM) and have adequate microservice architecture backing such as the Internet of Things (IoT), and big data in various clouds. Though a lot of efforts have already been made to explore Kubernetes, a methodical unification of the current publications for evaluating its role in face-lifting Cloud is still not readily available. This paper analyzes the Kubernetes and its requirements to be deployed on any cloud environment. It was observed that the deploying Kubernetes on private cloud is not as simple as it is for public clouds. The performance measures are highly dependent on monotony that can greatly reduce the downtime. It was also examined that Kubernetes detects the failure as soon as services are live, and it doesn’t depend on the restoration time of the defective unit. Keywords Virtualization · Virtual machine · Kubernetes · Containerization · Dockers · Cloud computing
1 Introduction In the past decade, a major shift of the computing community toward the cloud is seen [1]. Cloud computing re-modeled the ways one can get access to various computing resources. Cloud computing has already made a very stronghold in the computing market because of its access to numerous resources in a pay-as-you-go N. Jaglan · D. Upadhyay (B) Department of Computer Science & Engineering, ABES Engineering College, Ghaziabad, India e-mail: [email protected] N. Jaglan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_9
93
94
N. Jaglan and D. Upadhyay
model, economic and in a virtual manner that too without having any headache of maintenance overhead of the resources. The key technology behind this revolution is virtualization. Virtualization involves the creation of a virtual form of any physical resources such as computing, network, or storage and these resources are usually subject to restrictions on how the underlying actual physical resource is shared. End-user won’t see much difference in using actual physical resources and virtual resources. Whereas, on the other hand, Kubernetes, also called K8s, is an open-source system that helps to manage containerized applications. It groups containers that are independent units for easy packaging, deploying, and updating web apps. A hybrid cloud is one that simultaneously uses the public, private cloud, and on-prem infra to form a single, adaptable, less costly IT infra. Kubernetes is an ideal foundation for a hybrid cloud strategy because it provides consistency, irrespective of where it’s deployed on-premises or to one or multiple public clouds [2]. In this paper, we will discuss and analyze various papers to understand why Kubernetes is known as the NexGen Virtualization by analyzing various app deployment challenges faced on cloud and how Kubernetes addressed these challenges (Fig. 1). In virtualization, VMs were placed on top of the hypervisor. It’s different in Kubernetes, where Docker is used. Let’s now discuss the theoretical and procedural approaches that were established related to our work.
Fig. 1 Containers architecture of kubernetes, hypervisor, and traditional deployment
An Analysis of Deployment Challenges for Kubernetes: A NextGen …
95
2 Origin, History of Kubernetes, and the Community Behind Let’s look at history and go back in time before virtualization. Now, before virtualization, Tolosana-Calasanz [3] if you look at the prevalent flow in which code was written and deployed, it began with a developer desiring to code something. And even before they start to write code, they need a host which was appropriate with respect to the OS and the coding language that they are using to code their application into. Hence, even before composing a piece of code, they would need to decide what OS they’re going to use, what coding language they will have to use, and then on that basis, they will request for the hardware because there was no virtualization. They only had an option of a physical one-to-one hardware port OS. The process of even starting to write an application code was time-consuming. Moreover, this process was not only relevant to the development units, but it was also relevant to the units that were going to test the application. So, even the QA units, needed to do identical things. They would need to raise a ticket within IT to provision hardware that could be installed with the suitable OS that the application was written for. And that’s when the test unit would begin testing the applications. Once the application was developed and tested, it would be passed to the operations team that would then decide or plan on how to deploy that application. Again, from an operations team’s standpoint, they would need to figure out the capacity in terms of storage, networking needs, etc [4]. So, the operations team will have to work with the networking unit to carve VLANs with enough routing to redirect traffic to the right application. That’s the reason the process between the time an application was developed and deployed was long. The time duration to resolve bugs and problems is also high. If any bug is found it goes back to the development units beginning from scratch with the code and then correcting it [2], needing a fresh host, or needing a host to be re-imaged with that application, provisioning, and giving it over to the QA teams. Hence, the overall duration it took for an application to be written from scratch to get it deployed was around 2–6 months at times or even in a year, if the application needs to be developed was large. The birth of virtualization where physical hardware could now host more than one OS helps in simplifying a lot many things. From an application development unit’s viewpoint, it is not necessarily waiting for an operations unit to deploy a physical host with the right operating system so that it can start working on its piece of code. However, the overall approach was simplified with the introduction of virtual machines as a unit of interaction versus just a piece of code that had to be deployed in a physical server somewhere. The complexity of scale is reduced as scaling now implies replicating a virtual machine instead of getting a completely new physical host [3]. It still must work with a lot of teams, factors, and hoops to deploy an application.
96
N. Jaglan and D. Upadhyay
3 Related Works This section comprises of two phases: Literature Review and Understanding of the techniques used in various research papers. Literature review helped in understanding how Kubernetes is different from virtualization and its impact on digital transformation and how it has uplifted the face of the cloud.
3.1 Literature Review This section of our survey paper covers the various analysis techniques used by the researchers in their research works which will prove that Kubernetes is a new generation of virtualization. Abdollahi Vayghan et al. [1], in this paper, investigated its architectures and conducted experiments to evaluate the availability that Kubernetes delivers for its managed microservices, and the results show that, in certain cases, the service outage for applications managed with Kubernetes is significantly high. Qiang et al. [5], in this work, presented the architecture of a cloud-native version of IBM Streams [6], with Kubernetes as the target platform, and eliminated 75% of the original platform code. Qiang Wu et al. proposed a generic system to dynamically adjust the scale of a Kubernetes cluster, which is able to reduce the waste of resources on the premise of QoS guarantee [7]. Soltesz et al., in this paper, showed how container-based OS hypervisor is a scalable and high-performance alternative to hypervisor [8]. Bhardwaj, et al., in this paper, the literature on the hypervisor and container-based virtualization is consolidated by formulating a research question framework based on key concerns, namely, application deployment using hypervisor versus container virtualization platform, container orchestration systems, performance monitoring tools, and finally future research challenges [9]. Dragoni et al., “Microservices: Yesterday, Today, and Tomorrow,” in Present and Ulterior Software Engineering, Mazzara and Meyer, Eds. Cham: Springer International Publishing, 2017, pp. 195–216: this survey provides an academic viewpoint on the topic and also investigates some issues and identifies the potential solutions for the same [10]. Nicola Dragoni et al., in this paper, discussed the main features of microservices and highlighted how these features help in improving scalability [11]. As per Forbes [8], since 2015, two major trends have begun to change the face of the hybrid cloud: Containers and Kubernetes. At times when operating containers, issues would occur on the test side, which would not occur at the development phase [7]. This kind of situation brings the need for the Container Orchestration System. When numerous services run inside containers, we may need to scale these containers [12]. In a large-scale industry, it would be difficult to do so because it would raise the cost of maintenance services and the difficulty to run them together (Fig. 2).
An Analysis of Deployment Challenges for Kubernetes: A NextGen …
97
Fig. 2 Basic architecture of kubernetes [13]
3.2 Objective Kubernetes cluster features a master/slave architecture. Nodes in an exceedingly Kubernetes set may either be physical machines or virtual machines. The master/ node contains a grouping of procedures needed for preserving the specified condition of a cluster. Slave nodes, over which we talk just as nodes, hold the procedures to operate the containers and are governed by the master [11]. Kubelet could be a process residing on each node. The most small and simple unit deployed & managed by Kubernetes is called a Pod, which is a group of more than one container and furnishes a common/shared repository and network for its container. Container in the same pod has identical IP address and identical port space. A pod furthermore holds the specifications of the way to drive its containers. Customized labels are often allocated to the pods to bunch and query these within the cluster. This complete data is mentioned within the pod scout. And these pods are driven by Controllers [11]. Controller’s specification contains the template of the pod, the desired number of replications of this pod which should be maintained by the controller and other extra details such as pod’s label and upgrade approach. When it is installed into the cluster, it will create the desired number of pods upheld by the furnished template and constantly keep up with the expected number. For example: If a node failure causes a pod failure, then the corresponding controller will automatically make a replacement on other nodes. Controllers are of many styles, each fit for a designated objective. Take an instance, to run duplicate pods on various nodes, Daemon Set controllers are used, to create and successfully terminate pods Job controllers is used, and stateful applications are managed by using Stateful Set controllers.
98
N. Jaglan and D. Upadhyay
In this paper, we are going to analyze the deployment controller used for deploying stateless applications. This paper is divided into various sections. Section 4 will discuss the deployment of Kubernetes cluster in the public cloud. Section 5 will analyze the failure challenge and its countermeasures.
4 Deployment of Application in Kubernetes Cluster in Public Cloud Here, [1] assumption is made that the Kubernete cluster composed of Virtual Machines is functioning during the public cloud. On each VM [14], Kubernetes [15, 16] is run and forms an appropriate picture of cluster. One among all these, VMs is chosen as a master and is liable for handling the nodes. Since High Availability is required, so one must always think about a High Availability cluster comprised of more additional master. However, such a setting continues to be under testing and unmature for Kubernete, hence let’s proceed with a single master only and all the failures from master will be kept out of scope. For keeping things simple and clear, we are going to consider an application composed of one microservice. Here two ways to expose services to the general public cloud environment will be investigated: (1) Using Service of Type Load Balance: The below figure Fig. 3 shows the architecture for deployment of applications [1] in an exceedingly Kubernetes cluster by employing a service of type Load Balancer. Together with cluster IP, the external IP address is also automatically set with the cloud service provider’s load balance IP address. It becomes achievable to retrieve pods that reside outside this cluster utilizing this l IP address.
Fig. 3 Deployment of an application to kubernetes clusters (in public cloud)
An Analysis of Deployment Challenges for Kubernetes: A NextGen …
99
(2) An Ingress: [17] A situation may arise when multiple services need to be revealed or exposed externally and by using service of the type load balancer, each service will require a load balancer. Whereas using Kubernetes ingress resource gives the flexibility to have more than one service as backend and to reduce the number of load balancers used. Ingress is deployed and disclosed by a service type of load balancer. Hence, all the requests for services transmitted to it are acquired by the ingress regulator and then diverted to relevant service as per the rules described in the resources of ingress. Deployment of Application in Kubernetes Cluster in Private Cloud. The main difference between the public and the private cloud lies in the way we expose the application externally. In the private cloud also, we have two ways of exposing the applications, let’s discuss both ways: (1) External Load Balancer: The below figure, Fig. 4. Illustrates the architecture for revealing the services utilizing an external load balancer. The disadvantage of this architecture is that one external load balancer will be needed to expose each service in the cluster externally. (2) Ingress: It [17] is more organized method for revealing a service. A service-type Cluster IP is built for diverting all the requests to pods and will also be used as a backend for ingress resources. In a private cloud, it’s not a straightforward job for adapting the ingress controller with the Kubernetes cluster and even no adequate documentation is available for using it. Figure 4 displays a general architecture for ingress revealing the services (in the private cloud) to the external world.
Fig. 4 Deployment of an application to Kubernetes clusters in a private cloud
100
N. Jaglan and D. Upadhyay
4.1 Survey to Examine Kubernetes Impact Survey Question: The main aim of this survey is to examine existing investigations and facts to evaluate the Kubernetes impact on digital transformation: like how Kubernetes availability for its managed services is better solely via its restoring features, the impact on the availability achieved with Kubernetes by adding redundancy, availability achievable with most responsive arrangement of Kubernetes and comparing it with existing solutions.
5 Analysis of Deployment Failure Strategies and Measures Here [1] we will review various failure strategies and availably measures already discussed by researchers. For this, Kubernetes cluster is set in private cloud composed of 3 virtual machines operating on the OpenStack [1]. OS running on all the VMs is Ubuntu 16.04 with Kubernetes v1.8.2 running on them and the container used is Docker v17.09 [18]. For synchronization of time between nodes, Network Time Protocol (NTP) is used and the application used is Video Lan Client (VLC). VLC is installed on a container image in the pod template. After the deployment of Pod, an app container is made on the basis of this image and it will begin flowing from file. Metrics used to assess Kubernetes [17] for availability are as follows: 1. Reaction Time: This is the time course between the failure circumstance introduced and the first response of Kubernetes that echoes that a failure was detected. 2. Repair Time: This is the course of time between Kubernetes’ first response and restoration of the pod failed. 3. Recovery Time: This is the course of time between Kubernetes’ first response and when service becomes functional again. 4. Outage Time: This is the course of time when the service is unavailable. It is equal to the aggregate of reaction time and recovery time (Fig. 5). Repair Action using Default Settings of Kubernetes for keeping up the availability, Effect of Redundancy on it [1]. Service Outage because of VLC Container Failure: Researchers have simulated this scenario by destroying the VLC container process from OS. As soon as this container cracks, Kubelet catches it and changes the state of pod such that it won’t receive any new request. At this time, this pod will be withdrawn from the endpoint list, called Reaction Time. Later on, Kubelet restarts the VLC container. This time duration is called repair time. And recovery time is the duration of time when the pod is moved to the endpoint list and is now again ready to accept new requests. Service Outage because of pod container failure: Here, researcher has deployed a pod along with an application container. As pod container is a function in Operating System, it can possibly crash. So, the researcher has performed this by killing the pod container process.
An Analysis of Deployment Challenges for Kubernetes: A NextGen …
101
Fig. 5 Matrix to evaluate kubernetes
Service Outage because of Node Failure: Researchers simulated failure of node by running Linux’s reboot command on a virtual machine hosting this pod.
6 Result Analysis It shows the reaction time [1] of Kubelet to the VLC container failure due to releasing the IP address of Pod1 from the endpoint index and compares it with the case of pod container failure. The repair time for all the scenarios is the time between the new pod creation again and when the streaming of video starts again. IP of Pod1 removal from endpoint list due to VLC container takes 0.579 s, in pod container fail its 0.696 s, whereas, in node failure reaction time is 38.554 s and if we talk about repair time its 0.499, 30.986, and 262.178 resp. Restoration Steps with Most Responsive Formatting of Kubernetes Sustaining Availability [1]. In this, the researcher [1] has used two ways to conduct the experiment: No Redundancy and N Way Active monotony model architectures. When pod container process fails, a signal called graceful termination is passed to Docker for termination of app container which delays its repair by 30 s. NoRedundancy model’s recovery time gets affected by this grace period as till the pod is completely terminated it will not create new pod. For reducing this grace period, the pod template is updated, and grace period is set as zero. For having the most receptive Kubernetes config, Kubelet of every node is reconfigured for sending the status of node to master every second. Now the experiment is repeated to analyze the impact of these reconfigurations on the outage of service due to Pod and Node failure, respectively.
102
N. Jaglan and D. Upadhyay
7 Result Analysis The result of these experiments is shown in Tables 2 and 3. As anticipated, Table 2 shows a good decrease in repair time, which affects the interruption of service with No-Redundancy model. Interruption of service with NWay Active Redundancy model does not change because repair time has no role in interuppting the service in this. Moreover, we can see in Table-3, that there is a great change in all measured metrics. Using a new configuration, master only allows single missed status to be updated, and since every node sends an update of status every second, hence reaction time gets reduced to 1 s. Repair time also gets decreased by a huge margin as master now only has to wait for single second before it starts a new pod on healthy node. Table 1 Reaction time of kubelet to VLC container Failure trigger (in secs)
VLC container failure
Pod container failure
Node failure
Reaction time
0.579
0.696
38.554
Repair time
0.499
30.986
262.178
Recovery time
0
0.034
0.028
Outage time
0.579
0.73
38.582
Table 2 Service outage because of pod container failure Redundancy model (unit: seconds)
No-redundancy
N-way active
Reaction time
0.708
0.521
Repair time
3.039
3.008
Recovery time
3.337
0.032
Outage time
4.045
0.554
Table 3 Service outage because of node failure Redundancy model (unit: seconds)
No-redundancy
N-way active
Reaction time
0.976
0.849
Repair time
2.791
2.173
Recovery time
2.998
0.022
Outage time
3.974
0.872
An Analysis of Deployment Challenges for Kubernetes: A NextGen …
103
8 Conclusion In this paper, the result of experiments done by researchers is analyzed, which shows us how Kubernetes supports automatic deployment and scaling of microservicebased apps. Though Kubernetes can run in any environment, deployment in a private cloud is not as straightforward as it is in public clouds. Kubernetes provides availability through its repair action but it’s not adequate for availability, mainly high availability. However, it is mostly noticed that monotony can enormously reduce the downtime as the service is restored as soon as Kubernetes detects failure which doesn’t depend on the repair time of the flawed unit. So, it can be fairly said how Kubernetes has become much easier and seamless way to deploy apps on cloud rather than using Virtual Machines for it. Hence, it can be rightly said that Kubernetes is actually a NexGen Virtualization. The acceptance adoption of containers is across-the-board and still has a lot of research possibilities, mainly in the security area and on using containers for delivering serverless computing. The author will work on further analysis on this bright domain.
References 1. Abdollahi Vayghan L, Saied MA, Toeroe M, Khendek F (2018) Deploying microservice based applications with kubernetes: experiments and lessons learned. In: 2018 IEEE 11th international conference on cloud computing (CLOUD). pp 970–973. https://doi.org/10.1109/ CLOUD.2018.00148 2. Medel V, Rana O, Bañares J, Arronategui U (2016) Modelling performance & resource management in kubernetes. pp 257–262. https://doi.org/10.1145/2996890.3007869 3. Tolosana-Calasanz R, Bañares JA, Colom JM (2014) Towards petri net-based economical analysis for streaming applications executed over cloud infrastructures. In: Economics of grids, clouds, systems, and services-11th international conference, GECON’14, Cardiff, UK, September 16–18, 2014., ser. LNCS, vol 8914. pp 189—205 4. Chen X, Xiaoping D, Fan X, Giuliani G, Zhongyang H, Wang W, Liu J, Wang T, Yan Z, Zhu J, Jiang T, Guo H (2022) Cloud-based storage and computing for remote sensing big data: a technical review. Int J Digit Earth 15(1):1417 5. Wu Q, Yu J, Lu L, Qian S, Xue G (2019) Dynamically adjusting scale of a kubernetes cluster under QoS guarantee. In: Parallel and distributed systems (ICPADS) 2019 IEEE 25th international conference on, pp 193–200 6. https://www.ibm.com/cloud/blog/containers-vs-vms. Accessed 31 Dec 2021 7. Soltesz S, Pötzl H, Fiuczynski ME, Bavier A, Peterson L (2007) Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. SIGOPS Oper Syst Rev 41(3):275–287 8. Dragoni N, Lanese I, Larsen ST, Mazzara M, Mustafin R, Safina L (2018) Microservices: how to make your application scale. In: Perspectives of system informatics. pp 95–104 9. Kanso A, Toeroe M, Khendek F (2014) Comparing redundancy models for high availability middleware. Computing 96(10):975–993 10. Netto HV, Lung LC, Correia M, Luiz AF, Sá de Souza LM (2017) State machine replication in containers managed by Kubernetes. J Syst Architect 73:53–59 11. Integrating open SAF high availability solution with open stack-IEEE conference publication. [Online]. https://ieeexplore.ieee.org/abstract/document/7196529. Accessed 12 Nov 2022
104
N. Jaglan and D. Upadhyay
12. Container and microservice driven design for cloud infrastructure devOps. IEEE conference publication. [Online]. https://ieeexplore.ieee.org/abstract/document/7484185. Accessed 22 Nov 2022 13. https://www.forbes.com/sites/janakirammsv/2019/12/16/how-kubernetes-has-changed-theface-of-hybrid-cloud/?sh=4e39f951228d 14. https://thenewstack.io/the-case-for-virtual-kubernetes-clusters/. Accessed 01 Jan 2022 15. Kubernetes, Kubernetes. [Online]. https://kubernetes.io/. Accessed 12 Dec 2021 16. Kubernetes Documentation, Kubernetes. [Online]. https://kubernetes.io/docs/home/. Accessed 23 Jan 2022 17. Ahmad I, AlFailakawi MG, AlMutawa A, Alsalman L (2021) Container scheduling techniques: a survey and assessment. J King Saud Univ-Comput Inf Sci 18. Docker-Build, Ship, and Run Any App, anywhere. [Online]. https://www.docker.com/. Accessed 12 Jan 2022
A New Task Offloading Scheme for Geospatial Fog Computing Environment Using M/M/C Queueing Approach Manoj Ranjan Mishra , Bibhuti Bhusan Dash , Veena Goswami , Sandeep Nanda, Sudhansu Shekhar Patra , and Rabindra Kumar Barik
Abstract Geospatial fog computing became an interesting paradigm to support delay-sensitive jobs in Internet of Spatial Things (IoST) for geospatial applications by offering scalable, shared, and computational resources in addition to geospatial cloud services. The delay-sensitive jobs are offloaded in geospatial fog computing environment for reducing overall communication time. Contrarily, a good offloading technique selects a suitable computing device in terms of a fog node according to the task’s resource needs while meeting the deadline. This article proposes a new offloading technique for scheduling and processing activities generated by IoST devices to appropriate computing devices. M/M/c queueing model is used to model the system and find the optimized number of virtual machines to be activated in the fog layer for active processing of the offloaded tasks. This schema decreases the delay-sensitive tasks queue waiting time and optimizes the required number of VMs in the fog layer. The efficacy of the proposed technique is validated with the help of simulation results.
M. R. Mishra · B. B. Dash · V. Goswami · S. S. Patra (B) · R. K. Barik School of Computer Applications, KIIT Deemed to Be University, Bhubaneswar, India e-mail: [email protected] M. R. Mishra e-mail: [email protected] B. B. Dash e-mail: [email protected] V. Goswami e-mail: [email protected] R. K. Barik e-mail: [email protected] S. Nanda School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_10
105
106
M. R. Mishra et al.
Keywords Cloud computing · Fog computing · Task offloading · M/M/c queuing model · Geospatial big data
1 Introduction Nowadays, many cloud computing workloads are generated by mobile devices. Many tasks require high computing capabilities and high power consumption. Because of the limited computing capabilities of the edge devices, a computation offloading is used. One of the newer computing paradigms is fog computing that serves as an assistive tool for cloud computing, and it is one of the most promising. This technique has the potential to be extremely useful in geospatial cloud computing environments when dealing with massive amounts of geospatial big data [1, 2]. Figure 1 shows a fog-assisted geospatial cloud computing. As is shown in Fig. 2, tasks submitted by mobile users instead of being submitted directly to the cloud layer, first sent to fog nodes for processing. If during offloading any VMs are ready to process the tasks it is processed immediately; otherwise, the task waits in the waiting queue. This paper provides a novel analytical model for studying M/M/c queuing systems, as well as the performance of such systems. This model is used to illustrate some of the system’s characteristics, and the validity of the proposed model is demonstrated by numerical evaluations and simulations. The rest of the article is arranged in the following way. The previous works done in this area are given in Sect. 2. Section 3 establishes the model. Section 4 assesses the model with the numerical and simulation results. Section 5 concludes the article with the future direction of the work.
Fig. 1 Fog-assisted geospatial cloud computing environment
A New Task Offloading Scheme for Geospatial Fog Computing …
107
Fig. 2 Fog-assisted geospatial cloud computing environment with M/M/C queueing approach
2 Related Work When it comes to digital manifesto, cloud and fog computing are the technologies that work for the naive/specialized users who want to be close to the level of information storage, processing, analysis, networking, and delivery that they need. When compared to a cloud computing-based paradigm, the Internet of Things (IoT) layer or clients are positioned closer to the fog nodes for geospatial data processing and delivery, and for sending geospatial data to cloud data centers. First, the diversity of services is dispersed in a manner that is very close to the end user/naive user. It has the capability of measuring the intermediary layer between fog and cloud servers. Local servers are used to store as much geographical data as possible. These geospatial data sets contain intelligence data that has been wrapped and is being sent to cloud servers. By utilizing these techniques, these fog-assisted cloud computing frameworks can decrease latency and delay [3–5]. Although the fog computing paradigm can perform better than the cloud in the case of geospatial big data applications, the cloud is still preferred. Regarding cloud computing, it is impossible to eliminate fog; instead, it must work at the same time as required. Figure 1 shows the fog-aided cloud computing architecture for IoST services along with the fog-assisted geospatial cloud computing architecture. In a fog-based paradigm, the end users’ requests are transmitted to the fog tier, processed by the fog servers and the results are returned back to the customers/end users [6, 7]. Users are responsible for geospatial storage and analysis at the cloud tier, which is managed by the system. When the workload is unevenly distributed across the fog nodes/servers, the resource deployment components, which include the fog nodes/servers, are properly positioned. Inefficient resource management and resource usage resulted in a reduction in the overall quality of service (QoS) of the system and also increased the energy consumption of the overall system [8, 9].
108
M. R. Mishra et al.
Table 1 The considered concepts of fog computing in queueing approaches Authors Yousefpour et al. [9] Liu et al. [5] Barik et al. [7]
Exponential queue model √
Non-exponential queue model
√ √ √
Patra et al. [13]
√
Panigrahi et al. [2]
√
Nikoui et al. [4]
This section includes previous works on applying queueing theory and provides analytical modeling in distributed computing environments, as well as their implications for the future. Several factors for the sort of queue model were taken into consideration, including batch service, batch arrival, tandem queues, etc. [10–13]. Table 1 shows the principles that were taken into consideration in the reviewed publications in M/M/C approaches.
3 Establishing the Model The system is modeled as the M/M/c queue model where c is the number of VMs in the fog layer that is going to provide the service to the offloaded tasks and the arrivals and service rates are depicted as μn = {nμ, 1 ≤ n ≤ ccμ, c < n
(1)
λn = λ
(2)
The arrival of the tasks to the fog layer follows the Poisson distribution and the service time of each task by the VMs meets the negative exponential distribution. The system waiting capacity to be served is infinite. If all the VMs are busy while a task is offloaded to the fog layer to be served, then the task has to wait in the waiting queue. Figure 3 shows the state transition diagram of the system.
Fig. 3 State transition diagram
A New Task Offloading Scheme for Geospatial Fog Computing … Table 2 List of key notations
109
Computational variables
Meaning
λ
Tasks offloading rate
μ
Tasks served rate
c
Number of VMs in the fog layer
To set parameters ρ as the service intensity, when ρ < 1, the system can achieve a λ < 1 which implies ρc < 1 means the steady state, and has a smooth distribution. cμ tasks will not be queued in infinite numbers. Table 2 denotes the list of key notations used in the article and Table 3 is the list of result variables used. We know, λ n P n = ρ n P0 = ( ) P0 μ
(3)
This can be rewritten as Pn =
λn P0 = μ.2μ.3μ.....nμ
n 1 λ P0 , n < c μ n!
(4)
and Pn =
λn λ n 1 P ,n > c P0 = ( ) μ.2μ.3μ.....cμ.cμ.... μ c!cn−c 0
(5)
In the steady state, ∞
Pk = 1
k=0
Therefore, the probability of zero number of tasks in the fog layer is Table 3 List of all results variables Results variables
Meaning
C
The probability that the tasks offloaded to the systems must be queued
Lq
Average number of tasks in the queue
L
Average number of tasks that are offloaded currently in the system
Wq
Tasks average queueing time
W
Tasks average waiting time in the system
(6)
110
M. R. Mishra et al.
c−1 −1 1 cμ 1 c n ρ + ρ P0 = n! c! cμ − λ n=0
(7)
Through analysis of the system, the following corresponding target parameters can be drawn. The probability that the tasks are offloaded to the fog system is queued C(c, p) =
∞ k=c
Pk =
∞ cc k=c
c!
ρ k P0 =
∞
Pc .ρ k−c =
k=c
c Pc c−ρ
(8)
Average waiting length in the system Lq =
∞ ∞ (n − c)Pn = (n − c)ρ n n=c
n=c
1 ρ(cρ)c ρ P = P0 0 c!cn−c c!(1 − ρ)2
(9)
The average system length or the average number of tasks waiting in the system L = Lq +
λ (cρ)n ρ P0 = μ c!(1 − ρ)2
(10)
Average waiting time for the tasks in the system is Wq =
Lq λ
(11)
Average staying time for the tasks in the system W =
L 1 = Wq + λ μ
(12)
When the system is stable, depending on the average waiting time in the system if the fog layer activates the least VMs, the system will not only process the tasks offloaded but also reduce the operational cost as well as help in green energy [14].
4 Numerical and Simulation Examples We illustrate the numerical results found by the presented queueing model system using MAPLE 18. Here, in all the figures, we have considered the parameters as λ = 8 and μ = 5. Figure 4 shows the number of user requests in the system (Ls ) versus the number of VMs (c) which shows that the Ls decreases as the number of VMs (c) increases and is constant from c = 6 onwards. Figure 5 gives Lq versus c. Here also Lq decreases as c increases and is constant after c = 5 onwards. Figure 6 depicts the
A New Task Offloading Scheme for Geospatial Fog Computing …
111
Fig. 4 Ls versus c
relationship between cdf and number of tasks offloaded to the fog layer for various values of c. The cdf is increasing as the number of tasks is increasing in the fog layer and is being constant to 1 after a certain number of tasks. Fig. 5 Lq versus c
Fig. 6 Number of tasks versus CDF
112
M. R. Mishra et al.
5 Conclusions and Future Work Fog computing for geospatial applications has emerged as a promising paradigm for supporting time-sensitive tasks on the Internet of Spatial Things (IoST). It provides flexible and shared computing and communication resources and a lowcost computing communication infrastructure from the edge layer tasks offloaded to the fog nodes for processing. Once the tasks are offloaded to be fog layer, they follow the M/M/c queueing model and wait in the waiting buffer if all the VMs are busy. Then they follow FCFS scheduling to process by the VMs. We provided numerous numerical examples in the form of figures that will be helpful for the provider to model their system. In the future, one can use the advanced queueing models for the priority tasks when the tasks are offloaded to the fog layer.
References 1. Al Ahmad M, Patra SS, Bhattacharya S, Rout S, Mohanty SN, Choudhury S, Barik RK (2021) Priority based VM allocation and bandwidth management in SDN and fog environment. In: 2021 8th international conference on computing for sustainable global development (INDIACom). IEEE, pp 29–34 2. Rout S, Patra SS, Mohanty JR, Barik RK, Lenka RK (2021) Energy aware task consolidation in fog computing environment. In: Intelligent data engineering and analytics 2021. Springer, Singapore, pp 195–205 3. Barik RK, Dubey H, Samaddar AB, Gupta RD, Ray PK (2016) FogGIS: fog computing for geospatial big data analytics. In: 2016 IEEE Uttar Pradesh section international conference on electrical, computer and electronics engineering (UPCON). IEEE, pp 613–618 4. Nikoui TS, Rahmani AM, Balador A, Javadi HH (2022) Analytical model for task offloading in a fog computing system with batch-size-dependent service. Comput Commun 5. Liu L, Chang Z, Guo X (2018) Socially aware dynamic computation offloading scheme for fog computing system with energy harvesting devices. IEEE Internet Things J 5(3):1869–1879 6. Naik M, Barik L, Kandpal M, Patra SS, Jena S, Barik RK (2021) EVMAS: an energy-aware virtual machine allocation scheme in fog centers. In: 2021 2nd international conference for emerging technology (INCET). IEEE, pp 1–6 7. Panigrahi S, Barik RK, Mukherjee P, Pradhan C, Patro R, Patra SS (2021) Optimization policy for different arrival modes in IoT assisted geospatial fog computing environment. In: 2021 2nd international conference for emerging technology (INCET). IEEE, pp 1–6 8. Patra SS, Mittal M, Jude Hemantha D, Ahmad MA, Barik RK (2022) Performance evaluation and energy efficient VM placement for fog-assisted IoT environment. In: Energy conservation solutions for fog-edge computing paradigms. Springer, Singapore, pp 129–146 9. Yousefpour A, Ishigaki G, Gour R, Jue JP (2018) On reducing IoT service delay via fog offloading. IEEE Internet Things J 5(2):998–1010 10. Zhou Z, Chang Z, Liao H (2021) Dynamic computation offloading scheme for fog computing system with energy harvesting devices. In: Green internet of things (IoT): energy efficiency perspective. Springer, Cham, pp 143–161 11. Khazaei H, Misic J, Misic VB (2011) Modelling of cloud computing centers using M/G/m queues. In: 2011 31st international conference on distributed computing systems workshops. IEEE, pp 87–92 12. Outamazirt A, Barkaoui K, Aïssani D (2018) Maximizing profit in cloud computing using M/ G/c/k queuing model. In: 2018 international symposium on programming and systems (ISPS). IEEE, pp 1–6
A New Task Offloading Scheme for Geospatial Fog Computing …
113
13. Goswami V, Patra SS, Mund GB (2012) Performance analysis of cloud with queue-dependent virtual machines. In: 2012 1st international conference on recent advances in information technology (RAIT). IEEE, pp 357–362 14. Patra SS (2018) Energy-efficient task consolidation for cloud data center. Int J Cloud Appl Comput (IJCAC) 8(1):117–142
Face Recognition Using Deep Neural Network with MobileNetV3-Large Sakshi Bisht, Abhishek Singhal, and Charu Kaushik
Abstract With the advancement of computer technology, multiple advanced techniques rely on machine vision, especially biometric systems, to play an essential part. The recording shows an image or video data with a face in it, then recognizes and analyzes the face region. Detection and recognition consist of a group of Artificial Intelligence (AI) techniques. It has a broad scope of uses and has become a thriving topic of study. In our proposed model, we employed AveragePooling2D with MobileNetV3Large. The algorithm is trained on the collected dataset following a few preprocessing processes. According to the findings, MobileNetV3 with AveragePooling2D has an accuracy rate of 91.12 percent on the training dataset and 99.79 percent on the verification dataset for 50 epochs. Keywords Face recognition · Face detection · MobileNetV3Large · CNN · Preprocessing · AveragePooling2D
1 Introduction A perceiver might glean tons of useful knowledge about an individual by checking their face out. It can uncover data about an individual’s mood, attitude, and intent, as well as it could also be used to recognize them. Sure, somebody may be recognized without looking at them. In situations where face evidence is unavailable, speech, body structure, movement, and sometimes even clothes can all be used to verify identification. Nonetheless, face is by far the most unique and broadly utilized clue for personal identification, and a certain loss of capacity to identify appearances has a significant impact on people’s life [1]. Detection, extraction, and recognition are the 3 phases in developing a strong biometric system. The face detection stage is often used to identify individual faces S. Bisht (B) · A. Singhal · C. Kaushik Department of Computer Science and Engineering, ASET, Amity University Uttar Pradesh, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_11
115
116
S. Bisht et al.
in a picture. This stage determines whether the incoming image any human faces. Face detection might be hampered by fluctuations in lighting and expression. The characteristics of a face discovered in the detection stage are extracted in the extraction stage. This stage displays a face with a “signature,” which specifies the major characteristics of the face picture, as well as their geometric distribution. Lastly, the face recognition stage analyzes the derived data from the backdrop during the last stage to known individuals recorded in a collection. Face recognition may be used for identification and confirmation [2]. Some challenges faced by recognition system are illumination, pose, occlusion, expressions, low resolution, aging, and complex. Illumination: Light vacillations are alluded to as brightening. The smallest variety in lighting is a significant trouble for programmed facial acknowledgment and impressively influences the achievement. Pose: Recognition Systems are extremely sensitive to changes in stance. Whenever there is a head moves and their angle changes, the posture of their face changes. It is caused by motions or different camera POVs inevitably create high intra-class variances. Occlusion: It refers to a blocking of face and, preventing the entire face from being used as an input picture. One of the difficult difficulties is recognition systems. Expressions: Varied conditions lead to diverse moods, and emotions which finally, changes in expressions cause challenges for recognition. Low Resolution: A low resolution less than 16*16 pixels, this doesn’t really convey enough detail. Aging: The character of a person’s face varies with time, resulting in getting older, that is an issue. Model Complexity: Current recognition systems rely on a complicated (CNN) that is very complex and inadequate for real-time. In this paper, we employ upgraded CNN to increase accuracy. By adopting an AveragePooling2D and keeping applying batch normalization to keep up with the consistent transmission of the preparation and approval stage, the model size is a great deal decreased, improving distinguishing proof execution and quality. The following are the study’s primary contributions: • This study focuses on the MobileNetV3 Large algorithm for facial recognition to achieve better and faster outcomes. • Demonstrates the value of hyperparameter adjustment. • Using batch normalization to settle the training duration and AveragePooling2D to boost accuracy. The remainder of this paper is structured as follows: We momentarily look at important work in Sect. 2. Section 3 delves into the recognition system’s preprocessing and procedures. In Sect. 4, we use experiments to demonstrate the product’s usefulness. Lastly, we reach conclusions and future work in Sect. 5.
Face Recognition Using Deep Neural Network with MobileNetV3-Large
117
2 Related Work This part gives a review of different techniques for the face recognition. Liu et al. [3] provided a solution of deep recognition via an open-set convention, in which the angle softmax loss allows CNNs to learn angularly discriminative highlights. Cao et al. [4] presented VGGFace2, a big new facial database. They trained ResNet-50 Convolutional Neural Networks on VGGFace2, MS-Celeb-1 M, and its merger to measure face recognition accuracy. Wang et al. [5] to actualize this notion from a fresh angle, a unique loss function called long margin cosine loss (LMCL) was presented. Because of normalization and cosine decision margin maximisation, the smallest intra variance and greatest inter variance are attained. CosFace is the name given to it. Ranjan et al. [6] proposed HyperFace. Creators likewise offer 2 HyperFace varieties: (a) HyperFaceResNet, which relies here on ResNet-101 and provides good performance, and (b) QuickHyperFace, which improves the application’s speed by using a high recall quick detector. Wang et al. [7] proposed additive margin Softmax. Whenever a large-margin method is used in the classification to promote intra-class variance reduction, this is conceivable. Tran et al.’s [8] paper proposes three major features in the DRGAN. In additional to picture synthesizing, the encoder–decoder generators enable DRGAN to train a generated and discriminatory model. Next, it presented an estimate in the discriminator plainly unravels these portrayals from those other face changes like stance. Finally, it may accept one or more photos as input to produce a unifying depiction. Ding et al. [9] first haze preparing information made of good still photographs. Then, scientists offer the TBE CNN model, which gathers itemized information from exhaustive facial pictures and fixes cut around facial parts. Yang et al.’s [10] paper presents for feature representation and classification, the research provides a 2D image matrix-based error model called NMR. Gao et al. [11] suggest an approach based on Semi-Supervised Sparse Representation (S3RC). The key premise is that (a) utilize a fluctuation word reference to classify straight irritation factors by means of the sparsity system, and (b) model pictures are anticipated as an exhibition word reference through a GMM with consolidated marked and unlabeled datasets. Yin et al. [12] paper explores using Multi-Task Learning. They present a posture coordinated perform various tasks CNN that learns pose-specific identification characteristics concurrently throughout all positions in a joint framework by grouping various stances. Cavozo et al. [13] examine information elements as well as situation modeling aspects that take into account the function of the algorithms. They offer data from four facial recognition algorithms for East Asian and Caucasian countenances to show how these challenges relate. Cho et al. [14] offer the relational graph module, a graph-structured module that pulls worldwide relational data in addition to regular face traits. Zhou et al. [15] propose convolutional brain networks are utilized to propose the Face and Orientation Acknowledgment Framework. The technique is comprised of two modules: one for face acknowledgment and one more for orientation acknowledgment. Mantoro et al. [16] proposed
118
S. Bisht et al.
face recognition approach that employs a mix of Haar Cascades and Eigenface algorithms to identify several faces (55 in total) in a single detection step. With 91.67 percent accuracy.
3 Methodology Detection, alignment, representation, and classification are the four steps of facial recognition currently in use. A classification algorithm can still determine identity and gender from a facial description with a complicated backdrop (Fig. 1).
3.1 Dataset A dataset is a collection of information. Dataset, which is used in this study, is taken from Kaggle [18]. Dataset from Kaggle is a collection of 100 different celebrities from which 40 classes have been chosen and then we added more images to those classes. Our dataset classes have 700 + photos in each of them. Total 28,733 images have been used for creating this model. Dataset is separated into a training set and a validation set in 8:2 (Table 1).
3.2 Pre-processing The face area in a complicated backdrop is first detected using Multi-task Cascaded Convolutional Networks (MTCNN). For every picture, the dimension is first normalized to 224 by 224 pixels and a 128D representation is created. To prepare for later picture face recognition, face discovery and arrangement handling are used as pre-handling.
Fig. 1 Face recognition system [17]
Table 1 Dataset count Identity
Total
Training
Validation
28,733
22,987
5746
Face Recognition Using Deep Neural Network with MobileNetV3-Large
119
Fig. 2 MobileNetV3 last stage [19]
3.3 MobileNetV3Large Model Howard et al. [19] MobileNet V3 is a convolutional neural network of a specific type. A. Howard et al. proposed it in 2019. It is the next version of MobileNets, based on a different structure and a mix of complimentary search methodologies. MobileNetV3 is designed for mobile device, CPU, a blend of equipment-mindful NAS and the NetAdapt, which is then additionally upgraded by remarkable compositional developments. MobileNetV3Large and MobileNetV3Small are two additional MobileNet versions that are aimed toward huge and lower resource usage scenarios, respectively (Fig. 2). When compared to MobileNetV2, MobileNetV3Large is 3.2 percent better on ImageNet grouping while, at the same time, bringing down inertness by 15 percent. In comparison to its older version MobileNetV2, MobileNetV3Small is 4.6 percent more exact yet bringing down idleness by 5 percent. COCO recognition, MobileNetV3Large is 25 percent faster and has almost a similar accuracy as its older version MobileNetV2.
3.4 Hyperparameter Tuning The model’s design and training method are heavily reliant on a set of hyperparameters. Well before model can be implemented to a fresh set of data, all hyperparameters must be selected correctly. Because all deep learning methods are often represented by many hyperparameters, it’s hard to ignore customizing these in such a manner that they can best utilize the model’s abilities. Despite this, tuning these hyperparameters is not an easy process. There are a variety of hyperparameters related to MobileNetV3 that need to be tweaked. The quantity of hidden layers and nodes, the dropout rate, the activation function, the quantity of epochs, the batch size, the learning rate, and the quantity of units in the dense layer are only a few of them. CC and SCC stand for categorical cross-entropy and sparse categorical cross-entropy, respectively. There are two basic approaches to hyperparameter enhancement: manual and programmed. Manual optimization is time-consuming because it relies on experts. It is the experts’ responsibility to determine how the hyperparameters will affect the model’s result. So, after reviewing various papers and performing experiments, we decided to take these hyperparameters manually as shown in Table 2.
120
S. Bisht et al.
Table 2 Hyperparameter opted for the face recognition model Name of hyperparameter
Give values
Optimal value determined
Batch size
16, 32, 64
32
Epochs
30,40, 50, 100
40, 50
Optimizer
Nadam, adam
Adam
Loss function
CCE, SCCE
SCCE
Activation function
ReLu, softplus
Softplus
Dropout
0.3, 0.4, 0.5
0.5
4 Result We used a celeb image data [18] to develop the module. This study focuses on the MobileNetV3 Large algorithm [19] for facial recognition to achieve better and faster outcome. We employ the AveragePooling2D rather than the fully connected layer before a definite result, followed by the softmax layer, which diminishes the volume of the structure. We train our model up to the five different epoch values (10,20,30,40,50) to see changes in the accuracy and losses. According to the report, after 50 epochs of training, the nodes’ accurate validating rate becomes immersed and doesn’t improve with each epoch. The validation accuracy was 99.79 and the validation loss was 0.0083 after the model was trained for 50 epochs. Table 3 shows the experimental outcomes. We computed the accuracy and loss curve for the model. The face verification performance and losses are depicted graphically for both the testing and validation sets. Below Fig. 3 shows the validation and training, accuracy, and loss graph with respect to epochs. In the utilized dataset, we tested the Face Recognition System and discovered that certain photos were misclassified by the framework, as displayed in Table 4. Most of them have obstruction from points and impediment. Some of them are related with the face-point issues that cause face acknowledgment blunders. The other are related with the impediment issue that causes face acknowledgment mistakes. Some is related with the light issue. Furthermore, staying one has an intricate issue that causes face acknowledgment blunders.
Table 3 Training and validation accuracy, loss of the module Epochs
Accuracy
Loss
Val. accuracy
Val. loss
10
69.97
1.0022
92.92
0.2956
20
83.43
0.5543
98.96
0.0475
30
87.64
0.4145
99.56
0.0175
40
89.69
0.3424
99.70
0.0113
50
91.12
0.2963
99.79
0.0083
Face Recognition Using Deep Neural Network with MobileNetV3-Large
121
Fig. 3 Validation loss and validation accuracy graph with respect to epochs
Table 4 Model output of face recognition on training and validation set Training set (22,987)
Validation set (5746)
Epochs
Correctly identified
Incorrectly identified
Correctly identified
Incorrectly identified
10
21,359
1628
5339
407
20
22,748
239
5686
60
30
22,885
102
5720
26
40
22,918
69
5728
18
50
22,939
48
5736
10
5 Conclusion This research used the MobileNetv3Large deep learning model for facial recognition. Preprocessing and Recognition were the two main steps of the entire tale. We employ the AveragePooling2D rather than the fully connected layer before definite result, followed by the softmax layer, which diminishes the volume of the structure. The precision of the recognition has been increased by developing such a basic structure. The hyperparameter tweaking process aided us in the model’s performance. Our work has decent facial recognition accuracy, but there is still potential for improvement. Working with half-visible or low-visible faces, for example. Another method is to embed different algorithms into our base one to improve its accuracy. We will also assess our method’s effectiveness in other areas such as detection, Pose prediction, object tracking, pedestrian detection, etc., in future.
122
S. Bisht et al.
References 1. Bruce V, Young (1986) A understanding face recognition. Br J Psychol 77(3):305–327. https:// doi.org/10.1111/j.2044-8295.1986.tb02199.x 2. Kortli Y, Jridi M, Al Falou A, Atri M (2020) Face recognition systems: a survey. Sensors 20:342. https://doi.org/10.3390/s20020342 3. Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) SphereFace: deep hypersphere embedding for face recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 6738−6746. https://doi.org/10.1109/CVPR.2017.713 4. Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A (2018) VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). pp 67–74. https://doi.org/10.1109/FG.2018.00020 5. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li, Z, Liu W (2018) CosFace: large margin cosine loss for deep face recognition. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 5265–5274. https://doi.org/10.1109/CVPR.2018.00552 6. Ranjan R, Patel VM, Chellappa R (2019) HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135. https://doi.org/10.1109/TPAMI.2017.2781233 7. Wang F, Cheng J, Liu W, Liu H (2018) Additive margin softmax for face verification. IEEE Signal Process Lett 25(7):926–930. https://doi.org/10.1109/LSP.2018.2822810 8. Tran L, Yin X, Liu X (2017) Disentangled representation learning GAN for pose-invariant face recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 1283−1292. https://doi.org/10.1109/CVPR.2017.141 9. Ding C, Tao D (2018) Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans Pattern Anal Mach Intell 40(4):1002–1014. https://doi.org/10. 1109/TPAMI.2017.2700390 10. Yang J, Luo L, Qian J, Tai Y, Zhang F, Xu Y (2017) Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE Trans Pattern Anal Mach Intell 39(1):156–171. https://doi.org/10.1109/TPAMI.2016.2535218 11. Gao Y, Ma J, Yuille AL (2017) Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans Image Process 26(5):2545– 2560. https://doi.org/10.1109/TIP.2017.2675341 12. Yin X, Liu X (2018) Multi-task convolutional neural network for pose-invariant face recognition. IEEE Trans Image Process 27(2):964–975. https://doi.org/10.1109/TIP.2017.276 5830 13. Cavazos JG, Phillips PJ, Castillo CD, O’Toole AJ (2021) Accuracy comparison across face recognition algorithms: where are we on measuring race bias? IEEE Trans Biom, Behav, Identity Sci 3(1):101–111. https://doi.org/10.1109/TBIOM.2020.3027269 14. Cho M, Kim T, Kim I-J, Lee K, Lee S (2021) Relational deep feature learning for heterogeneous face recognition. IEEE Trans Inf Forensics Secur 16:376–388. https://doi.org/10.1109/TIFS. 2020.3013186 15. Zhou Y, Ni H, Ren F, Kang X (2019) Face and gender recognition system based on convolutional neural networks. IEEE Int Conf Mechatron Autom (ICMA) 2019:1091–1095. https://doi.org/ 10.1109/ICMA.2019.8816192 16. Mantoro T, Ayu MA, Suhendi (2018) Multi-faces recognition process using Haar cascades and eigenface methods. In: 2018 6th international conference on multimedia computing and systems (ICMCS). pp 1–5. https://doi.org/10.1109/ICMCS.2018.8525935
Face Recognition Using Deep Neural Network with MobileNetV3-Large
123
17. Napoléon T, Alfalou A (2017) Pose invariant face recognition: 3D model from single photo. Opt Lasers Eng 89:150–161 18. Kumar R Bollywood celebrity faces in Kaggle. https://www.kaggle.com/datasets/havingfun/ 100-bollywood-celebrity-faces 19. Howard A, Sandler M, Chen B, Wang W, Chen LC, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Quoc (2019) Searching for mobileNetV3. In: 2019 IEEE/CVF international conference on computer vision (ICCV). pp 1314–1324. https://doi.org/10.1109/ICCV.2019. 00140
Detection of BotNet Using Extreme Learning Machine Tuned by Enhanced Sine Cosine Algorithm Nebojsa Bacanin , Miodrag Zivkovic , Zlatko Hajdarevic , Aleksandar Petrovic , Nebojsa Budimirovic, Milos Antonijevic, and Ivana Strumberger
Abstract The rise of the internet of things (IoT) popularity with smart home systems and improvements in the technologies of embedded devices raised interest in such solutions. However, with the increase in the utilization of these technologies, the security risks increased as well. Connecting a large number of devices to the same network is risky, because the security of any network is only as strong as its weakest part, and in the case of IoT the weakest link can be a small home appliance. This research aims to improve the means for detecting and preventing attacks on such networks via a hybrid method between machine learning and population-based metaheuristics. The proposed study introduces an enhanced version of the recently emerged sine cosine algorithm (SCA) that was used for tuning extreme learning machine (ELM). The hybrid method was validated against the UNSW-NB15 dataset that consists of features that describe regular (normal) and botnet traffic, where the latter refers to the distributed denial of service (DDoS) attack. The hybrid framework’s performance metrics were compared with several other classic machine learning models, as well as with other ELM solutions tuned by other population-based metaheuristics. Based N. Bacanin (B) · M. Zivkovic · Z. Hajdarevic · A. Petrovic · N. Budimirovic · M. Antonijevic · I. Strumberger Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] M. Zivkovic e-mail: [email protected] Z. Hajdarevic e-mail: [email protected] A. Petrovic e-mail: [email protected] N. Budimirovic e-mail: [email protected] M. Antonijevic e-mail: [email protected] I. Strumberger e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 B. Unhelkar et al. (eds.), Advances and Applications of Artificial Intelligence & Machine Learning, Lecture Notes in Electrical Engineering 1078, https://doi.org/10.1007/978-981-99-5974-7_12
125
126
N. Bacanin et al.
on experimental data, the conclusion that the proposed method on average obtains the best performance can be derived. Keywords IoT · Cybersecurity · DDoS · ELM · Metaheuristics · Swarm intelligence
1 Introduction The importance of question of cybersecurity has sustained substantial increase due to a rising number of connected devices. The advancement of technology allowed for an ever enlarging number of electronic devices to communicate and achieve combined functionality. The combination of functionality regards the functions unavailable by performance of a single device. The example of such scenario is the programming of smart LED systems in-house system. The definition of a network of connected devices has been coined in as internet of things (IoT). In the twenty-first century, the IoT has become a significant technology used for data collection and analysis that allows companies to better understand their products, customer needs, provide better service, have more accurate data to make better decisions and to improve their business. The result of the continuous growth of IoT devices is an increase in network attacks due to a larger number of potential targets. Attacks on IoT device increased by more than 200% in recent years, according to the SonicVall report [15]. Distributed denial of service (DDoS) is a type of attack that focuses on disrupting the communication between the connected sides. The connected sides typically consider a host who provides the service, and the client that exploits it. In the case of IoT, the blocking of communication means that no device receives packet traffic. This situation is dangerous for many reasons. For start, some home appliances could catch on fire in this case due to ill handling. Additionally, these scenarios increase the chance of a hardware error because the device is more likely to deviate from the predicted behavior. Finally, in terms of security for the downed network of devices, the attacker has more room for exploiting physical entrance onto the property due to devices being more vulnerable as they stopped communicating with each other and the owner of the system. Therefore, one of the most important challenges in IoT is to differentiate between normal and BotNet (DDoS) traffic. Fortunately, if the historic dataset is available, machine learning (ML) models can help in resolving this issue. However, since each dataset is specific, ML models need to be tuned for every specific task. Additionally, traditional ML training algorithms, such is stochastic gradient descent (SGD) are susceptible to be trapped in local optimum. Research proposed in this study tries to tackle the above mentioned ML issues and to improve classification performance for detecting the BotNet traffic on IoT networks. For this purpose, special type of artificial neural network (ANN), extreme learning machine (ELM), is chosen due to its efficiency and simplicity. However, the number of neurons in ELM’s hidden layer needs to be determined for this particular problem. Also, since the ELM does not require classical training, generated weights
Detection of BotNet Using Extreme Learning Machine Tuned …
127
and biases between the input and hidden layers in most cases do not render satisfying solutions. Determining number of neurons in the ELM’s hidden layer (hyper-parameter optimization) and weights and biases values (training) represent NP-hard challenge and this research proposes an enhanced version of recently emerged sine cosine algorithm (SCA) [22] for solving this task. It is well known that population-based metaheuristics, such as SCA, represent efficient NP-hard problem solvers [2, 13, 26]. The remaining of the paper is structured as follows: Sect. 2 provides literature review and description of the decomposed parts of the proposed hybrid solution, Sect. 3 is focused on the proposed metaheuristics method. Section 4 holds the information on the experimental setup and discusses the results of the research, and the final Sect. 5 briefly summarizes findings of the proposed research and shows possibilities for future improvements.
2 Background and Related Work This section briefly introduces background information related to proposed research along with the literature review.
2.1 BotNet and DDOS A bot is an automated software with programmed rules on which it acts on. Applications of bots vary from harmless conversational agents that serve either for entertainment or for providing information to malicious bots with purpose to produce negative effects on the victim. This work focuses on such harmful bots that target IoT systems with the purpose of blocking the flow of information between the devices in such network. Multiple bots performing a coordinated DDoS attack are referred to as a botnet. Between these units a central unit is distinguished in terminology known as the botmaster. The botnet grows as the number of infected computers increases and a single infected computer is known as the bot. These machines as well as their users are unaware of such activities as the bot does not necessarily have any visible effect on the host machine. The botmaster enforces control over the network and attacks the target from a few hundred infected devices in the network to extensively large botnets summing up to measures in tens of thousands. The attacker’s goal is to have as large network of bots as possible, so the attack could gain larger momentum. Bot devices can perform harmful processes for years before being discovered.
128
N. Bacanin et al.
2.2 Extreme Learning Machine The ELM is applied as a training algorithm for single (hidden) layer feedforward neural networks (SLFNs). Hidden neurons are initialized randomly and the MoorPenrose (MP) generalized inverse is used for analytical determination of the output weights [19, 20]. As already mentioned in Sect. 1, among two most important challenges with ELMs are as follows [19]: it is needed to determine a proper number of neurons in the hidden layer in a way that the generalization performance can be achieved without over-fitting and generated weights and biases between the input and hidden layers are generated only once and that cannot guarantee satisfying model’s performance. Mathematical formulation of ELM is given as follows: for training examples set . N = {(xi , ti )|xi ∈ R d , ti ∈ R m , i = 1, . . . , N } using activation function .g(x) of SLFNs with N hidden neurons the output is represented as [20]: L ∑ .
βi o(wi · x j + bi ) = y j , j = 1, . . . , N ,
(1)
i=1
where: .wi = [.wi 1, . . . , wi d] and .bi denote the input weight of hidden neuron and bias, respectively, .βi = [.βi1 , . . . , βim ] represent the output weight, while the inner product of .wi and .x j is indicated as .wi · x j . The following equation estimates the parameters .βi , i = 1, . . . , L: L ∑ .
βi o(wi · x j + bi ) = t j , j = 1, . . . , N
(2)
i=1
The previous Eq. (2) can be transformed as suggested in [20]: Hβ = T,
(3)
⎤ o(w1 · x1 + b1 ) · · · o(wl · x1 + bl ) ⎥ ⎢ .. .. .. .H = ⎣ ⎦ . . . o(w1 · x N + b1 ) · · · o(wl · x N + bl )
(4)
.
where
⎡
⎤ β1T ⎢ . ⎥ .β = ⎣ . ⎦ . ⎡
β LT
and
(5)
Detection of BotNet Using Extreme Learning Machine Tuned …
⎤ T1T ⎢ . ⎥ .T = ⎣ . ⎦ .
129
⎡
(6)
TNT
Matrix . H is output of the hidden layer. Using the solution with the minimum norm least-square solution, the output weight .β can be calculated analytically with the minimum norm least-square solution: β = Ht ∗ T
.
(7)
In Moore-Penrose . H t represents generalized inverse of H.
2.3 Population-Based Metaheuristics Population-based metaheuristics are optimization methods that are trying to incrementally improve problem solution throughout iterations. In general, populationbased metaheuristics conduct a search by simultaneously conducting exploitation and exploration processes with set (population) of solutions. According to one taxonomy, population-based metaheuristics can be divided into those inspired by the nature and metaphor-based. The most notable representatives of nature inspired metaheuristics is swarm intelligence, while some examples of metaphor-based algorithms include SCA [22] and gravitational search algorithm (GSA) [24]. On the other side, some of swarm intelligence approaches are artificial bee colony (ABC) [21], bat algorithm (BA) [28], and harris hawks optimization (HHO) [18]. Domain with most population-based metaheuristics applications are various NPhard challenges from the field of computer science. Those applications vary from feature selection [5, 33], cloud-edge environment task scheduling [2, 10, 13], localization and lifetime maximization of wireless sensors networks [9, 29, 31], tuning artificial neural networks training [1, 3, 4, 8, 26], price and values prediction [25], assisted medical diagnosis [11, 12, 27], COVID-19 related applications [30, 32, 34], etc. [6]. Extended survey of literature reveals that the population-based metaheuristics were not abundantly applied for ELM optimization, with only few approaches found in the literature survey [7, 14, 16, 17].
3 Proposed Method In this section, an original version of SCA metaheuristics is first shown, followed by overview of its enhanced version which is proposed in this study.
130
N. Bacanin et al.
The SCA population-based metaheuristics has recently emerged, and it has been proposed for the first time by Mirjalili for tackling global optimization challenges [22]. The SCA’s search process is founded on the two main mechanism, as it is also the case of all other metaheuristics approaches: exploration—algorithm combines random solutions to find the regions that are promising (avoiding local solution) and exploitation—combining existing solutions from the population (search within the vicinity of local solutions). The SCA conducts exploration and exploitation by using the following search equations, inspired by basic mathematical trigonometric functions [22]: | | X it+1 = X it + r1 · sin(r 2) · |r3 · Pit − X it |
(8)
| | X it+1 = X it + r1 · cos(r 2) · |r3 · Pit − X it |,
(9)
.
.
where random numbers from the interval .[0, 1] are denoted as .r1 /r2 /r3 , . X it and . X it+1 represent solution .i in iterations .t and .t + 1, respectively, while . Pi is the current best solution (point) in the population. Every individual is updated in every iteration either by applying Eq. (8) or Eq. (9) according to [22]: { t+1 .Xi
=
| | } X it + r1 · sin(r 2) · ||r3 · Pit − X it ||, for r4 < 0.5 X it + r1 · cos(r 2) · |r3 · Pit − X it |, for r4 ≥ 0.5,
(10)
where .r4 is another pseudo-random number that controls the above mentioned process. During algorithm’s execution, the range of sine and cosine functions, therefore balance between intensification and diversification, is controlled by applying the following expression after the .r1 is generated: r =a−t
. 1
a , T
(11)
where .a is constant, while .T is the total iterations number in a run.
3.1 Suggested Improved SCA The original variant of SCA metaheuristics is considered to be a very powerful in various optimization tasks, however, as it is the case with all other metaheuristics algorithms, it has certain noted drawbacks. The extensive experiments that have been executed over standard benchmark function test suite have proven that the original SCA implementation excels in exploration during the early rounds of the execution. Nevertheless, the simulations have also exposed some troubles in later rounds of execution, where the basic SCA can suffer from the lack of exploitation strength.
Detection of BotNet Using Extreme Learning Machine Tuned …
131
In order to address this known flaw of the original SCA variant, enhanced SCA suggested in this research proposes the following simple strategy: in the last third of rounds (.t ∈ [2/3 · T, T ]) of a run, when the algorithm should narrow down, converge and exploit the favorable regions of the search space, in every iteration the worst individual will be removed, and the novel, arbitrary produced individual will replace it. The strategy for generating the novel solution is guided-random approach, as the novel individual’s parameters are produced inside the borders defined as the arithmetic mean between parameters of the best solution, and the median parameters’ value of the complete population. Therefore, guided-random approach generates new individual as .
X i = r nd · params
(12)
where . params denotes an array of parameters, calculated as: . params = ( params_best + params_median)/2, while .r nd is pseudo-random number between .[0, 2]. Proposed improved SCA is named enhanced SCA (eSCA) and its work-flow is shown in Algorithm 1. Algorithm 1 Pseudo-code of suggested eSCA algorithm Initialize search agents (solutions)(X) while t 0.7 is considered reliable. The α value obtained is 0.72 which is greater than 0.7, hence this questionnaire can be considered reliable [12, 13]. Kaiser–Meyer–Olkin (KMO) and Bartlett’s Test of Sphericity. Before completing factor analysis, it is necessary to determine the appropriateness of the data utilized in the factor extraction procedure. The KMO is a measurement that shows the extent to which hidden components might cause differences in the factors. For the most part, high quality (around 1.0) indicates that factor analysis could be useful for the data. If the value is less than 0.50, the factor investigation’s results are unlikely to be very useful. The KMO statistic of 0.85 is also large (greater than 0.50). Bartlett’s test of sphericity determines if the correlation matrix is an identity matrix, which indicates that the variables are unrelated and hence inappropriate for structure discovery. If the crucial value is less than 0.05, a factor analysis could work with the data [14, 15].
5 Data Analysis Technique Employed R tool is used for predictive and statistical analysis. It serves as a tool for data cleaning, analysis, and representation. It can handle a wide range of tasks, such as data manipulation, data analysis, and graphics. The methodologies employed in this research are EFA and CFA. Later the Dataset is analyzed using Cluster Analysis.
5.1 Exploratory Factor Analysis Principal Component Analysis and Factor Reduction. Factor analysis was used to minimize a large number of variables in the dataset and to emphasize the structure of the functional relationships. RStudio was used for factor analysis. It is a statistical strategy for explaining variation among related variables by employing a small number of unknown variables known as factors. Although FA and PCA are similar, they are not identical. It is commonly used to identify the structure behind such variables and to estimate scores to quantify the latent factors themselves, whereas
296
V. Nehra et al.
principal components analysis is commonly used to determine the best ways to arrange variables into a restricted number of subsets. FA is used to calculate the numerical values of the significance and correlation seen among the attributes. The data is shown in a more intelligible way by reducing the dimensionality and showing the principal components. This 2-D representation reveals some trends in the data, but additional information may be gained by investigating the driving mechanism that distributes or ranks the samples in the 2-D plane. However, in the test, five elements must be extracted based on relativity to summarize the supplied elements. By lowering the plotting area and boosting the clarity of observation in the derived factors, PCA allows us to view variables in higher dimensions [16]. Scree Plot. The scree plot can help figure out how many components are best. In the first solution, the eigenvalues of each component are displayed. In general, the components should be extracted on a steep slope. The components on the shallow slope have a negligible impact on the outcome. Because the final major decrease happens between the third and fourth components, sticking with the first three is an easy option. The scree plot is used in exploratory factor analysis (FA) to determine how many factors to keep or how many principal components to keep in PCA. Scree Plot is a method for finding statistically significant features or components. Figure 1 represents the scree plot with Eigen Values across Factors or Component Numbers. The amount of total variation in the first five columns differs significantly, as seen in Fig. 1. The graph becomes practically flat after column 6, indicating that each consecutive component accounts for a substantially less proportion of overall variation. It is concluded that the last major drop happens at the sixth component, hence the first five components should be used as the final answer (see Fig. 2). Variance. Rotation techniques must be chosen for factor analysis. The Varimax approach is chosen because the goal is to minimize the number of variables to obtain independent factors that best suit the data. The Varimax rotation aims to maximize each factor’s variance. In terms of missing values, there are none in the data set for this study. Every factor analysis has the same number of factors as variables. Each component contributes a fraction of the overall variance in the observed variables, and the components are often listed in the order of how much variation they account for. The eigenvalue is a measure of how much variance a component explains in the observed variables. Five components were retrieved with Eigenvalues larger than 1. The percentage difference between the current and previous factors is represented by the cumulative variance. The total number of factors considered for this study (i.e., 25) was set as the initial number of factors. Three rows were created from the final result. Each factor’s SS Loading values are listed in Column 1. The percent of percentage variation accounted for by each component is shown in Column 1 (percent of Variance). The third column (Cumulative percent) shows the total percentage of variance explained by all previous and current factors (see Table 1). Rotation Component Matrix. An orthogonal rotation is a change of the factor loadings that makes them easier to read. The correlation or covariance matrix, the residual matrix, the individual variances, and the commonalities are all retained by the
Applications of Big Five Personality Test in Job Performance
297
Fig. 2 Scree plot obtained after the analysis
Table 1 Proportion and cumulative variance with SS loading across the five factors SS loadings
Proportion variance
Cumulative variance
Factor 1
2.719
0.109
0.109
Factor 2
2.099
0.084
0.193
Factor 3
2.068
0.083
0.275
Factor 4
2.047
0.082
0.357
Factor 5
1.598
0.064
0.421
rotational loadings. The variation is accounted for by each factor and the associated percentages change as the loadings vary. The axes are rotated to get them as near to as many points as feasible, and each combination of variables is assigned to a factor. In other circumstances, however, a variable is near more than one axis and so related to many factors. Each rotating factor is generally highly connected and less linked with the others. The obtained collection of components is numbered, as can be seen. After that, each group is given a name that represents the grouped factor as well as all of the other elements included in it. Table 2 shows the findings after rotation. Because the variables were heavily loaded onto the first component, the pre-rotation failed to produce concrete results, leaving the other components dimly apparent [17].
298
V. Nehra et al.
Table 2 Rotation component matrix of 25 items Items
Factors
Rotated component matrix 1
2
3
4
5
O1
Am full of ideas
O2
Avoid difficult reading material
0.510
O3
Carry the conversation to a higher level
0.674
O4
Spend time reflecting on things
0.326
O5
Will not probe deeply into a subject
C1
Am exacting in my work
0.518
C2
Continue until everything is perfect
0.646
C3
Do things according to a plan
0.595
C4
Do things in a halfway manner?
−0.634
C5
Waste my time
−0.586
E1
Don’t talk a lot
−0.559
E2
Find it difficult to approach others
−0.657
E3
Know how to captivate people
0.443
E4
Make friends easily
0.558
E5
Take charge
A1
Am indifferent to the feelings of others
−0.338
A2
Inquire about others’ well-being
0.568
A3
Know how to comfort others
0.667
A4
Love children
0.477
A5
Make people feel at ease
N1
Get angry easily
0.823
N2
Get irritated easily
0.779
N3
Have frequent mood swings
0.718
N4
Often feel blue
0.558
N5
Panic easily
0.521
−0.427
−0.489
0.451
0.589
Applications of Big Five Personality Test in Job Performance
299
5.2 Confirmatory Factor Analysis Evaluating Model Fit. In CFA, several statistical tests are employed to measure how well the model fits the data. It is critical to note that a strong match between the model and the data does not imply that the model is “right” or explains a large portion of the variation. “Excellent model fit” just means that the model is believable. Model fit Indices to examine the goodness-of-fit of the model with the given dataset: root means the square error of approximation (RMSEA), goodness-of-fit index (GFI), normed fit index (NFI), Tucker-Lewis Index (TLI), Comparative fit index (CFI), and adjusted goodness-of-fit index (AGFI). After evaluating the model fit, Constructed Reliability and Average Variance Extracted are also calculated. The Values obtained are CFI = 0.774; AGFI = 0.823 (>0.80); TLI = 0.744 (